We consider the problem of locating a robot in an initially-unfamiliar
environment from visual input. The robot is not given a map of the 
environment, but it does have access to a collection of training
examples, each of which specifies the video image observed when the 
robot is at a particular location and orientation.

We address two variants of this 
problem: how to estimate translation
of a moving robot assuming the orientation is known, and how
to estimate translation and orientation for a mobile robot.

Performing scene reconstruction to construct a 
metric map of the environment using only video images is difficult. 
We avoid this by using an approach in which the robot learns to
convert a set of image measurements into a representation of its pose 
(position and orientation). This provides a metric 
estimate of the robot's location within a region covered by the 
statistical map we build.
Localization can be performed on-line
without a prior location estimate.
The conversion from visual data to camera pose is implemented using
a multi-layer neural network that is trained using backpropagation.
An  aspect of the approach is the use of an inconsistency measure
to eliminate incorrect data and estimate components of the
pose vector.
The experimental data reported in this paper
suggests that the accuracy and flexibility
of the technique is good, while the on-line
computational cost is very low.