We consider the problem of locating a robot in an initially-unfamiliar environment from visual input. The robot is not given a map of the environment, but it does have access to a collection of training examples, each of which specifies the video image observed when the robot is at a particular location and orientation. We address two variants of this problem: how to estimate translation of a moving robot assuming the orientation is known, and how to estimate translation and orientation for a mobile robot. Performing scene reconstruction to construct a metric map of the environment using only video images is difficult. We avoid this by using an approach in which the robot learns to convert a set of image measurements into a representation of its pose (position and orientation). This provides a metric estimate of the robot's location within a region covered by the statistical map we build. Localization can be performed on-line without a prior location estimate. The conversion from visual data to camera pose is implemented using a multi-layer neural network that is trained using backpropagation. An aspect of the approach is the use of an inconsistency measure to eliminate incorrect data and estimate components of the pose vector. The experimental data reported in this paper suggests that the accuracy and flexibility of the technique is good, while the on-line computational cost is very low.