In order to describe the environment, images must be obtained from representative viewpoints. For the purposes of this discussion, let us assume that we select viewpoints that cover the configuration space in a uniform grid. This is by no means a requirement or constraint, but rather a simplifying assumption. In order to achieve computational efficiency, viewpoints are selected such that the camera is facing in a consistent orientation. Once the sample images have been acquired, they are used to automatically learn a suitable set of tracked landmarks for subsequent positioning.
The set of tracked landmarks is initially defined by the set of single
candidate landmarks observed in a selected bootstrap image from
the database. These candidate landmarks, which become prototypes for
matching, are selected in this manner in order to guarantee uniqueness
- no two landmark candidates will overlap within the same
image. Matching is based on a minimisation of the
Euclidean distance between the principal components encodings of the
prototype and of the observed candidate landmarks in each image.
Typically, we select the initial bootstrap image to be the one that is
taken from a camera position closest to the centroid of all visited
camera positions. Given this initial set of prototypes, the candidate
landmarks in each of the remaining images are considered for inclusion
in one of the tracked landmarks. Consideration for inclusion in a set
is based on the following methodology:
The goal of this method is to grow landmark sets as much as possible in configuration space so that a candidate landmark can be matched to the correct target over a large portion of the space. The local search in the neighbourhood of is performed in order to counter the effects of any instabilities in the underlying landmark detector. Figure 4.3 shows a typical landmark set. Each thumbnail image corresponds to the landmark as detected in the image taken at the corresponding grid position in configuration space. Grid positions with no corresponding thumbnail image indicate positions in the configuration space where no landmark candidate was found that matched the prototype. This can occur under three separate conditions: first, no suitable landmark candidate was detected by the landmark detector; second, a landmark candidate was detected but found a better match to a different prototype in the local neighbourhood; or third, a landmark candidate was detected but differed too greatly in appearance from the prototype - that is, the distance in the subspace was greater than the user-defined threshold.
Figure 4.3: A typical landmark set. Each
thumbnail corresponds to the landmark as detected in the image taken
at the corresponding grid position in camera space.
A tracked landmark is the essential modelling primitive that defines the ``map'' and which is used for subsequent correspondence and position estimation. It should be noted that the tracking method makes no assumptions regarding position within the image, which somewhat relaxes some constraints that could be imposed on the pose of the camera - landmarks can be matched regardless of their image position.