GTSAM Geometry Conventions
This post describes the 3D geometry conventions used in the famous GTSAM geometry optimization library. It also describes best practices for naming your 3D point and pose variables so that you don’t get lost in the web of inverse transformations.
Points
- Always name your 3D points like how you would on paper. A point $^cX$ in the camera coordinate system $c$ is named
cX
. - 3D points use uppercase letters, 2D points use lowercase letters.
Pose
Name your pose variables like how you would write them on paper. The pose $^wT_c$ of camera $c$ in the world coordinate frame
$w$ is named wTc
.
In GTSAM jargon, c
is the pose coordinate frame, and w
is the world coordinate frame.
Composing Poses
Math: $^oT_c =~^oT_w~\cdot~^wT_c$.
GTSAM code:
Pose3 oTw = Pose3(...);
Pose3 wTc = Pose3(...);
Pose3 oTc = oTw.compose(wTc);
Transforming Points From Pose Coordinates
Math: $^w\homo{X} =~^wT_c~\cdot~^c\homo{X}$
GTSAM Code:
Point3 wX = wTc.transformFrom(cX);
Transforming Points To Pose Coordinates
Math: $^c\homo{X} =~\left(^wT_c\right)^{-1}~\cdot~^w\homo{X}$
GTSAM Code:
Point3 cX = wTc.transformTo(wX);
Cameras
The GTSAM pinhole camera classes (e.g. PinholeBase
)
internally use transformTo()
to transform 3D points into the camera coordinates, so you should use the pose of the camera
w.r.t. world while constructing the object:
Pose3 wTc = Pose3(...);
SimpleCamera cam(wTc, K);
now you can use cam
in the TriangulationFactor
for example. Other factors like the
GenericProjectionFactor
also use the same convention:
\(\begin{align*}
^{sensor}\homo{X}
&=~^{sensor}T_{world}~\cdot~^{world}\homo{X}\\
&=~^{sensor}T_{body}~\cdot~^{body}T_{world}~\cdot~^{world}\homo{X}\\
&= \left(^{world}T_{body}~\cdot~^{body}T_{sensor}\right)^{-1}~\cdot~^{world}\homo{X}
\end{align*}\)
GTSAM Code:
Pose3 body_T_sensor = ...
Point2 sensor_p = ... // 2D point in the image
// in the following factor,
// Symbol('T', i) is world_T_body for the i'th frame
// Symbol('X', j) is the j'th 3D point in world coordinates i.e. world_Xj
auto f = GenericProjectionFactor<Pose3, Point3, Cal3_S2>(sensor_p, noise, Symbol('T', i), Symbol('X', j), K, body_T_sensor);
It will project the world 3D point $^{world}\homo{X}$ into the sensor coordinates like so:
Pose3 world_T_sensor = world_T_body.compose(body_T_sensor);
Point3 sensor_X = world_T_sensor.transformTo(world_X);
and then project it to the image using instrinsics and then compare it to the detection sensor_p
.