Cable Mind: Camera Placement & Scene Design
The Problem
The policy needs three genuinely different viewpoints for depth cues. Three copies of the same angle is wasted information.
This took the most iteration of anything in the UR5e env.
Camera Iteration
Attempt 1: All three on wrist_3_link
Mounted cameras on the wrist with small X offsets. The wrist flange blocks the center camera entirely. The gripper body sits between the camera and the cable.
Attempt 2: All three on gripper-base
Moved cameras to the gripper body, past the wrist flange. Could see the cable now. But all three had nearly the same view direction ([0, 0, 1, 0] quat, looking along +Z). Three near-identical views with slightly different X offsets. Useless for triangulation.
Attempt 3: Angled 45 degrees on gripper-base
Kept center looking straight, angled left/right 45 degrees inward with wider X offsets. The arm body dominated both side views. At 45 degrees inward, the camera looks right at the wrist links.
Attempt 4: Reduced to 30 degrees
Less arm in frame, but still too much metal and not enough workspace.
Final: Hybrid mounting
Center camera on gripper-base, looking straight down the cable axis. This is the insertion view. It moves with the arm, always showing connector-to-socket alignment.
cam_c = grip_body.add_camera()
cam_c.pos = [0, 0.02, 0.10]
cam_c.quat = [0, 0, 1, 0] # 180 deg around Y, look along +Z
cam_c.fovy = 90
Left and right cameras fixed on worldbody, aimed at the socket area from two sides. These don’t move with the arm. They always show the cable-to-socket gap in profile, giving the policy stable depth/distance cues regardless of arm pose.
cam_l = spec.worldbody.add_camera()
cam_l.pos = [-0.40, 0.30, 0.20]
cam_l.quat = [0.7138, 0.4627, -0.2860, -0.4412] # look-at socket
cam_l.fovy = 50
Quaternions were computed via a look-at function targeting [-0.1, 0.45, 0.05] (slightly above the socket) with world +Z as up. Right camera mirrors the left at [0.20, 0.30, 0.20].
Body-mounted cameras are great for egocentric views (looking where you’re going), but terrible for external perspective. Fixed cameras solve the triangulation problem because they always frame the workspace the same way.
MuJoCo Camera Quaternions
MuJoCo cameras look along -Z in their own frame, with +Y as up. The quat field rotates this default orientation into world coordinates. No xyaxes support in MjSpec; quaternions only.
For a camera at position P looking at target T:
forward = normalize(T - P)(this is where -Z_cam should point)right = normalize(forward x world_up)(camera X axis)up_cam = right x forward(camera Y axis)- Rotation matrix
R = [right | up_cam | -forward](columns) - Convert
Rto quaternion[w, x, y, z]
The overhead camera is trivial: quat = [1, 0, 0, 0] (identity) already looks along -Z. The side and third-person cameras required manual quaternion work. The third-person hero shot at [0.8, -0.5, 0.8] with quat = [0.774, 0.504, 0.209, 0.321] took a few tries to get right.
Workspace Scene
Added visual context beyond floor + socket:
- Server tray: dark-green PCB base (
[0.05, 0.28, 0.08]) at[-0.1, 0.45, 0.003] - Copper traces: two gold strips across the board
- IC chips: two black boxes at different positions
- Socket: bright green box with dark inset “hole”, sitting on the PCB
All decorative geoms have contype=0, conaffinity=0. No collision, pure visuals. The socket body position is what gets randomized during domain randomization.
Domain Randomization
Three randomization axes per episode:
| What | Range | Why |
|---|---|---|
| Target position | +/-3cm XY, +/-1cm Z | Prevents memorizing one socket location |
| Arm pose | +/-5 deg per joint from home | Different starting configurations |
| Cable damping | +/-20% | Cable dynamics variation |
The socket body position (model.body_pos[socket_id]) is moved to match the randomized target, so the visual socket always corresponds to the reward target.
Takeaways
- Camera placement is hard. Five iterations from wrist-mounted to the final hybrid setup. Body-mounted cameras give good egocentric views but bad external perspective. Fixed world cameras solve triangulation.
- MuJoCo quaternion cameras are annoying. No
xyaxesin MjSpec, so you have to compute look-at quaternions manually. Worth writing a utility function once and reusing it. - Decorative geometry matters for vision policies. A bright green socket on a bare floor is ambiguous from certain angles. The PCB, traces, and chips give the CNN texture cues to anchor the socket in the scene.