Cable Mind: UR5e Competition Environment
Session Context
Sessions 1 and 2 used a floating 3-DOF gripper to build intuition. That env topped out at 838 successes with 5.7cm median distance. Good for learning MuJoCo and reward shaping, but the competition uses a UR5e arm with a Robotiq gripper. 7-DOF actions, real joint dynamics, wrist-mounted cameras. Time to build the real thing.
New codename: Cable Mind.
MjSpec Composition
No static XML for this env. MuJoCo’s MjSpec API lets you compose models programmatically. Load the UR5e, load the Robotiq 2F-85, attach the gripper to the arm’s attachment_site, then add everything else in Python.
arm = mujoco.MjSpec.from_file(UR5E_XML)
gripper = mujoco.MjSpec.from_file(GRIPPER_XML)
site = arm.site("attachment_site")
arm.attach(gripper, "gripper-", "", site=site)
The attach() call merges the gripper’s kinematic tree into the arm’s, with "gripper-" as a prefix for all gripper bodies/joints to avoid name collisions. After attachment, spec.body("gripper-base") gives you the gripper’s root body for mounting cameras and the cable chain.
Why MjSpec instead of XML? The cable is procedurally generated (5 ball-joint segments), cameras need computed quaternions, and sensor wiring references bodies that only exist after attachment. Doing all of this in Python is cleaner than templating XML.
The composed model: 30 bodies, 20 joints, 7 actuators.
Action Space: Joint Velocity Control
7-dim continuous [-1, 1]:
| Dims | What | Details |
|---|---|---|
| 0-5 | Joint velocities | Scaled by MAX_JOINT_VEL = [1.0, 1.0, 1.0, 1.5, 1.5, 1.5] rad/s |
| 6 | Gripper command | -1 = open, +1 = closed |
Joint velocity control uses position actuators under the hood:
joint_vel = action[:6] * MAX_JOINT_VEL
self.data.ctrl[:6] = self.data.qpos[:6] + joint_vel * CONTROL_DT
The arm’s position actuators track ctrl targets, so setting ctrl = current_pos + vel * dt gives smooth velocity control at 20Hz (500Hz physics, 25 substeps). The gripper maps [-1, 1] to [0, 255] for the Robotiq’s position actuator.
Observation Space
State mode has 22 dimensions:
| Dims | Source |
|---|---|
| 0-5 | Joint positions (6) |
| 6-11 | Joint velocities (6) |
| 12-14 | Wrist force sensor (3) |
| 15-17 | Wrist torque sensor (3) |
| 18 | Gripper state (1) |
| 19-21 | Target position (3) |
Vision mode uses a Dict observation for SB3’s MultiInputPolicy:
{
"wrist_left": (84, 84, 3) uint8,
"wrist_center": (84, 84, 3) uint8,
"wrist_right": (84, 84, 3) uint8,
"state": (22,) float32
}
Flat Dict, not nested. SB3’s CombinedExtractor needs this. Each camera renders at 84x84, the standard for CNN feature extractors.
The Cable
5 capsule segments connected by ball joints, attached to gripper-base body:
| Parameter | Value | Notes |
|---|---|---|
| Segments | 5 | Ball-joint chain |
| Segment length | 2.5cm | Total ~12.5cm + connector |
| Radius | 1.4cm | Thick for visibility (was 0.6cm) |
| Damping | 0.15 | Prevents wild oscillation |
| Stiffness | 0.08 | Gentle restoring force |
| Color | Bright yellow [1.0, 0.85, 0.0] | High contrast against dark floor |
Connector tip: bright blue cylinder ([0.2, 0.4, 0.95]), chunky RJ45-style plug. Socket: bright green box on a dark-green PCB with copper traces and IC chips for visual context.
The first segment offsets z=0.155 from gripper-base to clear the pinch site. Each subsequent segment offsets by CABLE_SEG_LENGTH. Joint qpos entries are quaternions (ball joints), so the cable occupies qpos[14:38]. 6 ball joints x 4 quaternion values.
Files
| File | What |
|---|---|
envs/cable_env_ur5e.py | Competition-spec env, 525 lines, MjSpec composition |
experiments/smoke_test_ur5e.py | 5-test validation suite (state, vision, render, randomization, full episode) |
Takeaways
- MjSpec composition works well for building complex scenes. Load arm, load gripper,
attach(), then add everything else in Python. No XML templating needed. - Joint velocity control via position actuators (
ctrl = qpos + vel * dt) gives smooth, stable control at 20Hz without needing explicit velocity actuators. - Going from 3-DOF floating gripper to 7-DOF arm is a different problem. The observation space tripled in size, the action space changed completely, and now the policy has to deal with joint limits and self-collision. Starting fresh was the right call.