Cable Mind: UR5e Competition Environment

February 10, 2026

Session Context

Sessions 1 and 2 used a floating 3-DOF gripper to build intuition. That env topped out at 838 successes with 5.7cm median distance. Good for learning MuJoCo and reward shaping, but the competition uses a UR5e arm with a Robotiq gripper. 7-DOF actions, real joint dynamics, wrist-mounted cameras. Time to build the real thing.

New codename: Cable Mind.

MjSpec Composition

No static XML for this env. MuJoCo’s MjSpec API lets you compose models programmatically. Load the UR5e, load the Robotiq 2F-85, attach the gripper to the arm’s attachment_site, then add everything else in Python.

arm = mujoco.MjSpec.from_file(UR5E_XML)
gripper = mujoco.MjSpec.from_file(GRIPPER_XML)
site = arm.site("attachment_site")
arm.attach(gripper, "gripper-", "", site=site)

The attach() call merges the gripper’s kinematic tree into the arm’s, with "gripper-" as a prefix for all gripper bodies/joints to avoid name collisions. After attachment, spec.body("gripper-base") gives you the gripper’s root body for mounting cameras and the cable chain.

Why MjSpec instead of XML? The cable is procedurally generated (5 ball-joint segments), cameras need computed quaternions, and sensor wiring references bodies that only exist after attachment. Doing all of this in Python is cleaner than templating XML.

The composed model: 30 bodies, 20 joints, 7 actuators.

Action Space: Joint Velocity Control

7-dim continuous [-1, 1]:

Dims	What	Details
0-5	Joint velocities	Scaled by `MAX_JOINT_VEL = [1.0, 1.0, 1.0, 1.5, 1.5, 1.5]` rad/s
6	Gripper command	-1 = open, +1 = closed

Joint velocity control uses position actuators under the hood:

joint_vel = action[:6] * MAX_JOINT_VEL
self.data.ctrl[:6] = self.data.qpos[:6] + joint_vel * CONTROL_DT

The arm’s position actuators track ctrl targets, so setting ctrl = current_pos + vel * dt gives smooth velocity control at 20Hz (500Hz physics, 25 substeps). The gripper maps [-1, 1] to [0, 255] for the Robotiq’s position actuator.

Observation Space

State mode has 22 dimensions:

Dims	Source
0-5	Joint positions (6)
6-11	Joint velocities (6)
12-14	Wrist force sensor (3)
15-17	Wrist torque sensor (3)
18	Gripper state (1)
19-21	Target position (3)

Vision mode uses a Dict observation for SB3’s MultiInputPolicy:

{
    "wrist_left":   (84, 84, 3) uint8,
    "wrist_center": (84, 84, 3) uint8,
    "wrist_right":  (84, 84, 3) uint8,
    "state":        (22,) float32
}

Flat Dict, not nested. SB3’s CombinedExtractor needs this. Each camera renders at 84x84, the standard for CNN feature extractors.

The Cable

5 capsule segments connected by ball joints, attached to gripper-base body:

Parameter	Value	Notes
Segments	5	Ball-joint chain
Segment length	2.5cm	Total ~12.5cm + connector
Radius	1.4cm	Thick for visibility (was 0.6cm)
Damping	0.15	Prevents wild oscillation
Stiffness	0.08	Gentle restoring force
Color	Bright yellow `[1.0, 0.85, 0.0]`	High contrast against dark floor

Connector tip: bright blue cylinder ([0.2, 0.4, 0.95]), chunky RJ45-style plug. Socket: bright green box on a dark-green PCB with copper traces and IC chips for visual context.

The first segment offsets z=0.155 from gripper-base to clear the pinch site. Each subsequent segment offsets by CABLE_SEG_LENGTH. Joint qpos entries are quaternions (ball joints), so the cable occupies qpos[14:38]. 6 ball joints x 4 quaternion values.

Files

File	What
`envs/cable_env_ur5e.py`	Competition-spec env, 525 lines, MjSpec composition
`experiments/smoke_test_ur5e.py`	5-test validation suite (state, vision, render, randomization, full episode)

Takeaways

MjSpec composition works well for building complex scenes. Load arm, load gripper, attach(), then add everything else in Python. No XML templating needed.
Joint velocity control via position actuators (ctrl = qpos + vel * dt) gives smooth, stable control at 20Hz without needing explicit velocity actuators.
Going from 3-DOF floating gripper to 7-DOF arm is a different problem. The observation space tripled in size, the action space changed completely, and now the policy has to deal with joint limits and self-collision. Starting fresh was the right call.