Contents

Full Results

More pages: Home Implementation Details

# Evaluation Overview

We evaluated TiPToP on 28 manipulation tasks in 3 settings: (i) simulation using IsaacSim, (ii) a real-world DROID hardware setup operated internally by TiPToP's developers at MIT, and (iii) a separate DROID setup operated by an external evaluation team at the University of Pennsylvania not involved in TiPToP's development. Below we present detailed video demonstrations for tasks evaluated on the internal DROID setup, along with complete results over all 28 tasks in the summary table.

Experimental Setup

  • Tasks: 28 tasks total (5 simulation, 8 internal DROID, 15 external DROID)
  • Trials: 10 trials per simulation task, 5 trials per real-world task
  • Total comparisons: 165 trials (50 simulation, 40 internal, 75 external)
  • Hardware: Franka Emika Panda FR3 with Robotiq 2F-85 gripper
  • Cameras: 1 x ZED Mini wrist camera, 1 x ZED 2i external camera (not used by TiPToP)

# Summary Results

Below is a comprehensive table of all 28 evaluation tasks across simulation, internal DROID, and external DROID settings. Tasks are organized by category (Simple, Distractor, Semantic, Multi-step). Click on task names with links to jump to detailed video demonstrations. For a detailed breakdown of the language prompt and progress metric for each task, please see the Scene Details table.

Key: SR = Success Rate, TP = Task Progress. † indicates tasks evaluated by system designers on internal DROID setup. (sim) indicates simulation tasks in IsaacSim. Unmarked tasks were evaluated by an external team at the University of Pennsylvania on a separate DROID setup. Bold values indicate better performance on that metric.

Scene TiPToP SR TiPToP TP \(\pi_{0.5}\)-DROID SR \(\pi_{0.5}\)-DROID TP
Simple Tasks
Cube → bowl (sim) 5/10 72.5% 8/10 90%
Can → mug (sim) 9/10 97.5% 2/10 50%
Banana → bin (sim) 0/10 70% 9/10 97.5%
Marker → tray 3/5 80% 5/5 100%
Crackers → tray 5/5 100% 3/5 60%
22/40 84% 27/40 79.5%
Distractor Tasks
Meat can → sugar box (sim) 5/10 72.5% 0/10 5%
Coffee capsules → plate 4/5 90% 2/5 58%
Turkish figs → plate 3/5 64% 2/5 52%
Cashews → plate 0/5 16% 0/5 12%
Red cubes → plate 1/5 50% 5/5 92%
Fish → box 4/5 80% 0/5 10%
Crackers → tray (medium) 5/5 100% 3/5 80%
PB crackers → tray (hard) 5/5 100% 0/5 20%
27/45 71.6% 12/45 41.1%
Semantic Tasks
Toy → matching plate 4/5 90% 1/5 62%
Creeper → plate 3/5 70% 0/5 0%
Largest toy → plate 3/5 70% 0/5 20%
Red A → color pile 5/5 100% 3/5 80%
Banana → box 2/5 40% 0/5 30%
N block → indicated cup 3/5 80% 2/5 60%
Sort blocks by color 5/5 100% 0/5 32%
Banana → matching plate 1/5 20% 4/5 90%
26/40 71.3% 10/40 46.8%
Multi-step Tasks
Color cubes → bowl (sim) 9/10 94.6% 0/10 24.2%
AirPods → cup 1/5 55% 3/5 75%
Pack pods → tray 4/5 80% 1/5 65.7%
Pack pods → tray (obs.) 1/5 67% 0/5 64%
Aleve bottle → tray (obs.) 4/5 80% 2/5 70%
Three marbles → cup 2/5 80% 0/5 6.7%
Marbles + cable 2/5 70% 0/5 60%
23/40 75.2% 6/40 52.2%
Overall 98/165 74.6% 55/165 52.4%

# Execution Time Comparison

The table below shows average execution times on 7 representative scenes where both methods succeeded. TiPToP's planning time is shown separately to illustrate the breakdown between planning and execution.

Key observation: TiPToP is faster than \(\pi_{0.5}\text{-DROID}\) in 6 of 6 scenes. Even though TiPToP spends significant upfront time on perception and planning, \(\pi_{0.5}\text{-DROID}\) often spends considerable time idling or re-grasping objects.

Scene \(\pi_{0.5}\)-DROID Time TiPToP Total Time TiPToP Planning Time
Simulation (IsaacSim)
Cube → bowl 27.4s 17.9s 9.7s
Can → mug 41.0s 18.6s 9.2s
Real-World (Internal DROID)
Crackers → tray (simple) 32.2s 14.9s 7.0s
Crackers → tray (medium) 45.2s 14.9s 7.3s
Pack pods → tray 53.4s 47.0s 20.5s
Aleve bottle → tray (obs.) 31.2s 31.2s 16.4s

# Video Results

Below we show the videos for the 8 tasks evaluated on our internal DROID setup, where we have side-by-side video recordings of both systems. For results over all 28 tasks (including simulation and external evaluation), see the summary results table above.

# Crackers → tray

"place the crackers onto the tray"

A simple pick-and-place task where the robot must pick up a cracker box and place it on a designated target surface. This serves as a baseline test for basic manipulation capabilities.

Observations

TiPToP: Achieved 100% success rate (5/5) with consistent execution times averaging 14.9s. All trials completed efficiently with reliable grasp planning and placement.

\(\pi_{0.5}\text{-DROID}\): 60% success rate (3/5). Common failure modes included idling with no progress, and timeout failures at 120s. When successful, execution was slower (avg 32.2s) and less consistent. Some trials showed late progress around the timeout period.

Trial 1
Ours: ✓ 1.0 Successful completion
π₀.₅: ✗ 0.0 Robot just idles
Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - Robot just idles with no progress.
Trial 2
Ours: ✓ 1.0 Successful completion
π₀.₅: ✗ 0.0 Moves close around 120s
Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - π₀.₅ moves close to the crackers around 120s and attempts a pick-place, but fails the task.
Trial 3
Ours: ✓ 1.0 Successful completion
π₀.₅: ✓ 1.0 After initial idling
Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - After some initial idling, π₀.₅ successfully completes the task.
Trial 4
Ours: ✓ 1.0 Successful completion
π₀.₅: ✓ 1.0 After initial idling
Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - After some initial idling, π₀.₅ successfully completes the task.
Trial 5
Ours: ✓ 1.0 Successful completion
π₀.₅: ✓ 1.0 Carton overturns slightly
Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - π₀.₅ succeeds, though the carton overturns slightly during placement.

# Crackers → tray (medium)

"place the crackers onto the tray"

Place a cracker box onto a tray in the presence of medium clutter (a medicine box and a strawberry). The robot must identify the correct target object and avoid disturbing distractors.

Observations

TiPToP: Perfect success rate (5/5) with average time of 14.9s. Consistently identified and manipulated the correct object despite distractors.

\(\pi_{0.5}\text{-DROID}\): 60% success (3/5) with average time of 45.2s for successful trials. Frequent confusion about task objectives, often manipulating wrong objects (strawberry, medicine box) or placing items incorrectly. One trial showed the tray falling over, and another resulted in the crackers being thrown off the table.

Trial 1
Ours: ✓ 1.0 Successful completion
π₀.₅: ✓ 1.0 Medicine box and strawberry first
Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Succeeds, though it first puts the medicine box and strawberry into the tray before placing the crackers.
Trial 2
Ours: ✓ 1.0 Successful completion
π₀.₅: ✓ 1.0 Successful completion
Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Succeeds at first then knocks container over, then fixes the placement and picks up strawberry.
Trial 3
Ours: ✓ 1.0 Successful completion
π₀.₅: ~ 0.5 Throws crackers off table
Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Puts the strawberry into the tray, then picks the crackers and throws them off the table, finally puts the medicine into the tray instead.
Trial 4
Ours: ✓ 1.0 Successful completion
π₀.₅: ✓ 1.0 Tray falls over afterward
Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Successfully puts the crackers into the tray, though the tray falls over afterward.
Trial 5
Ours: ✓ 1.0 Successful completion
π₀.₅: ~ 0.5 Placed on medicine box, not in tray
Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Picks the crackers and puts them on the medicine box, but not in the tray as required.

# PB crackers → tray (hard)

"place the peanut butter crackers onto the tray"

A more challenging version with heavy clutter: place the cracker box on the tray while navigating around multiple distractor objects including medicine boxes, popcorn containers, and other items that significantly crowd the workspace.

Observations

TiPToP: Maintained 100% success (5/5) even with heavy clutter, averaging 15.2s. Demonstrates robust planning in crowded scenes.

\(\pi_{0.5}\text{-DROID}\): Complete failure (0/5). The VLA struggled significantly with the increased clutter. Common failures included picking the crackers but not placing them in the tray, dropping objects on other items (e.g., popcorn), or showing no progress at all. The presence of many distractors appeared to overwhelm the policy.

Trial 1
Ours: ✓ 1.0 Successful completion
π₀.₅: ~ 0.5 Picks crackers, doesn't place in tray
Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Picks the crackers but doesn't put them in the tray.
Trial 2
Ours: ✓ 1.0 Successful completion
π₀.₅: ✗ 0.0 Doesn't even pick crackers
Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - Doesn't even pick the crackers.
Trial 3
Ours: ✓ 1.0 Successful completion
π₀.₅: ~ 0.5 Drops on popcorn instead
Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Picks the crackers but drops them on the popcorn instead of placing in tray.
Trial 4
Ours: ✓ 1.0 Successful completion
π₀.₅: ✗ 0.0 Doesn't do anything
Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - Doesn't do anything.
Trial 5
Ours: ✓ 1.0 Successful completion
π₀.₅: ✗ 0.0 Doesn't do anything
Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - Doesn't do anything.

# Pack pods → tray

"pack the coffee pods onto the rectangular tray"

Pack three coffee pods onto a tray in specific slots. This requires precise placement and understanding of spatial arrangements. Each pod must be placed correctly in its designated position on the tray.

Observations

TiPToP: Strong performance with 80% success (4/5), averaging 46.8s for successful trials. One failure due to planning timeout. Most trials completed all 3 pods successfully, demonstrating good sequential task execution.

\(\pi_{0.5}\text{-DROID}\): 20% success (1/5). Typical failures included placing only 1-2 out of 3 pods correctly, or placing pods in wrong locations (e.g., into a coffee mug instead of the tray). The task's requirement for precise, repeated placements proved challenging for the VLA.

Trial 1
Ours: ✓ 1.0 All 3 pods placed
π₀.₅: ~ 0.85 2/3 coffee pods placed
Ours: 1.0 progress - Successful completion of all 3 pods. π₀.₅: 0.85 progress - 2/3 coffee pods placed correctly, pick-and-place one pod into the coffee machine.
Trial 2
Ours: ✓ 1.0 All 3 pods placed
π₀.₅: ~ 0.4 Only 1/3 pod placed, approaches other pods but fails to grasp them.
Ours: 1.0 progress - Successful completion of all 3 pods. π₀.₅: 0.4 progress - Only 1/3 pod placed correctly.
Trial 3
Ours: ✓ 1.0 All 3 pods placed
π₀.₅: ~ 0.85 2/3 succeeded, 3rd incorrectly placed
Ours: 1.0 progress - Successful completion of all 3 pods. π₀.₅: 0.85 progress - 2/3 succeeded, picked the 3rd one but placed it incorrectly.
Trial 4
Ours: ✗ 0.0 Planning failure
π₀.₅: ~ 0.18 Put pod into steel mug
Ours: 0.0 progress - Planning failure (couldn't find a plan). π₀.₅: 0.183 progress - Picked a pod and put it into the steel mug instead of packing it onto the tray.
Trial 5
Ours: ✓ 1.0 Successful completion
π₀.₅: ✓ 1.0 Both methods succeed
Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Both methods succeed on this trial.

# Pack pods → tray (obs.)

"pack the coffee pods onto the rectangular tray"

An extended version of coffee packing that requires moving a Coke can obstacle out of the way before packing all three coffee pods onto the tray. This tests multi-step reasoning and obstacle management.

Observations

TiPToP: 20% success (1/5), taking 61.4s when successful. Common failure modes included grasp failures on pods, planning timeouts, and incomplete pod placement (2/3 pods placed). The need to reason about obstacle removal before placement proved difficult.

\(\pi_{0.5}\text{-DROID}\): Complete failure (0/5). Struggled with the multi-step nature: often managed 1-2 pods but failed on the third, sometimes placing pods incorrectly (on the can rather than the tray, or in the coffee cup). The requirement to move an obstacle first added significant complexity.

Trial 1
Ours: ~ 0.8 2/3 pods, third slipped out
π₀.₅: ~ 0.45 1/3 pod, tried placing on can
Ours: 0.66 progress - 2/3 pods packed; grasped the third one but it slipped out. π₀.₅: 0.33 progress - 1/3 pod placed; tried to place another pod on the coke can put it fell out.
Trial 2
Ours: ~ 0.8 2/3 pods, failed pick on first pod
π₀.₅: ~ 0.65 2/3 pods, third failed due to can
Ours: 0.8 progress - 2/3 pods; failed a pick on the first coffee pod but successfully moved the coke can out of the way. π₀.₅: 0.65 progress - 2/3 pods; managed to pack 2 in, but failed to place the third due to the can.
Trial 3
Ours: ✓ 1.0 All 3 pods packed
π₀.₅: ~ 0.55 1/3 pod, others in coffee cup
Ours: 1.0 progress - Successful completion of all 3 pods. π₀.₅: 0.55 progress - 1/3 pod placed; put the other two into the coffee cup instead.
Trial 4
Ours: ✗ 0.0 Failed to find a plan
π₀.₅: ~ 0.9 2/3 pods, failed final placement
Ours: 0.0 progress - Failed to find a plan. π₀.₅: 0.9 progress - 2/3 pods; got lucky moving the can away, picked the last one up but failed to place it properly.
Trial 5
Ours: ~ 0.75 2/3 pods, Gemini vision issue
π₀.₅: ~ 0.65 2/3 pods, third balanced on can
Ours: 0.66 progress - 2/3 pods; missed the third pod due to Gemini vision issue. π₀.₅: 0.65 progress - 2/3 pods; put the third one balanced but it doesn't touch the tray (balanced on the coke can and other pods).

# Aleve bottle → tray (obs.)

"put the small white aleve bottle into the cardboard tray"

Pack a medicine box onto a wooden tray while navigating around obstacles in the workspace. Requires reasoning about which objects need to be moved to create a clear path to the goal.

Observations

TiPToP: Good performance with 80% success (4/5), averaging 31.2s. Successfully moved obstacles when necessary. One failure involved picking a suboptimal grasp to move an obstacle, leading to accidentally picking up the wrong object.

\(\pi_{0.5}\text{-DROID}\): Moderate performance at 40% success (2/5). Failure modes included pushing objects together to create more clutter, flipping the wooden platform, and knocking objects off the table. When successful, completion times were similar to ours (~31s), but execution was less reliable.

Trial 1
Ours: ✓ 1.0 Successful completion
π₀.₅: ✓ 1.0 Shoves box off at 81.6s
Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Succeeds, though at 81.6s, π₀.₅ accidentally shoves the box off the table.
Trial 2
Ours: ✓ 1.0 Successful completion
π₀.₅: ~ 0.5 Wooden platform flips up, grasped bottle
Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Pushes the objects such that the wooden platform flips up and can't be used anymore, picks up aleve bottle
Trial 3
Ours: ✗ 0.0 Bad grasp on obstacle
π₀.₅: ~ 0.5 Pushed objects around, grasped bottle
Ours: 0.0 progress - Picked a bad grasp to move an obstacle and failed, then accidentally picked up another object. π₀.₅: 0.5 progress - Pushed objects around to make space and grasp aleve bottle, but placed in mug.
Trial 4
Ours: ✓ 1.0 Successful completion
π₀.₅: ✓ 1.0 Successful completion. Tips tray at 118s, medicine falls
Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Successful completion. At around 118s, the robot tips over the wooden tray and the medicine falls out.
Trial 5
Ours: ✓ 1.0 Moved obstacle, completed task
π₀.₅: ~ 0.5 Medicine falls out, struggles to pick
Ours: 1.0 progress - Successfully moved an object out of the way and completed task! π₀.₅: 0.5 progress - Partially picks up the medicine, but it falls out; struggles to pick it up again with a proper grasp.

# Three marbles → cup

"put only the marbles in the cup"

Place three marbles into a cup. This task is particularly challenging due to the small size of marbles, their tendency to roll, and the precision required for cup placement. Requires careful perception and delicate manipulation.

Observations

TiPToP: Challenging task with 40% success (2/5), averaging 43.7s for successful trials. Main failure modes included misperception of the cup leading to missed placements, marbles rolling away, and unmodeled cables causing collisions with the cup. The precision required for small rolling objects proved difficult.

\(\pi_{0.5}\text{-DROID}\): Complete failure (0/5). Common issues included marbles rolling away before manipulation, picking wrong objects (e.g., larger balls), and general inability to execute precise placement. One trial managed to place one marble but failed on subsequent ones.

Trial 1
Ours: ✓ 1.0 Successful completion
π₀.₅: ✗ 0.0 Balls roll away
Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - Balls roll away; didn't even pick one.
Trial 2
Ours: ~ 0.83 Final ball rolled away
π₀.₅: ✗ 0.0 Picked bigger ball, caused chaos
Ours: 0.83 progress - Missed placing the final ball into the cup (it rolled away), likely due to bad perception of the cup. π₀.₅: 0.0 progress - Picked a much bigger ball and dropped it into the cup, causing chaos in the scene.
Trial 3
Ours: ✓ 1.0 Successful completion
π₀.₅: ~ 0.33 One marble in, fails others
Ours: 1.0 progress - Successful completion. π₀.₅: 0.33 progress - Puts one marble into the cup, but fails at the others.
Trial 4
Ours: ~ 0.83 Third marble, cup misperception
π₀.₅: ✗ 0.0 Complete failure
Ours: 0.83 progress - Failed placement on the third marble once again due to misperception of cup. π₀.₅: 0.0 progress - Complete failure.
Trial 5
Ours: ~ 0.33 Cable collision with cup
π₀.₅: ✗ 0.0 Complete failure
Ours: 0.33 progress - Successfully placed one marble, but not modeling the cable caused collision with the cup, making it move and causing further collisions. When picking the first marble, bumped the table making the other two move. π₀.₅: 0.0 progress - Complete failure.

# Marbles + cable

"put the small plastic bag of marbles into the black mesh bag, and the cable on top of the empty large plastic bag"

A complex multi-object task: place a small bag of marbles into a mesh bag, then a wire/cable onto the plastic surface. Requires coordinating multiple objects with different properties (flexible cable, plastic bag, deformable mesh bag).

Observations

TiPToP: Difficult task with 40% success (2/5), averaging 35.4s when successful. Failure modes included missed grasps on the cable, missed placements into the mesh bag, and one case where Gemini vision model incorrectly identified a beaker as the target cable. The combination of deformable objects and precise sequential manipulation proved challenging.

\(\pi_{0.5}\text{-DROID}\): Complete failure (0/5). While some trials showed partial progress (e.g., placing the wire on plastic around 30-90s), the VLA consistently failed to complete the full sequence. Common failures included picking up the plastic sheet and dumping everything out, throwing marbles off the table, or complete inability to progress.

Trial 1
Ours: ~ 0.75 Wire placed, balls missed pouch
π₀.₅: ~ 0.75 Cable on plastic, then ruined
Ours: 0.75 progress - Correctly put the wire on the plastic and picked the balls, but missed placing them in the pouch. π₀.₅: 0.75 progress - Put the marbles on the plastic, then put the cable on the plastic around the 90s mark, but then picked up the plastic and ruined everything.
Trial 2
Ours: ✓ 1.0 Successful completion
π₀.₅: ~ 0.5 Grasped marbles and cable but placement failures
Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Grasped bag of marbles but put on table, grasped cable but put on mesh pouch, then put plastic bag on cable and mesh pouch.
Trial 3
Ours: ~ 0.25 Missed grasp and placement
π₀.₅: ~ 0.5 Wire placed, then dumps everything
Ours: 0.25 progress - Missed a grasp on the cable, then grasped bag of marbles but missed a placement into the mesh bag. π₀.₅: 0.5 progress - At around 30.9s, places the wire on the plastic, but then picks the plastic and dumps everything out.
Trial 4
Ours: ~ 0.5 Picked beaker, Gemini vision issue
π₀.₅: ~ 0.75 Marbles off table, cable at 63.77s
Ours: 0.5 progress - Picked the beaker instead of the cable due to a Gemini vision issue, placed marbles correctly. π₀.₅: 0.75 progress - Failed to put the marbles into the mesh (threw them both off the table). At 63.77s, placed the cable onto the plastic.
Trial 5
Ours: ✓ 1.0 Successful completion
π₀.₅: ~ 0.5 Threw cable off table, placed marbles on plastic bag
Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Threw the cable off the table and cleared other objects from table, put the marbles onto the plastic bag.

# Evaluation Scene Details

Each scene shows an image of the task, its identifier (as referenced in the Summary table above), the language prompt given to both systems, and the task progress metric used for evaluation. Scenes are grouped by category: Simple, Distractor, Semantic, and Multi-step. indicates tasks evaluated by the system designers at MIT. Unmarked scenes are evaluated by external evaluators at the University of Pennsylvania not involved in the development of TiPToP. (sim) denotes tasks evaluated in simulation.

Task progress metric numbers are reported in %; a + or − sign indicates that the particular denoted amount is added or subtracted from the overall score, and no sign indicates that the number is the absolute score for achieving that particular condition. Progress metrics may vary by evaluator and task. Some metrics penalize manipulating distractors while others do not.

Scene Identifier / Language Prompt Progress Metric
Simple
Cube to bowl (sim) Cube → bowl (sim)
"put the cube in the bowl"
25% approach cube, 50% grasp, 75% approach bowl with cube, 100% place
Can to mug (sim) Can → mug (sim)
"put the can in the mug"
25% approach can, 50% grasp, 75% approach mug with can, 100% place
Banana to bin (sim) Banana → bin (sim)
"put banana in the bin"
25% approach banana, 50% grasp, 75% approach bin with banana, 100% place
Marker to tray Marker → tray
"put the marker in the tray"
+25% touch marker, +25% grasp, +25% touch tray, +25% place
Crackers to tray Crackers → tray
"place the crackers onto the tray"
50% grasp crackers, 100% place
Distractor
Meat can to sugar box (sim) Meat can → sugar box (sim)
"put the meat can on the sugar box"
25% approach meat can, 50% grasp, 75% approach box with meat can, 100% place
Coffee capsules to plate Coffee capsules → plate
"put all of the coffee capsules onto the white plate"
+50% per capsule placed, −20% per distractor
Turkish figs to plate Turkish figs → plate
"put the turkish figs onto the white plate"
+50% per fig placed, −20% per cashew
Cashews to plate Cashews → plate
"put the roasted cashews onto the white plate"
+50% per cashew placed, −20% per fig
Red cubes to plate Red cubes → plate
"put the red cubes onto the white plate"
+50% per cube placed, −20% if distractor placed
Fish to box Fish → box
"place the fish into the white box"
+50% pick fish, +50% place into white box
Crackers to tray (medium) Crackers → tray (med.)
"place the crackers onto the tray"
+50% pick crackers, +50% place on the tray (no penalty for distractor)
PB crackers to tray (hard) PB crackers → tray (hard)
"place the peanut butter crackers onto the tray"
+50% pick crackers, +50% place on the tray (no penalty for distractor)
Semantic
Toy to matching plate Toy → matching plate
"pick up the toy and place on the plate with similar color"
+50% pick toy, +50% place on teal or +30% place on blue
Creeper to plate Creeper → plate
"pick up the creeper and place onto the purple plate"
+50% pick creeper toy, +50% place onto purple plate
Largest toy to plate Largest toy → plate
"pick up the largest toy and place onto the purple plate"
+50% pick creeper, +50% place onto purple plate, −20% if attempt to place on distractor
Red A to color pile Red A → color pile
"pick up the red A and place on same color pile"
+50% pick red A block, +50% place onto red pile, −20% knock pile over
Banana to box Banana → box
"pick up the banana and put it in the box"
+50% place banana into any box, +50% place into box with fruit (aims to test common sense of human selection)
N block to indicated cup N block → indicated cup
"put the N block into the cup pointed to by the arrow"
+50% grasp N block, +50% place into cup pointed at
Sort blocks by color Sort blocks by color
"sort the blocks into opposite color plates"
+10% per block touched, +40% per correct place
Banana to matching plate Banana → matching plate
"place banana into plate has similar color"
+50% pick banana, +50% place into orange plate
Multi-step
Color cubes to bowl (sim) Color cubes → bowl (sim)
"put 3 cubes into the bowl"
For up to 3 cubes (normalized to 100%): +5% approach cube, +10% grasp, +10% approach bowl with cube, +15% place
AirPods to cup AirPods → cup
"place airpods into the yellow cup"
+25% per AirPods picked, +25% per place, −20% distractor
Pack pods to tray Pack pods → tray
"pack the coffee pods onto the rectangular tray"
For each of the 3 pods: +3.33% approach, +15% grasp, +0% place not in tray, +15% place touching tray
Pack pods to tray (obstruction) Pack pods → tray (obs.)
"pack the coffee pods onto the rectangular tray"
+12.5% pick can, +12.5% place s.t. it doesn't obstruct tray (or +25% for clearing can obstruction without pick/place), for each of 3 pods: +5% for approaching pod, +10% for correct pick, +10% for correct place into tray
Aleve bottle to tray (obstruction) Aleve bottle → tray (obs.)
"put the small white aleve bottle into the cardboard tray"
+10% pick an obstacle object, +10% place obstacle s.t. unobstructs aleve, +30% pick aleve bottle (+50% if picked without clearing obstacles), +50% place bottle in tray
Three marbles to cup Three marbles → cup
"put only the marbles in the cup"
+16.67% for each pick of a marble, +16.67% for each place of a marble into the cup
Marbles + cable Marbles + cable
"put the small plastic bag of marbles into the black mesh bag, and the cable on top of the empty large plastic bag"
wire: +5% approach, +20% stable pick, +25% stable place atop plastic; marbles pouch: +5% approach, +20% pick, +25% place into mesh bag

© 2026 TiPToP Authors