Full Results - TiPToP

# Evaluation Overview

We evaluated TiPToP on 28 manipulation tasks in 3 settings: (i) simulation using IsaacSim, (ii) a real-world DROID hardware setup operated internally by TiPToP's developers at MIT, and (iii) a separate DROID setup operated by an external evaluation team at the University of Pennsylvania not involved in TiPToP's development. Below we present detailed video demonstrations for tasks evaluated on the internal DROID setup, along with complete results over all 28 tasks in the summary table.

Experimental Setup

Tasks: 28 tasks total (5 simulation, 8 internal DROID, 15 external DROID)
Trials: 10 trials per simulation task, 5 trials per real-world task
Total comparisons: 165 trials (50 simulation, 40 internal, 75 external)
Hardware: Franka Emika Panda FR3 with Robotiq 2F-85 gripper
Cameras: 1 x ZED Mini wrist camera, 1 x ZED 2i external camera (not used by TiPToP)

# Summary Results

Below is a comprehensive table of all 28 evaluation tasks across simulation, internal DROID, and external DROID settings. Tasks are organized by category (Simple, Distractor, Semantic, Multi-step). Click on task names with links to jump to detailed video demonstrations. For a detailed breakdown of the language prompt and progress metric for each task, please see the Scene Details table.

Key: SR = Success Rate, TP = Task Progress. † indicates tasks evaluated by system designers on internal DROID setup. (sim) indicates simulation tasks in IsaacSim. Unmarked tasks were evaluated by an external team at the University of Pennsylvania on a separate DROID setup. Bold values indicate better performance on that metric.

Scene	TiPToP SR	TiPToP TP	\(\pi_{0.5}\)-DROID SR	\(\pi_{0.5}\)-DROID TP
Simple Tasks
Cube → bowl (sim)	5/10	72.5%	8/10	90%
Can → mug (sim)	9/10	97.5%	2/10	50%
Banana → bin (sim)	0/10	70%	9/10	97.5%
Marker → tray	3/5	80%	5/5	100%
Crackers → tray^†	5/5	100%	3/5	60%
	22/40	84%	27/40	79.5%
Distractor Tasks
Meat can → sugar box (sim)	5/10	72.5%	0/10	5%
Coffee capsules → plate	4/5	90%	2/5	58%
Turkish figs → plate	3/5	64%	2/5	52%
Cashews → plate	0/5	16%	0/5	12%
Red cubes → plate	1/5	50%	5/5	92%
Fish → box	4/5	80%	0/5	10%
Crackers → tray (medium)^†	5/5	100%	3/5	80%
PB crackers → tray (hard)^†	5/5	100%	0/5	20%
	27/45	71.6%	12/45	41.1%
Semantic Tasks
Toy → matching plate	4/5	90%	1/5	62%
Creeper → plate	3/5	70%	0/5	0%
Largest toy → plate	3/5	70%	0/5	20%
Red A → color pile	5/5	100%	3/5	80%
Banana → box	2/5	40%	0/5	30%
N block → indicated cup	3/5	80%	2/5	60%
Sort blocks by color	5/5	100%	0/5	32%
Banana → matching plate	1/5	20%	4/5	90%
	26/40	71.3%	10/40	46.8%
Multi-step Tasks
Color cubes → bowl (sim)	9/10	94.6%	0/10	24.2%
AirPods → cup	1/5	55%	3/5	75%
Pack pods → tray^†	4/5	80%	1/5	65.7%
Pack pods → tray (obs.)^†	1/5	67%	0/5	64%
Aleve bottle → tray (obs.)^†	4/5	80%	2/5	70%
Three marbles → cup^†	2/5	80%	0/5	6.7%
Marbles + cable^†	2/5	70%	0/5	60%
	23/40	75.2%	6/40	52.2%
Overall	98/165	74.6%	55/165	52.4%

# Execution Time Comparison

The table below shows average execution times on 7 representative scenes where both methods succeeded. TiPToP's planning time is shown separately to illustrate the breakdown between planning and execution.

Key observation: TiPToP is faster than \(\pi_{0.5}\text{-DROID}\) in 6 of 6 scenes. Even though TiPToP spends significant upfront time on perception and planning, \(\pi_{0.5}\text{-DROID}\) often spends considerable time idling or re-grasping objects.

Scene	\(\pi_{0.5}\)-DROID Time	TiPToP Total Time	TiPToP Planning Time
Simulation (IsaacSim)
Cube → bowl	27.4s	17.9s	9.7s
Can → mug	41.0s	18.6s	9.2s
Real-World (Internal DROID)
Crackers → tray (simple)	32.2s	14.9s	7.0s
Crackers → tray (medium)	45.2s	14.9s	7.3s
Pack pods → tray	53.4s	47.0s	20.5s
Aleve bottle → tray (obs.)	31.2s	31.2s	16.4s

# Video Results

Below we show the videos for the 8 tasks evaluated on our internal DROID setup, where we have side-by-side video recordings of both systems. For results over all 28 tasks (including simulation and external evaluation), see the summary results table above.

# Crackers → tray

"place the crackers onto the tray"

A simple pick-and-place task where the robot must pick up a cracker box and place it on a designated target surface. This serves as a baseline test for basic manipulation capabilities.

Observations

TiPToP: Achieved 100% success rate (5/5) with consistent execution times averaging 14.9s. All trials completed efficiently with reliable grasp planning and placement.

\(\pi_{0.5}\text{-DROID}\): 60% success rate (3/5). Common failure modes included idling with no progress, and timeout failures at 120s. When successful, execution was slower (avg 32.2s) and less consistent. Some trials showed late progress around the timeout period.

Trial 1

Ours: ✓ 1.0 Successful completion

π₀.₅: ✗ 0.0 Robot just idles

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - Robot just idles with no progress.

Trial 2

Ours: ✓ 1.0 Successful completion

π₀.₅: ✗ 0.0 Moves close around 120s

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - π₀.₅ moves close to the crackers around 120s and attempts a pick-place, but fails the task.

Trial 3

Ours: ✓ 1.0 Successful completion

π₀.₅: ✓ 1.0 After initial idling

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - After some initial idling, π₀.₅ successfully completes the task.

Trial 4

Ours: ✓ 1.0 Successful completion

π₀.₅: ✓ 1.0 After initial idling

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - After some initial idling, π₀.₅ successfully completes the task.

Trial 5

Ours: ✓ 1.0 Successful completion

π₀.₅: ✓ 1.0 Carton overturns slightly

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - π₀.₅ succeeds, though the carton overturns slightly during placement.

# Crackers → tray (medium)

"place the crackers onto the tray"

Place a cracker box onto a tray in the presence of medium clutter (a medicine box and a strawberry). The robot must identify the correct target object and avoid disturbing distractors.

Observations

TiPToP: Perfect success rate (5/5) with average time of 14.9s. Consistently identified and manipulated the correct object despite distractors.

\(\pi_{0.5}\text{-DROID}\): 60% success (3/5) with average time of 45.2s for successful trials. Frequent confusion about task objectives, often manipulating wrong objects (strawberry, medicine box) or placing items incorrectly. One trial showed the tray falling over, and another resulted in the crackers being thrown off the table.

Trial 1

Ours: ✓ 1.0 Successful completion

π₀.₅: ✓ 1.0 Medicine box and strawberry first

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Succeeds, though it first puts the medicine box and strawberry into the tray before placing the crackers.

Trial 2

Ours: ✓ 1.0 Successful completion

π₀.₅: ✓ 1.0 Successful completion

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Succeeds at first then knocks container over, then fixes the placement and picks up strawberry.

Trial 3

Ours: ✓ 1.0 Successful completion

π₀.₅: ~ 0.5 Throws crackers off table

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Puts the strawberry into the tray, then picks the crackers and throws them off the table, finally puts the medicine into the tray instead.

Trial 4

Ours: ✓ 1.0 Successful completion

π₀.₅: ✓ 1.0 Tray falls over afterward

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Successfully puts the crackers into the tray, though the tray falls over afterward.

Trial 5

Ours: ✓ 1.0 Successful completion

π₀.₅: ~ 0.5 Placed on medicine box, not in tray

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Picks the crackers and puts them on the medicine box, but not in the tray as required.

# PB crackers → tray (hard)

"place the peanut butter crackers onto the tray"

A more challenging version with heavy clutter: place the cracker box on the tray while navigating around multiple distractor objects including medicine boxes, popcorn containers, and other items that significantly crowd the workspace.

Observations

TiPToP: Maintained 100% success (5/5) even with heavy clutter, averaging 15.2s. Demonstrates robust planning in crowded scenes.

\(\pi_{0.5}\text{-DROID}\): Complete failure (0/5). The VLA struggled significantly with the increased clutter. Common failures included picking the crackers but not placing them in the tray, dropping objects on other items (e.g., popcorn), or showing no progress at all. The presence of many distractors appeared to overwhelm the policy.

Trial 1

Ours: ✓ 1.0 Successful completion

π₀.₅: ~ 0.5 Picks crackers, doesn't place in tray

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Picks the crackers but doesn't put them in the tray.

Trial 2

Ours: ✓ 1.0 Successful completion

π₀.₅: ✗ 0.0 Doesn't even pick crackers

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - Doesn't even pick the crackers.

Trial 3

Ours: ✓ 1.0 Successful completion

π₀.₅: ~ 0.5 Drops on popcorn instead

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Picks the crackers but drops them on the popcorn instead of placing in tray.

Trial 4

Ours: ✓ 1.0 Successful completion

π₀.₅: ✗ 0.0 Doesn't do anything

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - Doesn't do anything.

Trial 5

Ours: ✓ 1.0 Successful completion

π₀.₅: ✗ 0.0 Doesn't do anything

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - Doesn't do anything.

# Pack pods → tray

"pack the coffee pods onto the rectangular tray"

Pack three coffee pods onto a tray in specific slots. This requires precise placement and understanding of spatial arrangements. Each pod must be placed correctly in its designated position on the tray.

Observations

TiPToP: Strong performance with 80% success (4/5), averaging 46.8s for successful trials. One failure due to planning timeout. Most trials completed all 3 pods successfully, demonstrating good sequential task execution.

\(\pi_{0.5}\text{-DROID}\): 20% success (1/5). Typical failures included placing only 1-2 out of 3 pods correctly, or placing pods in wrong locations (e.g., into a coffee mug instead of the tray). The task's requirement for precise, repeated placements proved challenging for the VLA.

Trial 1

Ours: ✓ 1.0 All 3 pods placed

π₀.₅: ~ 0.85 2/3 coffee pods placed

▶

Ours: 1.0 progress - Successful completion of all 3 pods. π₀.₅: 0.85 progress - 2/3 coffee pods placed correctly, pick-and-place one pod into the coffee machine.

Trial 2

Ours: ✓ 1.0 All 3 pods placed

π₀.₅: ~ 0.4 Only 1/3 pod placed, approaches other pods but fails to grasp them.

▶

Ours: 1.0 progress - Successful completion of all 3 pods. π₀.₅: 0.4 progress - Only 1/3 pod placed correctly.

Trial 3

Ours: ✓ 1.0 All 3 pods placed

π₀.₅: ~ 0.85 2/3 succeeded, 3rd incorrectly placed

▶

Ours: 1.0 progress - Successful completion of all 3 pods. π₀.₅: 0.85 progress - 2/3 succeeded, picked the 3rd one but placed it incorrectly.

Trial 4

Ours: ✗ 0.0 Planning failure

π₀.₅: ~ 0.18 Put pod into steel mug

▶

Ours: 0.0 progress - Planning failure (couldn't find a plan). π₀.₅: 0.183 progress - Picked a pod and put it into the steel mug instead of packing it onto the tray.

Trial 5

Ours: ✓ 1.0 Successful completion

π₀.₅: ✓ 1.0 Both methods succeed

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Both methods succeed on this trial.

# Pack pods → tray (obs.)

"pack the coffee pods onto the rectangular tray"

An extended version of coffee packing that requires moving a Coke can obstacle out of the way before packing all three coffee pods onto the tray. This tests multi-step reasoning and obstacle management.

Observations

TiPToP: 20% success (1/5), taking 61.4s when successful. Common failure modes included grasp failures on pods, planning timeouts, and incomplete pod placement (2/3 pods placed). The need to reason about obstacle removal before placement proved difficult.

\(\pi_{0.5}\text{-DROID}\): Complete failure (0/5). Struggled with the multi-step nature: often managed 1-2 pods but failed on the third, sometimes placing pods incorrectly (on the can rather than the tray, or in the coffee cup). The requirement to move an obstacle first added significant complexity.

Trial 1

Ours: ~ 0.8 2/3 pods, third slipped out

π₀.₅: ~ 0.45 1/3 pod, tried placing on can

▶

Ours: 0.66 progress - 2/3 pods packed; grasped the third one but it slipped out. π₀.₅: 0.33 progress - 1/3 pod placed; tried to place another pod on the coke can put it fell out.

Trial 2

Ours: ~ 0.8 2/3 pods, failed pick on first pod

π₀.₅: ~ 0.65 2/3 pods, third failed due to can

▶

Ours: 0.8 progress - 2/3 pods; failed a pick on the first coffee pod but successfully moved the coke can out of the way. π₀.₅: 0.65 progress - 2/3 pods; managed to pack 2 in, but failed to place the third due to the can.

Trial 3

Ours: ✓ 1.0 All 3 pods packed

π₀.₅: ~ 0.55 1/3 pod, others in coffee cup

▶

Ours: 1.0 progress - Successful completion of all 3 pods. π₀.₅: 0.55 progress - 1/3 pod placed; put the other two into the coffee cup instead.

Trial 4

Ours: ✗ 0.0 Failed to find a plan

π₀.₅: ~ 0.9 2/3 pods, failed final placement

▶

Ours: 0.0 progress - Failed to find a plan. π₀.₅: 0.9 progress - 2/3 pods; got lucky moving the can away, picked the last one up but failed to place it properly.

Trial 5

Ours: ~ 0.75 2/3 pods, Gemini vision issue

π₀.₅: ~ 0.65 2/3 pods, third balanced on can

▶

Ours: 0.66 progress - 2/3 pods; missed the third pod due to Gemini vision issue. π₀.₅: 0.65 progress - 2/3 pods; put the third one balanced but it doesn't touch the tray (balanced on the coke can and other pods).

# Aleve bottle → tray (obs.)

"put the small white aleve bottle into the cardboard tray"

Pack a medicine box onto a wooden tray while navigating around obstacles in the workspace. Requires reasoning about which objects need to be moved to create a clear path to the goal.

Observations

TiPToP: Good performance with 80% success (4/5), averaging 31.2s. Successfully moved obstacles when necessary. One failure involved picking a suboptimal grasp to move an obstacle, leading to accidentally picking up the wrong object.

\(\pi_{0.5}\text{-DROID}\): Moderate performance at 40% success (2/5). Failure modes included pushing objects together to create more clutter, flipping the wooden platform, and knocking objects off the table. When successful, completion times were similar to ours (~31s), but execution was less reliable.

Trial 1

Ours: ✓ 1.0 Successful completion

π₀.₅: ✓ 1.0 Shoves box off at 81.6s

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Succeeds, though at 81.6s, π₀.₅ accidentally shoves the box off the table.

Trial 2

Ours: ✓ 1.0 Successful completion

π₀.₅: ~ 0.5 Wooden platform flips up, grasped bottle

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Pushes the objects such that the wooden platform flips up and can't be used anymore, picks up aleve bottle

Trial 3

Ours: ✗ 0.0 Bad grasp on obstacle

π₀.₅: ~ 0.5 Pushed objects around, grasped bottle

▶

Ours: 0.0 progress - Picked a bad grasp to move an obstacle and failed, then accidentally picked up another object. π₀.₅: 0.5 progress - Pushed objects around to make space and grasp aleve bottle, but placed in mug.

Trial 4

Ours: ✓ 1.0 Successful completion

π₀.₅: ✓ 1.0 Successful completion. Tips tray at 118s, medicine falls

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 1.0 progress - Successful completion. At around 118s, the robot tips over the wooden tray and the medicine falls out.

Trial 5

Ours: ✓ 1.0 Moved obstacle, completed task

π₀.₅: ~ 0.5 Medicine falls out, struggles to pick

▶

Ours: 1.0 progress - Successfully moved an object out of the way and completed task! π₀.₅: 0.5 progress - Partially picks up the medicine, but it falls out; struggles to pick it up again with a proper grasp.

# Three marbles → cup

"put only the marbles in the cup"

Place three marbles into a cup. This task is particularly challenging due to the small size of marbles, their tendency to roll, and the precision required for cup placement. Requires careful perception and delicate manipulation.

Observations

TiPToP: Challenging task with 40% success (2/5), averaging 43.7s for successful trials. Main failure modes included misperception of the cup leading to missed placements, marbles rolling away, and unmodeled cables causing collisions with the cup. The precision required for small rolling objects proved difficult.

\(\pi_{0.5}\text{-DROID}\): Complete failure (0/5). Common issues included marbles rolling away before manipulation, picking wrong objects (e.g., larger balls), and general inability to execute precise placement. One trial managed to place one marble but failed on subsequent ones.

Trial 1

Ours: ✓ 1.0 Successful completion

π₀.₅: ✗ 0.0 Balls roll away

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.0 progress - Balls roll away; didn't even pick one.

Trial 2

Ours: ~ 0.83 Final ball rolled away

π₀.₅: ✗ 0.0 Picked bigger ball, caused chaos

▶

Ours: 0.83 progress - Missed placing the final ball into the cup (it rolled away), likely due to bad perception of the cup. π₀.₅: 0.0 progress - Picked a much bigger ball and dropped it into the cup, causing chaos in the scene.

Trial 3

Ours: ✓ 1.0 Successful completion

π₀.₅: ~ 0.33 One marble in, fails others

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.33 progress - Puts one marble into the cup, but fails at the others.

Trial 4

Ours: ~ 0.83 Third marble, cup misperception

π₀.₅: ✗ 0.0 Complete failure

▶

Ours: 0.83 progress - Failed placement on the third marble once again due to misperception of cup. π₀.₅: 0.0 progress - Complete failure.

Trial 5

Ours: ~ 0.33 Cable collision with cup

π₀.₅: ✗ 0.0 Complete failure

▶

Ours: 0.33 progress - Successfully placed one marble, but not modeling the cable caused collision with the cup, making it move and causing further collisions. When picking the first marble, bumped the table making the other two move. π₀.₅: 0.0 progress - Complete failure.

# Marbles + cable

"put the small plastic bag of marbles into the black mesh bag, and the cable on top of the empty large plastic bag"

A complex multi-object task: place a small bag of marbles into a mesh bag, then a wire/cable onto the plastic surface. Requires coordinating multiple objects with different properties (flexible cable, plastic bag, deformable mesh bag).

Observations

TiPToP: Difficult task with 40% success (2/5), averaging 35.4s when successful. Failure modes included missed grasps on the cable, missed placements into the mesh bag, and one case where Gemini vision model incorrectly identified a beaker as the target cable. The combination of deformable objects and precise sequential manipulation proved challenging.

\(\pi_{0.5}\text{-DROID}\): Complete failure (0/5). While some trials showed partial progress (e.g., placing the wire on plastic around 30-90s), the VLA consistently failed to complete the full sequence. Common failures included picking up the plastic sheet and dumping everything out, throwing marbles off the table, or complete inability to progress.

Trial 1

Ours: ~ 0.75 Wire placed, balls missed pouch

π₀.₅: ~ 0.75 Cable on plastic, then ruined

▶

Ours: 0.75 progress - Correctly put the wire on the plastic and picked the balls, but missed placing them in the pouch. π₀.₅: 0.75 progress - Put the marbles on the plastic, then put the cable on the plastic around the 90s mark, but then picked up the plastic and ruined everything.

Trial 2

Ours: ✓ 1.0 Successful completion

π₀.₅: ~ 0.5 Grasped marbles and cable but placement failures

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Grasped bag of marbles but put on table, grasped cable but put on mesh pouch, then put plastic bag on cable and mesh pouch.

Trial 3

Ours: ~ 0.25 Missed grasp and placement

π₀.₅: ~ 0.5 Wire placed, then dumps everything

▶

Ours: 0.25 progress - Missed a grasp on the cable, then grasped bag of marbles but missed a placement into the mesh bag. π₀.₅: 0.5 progress - At around 30.9s, places the wire on the plastic, but then picks the plastic and dumps everything out.

Trial 4

Ours: ~ 0.5 Picked beaker, Gemini vision issue

π₀.₅: ~ 0.75 Marbles off table, cable at 63.77s

▶

Ours: 0.5 progress - Picked the beaker instead of the cable due to a Gemini vision issue, placed marbles correctly. π₀.₅: 0.75 progress - Failed to put the marbles into the mesh (threw them both off the table). At 63.77s, placed the cable onto the plastic.

Trial 5

Ours: ✓ 1.0 Successful completion

π₀.₅: ~ 0.5 Threw cable off table, placed marbles on plastic bag

▶

Ours: 1.0 progress - Successful completion. π₀.₅: 0.5 progress - Threw the cable off the table and cleared other objects from table, put the marbles onto the plastic bag.

# Evaluation Scene Details

Each scene shows an image of the task, its identifier (as referenced in the Summary table above), the language prompt given to both systems, and the task progress metric used for evaluation. Scenes are grouped by category: Simple, Distractor, Semantic, and Multi-step. † indicates tasks evaluated by the system designers at MIT. Unmarked scenes are evaluated by external evaluators at the University of Pennsylvania not involved in the development of TiPToP. (sim) denotes tasks evaluated in simulation.

Task progress metric numbers are reported in %; a + or − sign indicates that the particular denoted amount is added or subtracted from the overall score, and no sign indicates that the number is the absolute score for achieving that particular condition. Progress metrics may vary by evaluator and task. Some metrics penalize manipulating distractors while others do not.

Scene	Identifier / Language Prompt	Progress Metric
Simple
	Cube → bowl (sim) "put the cube in the bowl"	25% approach cube, 50% grasp, 75% approach bowl with cube, 100% place
	Can → mug (sim) "put the can in the mug"	25% approach can, 50% grasp, 75% approach mug with can, 100% place
	Banana → bin (sim) "put banana in the bin"	25% approach banana, 50% grasp, 75% approach bin with banana, 100% place
	Marker → tray "put the marker in the tray"	+25% touch marker, +25% grasp, +25% touch tray, +25% place
	Crackers → tray^† "place the crackers onto the tray"	50% grasp crackers, 100% place
Distractor
	Meat can → sugar box (sim) "put the meat can on the sugar box"	25% approach meat can, 50% grasp, 75% approach box with meat can, 100% place
	Coffee capsules → plate "put all of the coffee capsules onto the white plate"	+50% per capsule placed, −20% per distractor
	Turkish figs → plate "put the turkish figs onto the white plate"	+50% per fig placed, −20% per cashew
	Cashews → plate "put the roasted cashews onto the white plate"	+50% per cashew placed, −20% per fig
	Red cubes → plate "put the red cubes onto the white plate"	+50% per cube placed, −20% if distractor placed
	Fish → box "place the fish into the white box"	+50% pick fish, +50% place into white box
	Crackers → tray (med.)^† "place the crackers onto the tray"	+50% pick crackers, +50% place on the tray (no penalty for distractor)
	PB crackers → tray (hard)^† "place the peanut butter crackers onto the tray"	+50% pick crackers, +50% place on the tray (no penalty for distractor)
Semantic
	Toy → matching plate "pick up the toy and place on the plate with similar color"	+50% pick toy, +50% place on teal or +30% place on blue
	Creeper → plate "pick up the creeper and place onto the purple plate"	+50% pick creeper toy, +50% place onto purple plate
	Largest toy → plate "pick up the largest toy and place onto the purple plate"	+50% pick creeper, +50% place onto purple plate, −20% if attempt to place on distractor
	Red A → color pile "pick up the red A and place on same color pile"	+50% pick red A block, +50% place onto red pile, −20% knock pile over
	Banana → box "pick up the banana and put it in the box"	+50% place banana into any box, +50% place into box with fruit (aims to test common sense of human selection)
	N block → indicated cup "put the N block into the cup pointed to by the arrow"	+50% grasp N block, +50% place into cup pointed at
	Sort blocks by color "sort the blocks into opposite color plates"	+10% per block touched, +40% per correct place
	Banana → matching plate "place banana into plate has similar color"	+50% pick banana, +50% place into orange plate
Multi-step
	Color cubes → bowl (sim) "put 3 cubes into the bowl"	For up to 3 cubes (normalized to 100%): +5% approach cube, +10% grasp, +10% approach bowl with cube, +15% place
	AirPods → cup "place airpods into the yellow cup"	+25% per AirPods picked, +25% per place, −20% distractor
	Pack pods → tray^† "pack the coffee pods onto the rectangular tray"	For each of the 3 pods: +3.33% approach, +15% grasp, +0% place not in tray, +15% place touching tray
	Pack pods → tray (obs.)^† "pack the coffee pods onto the rectangular tray"	+12.5% pick can, +12.5% place s.t. it doesn't obstruct tray (or +25% for clearing can obstruction without pick/place), for each of 3 pods: +5% for approaching pod, +10% for correct pick, +10% for correct place into tray
	Aleve bottle → tray (obs.)^† "put the small white aleve bottle into the cardboard tray"	+10% pick an obstacle object, +10% place obstacle s.t. unobstructs aleve, +30% pick aleve bottle (+50% if picked without clearing obstacles), +50% place bottle in tray
	Three marbles → cup^† "put only the marbles in the cup"	+16.67% for each pick of a marble, +16.67% for each place of a marble into the cup
	Marbles + cable^† "put the small plastic bag of marbles into the black mesh bag, and the cable on top of the empty large plastic bag"	wire: +5% approach, +20% stable pick, +25% stable place atop plastic; marbles pouch: +5% approach, +20% pick, +25% place into mesh bag

Home • Implementation Details • Try it yourself!