Annotation KPIs
Benchmark Study
See the Videos in the Demo Environment
Sign-up to discover our tools and experience our automated annotation pipeline and R.PRO demos. Contact us to discuss your needs, or schedule a deep dive with us.
Key Facts
- 157h to annotate 1h of Egocentric Video with Ramblr’s AI-powered annotation pipeline
- Quality metrics: 0.94 IOU and 0.90 F1 score achieved
- 97% of 109,370 frames annotated automatically resulting in an avg. annotation time of 5.2s per frame
- Diverse dataset with 14 different scene types
- Open vocabulary with 529 unique object instances
- Accurate mask and category annotations for all context-relevant objects on every frame
Test Datasets
- Collection devices: Meta Quest 3, GoPro, Vuzix Blade, Project Aria
- Datasets: Collected by Ramblr, Ego4D, Aria Pilot Dataset
Video duration
Unique objects
Total
1h
529
Max per video
5min
10
Video Duration / Scene Type
(% of total dataset)
-
Cooking
23% -
Sports
18% -
Office work
12% -
Crafting
10% -
Social interaction
9% -
Pet care
7% -
Household activity
4% -
Arts
3% -
Eating
3% -
Studying
3% -
Personal hygiene
3% -
Playing
2% -
Lab work
1% -
Flight training
<1%
Annotation Time
1 Hour of Egocentric Video
-
Annotation time
111.0h70.9% -
Review and correction
26.3h16.8% -
GT annotation for QC
19.2h12.3%
2.5min
Avg. annotation time per manually annotated frame
5.2sec
Avg. annotation time per single frame
Video frames
Auto-annotated frames
Manually annotated frames
109,370
105,568 (97%)
3,802 (3%)
Quality
Average IOU and F1 Score for Dataset
IOU
0.94
F1 score
0.90
1125 (1%)
Ground truth frames for quality control
Annotations
- Temporal consistent segmentation of object instances
- Multi-object tracking for all context relevant objects
- AI-assisted annotation guidance to minimize subjective interpretation
Annotate Context Relevant Objects
Open vocabulary
All objects the ego interacts with; Ramblr’s AI-models detect hand-object interactions and gaze signals to provide annotation guidance
Closed vocabulary
Objects always considered to be relevant e.g., mobile phones and laptops. Ramblr’s custom trained detection model provides annotation guidance