Worker Posture Analysis

Evaluating computer vision-based markerless motion capture systems for real-world worker posture analysis and ergonomic assessment.

Computer VisionPose EstimationErgonomicsDeep Learning

Evaluation of Computer Vision-Based Markerless Motion Capture for Worker Posture Assessment

Wang J., Guzowski T., Barnett A., Fethke N., Baek S.

IISE Transactions on Occupational Ergonomics and Human Factors2026Submitted

Multi-camera pose estimation comparison

Participants

Recruited for the study

15.07°

Median RMSE

Across all joint angles

Camera Systems

Monocular, stereo, monocular+depth

3,832

Observations

Analyzed across all configurations

Overview

Can markerless motion capture replace traditional marker-based systems for workplace ergonomic assessment?

GoalEvaluate pose estimation accuracy for real-world industrial posture analysis

ProblemTraditional marker-based systems require specialized equipment and controlled environments

MethodTested modern pose estimation algorithms in unconstrained workplace settings

Challenges

1Limited accuracy of pose estimation in real-world conditions
2Lack of ground truth data for industrial settings
3Varying camera configurations and viewpoints
4Occlusions and complex movements in workplace environments

Methodology

The study compared three camera configurations against a gold-standard 10-camera optical motion capture system in a controlled reaching task with 46 participants.

Participants

Twenty-two males (age: 34.8 ± 15.5, height: 1.79 ± 0.08 m) and 24 females (age: 31.5 ± 10.9, height: 1.64 ± 0.07 m) were recruited from the University of Iowa community. All were at least 18 years of age with no recent upper extremity injury or pain.

Experimental Procedure

Participants performed a repetitive reaching task with the dominant arm, placing a pin into each of 16 holes spaced at 22.5° intervals around a ring. Posture variation was defined as "low" (47 cm ring at 50% reach, waist height) or "high" (85–100 cm ring at 75% reach, knee height). Two trials per condition were performed in randomized sequences.

Camera Systems

Three configurations were evaluated: (1) monocular — MediaPipe (v0.10.10) applied to a Zed 2i camera for 3D measurement; (2) stereo — Stereolabs' proprietary 3D body tracking (SDK v3.8.2) on the Zed 2i; (3) monocular+depth — MediaPipe on Intel RealSense D455 for 2D coordinates with depth. A 10-camera OptiTrack Flex 13 system at 120 Hz provided reference data. Cameras were positioned at 45° angles on dominant and non-dominant sides at 2.4 m distance.

Data Processing

Joint angles were calculated using the cosine rule from key points generated by each system for trunk, shoulder, elbow, and knee. Motion capture data were low-pass filtered (4th-order Butterworth, 6 Hz cutoff) and down-sampled to 30 Hz. Synchronization was achieved by manual labeling of start/end frames refined by minimizing RMSE.

Approach

1Compared multiple state-of-the-art pose estimation algorithms
2Evaluated single-camera vs multi-camera configurations
3Tested in both controlled lab and real workplace settings
4Developed metrics for biomechanical accuracy assessment

Results & Demos

Real-world workplace analysis

Posture tracking results

Findings

Analysis of 3,832 observations revealed significant effects of camera configuration, posture variation, and viewpoint on joint angle measurement accuracy.

Overall Error

Across all observations, the median RMSE was 15.07° (IQR: 9.63°–26.49°). Each main effect (posture variation, camera configuration, view) and two-way interactions were statistically significant (p < 0.01 for all effects).

RMSE by Camera Configuration

The monocular configuration produced the lowest median RMSE, while the monocular+depth configuration showed the highest error. Stereo performed comparably to monocular overall.

Configuration	Median RMSE	IQR
Monocular (MediaPipe)	12.96°	8.72°–19.70°
Stereo (Zed 2i)	13.33°	8.80°–26.17°
Monocular+Depth (RealSense)	21.07°	13.11°–33.48°

Posture Variation Effects

Higher posture variation increased median RMSE from 11.83° (low) to 17.54° (high) across all configurations. The difference was smallest for monocular (~2°), followed by stereo (~7°) and monocular+depth (~12°). Mean-adjusted analysis revealed a three-fold increase in effect size, indicating camera-specific baselines had masked the true impact of movement complexity.

Viewpoint Effects

The dominant-side camera view produced a median RMSE of 13.27° versus 16.67° for the non-dominant view. The non-dominant view increased RMSE by ~4° for monocular and ~9° for monocular+depth, but less than 0.2° for stereo.

Joint-Specific Error

Joint type was the largest source of error variability. After mean adjustment, the hierarchy was: elbow (24.82°) > trunk (9.51°) > shoulder (7.62°) > knee (6.45°). Camera configuration showed a modest decrease in effect size with mean adjustment, suggesting ~11% of apparent performance differences were attributable to systematic biases.

Key Outcomes

Identified optimal camera configurations for workplace deployment
Quantified accuracy limitations of current pose estimation methods
Established guidelines for practical implementation
Published findings in peer-reviewed venues

Discussion

The results provide practical guidance for deploying computer vision-based posture assessment in occupational settings.

Monocular as Default

Despite the intuitive expectation that stereo or depth-augmented systems would outperform monocular approaches, the monocular MediaPipe configuration produced the lowest overall error. This suggests that mature 2D-to-3D lifting algorithms can compensate for the absence of direct depth measurement, at least for the joint angles and movement patterns evaluated in this study.

Camera Placement Matters

The dominant-side oblique view consistently produced lower errors than the non-dominant view across configurations. This finding has direct implications for workplace deployment: positioning the camera on the active side of the worker at a 45° angle provides the best joint angle estimates, particularly for monocular systems where viewpoint sensitivity is highest.

Practical Deployment Considerations

Off-the-shelf computer vision systems can support occupational posture assessment when deployed with appropriate expectations. The median RMSE of 15° is within acceptable ranges for many ergonomic screening applications, though joint-specific limitations (particularly at the elbow) should be considered. Movement complexity significantly affects accuracy, suggesting that dynamic, high-variation tasks may require more robust tracking solutions or multi-camera setups.

View All Research