TY - JOUR
T1 - Automated surgical workflow recognition in privacy-preserving depth videos of the operating room
AU - Gerats, Beerend G.A.
AU - Wolterink, Jelmer M.
AU - Broeders, Ivo A.M.J.
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/9
Y1 - 2025/9
N2 - Background: Efficient operating room (OR) workflows have the potential to reduce delays and cancellations, shorten patient waiting lists, and improve satisfaction among patients and staff. Insights for OR efficiency can be extracted from the registration and timing of workflow steps. However, manual registration of these steps is often unreliable. Therefore, we propose to recognize the OR workflow automatically in videos from overhead depth cameras using deep learning. In contrast to regular cameras, depth cameras do not capture fine video details that permit identification of the people recorded. Hence, the privacy of patients and staff is preserved.Methods: We gathered a video dataset of 21 laparoscopic surgeries captured by three depth cameras positioned in different corners of the OR. The procedures were annotated with four phases describing the OR workflow, i.e., turnover, anesthesia, surgery, and wrap-up. We performed an extensive analysis with spatial and temporal deep learning models, including a comparison between multi- and single-view camera setups, and contrasting post-operative with real-time predictions. Along with standard metrics for workflow recognition, we introduce a new evaluation metric that reflects the error in estimated phase duration.Results: The best-performing model, ASFormer, recognized operative phases with 99.7% mean average precision (mAP), enabling the estimation of phase duration with a mean absolute error of 35 seconds. The best-performing spatial model resulted in 89.7% mAP, indicating the importance of temporal modeling. We also found that the three cameras could be replaced by a single camera, with 98.8% mAP, although performance depends on the camera location in the OR. Additionally, we found that real-time prediction is feasible but underperforms with respect to post-operative analysis (94.3% mAP).Conclusions: Automated OR workflow recognition is possible using existing deep learning techniques based on single- and multi-camera setups. The use of privacy-preserving depth videos and a reasonably low phase duration estimation error could have positive implications for practical use.
AB - Background: Efficient operating room (OR) workflows have the potential to reduce delays and cancellations, shorten patient waiting lists, and improve satisfaction among patients and staff. Insights for OR efficiency can be extracted from the registration and timing of workflow steps. However, manual registration of these steps is often unreliable. Therefore, we propose to recognize the OR workflow automatically in videos from overhead depth cameras using deep learning. In contrast to regular cameras, depth cameras do not capture fine video details that permit identification of the people recorded. Hence, the privacy of patients and staff is preserved.Methods: We gathered a video dataset of 21 laparoscopic surgeries captured by three depth cameras positioned in different corners of the OR. The procedures were annotated with four phases describing the OR workflow, i.e., turnover, anesthesia, surgery, and wrap-up. We performed an extensive analysis with spatial and temporal deep learning models, including a comparison between multi- and single-view camera setups, and contrasting post-operative with real-time predictions. Along with standard metrics for workflow recognition, we introduce a new evaluation metric that reflects the error in estimated phase duration.Results: The best-performing model, ASFormer, recognized operative phases with 99.7% mean average precision (mAP), enabling the estimation of phase duration with a mean absolute error of 35 seconds. The best-performing spatial model resulted in 89.7% mAP, indicating the importance of temporal modeling. We also found that the three cameras could be replaced by a single camera, with 98.8% mAP, although performance depends on the camera location in the OR. Additionally, we found that real-time prediction is feasible but underperforms with respect to post-operative analysis (94.3% mAP).Conclusions: Automated OR workflow recognition is possible using existing deep learning techniques based on single- and multi-camera setups. The use of privacy-preserving depth videos and a reasonably low phase duration estimation error could have positive implications for practical use.
KW - UT-Hybrid-D
KW - Operating room efficiency
KW - Surgical workflow recognition
KW - Depth videos
UR - https://www.scopus.com/pages/publications/105012717173
U2 - 10.1007/s00464-025-12031-6
DO - 10.1007/s00464-025-12031-6
M3 - Article
C2 - 40770511
AN - SCOPUS:105012717173
SN - 0930-2794
VL - 39
SP - 5948
EP - 5956
JO - Surgical endoscopy
JF - Surgical endoscopy
IS - 9
ER -