JOURNAL OF NEUROSURGERY: SPINE | 2024
Development and validation of an artificial intelligence model to accurately predict spinopelvic parameters
Edward S. Harake1, Joseph R. Linzey1, Cheng Jiang1, Rushikesh S. Joshi1, Mark M. Zaki1, Jaes C. Jones1, Siri S. Khalsa2, John H. Lee1, Zachary Wilseck1, Jacob R. Joseph1, Todd C. Hollon1, and Paul Park3
1University of Michigan 2Ohio State University 3University of Tennessee
Abstract
Objective. Achieving appropriate spinopelvic alignment has been shown to be associated with improved clinical symptoms. However, measurement of spinopelvic radiographic parameters is time-intensive and interobserver reliability is a concern. Automated measurement tools have the promise of rapid and consistent measurements, but existing tools are still limited by some degree of manual user-entry requirements. This study presents a novel artificial intelligence (AI) tool called SpinePose that automatically predicts spinopelvic parameters with high accuracy without the need for manual entry.
Methods. SpinePose was trained and validated on 761 sagittal whole-spine X-rays to predict sagittal vertical axis (SVA), pelvic tilt (PT), pelvic incidence (PI), sacral slope (SS), lumbar lordosis (LL), T1-pelvic angle (T1PA), and L1-pelvic angle (L1PA). A separate test set of 40 X-rays was labeled by 4 reviewers, including fellowship-trained spine surgeons and a fellowship-trained radiologist with neuroradiology subspecialty certification. Median errors relative to the most senior reviewer were calculated to determine model accuracy on test images. Intraclass correlation coefficients (ICC) were used to assess inter-rater reliability.
Results. SpinePose exhibited the following median (interquartile range) parameter errors: SVA: 2.2 mm (2.3 mm), p = 0.93; PT: 1.3° (1.2°), p = 0.48; SS: 1.7° (2.2°), p = 0.64; PI: 2.2° (2.1°), p = 0.24; LL: 2.6° (4.0°), p = 0.89; T1PA: 1.1° (0.9°), p = 0.42; and L1PA: 1.4° (1.6°), p = 0.49. Model predictions also exhibited excellent reliability at all parameters (ICC: 0.91 - 1.0).
Conclusions. SpinePose accurately predicted spinopelvic parameters with excellent reliability comparable to fellowship-trained spine surgeons and neuroradiologists. Utilization of predictive AI tools in spinal imaging can substantially aid in patient selection and surgical planning.
SpinePose overview
SpinePose training pipeline. (A) Standing whole-spine X-rays taken at a single academic institution were searched via an intra-institutional free-text search tool (EMERSE) and subsequently processed at the University of Michigan Radiology IT department. (B) Following image pre-processing, a senior Neurosurgery resident annotated each image with 9 total spinal keypoints at levels C7, T1, L1, S1, and both femoral heads. Bounding boxes were placed around regions L1 and S1 since their corresponding keypoints would be fed through their own convolutional neural network (CNN). (C) Each input image was fed through 3 parallel CNNs: L1-model, S1-model, and R model. The L1 and S1 models utilized a “top-down” region-based approach, whereas the R model used a “bottom-up” approach. Each of the 3 models output a set of predicted keypoint masks whose coordinates were compared with those of the ground truth keypoints to generate a keypoint loss (LKeypoint). In addition to the LKeypoint, the L1 and S1 models also predicted bounding boxes around the respective spinal levels with a loss value for the box coordinates (LBox) and classification (LClass). A reduction in loss values at each iteration of training corresponded to more accurate model predictions. (D) After training and model optimization, the respective outputs of each of the 3 models were combined into 1 aggregate output, and spinopelvic parameters of interest were automatically calculated.
Visualizing predictions
Images A and B show a whole-spine and lumbosacral X-ray, respectively, without instrumentation. Images C and D show the same modalities with spinal instrumentation.
ICC heatmap
An Intraclass correlation coefficient (ICC) was calculated at each parameter between 2 separate raters. On a scale of 0-1, the ICC reflects inter-rater similarity among scores within a given class. SpinePose (AI) shows excellent reliability at all parameters when compared to ground truth (GT) as well as to each of the 3 remaining raters
Bibtex
Coming soon!