Combining clinical exams can better predict lumbar spine radiographic instability

Our take

Can clinical physical examination tests accurately diagnose lumbar spine instability in patients with chronic low back pain?

None of the five clinical instability tests studied had adequate accuracy to rule in or rule out lumbar radiographic instability on their own. A combined model using body weight, lumbar lordosis, and the prone segmental instability test offered only a modest improvement over individual tests.

ChallengesRead paper

Primary study202 ParticipantsModerate evidence

Key points

All five clinical tests (aberrant movement, passive lumbar extension, prone segmental instability, H and I, and pheasant) showed very small likelihood ratios, meaning no single test reliably confirms or excludes instability.
The H and I test had the highest specificity (82.2%) but very low sensitivity (22.8%), making it a poor standalone diagnostic tool.
A three-variable model combining weight, lumbar lordosis, and the prone segmental instability test improved diagnostic odds ratio from near 1 to 3.77, but accuracy remained only moderate (0.66).
Higher lumbar lordosis was significantly associated with instability; greater body weight was inversely associated, possibly reflecting better muscular stabilisation.
The study found a radiographic lumbar instability prevalence of 62.87% in this general chronic low back pain outpatient sample.

How it was conducted

Design: Prospective diagnostic cross-sectional study following STARD guidelines
Participants: 202 adults with chronic low back pain (more than 3 months) recruited from an outpatient orthopaedic centre and physiotherapy clinic, November 2019 to August 2020
Index tests: Five clinical tests: aberrant movement, passive lumbar extension (PLE), prone segmental instability (PSI), H and I instability test, and pheasant test
Reference standard: Flexion-extension standing radiography; instability defined as translation >15% vertebral body and/or rotation >15 degrees at L1-L4, >20 degrees at L4-L5, >25 degrees at L5-S1
Model development: Multiple logistic regression with forward selection; internal validation by bootstrapping
Primary outcome: Sensitivity, specificity, positive and negative likelihood ratios, diagnostic odds ratio, and accuracy of each test and the combined model

What they found

Aberrant movement test: sensitivity 51.2% (95% CI 42.2%-60.1%), specificity 40.5% (95% CI 29.3%-52.6%), LR+ 0.86 (95% CI 0.66-1.12), DOR 0.71 (95% CI 0.40-1.28), accuracy 0.46 (95% CI 0.39-0.53).
Passive lumbar extension test: sensitivity 29.1% (95% CI 21.4%-37.9%), specificity 50.7% (95% CI 38.9%-62.4%), LR+ 0.59 (95% CI 0.41-0.85), DOR 0.42 (95% CI 0.23-0.76), accuracy 0.40 (95% CI 0.33-0.47).
Prone segmental instability test: sensitivity 52.8% (95% CI 43.7%-61.8%), specificity 52.1% (95% CI 39.9%-64.1%), LR+ 1.10 (95% CI 0.82-1.47), DOR 1.22 (95% CI 0.68-2.18), accuracy 0.52 (95% CI 0.45-0.60).
H and I test: sensitivity 22.8% (95% CI 15.9%-31.1%), specificity 82.2% (95% CI 71.5%-90.2%), LR+ 1.28 (95% CI 0.72-2.29), DOR 1.37 (95% CI 0.66-2.83), accuracy 0.53 (95% CI 0.47-0.58).
Pheasant test: sensitivity 52.4% (95% CI 43.3%-61.3%), specificity 49.3% (95% CI 37.4%-61.3%), LR+ 1.03 (95% CI 0.78-1.37), DOR 1.07 (95% CI 0.60-1.91), accuracy 0.51 (95% CI 0.44-0.58).
Final three-variable model (weight + lumbar lordosis + PSI): sensitivity 61.3% (95% CI 52.6%-70.0%), specificity 70.4% (95% CI 60.9%-80.0%), PPV 78.4% (95% CI 71.6%-85.1%), NPV 51.0% (95% CI 42.5%-59.6%), LR+ 2.07 (95% CI 1.08-3.06), LR- 0.55 (95% CI 0.40-0.69), DOR 3.78 (95% CI 1.19-6.34), accuracy 0.66 (95% CI 0.59-0.73), AUC 0.66 (95% CI 0.58-0.74).
Lumbar lordosis was significantly associated with instability (p=0.00); body weight was inversely associated (p=0.03).

Limitations

Inter-rater and test-retest reliability of the clinical tests were not assessed, so measurement error cannot be estimated.
Lumbar lordosis was measured from radiography, which adds radiation exposure; the authors note a flexible-ruler method could substitute but was not validated here.
The predictive model was built and tested on the same sample; bootstrapping provides only internal validation, so external generalisability is unknown.
The study was conducted at a single outpatient centre in Iran, and results may not generalise to other clinical settings or populations.

Why it matters

For patients: Patients with chronic low back pain cannot rely on a standard physical examination alone to confirm or exclude lumbar instability, and referral for functional X-ray may still be needed for a definitive answer.
For clinicians: None of the five commonly used lumbar instability tests provides meaningful diagnostic value in isolation; a model incorporating lumbar lordosis angle and body weight alongside the prone segmental instability test offers a small incremental gain but is not yet sufficient for confident clinical decision-making.
For readers: This study challenges the assumed utility of widely taught lumbar instability clinical tests and highlights the need for better diagnostic criteria before these tests can be recommended in evidence-based practice guidelines.

Source

doi:10.1016/j.msksp.2022.102504

Read the original paper

Clinically assessing this area? See the lumbar spine & low back special tests.