Evidence and recommendations for the use of segmental motion testing for patients with LBP: a systematic review

Our take

Are hands-on segmental motion tests of the lumbar spine valid and reliable enough to guide clinical decisions in patients with low back pain?

The evidence on lumbar segmental motion tests is poor overall, and no single test can be strongly recommended in isolation. When specificity is generally high, sensitivity is too low to rule out pathology, and agreement between examiners is mostly below clinically acceptable thresholds.

ChallengesRead paper

Systematic review13 TrialsLimited evidence

Key points

13 studies covered three test types: PAIVMs, PPIVMs, and the prone instability test (PIT)
Specificity was generally high for PAIVMs and PPIVMs, but sensitivity was consistently poor, making them weak for ruling out lumbar instability
Inter-rater reliability for mobility testing was overwhelmingly poor (most kappa values below 0.4) across both PAIVM and PPIVM studies
Pain provocation as a test outcome showed better reliability than mobility judgement in several studies
The PIT showed the most consistent reliability, with four of six studies exceeding the clinical relevance threshold (kappa 0.54 to 0.87)

How it was conducted

Design: Systematic review (PRISMA-DTA)
Databases: PubMed, LIVIVO, and Cochrane Library (searched September 2019)
Studies included: 13 primary studies
Tests evaluated: PAIVMs, PPIVMs, and the prone instability test (PIT)
Quality appraisal: QUADAS-2 for diagnostic accuracy studies; adapted QAREL for reliability studies
Meta-analysis: Not conducted due to clinical and statistical heterogeneity across studies

What they found

PPIVM specificity for detecting lumbar instability was 0.99-1.00; sensitivity was extremely poor at 0.03-0.07; positive likelihood ratios ranged from 4.82 to 26.80 and negative likelihood ratios from 0.93 to 0.98 (Abbott et al. 2005)
PAIVM specificity for detecting lumbar instability ranged from 0.81 to 0.95; sensitivity ranged from 0.17 to 0.46; positive likelihood ratios were predominantly moderate (2.42-9.00); negative likelihood ratios were poor (0.60-0.88) (Abbott et al. 2005; Fritz et al. 2005)
Combined PAIVMs and PPIVMs for detecting painful segments: sensitivity 0.94, specificity 1.00, yielding an excellent positive and near-zero negative likelihood ratio when verbal pain response was permitted; mobility judgement alone gave sensitivity 0.53, specificity 0.80 (Phillips and Twomey 1996)
PAIVM inter-rater reliability for mobility was overwhelmingly poor: kappa values ranged from -0.02 to 0.48 across 10 studies; exception was Landel et al. (kappa 0.71) where examiners only had to agree on the least mobile segment
PPIVM inter-rater reliability ranged from kappa -0.11 to 0.32; intra-rater reliability for flexion testing was kappa 0.31 (Qvistgaard et al. 2007)
PIT inter-rater reliability ranged from kappa 0.27 to 0.87 across six studies; four of six studies exceeded the clinical relevance threshold; studies with current-complaint LBP patients achieved kappa 0.67-0.87, while chronic or recurrent LBP populations yielded kappa 0.27-0.71
PAIVMs for detecting painful segments showed inter-rater kappa ranging from -0.14 to 0.69 across studies; ICC values for pain intensity at individual lumbar levels ranged from 0.61 to 0.69 (Maher and Adams 1994)

Limitations

Abstract screening was conducted by only one reviewer, increasing the risk of missed studies
Electronic search was restricted to three databases and four languages, potentially excluding relevant literature
The adapted QAREL domains modelled on QUADAS-2 lack validated evidence for their structure, which may affect risk-of-bias ratings
High proportion of included reliability studies (7 of 12) received 'unclear risk of bias' for rater blinding because studies did not report whether examiners were blinded to clinical information

Why it matters

For patients: Patients should be aware that a clinician's hands-on finding of a stiff or unstable spinal level may not reliably reflect what is actually happening in their spine, and treatment decisions based solely on these tests should be interpreted cautiously.
For clinicians: Clinicians should avoid making diagnostic or management decisions based on a single segmental motion test in isolation, and should instead integrate findings with other clinical information or use test batteries, particularly the PIT which showed the most consistent inter-rater agreement.
For readers: This review highlights a persistent gap between widespread clinical use of lumbar segmental motion tests and the weak psychometric evidence supporting them, calling for better-designed studies and standardised testing protocols.

Source

doi:10.1016/j.msksp.2019.102076

Read the original paper

Clinically assessing this area? See the lumbar spine & low back special tests.