Orthopedic test values

May 18, 2021
11 min read

Did you know that the Jobe test, or "empty can test", was a poor test for diagnosing an injury to the supraspinatus muscle? Did you know that the Lasègue test was even poorer for verifying the presence of a lumbar radiculopathy?

Although these two tests were taught to us in our initial training, and are still taught in many institutes, they are of very limited use to include the presence or not of a potential musculoskeletal pathology.

The orthopedic tests we use every day are important because they are part of our clinical reasoning, an essential cognitive process that allows us to develop the most appropriate strategy to help our patient. They are also necessary for our communication with other practitioners and with our patients.

In practice, if only for these questions: for me or not for me? Emergency or not emergency? This is perhaps the most important question! This question animates fascinating debates in the musculoskeletal field: once a "serious" pathology (in other words, the red flags) has been ruled out, is it really so important to focus on diagnosing absolutely, or is it not more relevant to prognosticate instead in order to best help the patient in his function and quality of life?[1].

Medical error is the third leading cause of death in the USA[2] ! 2] Among these medical errors, we obviously find the diagnostic error. In 2009 Schiff and his team published[3] a report in which out of 583 reported diagnostic errors, 28% were considered major, 41% moderate and 31% minor. A significant proportion of these errors were due to errors of appreciation by the clinician (32%) during clinical reasoning, as well as during the clinical examination (10%).

In order to improve our clinical precision and reduce our cognitive biases, it is preferable to arm ourselves with tests that have value. Value that has been given thanks to studies of high methodological quality.

Sensitivity and specificity

A test can be excellent at including a condition, but poor at excluding it. Thus it will be relevant if it is positive but will give us very little information if it is negative. And vice versa.

It is therefore necessary to be able to determine the volume of error that the test produces on a sick population (which has previously been diagnosed by a reference), the volume of error that the test produces on a so-called healthy population. In other words, on a sick population, how many false negatives will be obtained, how many sick people will fall through the cracks? This is the sensitivity of the test.

In a non-diseased population, how many false positives will be obtained, how many people will test positive when they are healthy? This is the specificity of the test.

Thanks to this, we can evaluate the risks generated by a test to produce false positives and/or false negatives.

To obtain these values, studies are performed where the test is compared to a gold standard (the means currently determined to be the most reliable for a pathology), a reference standard (defined as less reliable) or reference criteria (approximate) in a healthy or affected population. Unfortunately, although research is crucial, it can be of poor quality. Thus publications may announce results, but when the methodology is scrutinized, weaknesses and biases may be detected. For example, depending on the nature of the comparison. Just the fact of comparing a physical act to an imaging is in itself biased. Or the quality of the reference to which the test is compared. But also the sample size etc...

This is why very different clinimetric values can be found for the same test!

For example, the Apley test, hypothetically used to evaluate meniscal damage to the knee, obtains values equal to 61% of Se and 70% of Sp in one study (Hegedus and colleagues) but 16% of Se and 100% of Sp in another (Pookarnjanamoraka and colleagues).

A score has thus been set up, the QUADAS[4] (Quality Assessment of Diagnostic Accuracy Studies), composed of 14 items and which assesses the methodological quality of a study. The closer the QUADAS is to 14, the better the quality of the study.

Let's break down these figures and concepts by returning to the famous Jobe, which obtains rather low scores[5] :

71% sensitivity: out of 100 people with a supraspinatus lesion, only 71 people test positive. 29 false negatives! If I perform this test on one of my patients with shoulder pain who comes to the practice and the test is negative, there is a 29% risk that it is a false negative!

49% specificity: out of 100 healthy people, 49 will indeed test negative. But then there will be 51 people with a positive test. 51 false positives! This means that if my test is positive in the office, I have a 51% chance of being in front of a healthy person.

So is this test reliable for my clinical reasoning? Not really. It is bad for including but also for excluding this hypothesis in my reasoning.

Let us continue our example with the Lasègue test, much loved by clinicians, with the most "optimistic" clinimetric scores [6] :

92% sensitivity: 8 out of 100 living lumbar radiculopathy will be false negatives, this may seem to be reliable enough to afford to exclude radiculopathy if the test turns out to be negative since there are few false negatives.

28% specificity: out of 100 healthy people, 72 are likely to be positive to the lariat test. A considerable number of false positives. So if my test is positive I have too much risk that it is a false positive.

Taking the sensitivity and specificity data of the Lasègue test, it would seem that it is a good test to exclude lumbar radiculopathy when the test is negative, but if my test is positive it is not relevant and should not influence my clinical reasoning.

It is unfortunately incomplete

It is not enough. Sensitivity and specificity are inseparable and do not give much indication separately since, as we have noted, these values are statistics calculated using a known population. Healthy or affected. This moves away from the reality of the clinic which requires us to be in the unknown when we are faced with a person. There are then values that take into account the overall population, healthy added to that affected: the positive (RV+) and negative (RV-) likelihood ratios[7]. They are calculated from the sensitivity and specificity values. The positive likelihood ratio is a value for the positive test. It is equal to the rate of positive tests in an affected population (sensitivity) over the rate of positive tests in a healthy population (1 - Sp).

Let RV+ = Se/(1-Sp). It is therefore the ratio of true positives (VP) in a sick population to the number of false positives (FP=1-Sp) in a healthy population. The more RV+ is greater than 1, the more it means that the probability of having a false positive is small and therefore that a positive test is a true positive, but this time in a global population.

Let's translate: With this Y test, a well and truly sick individual is [RV+] times more likely to have a positive test than a healthy individual. A positive test is proportionally more reliable the higher the RV+ is.

The negative likelihood rate is the rate of negative tests in a sick population (1-Se) over the rate of negative tests in a healthy population.

Let RV- = (1-Se)/Sp. It is therefore the ratio of the number of false negatives (FN) to the number of true negatives (VN) in a global population. Thus, for a given test, the more RV- is lower than 1, the more it increases the probability that a negative test is a true negative.

Let's translate: With this test Y, a sick individual has [RV-] times the chance of having a negative test. Thus, a negative test is proportionally reliable as long as RV- is small.

Let's go back to our examples. How does the Jobe test fare?

RV+ = 1.39 = low diagnostic contribution

RV- = 0.59 = low diagnostic contribution

For the Lasègue test: 1.28 of RV+ and 0.29 of RV-. A "moderate" contribution this time in case of a negative test once again.

That means that a person reached of a radiculopathy has 1,28 times more chance to generate a positive test compared to a healthy individual. The risk of getting it wrong is great!

Chad Cook and Eric Hegedus in their book Orthopedic Physical Examination Tests: An Evidence-Based approach states that only 4% of the tests have good enough clinimetry to stand alone, and 96% have low diagnostic power or slight utility but not enough to stand alone. This is based on more than 870 physical tests reviewed.

Thus, we note that few tests are reliable. In order to obtain more reliability in our physical examinations, it has been thought that developing clusters of tests could increase their usefulness[8].

A well-known example is the cluster of Cook and colleagues for cervical myelopathy. This cluster includes:

- Being over 45 years of age

- Sensation of instability on walking

- Positive Hoffmann test

- Inverted supinator sign

- Positive Babinski test

If we look at the clinimetric values of the tests (except the first 2 criteria) independently[9] :

- Hoffmann test: Se 44%, Spe 75%, RV+ 1.8, RV- 0.7

- Inverted supinator sign: Se 61%, Spe 78%, RV+ 1.5, RV- 0.8

- Babinski test: Se 33%, Spe 92%, RV+ 4, RV- 0,7

If we refer to our previous comments, the diagnostic contribution of these tests is weak or even null.

However, when added together in a cluster, if the person presents 3 or more positive criteria, the specificity increases to 94% with an RV+ of 30.9, for a QUADAS of 7[10]. Thus, from 3 positive tests, the person who presents to us has a high probability of suffering from cervical myelopathy.

We could also dwell on Mark Laslett's clusters and the actual diagnosis of sacroiliac pain, but in this case I recommend reading an excellent series of articles by Joshua Lavallée on the Kinefact website[11].

Conclusion

For manual therapists who are already receiving in first intention, but also for physiotherapists for whom this should logically be the case in a short time, it is necessary to be able to exclude what is not within their competence. For this purpose, intuition and experience are not enough, on the contrary, they can be real biases! We must arm ourselves with high-value tools in order to put them at the service of our clinical reasoning. In order to help in the development and prioritization of our hypotheses.

We could also approach Fagan's nomogram? Are you up for it or was it already indigestible enough? Well, maybe next time!

But the fact remains that, as we noted above, there are very few really useful, precise tests for diagnosing the pathology of the person who consults us. Clinicians must undoubtedly reflect on and familiarize themselves with the notion of diagnostic discomfort. In the musculoskeletal field, with respect to orthopedic testing, it is highly likely that we will never be able to make a real determination. What we can know is what the person is suffering from! What are the functions that the person feels deprived of? What are the actions, hobbies, activities that the person can no longer perform as comfortably as possible? Isn't "I can't run as much as I used to" more important to "diagnose" than knee osteoarthritis? Knowing moreover the current literature on the subject, but also all the notions of radio-clinical discordance. Isn't it more important to develop our prognostic capacities in order to best accompany the people who call on us, to answer their questions, and our prescriptive capacities in order to give them tools, and/or to know the strategies to follow?

The "C-Spine Rule" (see image [12]) for example is a prescriptive rule which does not allow us to diagnose, but which is highly useful (Se 100%, Sp 42.5% [13]) to decide what to do next: Imaging or not imaging? Ottawa criteria[14] also for ankle sprains. Need for imaging or not? Suspicion of fractures?

What is the real patient demand? Over the course of a day's consultation, when asked "How can I be of service to you, what do you expect from me?" how many people, truly, answer "Find out what I have!"

[1] Studies of quality and impact in clinical diagnosis and decision-making - Eric J. Hegedus doi : 10.1179/106698110X12640740713012 [2] Medical error—the third leading cause of death in the US BMJ 2016; 353 doi: https://doi.org/10.1136/bmj.i2139 (Published 03 May 2016) Cite this as: BMJ 2016;353:i2139 [3] Schiff GD, Hasan O, Kim S, et al. Diagnostic Error in Medicine: Analysis of 583 Physician-Reported Errors. Arch Intern Med. 2009;169(20):1881–1887. doi:10.1001/archinternmed.2009.333

[4] Whiting, P., Rutjes, A.W., Reitsma, J.B. et al. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 3, 25 (2003). https://doi.org/10.1186/1471-2288-3-25

[5] Hermans J, Luime JJ, Meuffels DE, Reijman M, Simel DL, Bierma-Zeinstra SMA. Does This Patient With Shoulder Pain Have Rotator Cuff Disease? The Rational Clinical Examination Systematic Review. JAMA. 2013;310(8):837–847. doi:10.1001/jama.2013.276187

[6] van der Windt DA, Simons E, Riphagen II, Ammendolia C, Verhagen AP, Laslett M, Devillé W, Deyo RA, Bouter LM, de Vet HC, Aertgeerts B. Physical examination for lumbar radiculopathy due to disc herniation in patients with low-back pain. Cochrane Database Syst Rev. 2010 Feb 17;(2):CD007431. doi: 10.1002/14651858.CD007431.pub2. PMID: 20166095.

[7] Delacour H., François N., Servonnet A., Gentile A., Roche B. Les rapports de vraisemblance : un outil de choix pour l’interprétation des test biologiques. Immunoanalyse et biologie spécialisée (2009) 24, 92-99. doi:10.1016/j.immbio.2009.01.002

[8] Hegedus EJ, Cook C, Lewis J, Wright A, Park JY. Combining orthopedic special tests to improve diagnosis of shoulder pathology. Phys Ther Sport. 2015 May;16(2):87-92. doi: 10.1016/j.ptsp.2014.08.001. Epub 2014 Aug 10. PMID: 25178255.

[9] Cook C, Roman M, Stewart KM, Leithe LG, Isaacs R. Reliability and diagnostic accuracy of clinical special tests for myelopathy in patients seen for cervical dysfunction. J Orthop Sports Phys Ther. 2009 Mar;39(3):172-8. doi: 10.2519/jospt.2009.2938. PMID: 19252263. [10] Cook CE, Wilhelm M, Cook AE, Petrosino C, Isaacs R. Clinical tests for screening and diagnosis of cervical spine myelopathy: a systematic review. J Manipulative Physiol Ther. 2011 Oct;34(8):539-46. doi: 10.1016/j.jmpt.2011.08.008. Epub 2011 Sep 6. PMID: 21899892. [11]http://www.kinefact.com/troubles-musculo-squelettiques/examen-sacro-iliaques-1/

[12] http://www.piriforme.fr/sites/default/files/inline-images/canadian-c-spine.png [13] Stiell IG, Wells GA, Vandemheen KL, Clement CM, Lesiuk H, De Maio VJ, Laupacis A, Schull M, McKnight RD, Verbeek R, Brison R, Cass D, Dreyer J, Eisenhauer MA, Greenberg GH, MacPhail I, Morrison L, Reardon M, Worthington J. The Canadian C-spine rule for radiography in alert and stable trauma patients. JAMA. 2001 Oct 17;286(15):1841-8. doi: 10.1001/jama.286.15.1841. PMID: 11597285. [14] Bachmann LM, Kolb E, Koller MT, Steurer J, ter Riet G. Accuracy of Ottawa ankle rules to exclude fractures of the ankle and mid-foot: systematic review. BMJ. 2003 Feb 22;326(7386):417. doi: 10.1136/bmj.326.7386.417. PMID: 12595378; PMCID: PMC149439.

List of references and materials

This blog post does not claim to produce knowledge, its writing is enabled by reading scientific publications, blog posts and other writings.

Studies of quality and impact in clinical diagnosis and decision-making - Eric J. Hegedus

doi : 10.1179/106698110X12640740713012

Medical error—the third leading cause of death in the US BMJ 2016; 353 doi: https://doi.org/10.1136/bmj.i2139 (Published 03 May 2016) Cite this as: BMJ 2016;353:i2139

Schiff GD, Hasan O, Kim S, et al. Diagnostic Error in Medicine: Analysis of 583 Physician-Reported Errors. Arch Intern Med. 2009;169(20):1881–1887. doi:10.1001/archinternmed.2009.333

Whiting, P., Rutjes, A.W., Reitsma, J.B. et al. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 3, 25 (2003). https://doi.org/10.1186/1471-2288-3-25

Hermans J, Luime JJ, Meuffels DE, Reijman M, Simel DL, Bierma-Zeinstra SMA. Does This Patient With Shoulder Pain Have Rotator Cuff Disease? The Rational Clinical Examination Systematic Review. JAMA. 2013;310(8):837–847. doi:10.1001/jama.2013.276187

van der Windt DA, Simons E, Riphagen II, Ammendolia C, Verhagen AP, Laslett M, Devillé W, Deyo RA, Bouter LM, de Vet HC, Aertgeerts B. Physical examination for lumbar radiculopathy due to disc herniation in patients with low-back pain. Cochrane Database Syst Rev. 2010 Feb 17;(2):CD007431. doi: 10.1002/14651858.CD007431.pub2. PMID: 20166095.

Delacour H., François N., Servonnet A., Gentile A., Roche B. Les rapports de vraisemblance : un outil de choix pour l’interprétation des test biologiques. Immunoanalyse

et biologie spécialisée (2009) 24, 92-99. doi:10.1016/j.immbio.2009.01.002

Hegedus EJ, Cook C, Lewis J, Wright A, Park JY. Combining orthopedic special tests to improve diagnosis of shoulder pathology. Phys Ther Sport. 2015 May;16(2):87-92. doi: 10.1016/j.ptsp.2014.08.001. Epub 2014 Aug 10. PMID: 25178255

http://www.piriforme.fr/sites/default/files/inline-images/canadian-c-spine.png

Stiell IG, Wells GA, Vandemheen KL, Clement CM, Lesiuk H, De Maio VJ, Laupacis A, Schull M, McKnight RD, Verbeek R, Brison R, Cass D, Dreyer J, Eisenhauer MA, Greenberg GH, MacPhail I, Morrison L, Reardon M, Worthington J. The Canadian C-spine rule for radiography in alert and stable trauma patients. JAMA. 2001 Oct 17;286(15):1841-8. doi: 10.1001/jama.286.15.1841. PMID: 11597285

Cook C, Roman M, Stewart KM, Leithe LG, Isaacs R. Reliability and diagnostic accuracy of clinical special tests for myelopathy in patients seen for cervical dysfunction. J Orthop Sports Phys Ther. 2009 Mar;39(3):172-8. doi: 10.2519/jospt.2009.2938. PMID: 19252263.

Cook CE, Wilhelm M, Cook AE, Petrosino C, Isaacs R. Clinical tests for screening and diagnosis of cervical spine myelopathy: a systematic review. J Manipulative Physiol Ther. 2011 Oct;34(8):539-46. doi: 10.1016/j.jmpt.2011.08.008. Epub 2011 Sep 6. PMID: 21899892.

Shreffler J, Huecker MR. Diagnostic Testing Accuracy: Sensitivity, Specificity, Predictive Values and Likelihood Ratios. [Updated 2021 Mar 3]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2021 Jan-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK557491/

GUILLAUME CHRISTE, PT, MSc, DAS - Validité d’un test diagnostique : utilité clinique de la sensibilité, spécificité et rapports de vraisemblance

https://www.sfmu.org/upload/70_formation/02_eformation/02_congres/Urgences/urgences2014/donnees/pdf/008.pdf

http://www.piriforme.fr/bdd/orthopedie

http://www.piriforme.fr/stats

https://www.sfmu.org/fr/vie-professionnelle/outils-professionnels/ebm/sesp