Three Functional Capacity Evaluation Myths That Have Cost Your Company A Lot of Money

Three Functional Capacity Evaluation Myths That Have Cost Your Company A Lot of Money

Darrell Schapmire

Aug 6, 2010

Three Functional Capacity Evaluation Myths That Have Cost Your Company A Lot of Money

There are three popular myths in the field of functional assessment.  The three types of
testing related to these myths constitute the majority of the opportunities during an FCE to
collect data which can actually be assessed objectively for validity of effort.  The three most pervasive myths in this field are:

  1. Validity of effort during hand strength assessments can be objectively assessed by using the coefficient of variation (COV), Rapid Exchange Grip (REG) testing and the so-called “Bell Curve Analysis.”
  2. Dynamic (real life) lifting capacities can be determined by isometric testing.  The same technology is capable of objectively classifying validity of effort.
  3. During a lifting task, a “visual estimation of effort” is an accurate method of classifying relative levels of exertion, i.e. “light” or “moderate” or “heavy” and in determining if the person being tested is giving a good effort.

Like all myths, for these ideas to be accepted and commonly used in FCE testing
protocols, they seem to have at least a plausible element of truth.  But the fact of the matter is simply this:  they are myths (legends and tall tales) that have passed for “science” since the first days of the field known as industrial rehabilitation.

Hand Strength Mythology

The three methods used to classify validity of effort during hand strength testing have an error rate of at least 30%.  Most of the error occurs in the failure to detect feigned weakness.  At least 25 published studies and reviews that have appeared in print in the past 21 years have called the Unholy Trinity---the COV, REG and the Bell Curve---into question.  Nevertheless, these methods remain the most widely used in the classification of validity of effort.

The COV (also CV) has traditionally been used as a measure of consistency of performance.  Mathematically, the COV is the standard deviation divided by the mean and expressed as a percentage.  It is, literally, a measure of inter-data set variability.  The “traditional” cutoff point for separating good effort from poor effort has been 15%---a cutoff that has never been validated in a controlled study.

The assumption in using the COV as an index of effort is that inter-data set variability above a given cutoff is indicative of a invalid effort and variability less than the cutoff indicates a valid effort.  This is fallacious reasoning, simply because some individuals are adept at controlling force output and consistently producing measurements that are less than a maximum effort.  In other words, while it is certainly true that a “high” COV is indicative of an inconsistent performance and is highly likely to be related to poor effort, a low COV does not indicate that a valid effort has been offered.   It should also be pointed out that some completely cooperative persons may have occasional COVs which exceed the mythical.   
Rapid Exchange Grip (REG) testing is also based on a valid physiological principle.  Maximum voluntary grip strength requires 1.0 – 1.5 seconds to generate.  Therefore, a grip of much shorter duration that substantially exceeds a grip of longer duration indicates that the longer duration grip is likely to be a submaximal effort.  There are significant challenges in standardizing the rate at which the dynamometer is exchanged from one hand to the other.  There are no standards for how many exchanges between the hands are required for proper test administration.  Absent the use of expensive (and rarely purchased) computerized systems to standardize REG test administration, standardization is not possible---and this fact should be just one of the lines of attack an attorney should explore when taking legal testimony.  Most disturbing from the standpoint of interpreting REG data is this singular fact:  No controlled study has ever identified what distinguishes between an REG that is "too high" when compared to a phasic grip and on that is acceptable.

In addition to the methodological problems with REG testing, a significant flaw in the concept arises, not from the basic physiological premise, but from the amateurish and illogical conclusion that is frequently made on the basis of this testing:  “Since REG forces should be less than phasic grip forces, and since this subject’s REG force is less than his phasic grip force, the subject gave a maximum effort.”  This line of thinking is similar to:  “All carrots are orange.  Jack-o-lanterns are orange.  Therefore, Jack-o-lanterns are carrots.”  The “false negative,” the failure to detect feigned weakness, is frequently the result of this illogical conclusion.

The “Bell Curve Analysis” is also based on a valid physiological principle.  The tenet in this case is the fact that maximum strength is produced in the mid-range of motion.  Therefore, it was hypothesized by Stokes (Stokes HM. The Seriously Uninjured Hand—Weakness of Grip. J Occup Med. 1983;25(9):683-4), when grip forces produced at each of the five positions on the Jamar Dynamometer, more force would be produced in Position 2 or Position 3 and less force would be produced in Positions 1, 4 and 5.  Thus, when a line graph is produced which displays the force output for each of the five positions, a “Bell Curve” would be produced.

Consider the graphs below.  They are identical graphs.  Only the sizes, scales of the axes and the amount of space above and below the line have been altered.

Although the Stokes study is one of the most widely-cited “supportive” references in FCE bibliographies, few people realize that the study had only two subjects.  Furthermore:

  1. The size of the graphs is not standardized.
  2. The relative scaling of the x and y axes on such graphs are not standardized.
  3. There is no standardized approach with regard to how much space should occur above and below a line in a line graph.
  4. It is impossible to standardize impressions.

Isometric Testing

Isometric strength has been promoted as a way to predict dynamic (real life) lifting capacities and to classify validity of effort.  Intuitively, one would think that isometric strength might predict dynamic lifting function.

The purveyors of isometric strength testing have capitalized on the market’s misunderstanding of basic statistics.  The early isometric research found “statistically significant” relationships between isometric strength and dynamic lifting capacities.  However, a correlation is nothing more than an expression of the degree of linearity between two variables.

Consider the graph below.  It depicts a “perfect” correlation.  All of the data points fall on the regression line.  Therefore, it is possible to predict x (horizontal axis) if y (vertical axis) is known, and it is possible to predict y on the basis of knowing x.  The dark horizontal and vertical lines illustrate how these predictions are made.



If correlations are less than 1.0, data points fall on either side of the regression line.  As the r (correlation) values decrease, the spread between the data points widens.  Consider the graphs below.

In order to predict x from y or y from x in “less than perfect” correlations, the statistic which predicts that range is called the standard error of estimate (See).  The See  is similar to the standard deviation (SD) for a distribution of scores.  Ninety-five percent (95%) of the population falls within two SDs of the mean.

Likewise for the See,95% of the data points fall within two See on either side of the regression line.  This is illustrated in the graphs below.  In these graphs, the range for y, based on the value of x (at the intersection of the black line and the regression line), is the range of scores between the bottommost and topmost black lines.


In a study (130,000 subjects) due for publication at the end of 2010, it has been shown that predictions of this kind are simply not possible to make with any degree of scientific certainty.  Such predictions may result in these potential outcomes:

  1. 1.    If used to make hiring decisions, there will be a disparate impact on female job applicants because males have higher isometric strengths than females.
  2. 2.    If used for hiring purposes, or for the purpose of predicting the dynamic lifting capacity of an injured worker who is returning to work, it is possible to place persons in jobs that are too physically demanding.
  3. 3.    In many cases, indemnity for returning workers will be grossly miscalculated---with errors in both directions, i.e. over- and under-compensation for an injury.

Isometric strength testing has also been promoted as a method to classify validity of effort.  The isometric lifts most often used for this purported purpose are the Static Leg Lift and Static Arm Lift.  Although these isometric tests have been used to classify validity of effort for more than 30 years, until recently there had not been a single controlled study to verify the accuracy of such testing.  In a controlled study which will be published at the end of the year (2010), shows definitively that testing of this kind does not, in fact, accurately classify validity of effort.  In fact, the study demonstrates with 95% certainty that 40% - 80% of the people who are tested with these static lifts can successfully feign weakness by producing coefficients of variation that are less than 15%.

Visual Estimation of Effort During a Lifting Task

The last, and perhaps the most damaging, myth is related to the “visual estimation of effort” during a lifting task.  The basic concept behind this testing approach is simply this:  It is possible to visually estimate the relative level of exertion and to assess cooperation simply by making a series of visual observations.  These observations are codified in various testing protocols as “operational definitions.”  In essence, they are instructions that tell the observer “what to look for” during the lifting event.

On its face, methodology of this kind is not---and could never be---objective.  That singular fact has not prevented the successful marketing of various testing protocols to clinical entities that perform FCEs.  Virtually all commercially-available FCE testing protocols that do not use isometric testing as an index of effort use the visual estimation of effort approach to classifying the results of an FCE.  There is no essential difference between any of these testing protocols.  They are the same methodology, repackaged with a different brand name and different operational definitions that are only superficially “unique.”

There are other commercially-available FCE protocols using the visual estimation of effort approach.  But the ones on this list comprise the largest share of the FCE market, “homegrown” protocols excepted.  In fact, there are untold hundreds, if not thousands, of clinics using a homegrown combination of testing methodology.  The “supportive” literature for the visual estimation of effort has consisted of approximately 25 correlation studies which focused on:

  1. Inter-tester reliability, i.e. does Tom agree with Sally.
  2. Intra-rater reliability, i.e. does Tom agree with himself.
  3. Inter-protocol reliability, i.e. does one protocol produce the same results as another.
  4. Test-retest reliability, i.e. does a protocol consistently produce the same results.

Oddly enough, the subject of “accuracy” was given very little attention.  Essentially, the same four concepts have been investigated repeatedly by different researchers.  In 2005, though, Reneman et al. conducted another inter-rater reliability study---and also reported the accuracy of the observers in identifying “maximum lifting” capacities.  These words are from the Abstract of the study:  “’Maximal’ performances were correctly rated in 46% to 53% (healthy subjects) and in 5% to 7% (patients with chronic nonspecific low back pain) of the cases.”  Source: M.F. Reneman et al.  Testing lifting capacity: validity of determining effort level by means of observation.  Spine. 13 (2005), pp. E40-6.

In a study that will be published in early 2011, it will be reported that visual estimations of effort are accurate---at a rate that is marginally higher than chance.  Perhaps most damning, it will also be reported that trained and experienced therapists are no more accurate than lay subjects.

Viable alternatives to “standard” assessments of validity of effort have been developed.  The author can provide additional information upon request by phone or email.

About The Author

Darrell Schapmire has a graduate degree in exercise physiology.  Prior to making a mid-life career change at the age of 40, he worked in a variety of fields.  Mr. Schapmire has worked as a construction laborer, laying sidewalks and repairing and constructing agricultural structures. He was also a journeyman in the Signal Department for the Illinois Central Railroad.  As a signalman, he worked on projects that involved the construction, maintenance and repair of wayside signals and crossing protection.  These experiences gave him firsthand knowledge regarding work practices, the use of tools and machinery and the dangers inherent in the work that is performed in “the real world.”  Mr. Schapmire also worked in maximum security prisons in Illinois for more than seven years.  This experience gave him considerable insight into human behavior, specifically manipulative and/or deceptive behavior.  In 1987, Mr. Schapmire returned to university studies on a part time basis, taking classes to prepare for a career change.  He then qualified for entry in 1989 in the graduate program for exercise physiology at Illinois Benedictine College (now Benedictine University) in Lisle, Illinois.  Since graduating in 1991, Mr. Schapmire has worked in the field of industrial rehabilitation as an employee, as the director and partner in a clinic and as the owner of X-RTS, a product development company.  Mr. Schapmire has organized a series of studies which have resulted in a total of six articles published in peer-reviewed journals.

Published Studies

Schapmire D, St James JD, Townsend R, Stewart T, Delheimer S, Focht D. Simultaneous bilateral testing: validation of a new protocol to detect insincere effort during grip and pinch strength testing. J Hand Ther. 2002 Jul-Sep;15(3):242-50.

Schapmire D, St James JD, Townsend R, Feeler L. Accuracy of Visual Estimation of Effort During a Lifting Task.  Accepted for publication in Work, A Journal of Prevention, Assessment and Rehabilitation. 
Schapmire D, St James JD, Feeler L, Kleinkort J. Simultaneous Bilateral Hand Strength Testing in a Patient Population, Part I:  Diagnostic, Observational and Subjective Complaint Correlates to Consistency of Effort.  Accepted for publication in Work, A Journal of Prevention, Assessment and Rehabilitation.

St James JD, Schapmire D, Feeler L, Kleinkort J. Simultaneous Bilateral Hand Strength Testing in a Patient Population, Part II: Relationship to a Distraction-Based Lifting Evaluation. Accepted for publication in Work, A Journal of Prevention, Assessment and Rehabilitation.

Feeler L, St. James JD, Schapmire D.  Isometric Strength Assessment Part I: Static Testing  Does Not Accurately Predict Dynamic Lifting Capacity.  Accepted for publication in Work, A Journal of Prevention, Assessment and Rehabilitation.

Townsend R, Schapmire D, St. James JD, Feeler L.  Isometric Strength Assessment Part II: Static Testing Does Not Accurately Classify Validity of Effort.  Accepted for publication in Work, A Journal of Prevention, Assessment and Rehabilitation.