Statistical sampling and extrapolation are methods used by the Office of Inspector General (OIG) and Medicare Administrative Contractors (MACs) to evaluate compliance and identify potential overpayments under the False Claims Act (FCA).
This article outlines how the government utilizes statistical sampling and extrapolation in healthcare enforcement, how courts approach the use of these methods, and what healthcare professionals should know to ensure compliance.
1. General FCA Sampling and Extrapolation Case Law
Acceptance for Damages vs. Liability: Courts have long accepted statistical sampling to calculate damages in FCA cases, especially where liability is already established. For example, the Seventh Circuit in United States v. Rogan upheld using sampling to determine damages, rejecting the notion that a judge must “address each of the 1,812 claim forms” individually – “statistical analysis should suffice”. However, whether sampling can prove liability (i.e. falsity of claims) is more contentious. No federal appellate court has categorically barred sampling, but approaches vary by case and circuit.
Supreme Court Guidance: In Tyson Foods, Inc. v. Bouaphakeo (2016) – a non-FCA case – the Supreme Court refused to create any blanket ban on “trial by formula.” It held that representative samples are permissible to establish liability if the sample is reliable, noting that such evidence is treated like any other, and its admissibility “turns on the degree to which the evidence is reliable in proving or disproving the elements of the cause of action”. The Court emphasized it would not categorically exclude statistical proof; instead, the “fairness and utility” of sampling depend on case-specific facts.
Fourth Circuit – Agape Senior: A highly anticipated Fourth Circuit case, United States ex rel. Michaels v. Agape Senior Community, Inc. (2017), ultimately “took a pass” on the issue. The district court had refused to allow sampling to prove falsity in a massive hospice FCA case (requiring claim-by-claim proof). Relators sought an interlocutory appeal. The Fourth Circuit declined to issue a substantive ruling, dismissing the appeal as improvidently granted, effectively leaving in place the fact-specific approach and not endorsing any per se rule against sampling. The Fourth Circuit panel hinted that sampling is an evidentiary matter, not a pure question of law, meaning its use should be determined case-by-case (and thus not immediately appealable).
Circuits Allowing Sampling Evidence: In several cases, appellate courts have signaled openness to sampling given a sound methodology. The Fifth Circuit in United States v. Hodge, for example, affirmed a ~$300 million verdict where FHA loan fraud was proven in part by expert sampling of loan files. The court found the sample evidence sufficiently linked false statements to loan defaults, noting that “connecting false statements and defaults with specific loans is not feasible in a case that relies on sampling and extrapolation”. The Fifth Circuit also rejected the defendant’s Daubert challenge to the reliability of the sampling because the defendant had agreed to the sampling plan during discovery and raised no timely objection. More recently, the Eighth Circuit in United States v. Zorn (2024) upheld the use of sampling in an FCA bench trial. There, the relator’s expert extrapolated from a random sample of 31 patient files to prove overbilling. The Eighth Circuit found no abuse of discretion in admitting the expert, emphasizing that in a bench trial the judge as fact-finder can handle technical evidence with more flexibility. The court acknowledged it “left open” whether sampling would likewise be allowed in a jury trial, but it approved the practice for the bench trial at hand. Notably, the Eighth Circuit echoed earlier courts’ view that FCA remedies tolerate a degree of imprecision – “the Government is entitled to rough remedial justice… according to somewhat imprecise formulas” when full precision is impractical (quoting an earlier FCA case). Similarly, the Eleventh Circuit’s published decision in United States ex rel. Ruckh v. Salus Rehab. (2020) reinstated a large jury verdict for the relator in a nursing home upcoding case, effectively allowing sampling-based evidence to stand. The Eleventh Circuit did not explicitly opine on the sampling’s admissibility because the defense had abandoned that issue on appeal, but it reversed the trial judge’s post-verdict nullification of the jury’s finding. In short, the verdict – which was based on expert reviews of a sample of patient records extrapolated to thousands of claims – was upheld, signaling that reliable sampling evidence can support FCA liability findings if not properly challenged.
District Court Divergence: Because relatively few circuits have squarely ruled on sampling to prove liability, significant guidance comes from notable district court opinions:
- U.S. ex rel. Martin v. Life Care Centers of America (E.D. Tenn. 2014) – a landmark case endorsing sampling. The court faced an FCA suit involving over 54,000 Medicare claims for unnecessary therapy services at 82 facilities. Life Care moved to exclude the government’s statistical sample (a few hundred claims) as proof of falsity. The court denied the motion, reasoning that given the “large number of claims”, a claim-by-claim trial was impracticable in a complex FCA action. It held that statistical sampling could be used to establish liability on all claims, so long as the sample was representative and scientifically valid. Life Care was the first case to squarely permit sampling for FCA liability, not just damages, and it heavily influenced subsequent courts and settlements. (Life Care sought interlocutory Sixth Circuit review, but the case settled in 2016 before a circuit ruling.)
- U.S. ex rel. Guardiola v. Renown Health (D. Nev. 2015) – allowed a relator to use statistical sampling in discovery to identify false claims, indicating a willingness to consider extrapolation at trial (this was a discovery ruling, not a final judgment).
- U.S. ex rel. Ruckh v. Genoa Healthcare (M.D. Fla. 2015) – allowed relator’s late-proposed statistical expert to proceed, expressing an “inclination to allow” sampling to prove FCA violations in a sprawling nursing home case. (This is the same case later appealed as Ruckh v. Salus Rehab. in the Eleventh Circuit.)
- U.S. v. Vista Hospice Care, Inc. (N.D. Tex. 2016) – a leading case rejecting sampling under certain facts. The relator’s experts reviewed 291 hospice patient files and extrapolated that ~12,000 claims were false (patients allegedly ineligible for hospice). The court granted summary judgment for the defense, holding that statistical sampling could not establish falsity where medical judgment and patient-specific factors determined hospice eligibility. It noted “no circuit” had decided if sampling can prove liability when falsity hinges on individual physicians’ clinical judgment. Citing the Supreme Court’s Wal-Mart v. Dukes admonition against replacing individualized proof with a formula, the court found that the “diversity among claims” (different patients, diagnoses, time periods, etc.) made extrapolation too unreliable for falsity in that case. Each hospice certification was so fact-specific and subjective that only a claim-by-claim inquiry could determine truth or falsity. Vista Hospice has since been a touchstone for defendants, especially in cases about medical necessity.
- U.S. ex rel. Paradies v. AseraCare, Inc. (N.D. Ala. 2015), aff’d on other grounds 938 F.3d 1278 (11th Cir. 2019) – another hospice case underscoring the limits of extrapolation when falsity is subjective. The court bifurcated trial into falsity and scienter phases, requiring the government first to prove the sampled claims were objectively false. A jury initially found many claims false (based on expert clinical reviews of a sample of hospice patients), but the judge later ruled that a mere difference of clinical opinion on hospice eligibility cannot prove FCA falsity as a matter of law. The Eleventh Circuit agreed that “clinical judgment” disagreements alone are insufficient – some objective falsehood is needed (e.g. facts misrepresented or ignored) to deem a claim false. AseraCare thus illustrates that even if sampling is allowed, the nature of the claim matters: where proof of falsity requires a case-by-case clinical judgment, courts are reluctant to let statistics substitute for direct evidence on each claim.
Takeaway: In general, courts permit statistical sampling and extrapolation in FCA cases for calculating damages (overpayment amounts) fairly routinely. Many courts also allow sampling to prove liability (falsity/causation), especially when requiring proof of each claim would be effectively impossible (e.g. tens of thousands of claims) and the sampling methodology is robust. As one court noted, statistical evidence is just “evidence” – if it’s reliable, it can be used. No appellate court has banned it per se, and the Supreme Court and multiple circuits have signaled that outright “trial by formula” objections fail if the sample is reliable and the case’s nature doesn’t demand individual proof. However, courts will closely scrutinize reliability (under Daubert and Rule 702) and fit – if a sample is poorly drawn or if claims turn on highly individualized facts (subjective medical necessity, etc.), a court may exclude or limit sampling to avoid unfairness. The use of sampling is ultimately a case-specific inquiry: judges assess whether the sample size and method are statistically valid and whether the inference from sample to universe is reasonable given the nature of the fraud alleged.
2. Urine Drug Testing (UDT)- Specific FCA Cases
FCA cases involving urine drug testing often center on allegations of medically unnecessary testing, “bundled” or upcoded billing, and sometimes kickback schemes involving laboratories. UDT can generate huge volumes of claims (e.g. routine screens on every patient visit, confirmatory tests on every sample), making them candidates for sampling in both audits and litigation. While relatively few appellate decisions focus specifically on UDT, several significant enforcement actions and lower court opinions provide guidance:
- Wagoner (N.D. Indiana 2024): In United States v. Wagoner, the Government sued a pain management physician under the FCA for submitting 5,217 false claims to Medicaid for urine drug screens. The scheme involved billing a simple immunoassay test under an incorrect CPT code (80101) – which was intended for single-class tests – instead of the proper code (80104) for multi-drug panel tests, resulting in higher reimbursements. It was also alleged many of these tests were not medically necessary (e.g. administered at excessive frequency). On summary judgment, Judge Springmann held there was sufficient evidence that using CPT 80101 was “false” coding (since the AMA code book unambiguously required using 80104 for the multiplex test kits) and that the defendants knew or should have known. She therefore denied summary judgment on the FCA false-claim counts related to the improper coding and unnecessary tests. The court emphasized that each claim’s truthfulness depended on objective coding rules and medical necessity standards, which the government’s evidence put in dispute – making it a triable issue under the FCA. Wagoner is a rare published decision in the UDT context, showing that courts will allow the government to reach trial by identifying patterns of unnecessary testing or miscoding. Notably, in Wagoner the government did not need to rely on a sampled subset – it presented a universe of claims data (all 5,000+ claims billed with the wrong code) and expert testimony that none of those claims conformed to billing rules or necessity criteria. This obviated the need for statistical extrapolation, but the case still illustrates how aggregated proof (an analysis of thousands of claims at once) can establish FCA falsity when the scheme is systemic.
- Kickback/Unnecessary Testing Schemes: Many UDT cases have resolved through large settlements rather than published opinions, but they shed light on common fraudulent patterns. For instance, Millennium Health – at one time the nation’s largest UDT lab – paid $256 million in 2015 to settle FCA allegations that it billed Medicare and Medicaid for extensive unnecessary urine drug and genetic tests, and provided physicians with free testing supplies and other inducements to drive referrals. Similarly, in 2024 Precision Toxicology (Precision Diagnostics), a major UDT lab, paid $27 million to settle allegations of medically unnecessary UDT and illegal kickbacks to physicians in the form of free point-of-care test cups, consulting agreements, and more. In these cases, government investigators often identified suspicious patterns (e.g. routine performance of expansive confirmatory testing on every sample regardless of clinical need) and may have used statistical analyses during the investigation to quantify the scope of false billing. While settlements do not create precedent, they underscore that UDT practices are subject to FCA scrutiny, especially where labs blanket-test patients or use one-size-fits-all protocols driven by profit rather than medical necessity.
- Proof and Sampling in UDT Cases: Because UDT fraud cases frequently involve thousands of claims across many patients, courts and enforcers have used sampling in audits and occasionally in litigation. For example, OIG audits of labs often statistically sample claim lines to determine an error rate of improper UDT billing, then extrapolate that to recoup overpayments (more on OIG methodology below). In litigation, if a relator or DOJ needed to prove that hundreds of UDT claims lacked medical necessity, they might use expert medical review on a random sample of patient records. However, a cautionary note from the hospice cases applies: if “medical necessity” for UDT is a judgment call for each patient encounter, a court may be hesitant to accept pure extrapolation. The key is objectivity – many UDT fraud theories are objective (for instance, billing for a test that was not ordered, or billing the wrong code as in Wagoner, or a medical policy that no patient in a certain program should receive a particular panel test). These are well-suited to sampling/extrapolation because the falsity does not hinge on subjective clinical opinions. On the other hand, if a case required proving that each individual drug screen was not reasonable or necessary for that specific patient at that time, a defendant might analogize to Vista Hospice or AseraCare and argue individual proof is required. So far, courts have not squarely addressed a UDT case of that nature at the circuit level.
- Other Notable UDT-Related Cases: A recent case in the Northern District of California (U.S. ex rel. Campie v. Carolina Liquid Chemistries, 2022) illustrates another facet – device accuracy. Whistleblowers alleged a lab equipment manufacturer misrepresented the capabilities of its UDT analyzers, causing false claims. That case ended in summary judgment for the defense (the court found insufficient evidence of falsity/scienter). Although not about sampling, it shows the breadth of UDT-related FCA litigation – from laboratories engaged in unnecessary testing, to coding/billing abuse by clinics, to companies supplying testing technology. And in Zafirov v. Florida Medical Associates (M.D. Fla. 2024), a relator alleged a pain clinic and its lab violated the FCA by performing excessive UDT and waiving copays (an Anti Kickback Statute issue). Unusually, that case was dismissed on constitutional grounds – the judge held the FCA’s qui tam provisions unconstitutional – without reaching the merits. Thus, it did not produce substantive guidance on sampling or proof, but it is a reminder that many UDT FCA cases proceed as non-intervened relator suits, which can lead to novel issues.
In summary, urine drug testing FCA cases often involve high volumes of claims and patterned conduct, making them amenable to statistical evidence in audits or court. The government’s strategy tends to focus on proving an across-the-board practice (e.g. “defendant billed every patient’s test under code X, which is false” or “lab ran 20-panel tests on 100% of samples regardless of doctor orders”). Where such a uniform scheme is shown, courts have allowed the inference that all claims were false without needing to dissect each claim. On the flip side, defendants in UDT cases may argue that medical judgment was involved for each test, pushing for claim-specific proof. Courts have not definitively resolved this tension in the UDT context, but existing FCA precedents suggest that objective, scheme-driven falsity can be proven via representative evidence, whereas pure medical necessity disputes require more caution with extrapolation.
3. OIG Guidance on Statistical Sampling and Extrapolation
The Health and Human Services (HHS)’ Office of Inspector General (OIG) has long endorsed statistical sampling and extrapolation as essential tools for fighting healthcare fraud. Both in formal publications and routine practice, OIG has provided guidance and set expectations for using these techniques to identify overpayments and false claims:
- Authority and Use in Audits: Federal law explicitly authorizes statistical sampling in Medicare and Medicaid audits. For example, Medicare program integrity contractors must use sampling and extrapolation when a provider has a sustained high error rate. OIG’s own Office of Audit Services routinely employs random sampling to audit providers. The rationale is efficiency and effectiveness: reviewing every claim is often impractical, so OIG uses statistically valid samples to estimate total overcharges. According to a provider guidance issued by one state Medicaid OIG, sampling “offers a mathematical approach” to select claims representative of a period’s universe, and extrapolation projects the sample’s error rate onto all claims to estimate total false billing. OIG’s statisticians typically use a 90% or 95% confidence level, and methods “long proven as mathematically valid” (simple random sampling or stratified random sampling). The idea is that the extrapolated amount, with a high confidence interval, gives OIG “reasonable” assurance of the overpayment, even if it’s an estimate. Importantly, OIG and CMS guidelines require that the sampling methodology be scientifically sound and that providers have an opportunity to rebut it (for instance, during appeals of overpayment demands).
- OIG’s RAT-STATS Tool: OIG has developed and freely distributes RAT-STATS, a statistical software package for sample design and extrapolation. OIG encourages its use by contractors and providers alike. RAT-STATS helps ensure consistency and transparency in how samples are drawn and how overpayments are calculated. For example, if OIG audits a hospital, it might input all paid claim lines into RAT-STATS, generate a random sample of a certain size, audit those claims, and then use the software to calculate the overpayment point estimate and confidence interval. The availability of RAT-STATS is a form of informal guidance signaling that methodological rigor is expected in statistical sampling.
- Medicare Appeals & OIG Reports: In August 2020, OIG issued a report examining how consistently Medicare contractors and appeals adjudicators review extrapolated overpayments. The report (OIG A-05-18-00024) found some inconsistencies – e.g. certain contractors employed a particular statistical validity test that others did not – potentially leading to uneven results in appeals. OIG recommended that Centers for Medicare and Medicaid Services (CMS) provide additional guidance to contractors to ensure consistency in reviewing extrapolations and to improve data tracking of which appeals involve extrapolated amounts. CMS concurred and agreed to clarify procedures. This underscores OIG’s stance that while sampling is a powerful tool, it must be applied uniformly and fairly. Providers should be judged by the same standards (e.g. what statistical methods are acceptable) in an appeal of an extrapolated overpayment.
- OIG’s Provider Self-Disclosure Protocol (SDP): One of the most concrete pieces of OIG guidance on sampling comes from its Self-Disclosure Protocol – the process for providers to voluntarily disclose fraud and negotiate a settlement. In the updated SDP (2013), OIG requires providers to use statistical sampling if the potential damages involve numerous claims. Specifically, if a disclosing party elects to extrapolate damages, the sample “must include, at a minimum, 100 claims” and the mean point estimate must be used for the overpayment calculation. Unlike some payor audits, OIG does not require a particular precision (confidence interval) in the SDP sample, so long as at least 100 items are reviewed. This allows providers to avoid extremely large sample sizes. OIG also instructs that when using a sample, one cannot net out underpayments against overpayments – i.e. if some claims in the sample were under-billed, that doesn’t offset the false claims in the calculation. These SDP requirements reflect OIG’s view of best practices: use a sufficiently large random sample to ensure validity, be straightforward (take the average error rate as the measure), and don’t try to dilute liability by offsetting errors unless they are directly related. The SDP also confirms OIG’s general approach in settlements to use a multiplier on damages (often 1.5 times the single damages in self-disclosures), which ties into the idea that a statistically estimated overpayment is a starting point for resolution.
- OIG Compliance Guidance and Industry Education: OIG often emphasizes in its compliance-program guidance that providers should conduct regular self-audits, which may include sampling claims to ensure proper billing. For instance, OIG’s “Roadmap for New Physicians” cautions doctors to be mindful of billing rules and suggests auditing a sample of charts/claims periodically as a best practice. OIG has also published educational articles (such as the Wisconsin DHS OIG fact sheet) explaining extrapolation in simple terms to demystify it for providers. These resources uniformly convey that statistical extrapolation is here to stay and that providers, by participating in Medicare/Medicaid, effectively accept its use in audits. In fact, many provider agreements and state Medicaid regulations explicitly state that the provider agrees to statistical sampling in audits as a condition of program enrollment.
- “Rough Justice” Philosophy: Both OIG and the Department of Justice have pointed out (often citing case law) that combating fraud requires “rough remedial justice.” This phrase, quoted by the Eighth Circuit in Zorn and originally from the First Circuit, encapsulates the policy that the government need not prove damages with exact precision when the defendant’s conduct (the fraud) makes that difficult. In practical terms, OIG’s stance is that a statistically sound estimate of losses is sufficient to recover funds – perfection is not required. The FCA is remedial and “not a game of gotcha”, so if a provider argues a few claims in the extrapolation were actually proper, the focus is whether the overall methodology is fair and produces a reliable estimate of the total loss. Providers can challenge an extrapolation’s validity (e.g. sample bias or a very wide margin of error), but simply pointing out minor inaccuracies or demanding each claim be individually proven usually will not defeat an otherwise reliable statistical case. OIG guidance implicitly adopts this view – encouraging the use of 99% confidence in some cases or one-tailed intervals that favor the provider, but ultimately insisting that if the method is valid, the extrapolated overpayment is enforceable.
Bottom Line: The OIG’s formal and informal guidance on statistical sampling can be summarized as follows: Do it right, and it’s a powerful, accepted tool. “Right” means using random, representative samples, proper sample size, and well-recognized statistical techniques to ensure validity and transparency. OIG has provided tools (RAT-STATS), manuals, and protocol requirements to reinforce these best practices. In healthcare FCA audits and investigations, sampling and extrapolation are not only allowed but often expected. Providers are on notice that if they engage in widespread fraud (e.g. upcoding every claim or billing unnecessary tests over years), the government will not hesitate to statistically estimate the full scope of the fraud rather than limit recovery to a handful of claims. Courts generally support this approach as long as due process is respected through sound statistics and an opportunity to challenge the methods. As one court noted, “a statistical estimate may provide a sufficient basis” for FCA liability and damages, particularly when reviewing every claim is infeasible. In sum, both case law and OIG guidance reflect that statistical sampling and extrapolation, when properly applied, are well-accepted in FCA enforcement, striking a balance between accuracy and practicality in safeguarding public funds.
Sources:
- Manatt Health, “Using Statistical Sampling in False Claims Act Cases” (2017) .
- United States ex rel. Martin v. Life Care Centers of Am., Inc., 114 F. Supp. 3d 549, 571 (E.D. Tenn. 2014).
- United States v. Rogan, 517 F.3d 449, 453 (7th Cir. 2008).
- United States ex rel. Michaels v. Agape Senior Cmty., Inc., 848 F.3d 330 (4th Cir. 2017).
- United States v. Vista Hospice Care, Inc., 2016 WL 3449833 (N.D. Tex. June 20, 2016).
- United States ex rel. Ruckh v. Salus Rehabilitation, LLC, 963 F.3d 1089 (11th Cir. 2020).
- United States v. Zorn, 61 F.4th 921 (8th Cir. 2024).
- WilmerHale Client Alert, “False Claims Act 2019 Year-in-Review” (discussing 5th Cir. Hodge).
- Bass, Berry & Sims, “Court Rejects Relator’s Use of Statistical Sampling” (July 1, 2016) (summary of Vista Hospice decision).
- United States v. Wagoner, No. 2:17-cv-478, 2024 WL 5050954 (N.D. Ind. Sept. 17, 2024).
- OIG News Release: “Millennium Health Agrees to Pay $256 Million for Unnecessary Drug and Genetic Testing…” (Nov. 23, 2015).
- OIG News Release: “Precision Toxicology Agrees to Pay $27M… for Unnecessary Drug Testing…” (Oct. 2, 2024).
- Wisconsin DHS-OIG, “What Providers Need to Know About Statistical Sampling & Extrapolation” (July 2023).
- OIG Provider Self-Disclosure Protocol (2013 Update) – Hall Render summary.
- HHS-OIG Report A-05-18-00024, “Medicare Contractors Were Not Consistent in How They Reviewed Extrapolated Overpayments in the Provider Appeals Process” (Aug. 2020).
