Big Data Barriers in Healthcare

In the past several years, we have seen tremendous advances in the amount of data we routinely generate and collect, as well as our ability to analyze and understand it.

The intersection of these trends is what we call “Big Data” and it is helping businesses in every industry to become more efficient and productive. Big data analytics is the process of examining large, complex data sets to uncover hidden patterns, unknown correlations and other useful information that can be used to make better decisions.

In healthcare, harnessing and unlocking the potential in existing data sources, including clinical trials, biomedical imaging studies, EHRs, genomic and personal health data, can provide insights to improve decision making in all aspects of clinical trial design and drug development.

Despite significant technological progress, the big data revolution in healthcare is still in its infancy. Like analysts in other industries, healthcare professionals struggle to bring together masses of data and synthesize it into actionable information. This may not be the most formidable challenge, however. In order to realize the promise of big data, several hurdles remain which are specific to the healthcare industry.

So, what are the key challenges?

Challenge No. 1: Data Transparency in Clinical Trials

Beyond select clinical trial data made available in journal publications, individual participant data has generally not been shared routinely with the broader scientific community or the public. It is estimated that half of all clinical trial results have never been published, and positive trials are twice as likely to be published as others, whether these are industry sponsored or not (2).

Indeed, the progress towards data transparency has been slow, with clinical research sponsors expressing concern over competition, the loss of trade secrets, and general risks to reputation. But there are signs that this is beginning to change, with pressures mounting from both health regulatory authorities and principal investigators who assert that the benefits to secondary research, patient safety and general research cost savings outweigh the risks. Across the industry, an increasing number of organizations are taking the initiative to share their data more actively.

From the public sector side, the NIH recently launched the Big Data to Knowledge Initiative (BD2K) to enable researchers to better access and utilize big data.

Large public-private collaborations are also underway, such as the National Patient-Centered Research Network (PCORnet) and the consortium Optum Labs, a a research collaborative that has brought together academic institutions, health care systems, provider organizations, life sciences companies, and membership and advocacy organizations. Companies like GSK have taken significant steps forward by offering individual researchers portal for data sharing that other large pharma companies like Boehringer Ingelheim, Sanofi, and Pfizer agreed to use to post their clinical trial data.

Not to overstate the progress, dissention exists and some major players resist an overhaul of the existing paradigm, particularly in the U.S. Pfizer, the world’s third largest drug firm, has said it will resist demands from investors and transparency campaigners that it disclose results from all historical drug trials before 2007. AbbVie, also a U.S. firm, took legal action against the European Medicines Agency two years ago to block the publishing of trial data relating to its best-selling rheumatoid arthritis treatment, Humira.

And though the WHO recently released a new position statement calling for companies to publish all research studies and suggested specific timetables for making the information available, the FDA has played a more circumscribed role in participant-level data sharing efforts. This is likely driven by the position that commercial confidentiality laws prevent voluntary public disclosure of data.

SEE ALSO: Health Data Vulnerabilities

Challenge No. 2: HIPAA

Unlike other industries, health care organizations and practitioners face the added challenge of maintaining regulatory compliance with mandates such as HIPAA. Some recent discussion has pointed to the fact that the US Health Information Portability and Accountability Act (HIPAA) may be stifling innovation.

A 2013 Bipartisan Policy Center report, titled “A Policy Forum on the Use of Big Data in Health Carehttp:// suggests that HIPAA is a burden on healthcare organizations that are trying to innovate and maximize the value of patient big data. The report asserts that HIPAA is causing delays in the sharing and movement of data in a meaningful way, because the federal regulation is “misunderstood, misapplied, and over-applied.”

For the healthcare industry to enable data sharing and integration, the report suggests that HIPAA should be applied in a more precise way that enables a better balance between innovation and patient privacy.

Challenge No. 3: Lack of Patient Data Anonymization Standards

The Bipartisan Policy Center report from 2013 further criticizes the fact that HIPAA stipulates how data should be de-identified, yet there is considerable variability in the practice of anonymization and no obvious governing standards. Patient data anonymization represents both a technical and cultural challenge to leveraging big data in healthcare.

HIPAA requires that data custodians provide for a process to limit the ability to identify a Data Subject from a clinical dataset. Only when clinical data is de-identified according to this process, it may be disclosed to a third party.or presumably used in a Big Data analytics workflow. The interpretation of the requirement to “limit the ability to identify a Data Subject from a clinical dataset” is important here. What does it mean and how technically does one achieve it in diverse data sets, from large influenza studies in healthy populations to small orphan disease studies with limited patient pools?

Challenge No. 4: Patient Data Anonymization Requirements Vary

Since no prescriptive standard exists, the process of Anonymization may take many forms and involve several techniques, depending on the risk assessment of the data and interpretation of the regulations. Though HIPAA provides for a process to limit the ability to identify a Data Subject from a clinical dataset, the de-identification process is not acceptable for Anonymization standards in the European Union or in Switzerland. Additional measures must be taken to assure compliance with EU and Swiss anonymization rules, requiring sponsors to eliminate or limit the use of so-called “quasi-identifiers,” which may ultimately result in the identification of Data Subjects in a data set. So, what is a quasi-identifier? While individual fields may not be identifying by themselves, the contents of several fields in combination may be sufficient to result in identification. This set of fields in combination is a quasi-identifier. An example could be a collection of fields taken together, such as gender, age, ethnic group, marital status, and geography.

And to make matters more complicated, most Data Anonymization Standards focus on raw and analysis data; however, accompanying trial documentation could potentially identify individual patients as well. As a result, organizations need to also be concerned with understanding what should be anonymized in addition to the datasets.

Looking Ahead

The vast amount of data generated each day in healthcare and clinical research represents numerous opportunities to enable the life science research community to access, manage, and implement big data. Insights from big data can potentially influence multiple facets of health care, from making assessments about the safety and efficacy of different treatments to predictive models for diagnosing, treating, and delivering care.

But unlocking the potential of big data is no easy matter and requires careful consideration of the context in which the data and technology are used. Patient data privacy and protection, regulatory mandates such as HIPAA, and clinical trial data transparency standards are all critical barriers that must be addressed. Industry consortiums such as CDISC have colluded to support the standardization of acquisition, exchange, submission and archive of clinical research data and metadata.

Isn’t it time that we turn our attention collectively towards overcoming key precompetitive challenges to big data innovation in healthcare, such as data transparency practices and anonymization standards?

Sujay Jadhav is CEO of goBalto.

About The Author