Dotmatics

Prioritizing Data Integrity in R&D: Challenges and Best Practices

The article emphasizes the critical importance of maintaining data integrity in R&D by implementing robust data governance, security, and management practices throughout the research lifecycle to ensure data accuracy, protect patient safety, uphold product efficacy, and comply with regulatory standards amid increasing digitization and evolving threats.

Data integrity is an ongoing concern across all R&D organizations, regardless of the stage in the research lifecycle. These concerns go beyond delayed timelines or cost overruns, focusing on establishing a culture of quality, ensuring product efficacy and patient safety, and maintaining trust as a brand, partner, or provider.

Prioritizing Data Integrity in the Lab

Good data practices throughout the R&D process positively impact data integrity in the lab. Companies must defend the fidelity and confidentiality of all records and data generated throughout a product’s lifecycle, starting from the earliest research stages, including raw data, metadata, and transformed data. To achieve this, organizations need the right processes and technologies to ensure:

  • Data integrity: Completeness, consistency, validity, and accuracy of data as it is produced, captured, quality checked, transformed, and traced.
  • Data governance: Management and tracking of data access, usage, and degree of access.
  • Data security: Encryption, transfer, storage, and backup of data.

These factors are interconnected, adding complexity to upholding good data practices in the modern lab.

A Shifting Data Management Landscape

As R&D organizations digitize their data to enable large-scale analytics, data management best practices must evolve. Teams need clear strategies to identify and mitigate threats to data integrity, including technological, managerial, and external risks. In the pharmaceutical sector, the U.S. FDA has reported increasing data integrity violations in recent years. Common violations include data loss, missing metadata, non-contemporaneous collection or backdating, data deletion and copying, sample elimination or reprocessing, poorly investigated out-of-specification results, data access and security issues, and inadequate or disabled audit trails. Such missteps can impact research validity, patient safety, product efficacy, and regulatory approval.

Factors Impacting Good Data Practices

Three key factors complicate good data practices:

  • Multimodal R&D: Generates large volumes of disparate data requiring proper handling.
  • Increased collaboration: Drives wider data sharing, necessitating security and privacy considerations.
  • Artificial intelligence (AI): Changes how data are used to drive innovation.

1. Multimodal R&D

Organizations aiming for innovation diversify their R&D efforts across different scientific areas and modalities. Data flows from various sources, formats, and locations, including internal research groups, specialty equipment, legacy data migrations, and external CROs with distinct systems. This diversity creates challenges in lab integration and data management, risking data integrity and security. Many companies struggle to manage the vast volume and diversity of data and metadata needed for decision-making.

2. Collaboration

Successful large-scale R&D requires improved data flow between research groups to build collective knowledge. The importance of data sharing is highlighted by new NIH data management and sharing policies, which aim to confirm findings, encourage reuse, and spur innovation. However, collaboration is challenging for R&D groups accustomed to working in isolation. Effective collaboration requires shifts in mindset, culture, governance, and execution. Many teams lack systems to share well-annotated data while controlling access, tracking changes, and ensuring good data practices. Data often becomes scattered across repositories and mediums, rather than centralized in a secure, standardized data pool. The FAIR guiding principles for scientific data management promote making data findable, accessible, interoperable, and reusable. Achieving FAIR compliance requires changes in data format, model, storage, and system integration, but can be implemented incrementally for benefits in time savings, reproducibility, knowledge sharing, and AI-readiness.

3. Artificial Intelligence

With AI's arrival in R&D, organizations need data infrastructures to capture and manage proprietary data that differentiate their research. Becoming AI-ready involves adopting technology and process changes to support data growth, eliminate silos, integrate systems, and normalize data. The goal is to ensure all R&D data is trustworthy, well-structured, correlated, shareable, and model-ready. Achieving these standards is challenging due to complex workflows, data types, and systems, but is essential. Global compliance regulations are evolving to guide AI and ML use in research. The EU's Artificial Intelligence Act, passed in March 2024, aims to protect health, safety, and rights as AI becomes integral to innovation. Organizations must ensure their systems support regulatory and ethical challenges, including data integrity, security, traceability, and bias limitation.

Good Data Practices

Alignment of data management and integrity is vital for long-term research success and preparation for an automated, connected, and collaborative research future. Systems supporting these imperatives should:

  • Support research transparency, credibility, and reproducibility through complete data capture.
  • Automate results and metadata collection from instruments and lab systems.
  • Tie and track results to precise samples and fully documented experiments.
  • Aggregate R&D data into intelligent, correlated, model-ready structures.
  • Provide tools for scientists to manage, search, and visualize data.
  • Unite data-producing and analyzing applications within a secure data-management platform.
  • Centralize and securely store data with end-to-end encryption.
  • Configure checks and balances throughout R&D using audit trails, QC/QA and SOP checks, signature requirements, permission and access controls, project codes, encrypted reports, and secure dashboards.

References

  1. 1.Chen, S. Culture of Quality: Data Integrity and CGMP Compliance. U.S. Food and Drug Administration - SBIA Generic Drug Forum – April 26, 2022. (Accessed 02/06/2024)
  2. 2.Neumeyer, M. Data Integrity: 2020 FDA Data Integrity Observations in Review. American Pharmaceutical Review. Jun 23, 2020.
  3. 3.Vazquez, M.; Rayser, J. Regulatory warning letters in pharma: What can we learn post-COVID? Cleanroom Technology. July 27, 2022.
  4. 4.2023 NIH Data Management and Sharing Policy. National Institutes of Health. (Accessed 02/06/2024)
  5. 5.Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data, 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
  6. 6.Using Artificial Intelligence & Machine Learning in the Development of Drug & Biological Products. Discussion Paper and Request for Feedback. U.S. Food & Drug Administration. 2023. (Accessed 02/06/2024)
  7. 7.Artificial Intelligence in Drug Manufacturing. FDA Center for Drug Manufacturing and Research. Discussion Paper. 2023 (Accessed 02/06/2024)