Dotmatics

How to Build an AI-Ready Foundation for Drug Discovery

The article emphasizes that successful AI integration in drug discovery requires companies to build an AI-ready foundation focused on robust data infrastructure and management, highlighting Google's example of investing years in data and model development, leveraging public data contributions, and open-source tools to maximize AI value before applying it effectively in real-world applications.

As artificial intelligence (AI) gains traction in drug discovery, many companies feel compelled to increase their use of AI. According to Deloitte, more than 60% of biopharma and medtech companies surveyed spent over $20 million on AI programs in 2019; that amount is only expected to grow over time. However, AI investment can be complicated. It often takes longer than expected to see returns because of the time it takes to train models. Many companies struggle to successfully implement AI within their organizations, often due to data challenges.

AI Preparation Precedes Success

Simply put, you can’t benefit from AI if you’re not ready for AI.

With artificial intelligence, data are key. In fact, Gartner reports a notable shift from a model-centric approach toward a more data-centric approach, which looks to improve outcomes through better data management, labeling, and annotation, rather than through tweaking models. Therefore, an essential aspect of being AI-ready is having the infrastructure in place to efficiently collect and use data.

Google as a Pioneer

Google serves as an example, having spent years investing in and training their AI before applying it within widely used tools such as maps, search, and YouTube. Google’s AI approach centers around the reciprocal nature of data and models—the notion that plentiful, good data are needed to create models, and in turn, those models are needed to derive the most possible value from that data. This reciprocity extends into the company’s utilization of the wider public to help build their AI; for example, they’ve used Google CrowdSource to publicly collect training data, while providing back to the community open-source data sets (e.g., Google DataSet Search) and AI/ML software (e.g., TensorFlow).

At the annual I/O Developers Conference, Google CEO Sundar Pichai detailed recent ways the company has applied AI, such as mapping rural areas in Google Maps, summarizing documents in Google Workspace, and improving natural language processing and speech recognition in Google Chat and Google Pixel. Notably, Pichai emphasized the importance of setting up for success, commenting, “The advances we’ve shared today are possible only because of our continued innovation in our [technical] infrastructure.”

Preparing for AI in Drug Discovery

Life science and small molecule drug discovery innovators can learn from Google’s commitment to investing in infrastructure that supports large-scale data collection and model refinement.

Innovation with AI typically demands that companies manage their data and workflows differently than they have in the past. In a recent BioITWorld article, Dotmatics’ Science and Technology Specialist, Will Bowers, reviews best practices companies can adopt to automate data cleaning and AI pipelines for increased enterprise-wide adoption.

However, as detailed by Towards Data Science, legacy data and technology infrastructures typically cannot accommodate the level of integration and data fluidity needed for AI; instead, scalable, flexible data platforms are best suited to support AI. Frequently, innovative companies struggle with data management because of their technology infrastructure and workflow processes. Nearly 30% of the biopharma and medtech companies that Deloitte surveyed said data struggles negatively impact their AI initiatives. Specific pain points identified include poor-quality data and siloed data systems—two obstacles Dotmatics can help companies overcome.

Dotmatics Can Help You Get AI-Ready

Dotmatics can help life science and small molecule drug discovery companies get AI-ready by providing a unified scientific research-data management platform that:

  • Enables the capture of clean and trustworthy AI-ready data, such as through automated instrument-data collection, database and application integration, and error-proof data entry via electronic laboratory notebooks (ELNs)
  • Removes data silos by seamlessly integrating all the different data types that make up the experimental fabric, such as chemistry, biology, formulation, and physical characterization data
  • Provisions the model quality data needed for machine learning by breaking away from proprietary data formats, automating QC and QA, and eliminating time-consuming and error-prone data wrangling.

Next Steps

Learn more about developments in small molecule discovery with the on-demand webinar, "Reduce Risk, Cost, and Time in the New Era of Small Molecule Drug Discovery."