Data binding in Pharma analytics
In Pharma, business intelligence is replete with diverse interests, goals and uses. Besides having clearly demarcated internal departmental structures such as sales, R&D, operations, legal etc., pharmaceutical companies operate in an intricate network involving contract organizations and regulatory bodies, all of which have different interests in analytics that do not necessarily complement each other. The diversity in interests is reflected in the numerous approaches, methods and options utilized to build analytic IT solutions in Pharma.
This plurality does not dissuade executives and IT architects from being tempted with the idea of a one-stop solution for all analytics needs of a pharmaceutical company. At the center of this vision lies an enterprise data warehouse. The difficulties involved in catering to diverse interests from a single solution, however centralized in conception, knock hard when someone begins working on a data binding strategy for the EDW. Almost immediately the following become apparent:
- There is no fit-for-all binding approach
- Data must be bound at multiple stages in the solution
- It is not uncommon for data to need rebinding
It will be illustrative to discuss a few use cases commonly implemented across different departments to appreciate these assertions.
R&D departments use analytics to generate an ISS (integrated summary of efficacy) to facilitate e-submission for drug approval by integrating data resulting from studies conducted across multiple locations. Early binding to models derived from standards such as CDISC or SDTM, perhaps as soon as data is aggregated from source clinical trial management systems ensures consistency and uniformity, something very important for the submission process. On the other hand, patient cohort analysis studies to assess drug efficacy usually require purpose built data repositories that require binding to very specific models on an exploratory basis that may or may not last beyond the study. Since these models lack any degree of standardization, it is not a good idea binding data to them as soon as aggregation from source systems happens. We are looking at two use cases that could overlap in their use of source data, yet oppose each other in purpose and present significantly different binding demands.
Multiple needs for binding is best illustrated in solutions that attempt pharmacovigilance signal detection by using spontaneous reporting systems such as FDA’s AERS or EMA’s Eudrgavigilance databases. Even before data from these systems can be downloaded for analysis, attempts must be made to bind possible names for drugs used in the systems to substances defined in authoritative schemes such as the WHO Drug Dictionary. Possible descriptions of adverse events may need to be bound to authoritative nomenclature found in systems such as MedDRA. When downloaded, the data must be bound to light rules, for example, that determine how to handle missing or duplicate records. Depending on the methodology of the solution this lightly transformed data needs to be bound next to models to create dimensional tables and OLAP cubes. Data mining algorithms are then applied to detect adverse drug reaction signals and additional mappings to display the results in graphical formats must be done in the visualization layer. On the other hand, when signal detection is attempted using social media data, big data approaches that utilize free text mining methods are called for. There is very little to almost no emphasis on binding data in these approaches. We are looking at the same use case which presents very different binding demands when done differently.
In recent years, regulators have been busy embracing new paradigms to assess and monitor manufacturing quality and expanding the scope of products covered under such initiatives. As an example, the FDA has been promoting a risk based approach requiring manufacturers to proactively collect and report metrics. However, these initiatives are far from complete. The metrics are being finalized (an evolving process) and the scope expanded to include more sectors, such as sterile injectable products etc. This means a lot to manufacturing analytics solutions meant to assist pharmaceutical companies submit reports to regulators for ongoing GMP compliance. They need to adopt new paradigms and metrics sets. Changes will become more frequent in the coming years necessitating a binding strategy that can easily expand, modify and deploy master and reference data on demand. We are looking at solutions that need to co-evolve with regulatory initiatives.
What has been discussed so far may suggest that an enterprise data warehouse is not the best solution for disparate objectives. While data marts remain the choice to implement quick solutions for seemingly different objectives that require different binding approaches, an EDW need not be ruled out. A centralized meta-data driven engine that could bind on demand would be an invaluable contraption to enable a successful implementation of an EDW in the face of very disparate objectives as witnessed in the case of Pharma. The ability to bind on demand would render binding decisions that stick as non-issues. A neatly designed engine could substantially reduce the tedium involved in implementing binding strategies by streamlining the binding process and additionally, could ensure smooth flows of data across centralized solutions.
A very good article and gives a lot good insight into business intelligence challenges in the pharma Industry.