From legacy to leap: Statistical Computing at a Glance (Part 1/6)
We're excited to launch our LinkedIn newsletter, where we'll provide updates covering clinical data science, entimo insights, and other news in the life sciences industry.
The first six editions of the newsletter will be focussed on entimo's journey to a modern statistical computing environment.
The need for computation in statistical analysis has been around for a long time. The origins of statistical computing can be traced back to the 1920s and 1930s when universities and research labs first got their hands on IBM’s mechanical punched card tabulators. These early machines were a game-changer, allowing for more complex models like analysis of variance and linear regressions, decades before today’s high-powered computers came into existence. Imagine Ronald Fisher, one of the greatest statisticians of the early 20th century, hunched over sheets of paper, manually calculating the stats that we now take for granted with a few clicks. By the early 1920s, Fisher upgraded to one of the world’s first "Millionaire" calculating machines, revolutionizing his work at Rothamsted Research. The 1950s ushered in the computer age, but things weren’t quite as simple as they are today. Looking back at statistical programming in the 1950s will give modern users an appreciation of how easy programming actually is today! With the emergence of statistical programs for research, routine statistical analysis was also performed and these represented a big challenge, at least computationally. In 1963, Rothamsted statisticians, working with Elliott 401 and Elliott 402 computers, took a whopping 4,731 hours to analyze 14,357 data variables. Just imagine the sheer volume of paper tape they used to program these machines—enough, perhaps, to circle the Earth!
Fast forward a decade and the development of Genstat (General Statistics) started and the programming was done in FORTRAN, initially on an IBM machine. In that same year, at North Carolina State University, SAS®(Statistical Analysis Software) was almost simultaneously developed by computational statisticians, also for analysing agricultural data to improve crop yields. At around the same time, social scientists at the University of Chicago started to develop SPSS (Statistical Package for the Social Sciences). Although the three packages (Genstat, SAS and SPSS) were developed for different purposes and their functions diverged somewhat later, the basic functions covered similar statistical methodologies. Around 1985, SAS and SPSS also released a version for personal computers.
At entimo, we entered the clinical solutions market by releasing PhaLIMS, a Laboratory Information Management System which supports the processing of in vivo and in vitro as well as radioactive and non-radioactive studies, while simultaneously working to develop a statistical computing environment for life sciences.
In the clinical data world, even before CDISC released the first vesion of the SDTM standard in 2004, many organizations often had SAS installed on individual PCs and data was accessed over the local area network. Many of the SAS programs are generated as a result of exploratory data analysis or ad-hoc analysis. This posed several challenges -
Recommended by LinkedIn
Rather than looking for different tools or solutions to deal with each of these challenges, we chose to build an integrated solution that would address these issues and meet the needs for an organisation, while having the capability for scale-up should the need arise.
The year 2006 was one of many wins. It was the magical summer when Germany hosted the soccer world cup, one of the most watched events in television history. For entimo, it was the year we released our most-used product.
In 2006, entimo released a groundbreaking novel client-server architecture based software for statistical computing called entimICE DARE, entimo's Integrated Clinical Environment. It was the first ever Integrated Clinical Environment in the market built specifically for the life sciences industry, and had out-of-the-box functionalities that were unrivalled in the market at that time. In its first generation, entimICE DARE provided end-to-end traceability and regulatory compliance. Some other features of entimICE DARE included:
entimICE DARE ultimately evolved to become one of the most widely used platforms in the life sciences industry, transforming into an end-to-end platform for clinical data analysis. It combined powerful features like MDR-driven statistical data analysis with native interactive tools like SAS and R, and automated workflows with load balancing. entimICE DARE provided SAS scaleability and loadbalancing years before SAS Grid was launched. The unique concept of a pool manager, controllers and job dispatchers supported effective load balancing for daily programming activities of hundreds of users and efficient resource management of scheduled jobs also for compute-intense activities. Its access control and synchronization features brought ease and precision to the users in ways no one had ever seen before. Building on DARE’s success, we later introduced ToxKin, a state-of-the-art tool for non-compartmental analysis of toxicokinetic and pharmacokinetic studies. All of these applications were fully GxP and FDA 21 CFR Part 11 compliant with electronic signatures and readable audit trail, something uncommon 20 years ago. These applications are still being used in many pharmaceutical companies today, a testament to their robustness and long-lasting value.
How did we do it? Why did we do it? How did we come up with the first integrated clinical environment in the market? Watch out for the story in the next issue of the newsletter..