From legacy to leap: Statistical Computing at a Glance (Part 1/6)

Entimo AG

Innovation.Experience.Trust.

Published Oct 22, 2024

We're excited to launch our LinkedIn newsletter, where we'll provide updates covering clinical data science, entimo insights, and other news in the life sciences industry.

The first six editions of the newsletter will be focussed on entimo's journey to a modern statistical computing environment.

The need for computation in statistical analysis has been around for a long time. The origins of statistical computing can be traced back to the 1920s and 1930s when universities and research labs first got their hands on IBM’s mechanical punched card tabulators. These early machines were a game-changer, allowing for more complex models like analysis of variance and linear regressions, decades before today’s high-powered computers came into existence. Imagine Ronald Fisher, one of the greatest statisticians of the early 20th century, hunched over sheets of paper, manually calculating the stats that we now take for granted with a few clicks. By the early 1920s, Fisher upgraded to one of the world’s first "Millionaire" calculating machines, revolutionizing his work at Rothamsted Research. The 1950s ushered in the computer age, but things weren’t quite as simple as they are today. Looking back at statistical programming in the 1950s will give modern users an appreciation of how easy programming actually is today! With the emergence of statistical programs for research, routine statistical analysis was also performed and these represented a big challenge, at least computationally. In 1963, Rothamsted statisticians, working with Elliott 401 and Elliott 402 computers, took a whopping 4,731 hours to analyze 14,357 data variables. Just imagine the sheer volume of paper tape they used to program these machines—enough, perhaps, to circle the Earth!

Article content — Elliott 401 Computer in 1954. Part of the processor units, the monitor, tape-reader and type-writer (on top).

Fast forward a decade and the development of Genstat (General Statistics) started and the programming was done in FORTRAN, initially on an IBM machine. In that same year, at North Carolina State University, SAS®(Statistical Analysis Software) was almost simultaneously developed by computational statisticians, also for analysing agricultural data to improve crop yields. At around the same time, social scientists at the University of Chicago started to develop SPSS (Statistical Package for the Social Sciences). Although the three packages (Genstat, SAS and SPSS) were developed for different purposes and their functions diverged somewhat later, the basic functions covered similar statistical methodologies. Around 1985, SAS and SPSS also released a version for personal computers.

At entimo, we entered the clinical solutions market by releasing PhaLIMS, a Laboratory Information Management System which supports the processing of in vivo and in vitro as well as radioactive and non-radioactive studies, while simultaneously working to develop a statistical computing environment for life sciences.

In the clinical data world, even before CDISC released the first vesion of the SDTM standard in 2004, many organizations often had SAS installed on individual PCs and data was accessed over the local area network. Many of the SAS programs are generated as a result of exploratory data analysis or ad-hoc analysis. This posed several challenges -

Establishing relationship between input datasets, program codes and outputs
Difficulty in tracking overwriting of files
Detection of program log issues
Running of batch programs
Documentation
Tools and processes to incorporate standards

Recommended by LinkedIn

Learn Data Science Tutorial — Mathematics

Leandro C. 3 years ago

Mathematics for Data Science

Saifoddin Dakhani 4 years ago

Mathematics for Data Science Mastery: A Step-by-Step…

Sahin Ahmed 2 years ago

Rather than looking for different tools or solutions to deal with each of these challenges, we chose to build an integrated solution that would address these issues and meet the needs for an organisation, while having the capability for scale-up should the need arise.

The year 2006 was one of many wins. It was the magical summer when Germany hosted the soccer world cup, one of the most watched events in television history. For entimo, it was the year we released our most-used product.

In 2006, entimo released a groundbreaking novel client-server architecture based software for statistical computing called entimICE DARE, entimo's Integrated Clinical Environment. It was the first ever Integrated Clinical Environment in the market built specifically for the life sciences industry, and had out-of-the-box functionalities that were unrivalled in the market at that time. In its first generation, entimICE DARE provided end-to-end traceability and regulatory compliance. Some other features of entimICE DARE included:

Holistic and access-controlled view on the repository, on top of a file system, enabling metadata and dependency management and workflow control
Unique configuration options for hierarchies, permissions and workflows enabling customers to adjust the system on a no-code basis
Tight integration with SAS and later with R, becoming the first multi-language SCE

entimICE DARE ultimately evolved to become one of the most widely used platforms in the life sciences industry, transforming into an end-to-end platform for clinical data analysis. It combined powerful features like MDR-driven statistical data analysis with native interactive tools like SAS and R, and automated workflows with load balancing. entimICE DARE provided SAS scaleability and loadbalancing years before SAS Grid was launched. The unique concept of a pool manager, controllers and job dispatchers supported effective load balancing for daily programming activities of hundreds of users and efficient resource management of scheduled jobs also for compute-intense activities. Its access control and synchronization features brought ease and precision to the users in ways no one had ever seen before. Building on DARE’s success, we later introduced ToxKin, a state-of-the-art tool for non-compartmental analysis of toxicokinetic and pharmacokinetic studies. All of these applications were fully GxP and FDA 21 CFR Part 11 compliant with electronic signatures and readable audit trail, something uncommon 20 years ago. These applications are still being used in many pharmaceutical companies today, a testament to their robustness and long-lasting value.

How did we do it? Why did we do it? How did we come up with the first integrated clinical environment in the market? Watch out for the story in the next issue of the newsletter..

From legacy to leap: Statistical Computing at a Glance (Part 1/6)

Entimo AG

Innovation.Experience.Trust.

The first six editions of the newsletter will be focussed on entimo's journey to a modern statistical computing environment.

Recommended by LinkedIn

The entimo insider

291 followers

More articles by Entimo AG

Others also viewed

"Mastering Mathematics for Data Science: "A Step-by-Step Approach""Mastering Mathematics for Data Science: A Step-by-Step Approach"

Mathematics and Statistics: The Ancestors Of Data Science

Mathematics for Data Science: The Backbone of Data-Driven Decisions 🤓

The Mathematical Building Blocks of Data Science

Applied Maths Role in Data Science

The Dream Team of Data Scientists

My Data Analysis Journey

What math(s) do you need to learn as a data scientist?

How do I get started in Data Science?

Explore content categories

The first six editions of the newsletter will be focussed on entimo's journey to a modern statistical computing environment.

Recommended by LinkedIn

The entimo insider

291 followers

More articles by Entimo AG

Agentic AI is here. Are your data standards ready?

What the EMA–FDA 2026 AI Guiding Principles Really Signal for Drug Development

From Code to Submission: Building Traceability

The EU AI Act: Redefining AI Governance in Life Sciences

Beyond the AI Hype: Smart Automation in Clinical Data Analysis and Reporting

Game for GAMP: What’s New and Important in the Second Edition of ISPE’s GAMP 5

ICH E6 R3 - Are We Ready For The Major Update To GCP?

Annex 11: Still the Cornerstone of GxP Compliance in the Digital Age

21 CFR Part 11 - Relevance of a 28 Year Old Regulation in Modern Cloud and AI Settings

From legacy to leap: A Future Beyond Statistical Computing (Part 6/6)

Others also viewed

"Mastering Mathematics for Data Science: "A Step-by-Step Approach""Mastering Mathematics for Data Science: A Step-by-Step Approach"

Mathematics and Statistics: The Ancestors Of Data Science

Mathematics for Data Science: The Backbone of Data-Driven Decisions 🤓

The Mathematical Building Blocks of Data Science

Applied Maths Role in Data Science

The Dream Team of Data Scientists

My Data Analysis Journey

What math(s) do you need to learn as a data scientist?

How do I get started in Data Science?

Explore content categories