Data Analysis & Statistical Insights without Pivot Tables, Data Cleansing or Math – Part 1
The productivity gains to be had by using AI for statistical and data analysis are staggering. About a year ago, we (RFP Ninja) had to write so many instructions just to get ChatGPT to read raw tabular data sets correctly 100% of the time that we gave up and used explicit code to ensure accuracy. This past weekend, I wanted to see how ChatGPT has evolved and improved. I started with something small and when I realized what it could do, I kept pushing the limits.
Eventually, I recreated, and actually improved on, some work I did a few years ago that utilized 100,000+ rows of machine data 25 columns wide as the source. The file was so big it used to choke my computer when I manipulated pivot tables. GPT 4.0 sliced through the data in minutes and gave me deeper analysis than I did the first time working over the course of a week. If this kind of work was still part of my daily workload, I’d probably still be buried in that rabbit hole loading data sets and asking ChatGPT to help me gain insights.
If you’re analyzing data sets on a regular basis, the best approach is to set up your own GPT to do the work. Creating a GPT is like creating a personal AI model. It just takes a few minutes to get it set up. This way, you can continue to refine the instructions and train your model. Click here to see an article that talks about how to create GPTs.
Here's a quick synopsis of what ChatGPT 4.0 was able to do with the first small test.
Statistical Differences
This example shows that you don’t need to remember much of anything from Stats 101 to be able to get statistical insights using ChatGPT.
This is a little data set of open rates for different outreach email options. You can argue that the sample size isn’t big enough to draw any conclusions, but it’s still a valid test to see how ChatGPT performs.
➡️ GPT Prompt: Look at this table and tell me if there is a statistical difference between any of these results for the percent opened. <<I then pasted the table above into the prompt window>>
Normally, to get this answer, I would have to compare all the pairs. This is tedious.
🖥️ Chat GPT 4.0 Response
The statistical comparison shows that there's only one pair with a significant difference (p-value < 0.05):
Recommended by LinkedIn
All other comparisons did not show statistically significant differences. This means that, apart from Test 1 versus Test 2, there isn't enough evidence to conclude that the differences in open rates between the other tests are statistically significant.
The great thing about LLMs is that if you want more info, you just ask; there’s no need to google P values.
➡️ Additional Prompt: Explain why the p value of .038 shows the difference is significant
🖥️ Chat GPT 4.0 Response
A p-value of 0.038 means there's a 3.8% chance of seeing this level of difference between Test 1 and Test 2 purely by random chance if there was actually no real difference between them.
Since this value (0.038) is below the common threshold (0.05, or 5%), we say it's statistically significant. In other words, it's unlikely to have occurred just by chance, suggesting that the difference in the percent opened between Test 1 and Test 2 is probably real rather than random variation.
Conclusion
Chat GPT didn't need any explanation of the data I was providing or what kind of statistical test to perform. It took a data set and made a determination about how to provide insights. Imagine what it can do with some additional instructions.
So next, I dug up something with 100,000+ rows of machine data 25 columns wide and started asking questions. The summary of this exercise will be my next post…
Even I could understand this one.. and that says a lot! very practical and not intellectually pretentious.. love the excel comparison.. Die, You Pivot Table, Die!!!