The intersection of my data and your Large Language Model.

The intersection of my data and your Large Language Model.

Over the last few months I've been hooked on how a large language model can interact with private data. This isn't revolutionary, it's just me waking up to the possibilities and sharing observations.

I'm using the process of Retrieval Augmented Generation; a technique where private data supplements a large language model to get tailored, current and "safer" results. Implemented in Python with the LangChain and Google Gemini libraries reading from Google Sheets.

Google Sheets in itself is simple yet insanely powerful. It's by no means the the only way to centralise, automate and preprocess private data, but the breadth of possibilities stopped me in my tracks as fell down the rabbit hole. These are my notes.

The following examples are just raw data in a spreadsheet LangChain'ed to a query prompt, there is no additional context provided.

Again, none of this is super new, it's just to me, amazing and I was compelled to share.

overly basic example :

Article content
raw data
_Query_
Who has a boy's name in this list?   
How many boys attended from Sydney in the last week?  
and how much did they spend?

**Boys' names: Bob, John, Nigel ** 
**Boys from Sydney in the last week:**  
**Total spending by boys from Sydney in the last week:** $413 + $98 + $144 = $655"          

Observation: I never told it which names were "boys" names, or what "last week" meant, or even the name of the spreadsheet (it was located in the script's input folder in Google Drive), those gaps were all inferred from the column titles.


[NEXT] same script, updating the spreadsheet column titles, and modify the LangChain prompt :

Article content
raw data
Who worked the longest time based on the list?      > Amelia (11 hours 29 minutes)

Who checked in, but did not check out?       > Charlotte- Richard

Who worked the least number of hours?     > John (1 hour 57 minutes)

Tally working totals per role.     > Cleaner: 7 hours 44 minutes\n- Chef: 17 hours 17 minutes- Driver: 40 hours 57 minutes- Manager: 17 hours 22 minutes

Estimate hourly rate per role,  provide total for all hours worked > '| Role | Hourly Rate | Hours Worked | Total Pay |\n|---|---|---|---|\n| Cleaner | $10 | 7.4 | $74 |\n| Chef | $15 | 8.9 | $133.50 |\n| Driver | $12 | 30.7 | $368.40 |\n| Manager | $20 | 13.9 | $278 |\n| Total |  |  | $853.90 |'        

Observation: the language model "knew" hours worked equated to a check-out minus a check-in, I never said that. They were just the column titles. And then pulling context into the summary for graphing, was unexpected.

[NEXT] and as a final little experiment here's some financial information, into the same spreadsheet as a copy/paste with new column titles, no additional context.

Article content
<prompt> which company would you invest in? using only the data provided in the following table, provide reasoning

Based on the data provided, Company C would be a more attractive investment option compared to Company A and Company B. Here are the reasons:

1. Market Cap: Company C has consistently shown a lower market cap compared to Company A and Company B, indicating a lower valuation. This could provide an opportunity for potential growth and value appreciation.

2. Revenue Growth: Company C has experienced a steady increase in revenue over the years, with a 33% growth from 2020 to 2024. This consistent revenue growth indicates a strong and expanding customer base.

3. Gross Profit: Company C has shown a gradual increase in gross profit, rising from $20 billion in 2020 to $33 billion in 2024. This indicates an improvement in the company's profitability and efficiency.

4. Net Income: Company C has consistently reported higher net income compared to Company A and Company B. In 2024, Company C's net income reached $19 billion, which is significantly higher than the other two companies.

5. EPS: Company C has consistently maintained a higher EPS compared to Company A and Company B, indicating better earnings per share for its investors. In 2024, Company C's EPS was $8.6, which is higher than the other two companies.

6. Shareholder Equity: Company C has experienced a steady increase in shareholder equity over the years, indicating a strong financial position and the ability to fund future growth. In 2024, Company C's shareholder equity reached $887 billion, which is higher than the other two companies.

It's important to note that these are just a few metrics, and a more comprehensive analysis would require considering other factors such as industry trends, competitive landscape, and management quality. However, based on the data provided, Company C appears to be a promising investment option with strong growth potential and financial stability.        

Observation: Would you trust it? The logic and conclusions seems sound enough, and with private data the chances of it hallucinating external context is reduced. It's certainly a believable indicator.


for me, what's next? (application to the real world)

In my previous company I had a fire hose of data coming to me every night via targeted extracts from Salesforce, Hubspot, Netsuite, Chargify, Zendesk and our platforms. BDRs and BDMs. Management summaries and escalation lists. And more. Frankly this resulted in a lot of 'lists' to scan through and not much real information 😞.

Late 2023 I enhanced this by using Vertex AI from Google, and some private platform automation to 'learn' an ideal or exceptional customer trate, and only inform me when an ideal customer did something interesting 🤔 This ran in production for nearly a year.

Millions of daily customer and prospect interactions distilled down to one or two actions for my team (I had regional responsibility for pre-sales, post-sales, customer upsell and success for a global SaaS company in the media space), my team was now informed with more or less the right information, at more or less the right time. But the tip of the iceberg.

Next it's all about applicability. I want to help others smartly consume data they already know, find the needles in the haystacks, enhance smooth and efficient growth, clairvoyant customer intimacy, call it what you like.

thanks for your time and feel free to continue the conversation as I look for my next adventure -> https://www.garudax.id/in/adrianbrittonaus/


You love a good rabbit hole:)

Like
Reply

To view or add a comment, sign in

More articles by Adrian Britton

Others also viewed

Explore content categories