Building a Lightweight Formula Execution Engine for Analytics Platform

I recently had a very interesting problem to solve: how to safely store and parse user inputted formulas for our analytics platform. At first glance, it sounds simple. Take a formula, evaluate it, return a number. Something like: (meta.spends + (ads.spends - 50)) * 100 My first instinct was the obvious one: just evaluate the string. That idea lasted about 30 seconds. Because the moment you try to do this in a real system, things get complicated very quickly: • Users can input anything which is a security risk • Formulas need to be reusable and debuggable • Metrics depend on other metrics so ordering matters • You need control over functions and allowed operations I explored existing options, but they were either unsafe (like "eval"), too limited or too heavy for what we needed. I realized we had to build a lightweight execution engine, so here's what we ended up doing: 1. Parse the formula into an Abstract Syntax Tree (AST) 2. Evaluate it in a controlled environment (no arbitrary execution) 3. Extract dependencies to understand which variables are required 4. Support custom functions like SUM, AVG, etc. But the most interesting realization came later. Formulas are not isolated, they form a graph. One metric depends on another, which can depend on another in turn. Suddenly, this becomes a dependency problem. Which means: • You need to resolve execution order • You need to detect cycles • You need to think like a query planner, not just a parser What started as "just evaluate a formula" turned into designing a small, safe, and extensible computation layer. And this is something I have noticed repeatedly. The interesting problems are not always the big ones. They are the ones that look simple enough to ignore, until you try to build them properly. #engineering #systemdesign #python #analytics #backend #softwaredevelopment

2 Comments

kshitij singh 3w

Define PEG grammar -> use pyparsing to gen AST -> ast evaluator

2 Reactions

To view or add a comment, sign in

More Relevant Posts

Askar Ali
3w
Report this post
Hello dudes and dudettes!! 🚀 Day 12/150 — Solved LeetCode 380: Insert Delete GetRandom O(1) Today’s problem felt like a real brain workout 🧠 — not because it was long, but because it demanded the right idea. At first, it looks simple: 👉 Insert 👉 Delete 👉 Get Random But the catch? ⚡ All operations must run in O(1) time That’s where things get interesting. 🧠 Initial Thought Process Using a list? Insert ✅ Get random ✅ Remove ❌ (takes O(n)) Using a set? Insert ✅ Remove ✅ Get random ❌ So clearly… one data structure alone isn’t enough. 💡 The Breakthrough Moment The solution clicked when I realized: 👉 Why not combine the strengths of both? Use a list for fast random access Use a hash map for instant lookups This combination unlocks true O(1) performance for all operations. 🔥 The Most Interesting Part — Deletion Trick Normally, removing an element from a list is expensive because elements need to shift. But here’s the smart trick: 👉 Swap the element to be removed with the last element 👉 Remove the last element 👉 Update the index in the hash map That’s it. No shifting. No extra cost. 💥 Constant time deletion achieved. 📊 How It Works (Simple Flow) Imagine storing values like this: A list keeps all elements A map stores each value’s index Whenever you: Insert → add to list + store index Remove → swap + pop + update index GetRandom → pick directly from list Everything stays efficient and clean. 😎 What I Learned Sometimes one data structure isn’t enough — combining them is the real power Smart tricks (like swap & pop) can completely change time complexity Designing systems is more about thinking than coding 🎯 Key Takeaway “Efficiency isn’t about doing things faster… it’s about avoiding unnecessary work.” 🔥 Another solid step forward in the journey. On to the next challenge. #LeetCode #Algorithms #DataStructures #ProblemSolving #CodingJourney #100DaysOfCode #Python #LearningInPublic
Like Comment
To view or add a comment, sign in
Shafiq Ahmed
3w
Report this post
I Built a Custom Auto-EDA Engine 🚀 Copy/Paste the text below: Most Data Scientists spend 60% of their time just doing basic EDA. I got tired of writing the same df.describe(), sns.heatmap(), and plt.show() lines for every single project. It felt like manual labor, not data science. So, I decided to automate it. 🛠️ I built a Smart Auto-EDA Profiler using Python, Pandas, and Plotly. Instead of spending an hour building charts, I now run one function and get a professional, interactive HTML report in seconds. What makes this "Smart"? Beyond just plotting data, I programmed it to "think" like an analyst: ✅ Automatic Alerts: It flags constant columns, high cardinality, and missing values instantly. ✅ Interactive Visuals: Powered by Plotly, so I can zoom into outliers and hover for exact values. ✅ Statistical Intelligence: It calculates correlations and distribution skewness on the fly. ✅ Portable Reports: Everything is bundled into a single HTML file—perfect for sharing with stakeholders who don't have Python installed. The goal wasn't just to save time; it was to ensure I never miss a data quality issue ever again. The Tech Stack: 🐍 Python | 🐼 Pandas | 📊 Plotly | 📝 Jinja2 Automation is the bridge between a "good" analyst and a "great" one. Why do the same task twice when you can build a tool to do it forever? Check out the screenshots below to see the report in action! 👇 #DataScience #Python #Automation #DataAnalytics #Efficiency #Pandas #Programming #MachineLearning "I will be sharing the full report tomorrow. Stay tuned for a detailed breakdown
Like Comment
To view or add a comment, sign in
Sateesh Sonkamble
3w
Report this post
🚀 𝗖𝗿𝗮𝗰𝗸𝗶𝗻𝗴 𝘁𝗵𝗲 “𝗕𝗲𝘀𝘁 𝗧𝗶𝗺𝗲 𝘁𝗼 𝗕𝘂𝘆 𝗮𝗻𝗱 𝗦𝗲𝗹𝗹 𝗦𝘁𝗼𝗰𝗸” 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 💡 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 𝗦𝘁𝗮𝘁𝗲𝗺𝗲𝗻𝘁 You’re given stock prices where: prices[i] = price on day i 👉 Goal: Buy once and sell once (in the future) to get maximum profit 📌 𝗘𝘅𝗮𝗺𝗽𝗹𝗲 Input: [4, 2, 3, 4, 5, 2] Output: 3 ✔ Buy at 2 → Sell at 5 🧠 𝗔𝗽𝗽𝗿𝗼𝗮𝗰𝗵 𝟭: 𝗕𝗿𝘂𝘁𝗲 𝗙𝗼𝗿𝗰𝗲 (𝗢(n²)) Check every possible pair of buy & sell days ❌ Inefficient for large data ⚡ 𝗔𝗽𝗽𝗿𝗼𝗮𝗰𝗵 𝟮: 𝗧𝘄𝗼 𝗣𝗼𝗶𝗻𝘁𝗲𝗿 / 𝗦𝗹𝗶𝗱𝗶𝗻𝗴 𝗪𝗶𝗻𝗱𝗼𝘄 (𝗢(n)) Track buy and sell pointers Update buy when a smaller price appears ✔ Better performance with linear time 🔥 𝗔𝗽𝗽𝗿𝗼𝗮𝗰𝗵 𝟯: 𝗢𝗽𝘁𝗶𝗺𝗮𝗹 (𝗚𝗿𝗲𝗲𝗱𝘆 - 𝗢(n)) Track minimum price so far Calculate profit at each step ✔ Most efficient and clean solution 💻 𝗢𝗽𝘁𝗶𝗺𝗮𝗹 𝗖𝗼𝗱𝗲 def optimal_stock(prices): min_price = float("inf") max_profit = 0 for price in prices: min_price = min(min_price, price) profit = price - min_price max_profit = max(max_profit, profit) return max_profit 🔗 𝗚𝗶𝘁𝗛𝘂𝗯 𝗖𝗼𝗱𝗲: https://lnkd.in/g-iaHxs5 🎯 𝗞𝗲𝘆 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 Always track the minimum before maximum Greedy approach often gives optimal results in linear time 💬 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗧𝗶𝗽 Start with brute force → optimize step by step This shows strong problem-solving skills 💡 #DataStructures #Algorithms #CodingInterview #Python #LeetCode #SoftwareEngineering #ProblemSolving #GreedyAlgorithm
Like Comment
To view or add a comment, sign in
Tasmiah Tarannum
2w Edited
Report this post
I built a web app to make data analysis accessible for beginners Try it here: https://lnkd.in/gTgGrczZ No prior knowledge needed Upload any dataset and get insights in minutes What it does • Cleans data and handles missing values • Detects outliers with clear metrics • Generates visual insights automatically • Builds and compares models • Explains results simply for non technical users Best experienced on a computer Current limitations • 200 MB file size limit • Large datasets may take longer to process • Joins require exact column matches • UI still evolving Built with Python and Streamlit, combining logic with AI assistance Open to feedback #DataScience #MachineLearning #Streamlit

2 Comments
Like Comment
To view or add a comment, sign in
Victor Iheanacho
3w
Report this post
I thought I was done… but I wasn’t. A few days ago, I built a Python script to download 3 years’ worth of daily price data from my email. It worked perfectly. One click — everything downloaded. But there was a problem. Every time I ran it… it downloaded everything again. Same files. Same process. Over and over. Not efficient. Not scalable. So I did what changed everything: I stopped thinking like a coder… and started thinking like a product builder. I built a simple interface on top of the script. Now I can: → Select a specific date range → Download only what I need → Avoid duplicates completely Same automation… But now with control. That’s when it clicked for me: Writing scripts is powerful. But turning them into tools? That’s where the real value is. This is just one layer of something bigger. I’m building a stock analysis app — and this is the data engine getting smarter. Next: → Smarter parsing → Data cleaning pipeline → Visual insights It’s slowly evolving from: “code that works” → to “something people can actually use” Building in public. Let’s see where this goes 🚀 #BuildInPublic #Automation #Python #AI #ProductBuilding #DataScience

2 Comments
Like Comment
To view or add a comment, sign in
Deep Patel
2w
Report this post
Stop Guessing, Start Visualizing: Introducing codegraph-viz! Ever joined a new team and felt completely lost in a massive, sprawling codebase? Or spent hours tracing a bug only to realize you broke a dependency you didn't even know existed? I’ve been there, and that’s exactly why I built codegraph-viz. codegraph-viz is a zero-config tool that turns any Python project into an interactive, D3.js-powered map. It’s designed to help you understand complex architectures in seconds, not days. No more digging through endless directories just to see how things connect. 🌟 Key Features: • Interactive Dependency Graphs: Click any node to see imports, dependents, and full source code without leaving the browser. • 4 Layout Modes: Switch between Force, Grid, Hierarchy, and Radial views to find the perspective that makes sense for your project. • Impact Analysis: Instantly see which files will be affected if you change a specific module. No more "I only changed one line" production incidents. • LLM-Ready Exports: Generate a token-efficient JSON index that helps AI agents understand your architecture for 90% fewer tokens. 🛠️ How to Get Started: You can install and run it directly from your terminal right now: 1️⃣ Install: pip install codegraph-viz 2️⃣ Scan Your Project: cd your-project codegraph scan Your browser will automatically open with a full interactive map of your codebase. No configuration, no databases, no accounts—just your code, visualized. Whether you're a new engineer onboarding, a tech lead catching architecture violations, or using AI to help you code, codegraph-viz is built for you. I've put the PyPI link in the first comment below! 👇 #Python #OpenSource #SoftwareArchitecture #DeveloperTools #DataVisualization #Coding #PythonProgramming #codegraph
5 Comments
Like Comment
To view or add a comment, sign in
Muskan K M
4w
Report this post
🚀 Building an Automated ML Web App with Streamlit — Now Testing Regression! Last time I showed the app working with a Classification dataset — today I'm back with a Regression dataset and it handles it just as smoothly! 📊 Here's what the app does automatically once you upload your dataset: ✅ Data Preview & Statistical Summary ✅ Univariate, Bivariate & Multivariate Analysis ✅ Automatic Preprocessing (Encoding + Scaling) ✅ Train/Validation/Test Split ✅ Trains 11 Regression Models automatically: Linear, Ridge & Lasso Regression KNN, Decision Tree, Bagging Random Forest, AdaBoost, GBM XGBoost & SVR ✅ Evaluates each model using R2, MAE, MSE & RMSE ✅ Automatically picks the best model based on Validation R2 Score The best part? You just select "Regression" from the dropdown, upload your dataset and the app handles everything from EDA to model comparison — no code needed on your end! 🔥 Previously showed Classification with metrics like Accuracy, F1, Recall, Precision & AUC-ROC — this app supports both problem types seamlessly! Still building — next steps include hyperparameter tuning and model export! 💪 🛠️ Tech Stack: Python | Streamlit | Scikit-learn | XGBoost | Pandas | Matplotlib | Seaborn 🔗 GitHub: github.com/Muskanbanu03 #Python #Streamlit #MachineLearning #DataScience #Regression #BuildInPublic #100DaysOfCode
Like Comment
To view or add a comment, sign in
Sanskar Sainik
4w Edited
Report this post
🔍 Showcasing my Friend Suggestion System I previously built an ML-powered friend recommendation system, and I’ve now deployed it to demonstrate how mutual connections and user interactions can be used to generate meaningful friend suggestions. 🔍 How it works: The system analyzes user data such as shared interests, activity patterns, and existing connections. Using similarity-based algorithms, it ranks and recommends relevant profiles. To make the concept more intuitive, I demonstrated the working using a graph-based approach (e.g., Alice → Bob → Charlie → David → Alice), where users are represented as nodes and connections as edges. This allows anyone to simulate and understand the backend process of mutual friend connections. 💡 Key Highlights: • Achieved 88% accuracy in friend recommendations • Improved user engagement by 30% • Delivered 92% user satisfaction in internal testing • Clean and intuitive user interface ⚙️ Tech Stack: Python, C++, HTML, CSS, JavaScript, File Handling,cloud hosting,Graph algorithms. ☁️ Deployment: Deployed on Render, making the system accessible for real-time simulation and better understanding of the recommendation logic. 🎥 Demo Video: In the video below, I demonstrate how the system works using a graph example to simulate real-world friend connections. 🔗 Live Demo:https://lnkd.in/gbXmGTEn 💻 GitHub Repository: https://lnkd.in/gZFNQ9u4 This deployment helped me showcase how recommendation systems and graph-based logic work together in real-world applications. Would love your feedback! 🙌 #MachineLearning #GraphTheory #WebDevelopment #CloudComputing #Render #Python #StudentDeveloper

2 Comments
Like Comment
To view or add a comment, sign in
Silas Rhyneer
3w
Report this post
I built termrender a few nights ago at 4:00 AM because I needed better human in the loop systems. If you've ever asked an LLM to draw a bordered dashboard in the terminal, you know the move. It generates every ╭, every ─, every padding space, by hand. A four-line panel with three bullets eats around 200 output tokens — almost all of them decorative glyphs the model computed cell by cell. Slow, expensive, and (the part I hate) usually crooked anyway, because LLMs cannot count to twelve under pressure. So termrender. Small Python library, takes directive-flavored markdown, renders ANSI. The agent writes structure, termrender draws pixels. Instead of this: ╭─ Deploy Status ─────╮ │ ✔ Unit tests passed │ │ ✔ Lint clean │ │ ✖ Integration: 2 failures │ ╰─────────────╯ The agent writes this: :::panel{title="Deploy Status" color="green"} - ✔ Unit tests passed - ✔ Lint clean - ✖ Integration: 2 failures ::: 40 tokens against 200. Same visual. Borders that are straight. The directive set kept growing as I used it. The one I'm proudest of is the tree — you write indented lines and termrender draws the ├── connectors for you, which agents are notably awful at. There's also columns, callouts, red/green diffs, bar charts, KPI tiles, timelines, mermaid rendered as ASCII, badges. Everything nests. The syntax is MyST/Pandoc fenced directives, already in every model's training data, so agents pick it up zero-shot. The token savings turned out to be the smaller half of the win. The bigger half is correctness. With hand-drawn ASCII, agents miscount cells, drift on widths, break corners on line wraps. With directives they can't, because they never touch any of it. Every alignment bug I'd been working around just stopped existing. I went looking for prior art before building. Rich (Python) is the obvious one, but it isn't a file format, and it doesn't support mermaid. Textual is a full TUI framework, way overpowered for one view. glow, mdcat, and glamour render plain markdown gorgeously, but plain markdown has no panels, no columns, no charts. rich-cli's inline tag syntax tops out at colors and basic layout. None of them were designed around the constraint that's specific to LLM authorship: emit as few decorative tokens as possible, and never trust the model with character-level alignment. Fenced directives were the bet. Borrow what agents already know, broaden the vocabulary, keep them in structure-land. If you're shipping an agent that talks to humans through a terminal, formatting overhead is a real bottleneck on cost and correctness, and directives are roughly the right shape. pip install termrender
2 Comments
Like Comment
To view or add a comment, sign in
Vishal B.
4w
Report this post
Google NotebookLM is powerful. But the web UI is holding you back. Here's what most people don't know you can automate the entire thing with Python → I spent my Sunday digging into NotebookLM-Py, an unofficial open-source Python library that unlocks capabilities Google hasn't exposed in the browser yet. What you get with this: → Full API + CLI control — create notebooks, add sources, and query programmatically → Export to PPTX, JSON, and batch-download artifacts (not possible in the web UI) → Connect directly to AI agents like Claude Code, Codex, and other LLMs → Bulk-import YouTube videos, PDFs, and Google Drive files as sources → Auto-generate podcasts, study guides, flashcards, quizzes, videos, and mind maps Why this matters if you're building with AI: → You can feed your entire knowledge base into NotebookLM via script → Generate audio overviews and research reports at scale → Pipe outputs directly into your existing AI workflows → No more copy-pasting — everything is programmatic I've packaged up the full setup scripts, docs, and walkthrough. Comment "SCRIPTS" below and I'll DM you the zip. #NotebookLM #AIAutomation #PythonScripts #AITools #BuildInPublic
Like Comment
To view or add a comment, sign in

2,596 followers

64 Posts

View Profile Connect

Building a Lightweight Formula Execution Engine for Analytics Platform

More Relevant Posts

Explore content categories