How RDD uncovers causal effects in observational data

View organization page for Towards Data Science

646,331 followers

11mo Edited

No A/B test? No problem. Learn how Regression Discontinuity Design (RDD) can uncover causal effects in observational data by exploiting sharp cutoffs. Dive into the theory and a real-world e-commerce example in Alejandro Alvarez Pérez's newest article.

Regression Discontinuity Design: How It Works and When to Use It | Towards Data Science https://towardsdatascience.com

2 Comments

Alejandro Alvarez Pérez 11mo

Thanks Towards Data Science !

2 Reactions

To view or add a comment, sign in

More Relevant Posts

Towards Data Science

646,331 followers
6mo
Report this post
Data scientists often focus on building precise algorithms, but there's a broader horizon to explore. Derek Tran's new article on search models reveal how platform firms can optimize key strategies, from partner acquisition to pricing mechanisms. Learn how these models tackle real-world challenges beyond traditional prediction models.

Prediction vs. Search Models: What Data Scientists Are Missing | Towards Data Science https://towardsdatascience.com
Like Comment
To view or add a comment, sign in
Matthew Mayo
6mo
Report this post
This article presents seven easy-to-implement tricks for performing feature engineering on text data. Depending on the complexity and requirements of the specific model to feed your data to, you may require a more or less ambitious set of these tricks. https://lnkd.in/g6y4RZ58

7 Feature Engineering Tricks for Text Data - MachineLearningMastery.com https://machinelearningmastery.com
Like Comment
To view or add a comment, sign in
Tubkwan Homchampa
6mo
Report this post
A rare glimpse into the science behind our credit models, and how data becomes impact. Read the full piece on Medium by Puvarith Veerabulyarith.

ABACUS digital

6,115 followers
6mo

Too much data? The real challenge is knowing which part truly counts. Discover how our team uses Random Bar to uncover the most impactful features — turning data overload into smarter machine learning. 🧠 Written by Puvarith Veerabulyarith (Golf), Senior Data Scientist 🔗 Read the full article here: https://lnkd.in/gVRDFBPU . #ABACUSdigital #TechForInclusiveGrowth

From 3,000+ Features to a Stable Model: Extending Random Bar for Feature Selection medium.com
Like Comment
To view or add a comment, sign in
Arpit Bhayani Arpit Bhayani is an Influencer
6mo
Report this post
Some time back, I spent some time digging deeper into K-D trees and exploring how they fit within spatial databases. K-D trees also find a solid use in ML and geo-proximity use cases. Let's dig deeper ... K-D Trees optimize for both the depth and accessibility of the data stored. Here's a quick 2-pointer gist on how it works 1. it starts from a root node 2. it recursively splits data across nodes depending on a specific dimension X or Y coordinate. The split stops when a certain condition is met, and this prompts the formation of leaf nodes. Different stopping conditions drive different use cases, and here are some of them 1. stop when there's only a single point left in the node This precision is particularly helpful for operations like pinpointing a nearest neighbor, streamlining the search process dramatically. 2. stop when the number of points in a node hits some limit This ensures a balanced k-d tree and is hence useful when you need queries to be completed in consistent time while taking up minimal resources. 3. stop when all points in a node show minimal variance along the split dimension. This stopping strategy is leveraged to build decision trees and power unsupervised clustering algorithms, where the leaf nodes form well-defined, almost homogeneous clusters. Trees are quite an interesting data structure, and there are a ton of other variants, each optimized to solve a certain class of problems really well. It's always amusing and interesting to explore such nuances :)

4 Comments
Like Comment
To view or add a comment, sign in
Dr. Scott C. Riggs
6mo Edited
Report this post
Friction between data science and engineering teams is common, and while it can look like a clash of egos, it's often a symptom of a deeper issue: they're speaking 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞𝐬. Engineers tend to think, "How does this work, what causes what, and is it practical to build in the real world?" Data scientists tend to think, "What patterns in the data are most likely to lead to the desired outcome?" Imagine the AI is like a GPS that just says "Turn left now" without showing you the map. You don't know 𝘸𝘩𝘺 you're turning left. Is it to avoid traffic, a closed road, or is it a shortcut? Because you're missing the "why," it's difficult to use your own experience to judge if it's a good instruction, and it's frustrating to be told what to do without any context. So, what is the missing layer? The missing layer is a 𝐑𝐨𝐬𝐞𝐭𝐭𝐚 𝐒𝐭𝐨𝐧𝐞, an 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐚𝐧𝐝 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐄𝐧𝐠𝐢𝐧𝐞 that translates the 𝘱𝘳𝘰𝘣𝘢𝘣𝘪𝘭𝘪𝘴𝘵𝘪𝘤 why of the data scientist into the 𝘤𝘢𝘶𝘴𝘢𝘭 why of the engineers. Meet your 𝐑𝐨𝐬𝐞𝐭𝐭𝐚 𝐒𝐭𝐨𝐧𝐞, 𝐈𝐆𝐎𝐑. 𝐈𝐆𝐎𝐑 in 30 seconds: https://lnkd.in/gjutggi7 𝐈𝐆𝐎𝐑 demo: https://lnkd.in/gNbb-c5h
Like Comment
To view or add a comment, sign in
findwhatmatters.ai

50 followers
6mo
Report this post
We’ve built the missing layer for physical AI: a reasoning engine that translates black-box correlations into 𝐜𝐚𝐮𝐬𝐚𝐥, 𝐫𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐢𝐧𝐬𝐢𝐠𝐡𝐭𝐬 engineers can actually 𝐭𝐫𝐮𝐬𝐭 𝐚𝐧𝐝 𝐚𝐜𝐭 𝐨𝐧. Meet 𝐈𝐆𝐎𝐑. 𝐈𝐆𝐎𝐑 in 30 seconds: https://lnkd.in/gjutggi7 𝐈𝐆𝐎𝐑 demo: https://lnkd.in/gNbb-c5h #AI4Science #PhysicalAI #AIDrivenDiscovery #DeepTech #Materials #ProcessOptimization

Dr. Scott C. Riggs

Founder, findwhatmatters.ai
6mo Edited

Friction between data science and engineering teams is common, and while it can look like a clash of egos, it's often a symptom of a deeper issue: they're speaking 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞𝐬. Engineers tend to think, "How does this work, what causes what, and is it practical to build in the real world?" Data scientists tend to think, "What patterns in the data are most likely to lead to the desired outcome?" Imagine the AI is like a GPS that just says "Turn left now" without showing you the map. You don't know 𝘸𝘩𝘺 you're turning left. Is it to avoid traffic, a closed road, or is it a shortcut? Because you're missing the "why," it's difficult to use your own experience to judge if it's a good instruction, and it's frustrating to be told what to do without any context. So, what is the missing layer? The missing layer is a 𝐑𝐨𝐬𝐞𝐭𝐭𝐚 𝐒𝐭𝐨𝐧𝐞, an 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐚𝐧𝐝 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐄𝐧𝐠𝐢𝐧𝐞 that translates the 𝘱𝘳𝘰𝘣𝘢𝘣𝘪𝘭𝘪𝘴𝘵𝘪𝘤 why of the data scientist into the 𝘤𝘢𝘶𝘴𝘢𝘭 why of the engineers. Meet your 𝐑𝐨𝐬𝐞𝐭𝐭𝐚 𝐒𝐭𝐨𝐧𝐞, 𝐈𝐆𝐎𝐑. 𝐈𝐆𝐎𝐑 in 30 seconds: https://lnkd.in/gjutggi7 𝐈𝐆𝐎𝐑 demo: https://lnkd.in/gNbb-c5h
Like Comment
To view or add a comment, sign in
Wes Madrigal
6mo Edited
Report this post
When Kurve started our core thesis of extracting schema graphs of relationships fell on deaf ears - many didn't understand the relevance at the time. As time has passed and our product has matured, the market has woken up to the fact that any text-to-sql effort, or manual analytics and AI done by humans, on tabular data requires reliable metadata. That’s why there’s been a recent surge in interest around “knowledge graphs” and “semantic layers.” We’ve been building for this moment. Kurve automatically extract schema graphs of relationships between tables on data lakehouses. When you combine robust schema graphs with data preparation abstractions, such as those Kurve provides, you get reliable data discovery & data prep at scale. Without it you get failed pilots and 0 ROI. If you're a Snowflake customer try us out on the Marketplace: https://lnkd.in/eTw8t5NP A recent engineering post about Cortex Analyst accuracy with and without semantic view context, including but not limited to relationships:
1 Comment
Like Comment
To view or add a comment, sign in
Towards Data Science

646,331 followers
6mo
Report this post
Are your ML model's performance metrics on panel data too good to be true? Data leakage might be inflating your results. Marco Letta, Augusto Cerqua, and Gabriele Pinto explain how to get a realistic assessment of your model's accuracy.

Machine Learning Meets Panel Data: What Practitioners Need to Know | Towards Data Science https://towardsdatascience.com
Like Comment
To view or add a comment, sign in
Adi Vemuru
6mo
Report this post
🌐 From Semantics to Decisions: Context Is the New Compute The semantic layer solved what things mean. The next challenge is how they connect. 👉 Meaning without context is like words without grammar — you can read them, but you can’t reason with them. The context graph is that missing layer. It links “User → Activation → Conversion → Revenue,” turning definitions into a system of reasoning. 💡 In the last decade, we optimized analytics for compute — faster queries, bigger clusters. The next decade will optimize for context — systems that understand cause and effect. Compute makes data fast. Context makes data intelligent. The semantic layer built the dictionary. Context builds the language. Decision Intelligence writes the story. 💬 How close is your data stack to understanding relationships, not just definitions? #DecisionIntelligence #SemanticLayer #DataIntelligence #Analytics #KnowledgeGraph #DataModeling #ContextIsTheNewCompute
Like Comment
To view or add a comment, sign in
IBEKWE FAVOUR
6mo
Report this post
🎯 Tackling Class Imbalance in Data Science. Here’s how I’ve learned to handle class imbalance more effectively: 🔹 1. Understand the Data Before Modeling Don’t rush to train. Visualize class distributions. Know the scale of imbalance before you tweak anything. 🔹 2. Use the Right Metrics Accuracy alone can be deceptive. Try metrics like Precision, Recall, F1-score, or ROC-AUC they reveal what your model is really doing. 🔹 3. Resampling Techniques • Oversampling the minority (e.g., SMOTE) • Undersampling the majority Both can help, but balance is key too much oversampling can cause overfitting. 🔹 4. Algorithmic Approaches Some algorithms handle imbalance better (like XGBoost or Random Forest). You can also assign class weights to make the model “care more” about underrepresented cases. 💡 Good data science isn’t about just accuracy it’s about balanced understanding.
2 Comments
Like Comment
To view or add a comment, sign in

646,331 followers

View Profile Connect

How RDD uncovers causal effects in observational data

More from this author

🔎 What's on our reading list this week?

✨ What's on our reading list this week?

✨ What's on our reading list this week?

Explore content categories