No A/B test? No problem. Learn how Regression Discontinuity Design (RDD) can uncover causal effects in observational data by exploiting sharp cutoffs. Dive into the theory and a real-world e-commerce example in Alejandro Alvarez Pérez's newest article.
How RDD uncovers causal effects in observational data
More Relevant Posts
-
Data scientists often focus on building precise algorithms, but there's a broader horizon to explore. Derek Tran's new article on search models reveal how platform firms can optimize key strategies, from partner acquisition to pricing mechanisms. Learn how these models tackle real-world challenges beyond traditional prediction models.
To view or add a comment, sign in
-
This article presents seven easy-to-implement tricks for performing feature engineering on text data. Depending on the complexity and requirements of the specific model to feed your data to, you may require a more or less ambitious set of these tricks. https://lnkd.in/g6y4RZ58
To view or add a comment, sign in
-
A rare glimpse into the science behind our credit models, and how data becomes impact. Read the full piece on Medium by Puvarith Veerabulyarith.
Too much data? The real challenge is knowing which part truly counts. Discover how our team uses Random Bar to uncover the most impactful features — turning data overload into smarter machine learning. 🧠 Written by Puvarith Veerabulyarith (Golf), Senior Data Scientist 🔗 Read the full article here: https://lnkd.in/gVRDFBPU . #ABACUSdigital #TechForInclusiveGrowth
To view or add a comment, sign in
-
Some time back, I spent some time digging deeper into K-D trees and exploring how they fit within spatial databases. K-D trees also find a solid use in ML and geo-proximity use cases. Let's dig deeper ... K-D Trees optimize for both the depth and accessibility of the data stored. Here's a quick 2-pointer gist on how it works 1. it starts from a root node 2. it recursively splits data across nodes depending on a specific dimension X or Y coordinate. The split stops when a certain condition is met, and this prompts the formation of leaf nodes. Different stopping conditions drive different use cases, and here are some of them 1. stop when there's only a single point left in the node This precision is particularly helpful for operations like pinpointing a nearest neighbor, streamlining the search process dramatically. 2. stop when the number of points in a node hits some limit This ensures a balanced k-d tree and is hence useful when you need queries to be completed in consistent time while taking up minimal resources. 3. stop when all points in a node show minimal variance along the split dimension. This stopping strategy is leveraged to build decision trees and power unsupervised clustering algorithms, where the leaf nodes form well-defined, almost homogeneous clusters. Trees are quite an interesting data structure, and there are a ton of other variants, each optimized to solve a certain class of problems really well. It's always amusing and interesting to explore such nuances :)
To view or add a comment, sign in
-
Friction between data science and engineering teams is common, and while it can look like a clash of egos, it's often a symptom of a deeper issue: they're speaking 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞𝐬. Engineers tend to think, "How does this work, what causes what, and is it practical to build in the real world?" Data scientists tend to think, "What patterns in the data are most likely to lead to the desired outcome?" Imagine the AI is like a GPS that just says "Turn left now" without showing you the map. You don't know 𝘸𝘩𝘺 you're turning left. Is it to avoid traffic, a closed road, or is it a shortcut? Because you're missing the "why," it's difficult to use your own experience to judge if it's a good instruction, and it's frustrating to be told what to do without any context. So, what is the missing layer? The missing layer is a 𝐑𝐨𝐬𝐞𝐭𝐭𝐚 𝐒𝐭𝐨𝐧𝐞, an 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐚𝐧𝐝 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐄𝐧𝐠𝐢𝐧𝐞 that translates the 𝘱𝘳𝘰𝘣𝘢𝘣𝘪𝘭𝘪𝘴𝘵𝘪𝘤 why of the data scientist into the 𝘤𝘢𝘶𝘴𝘢𝘭 why of the engineers. Meet your 𝐑𝐨𝐬𝐞𝐭𝐭𝐚 𝐒𝐭𝐨𝐧𝐞, 𝐈𝐆𝐎𝐑. 𝐈𝐆𝐎𝐑 in 30 seconds: https://lnkd.in/gjutggi7 𝐈𝐆𝐎𝐑 demo: https://lnkd.in/gNbb-c5h
To view or add a comment, sign in
-
We’ve built the missing layer for physical AI: a reasoning engine that translates black-box correlations into 𝐜𝐚𝐮𝐬𝐚𝐥, 𝐫𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐢𝐧𝐬𝐢𝐠𝐡𝐭𝐬 engineers can actually 𝐭𝐫𝐮𝐬𝐭 𝐚𝐧𝐝 𝐚𝐜𝐭 𝐨𝐧. Meet 𝐈𝐆𝐎𝐑. 𝐈𝐆𝐎𝐑 in 30 seconds: https://lnkd.in/gjutggi7 𝐈𝐆𝐎𝐑 demo: https://lnkd.in/gNbb-c5h #AI4Science #PhysicalAI #AIDrivenDiscovery #DeepTech #Materials #ProcessOptimization
Friction between data science and engineering teams is common, and while it can look like a clash of egos, it's often a symptom of a deeper issue: they're speaking 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞𝐬. Engineers tend to think, "How does this work, what causes what, and is it practical to build in the real world?" Data scientists tend to think, "What patterns in the data are most likely to lead to the desired outcome?" Imagine the AI is like a GPS that just says "Turn left now" without showing you the map. You don't know 𝘸𝘩𝘺 you're turning left. Is it to avoid traffic, a closed road, or is it a shortcut? Because you're missing the "why," it's difficult to use your own experience to judge if it's a good instruction, and it's frustrating to be told what to do without any context. So, what is the missing layer? The missing layer is a 𝐑𝐨𝐬𝐞𝐭𝐭𝐚 𝐒𝐭𝐨𝐧𝐞, an 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐚𝐧𝐝 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐄𝐧𝐠𝐢𝐧𝐞 that translates the 𝘱𝘳𝘰𝘣𝘢𝘣𝘪𝘭𝘪𝘴𝘵𝘪𝘤 why of the data scientist into the 𝘤𝘢𝘶𝘴𝘢𝘭 why of the engineers. Meet your 𝐑𝐨𝐬𝐞𝐭𝐭𝐚 𝐒𝐭𝐨𝐧𝐞, 𝐈𝐆𝐎𝐑. 𝐈𝐆𝐎𝐑 in 30 seconds: https://lnkd.in/gjutggi7 𝐈𝐆𝐎𝐑 demo: https://lnkd.in/gNbb-c5h
To view or add a comment, sign in
-
When Kurve started our core thesis of extracting schema graphs of relationships fell on deaf ears - many didn't understand the relevance at the time. As time has passed and our product has matured, the market has woken up to the fact that any text-to-sql effort, or manual analytics and AI done by humans, on tabular data requires reliable metadata. That’s why there’s been a recent surge in interest around “knowledge graphs” and “semantic layers.” We’ve been building for this moment. Kurve automatically extract schema graphs of relationships between tables on data lakehouses. When you combine robust schema graphs with data preparation abstractions, such as those Kurve provides, you get reliable data discovery & data prep at scale. Without it you get failed pilots and 0 ROI. If you're a Snowflake customer try us out on the Marketplace: https://lnkd.in/eTw8t5NP A recent engineering post about Cortex Analyst accuracy with and without semantic view context, including but not limited to relationships:
To view or add a comment, sign in
-
-
Are your ML model's performance metrics on panel data too good to be true? Data leakage might be inflating your results. Marco Letta, Augusto Cerqua, and Gabriele Pinto explain how to get a realistic assessment of your model's accuracy.
To view or add a comment, sign in
-
🌐 From Semantics to Decisions: Context Is the New Compute The semantic layer solved what things mean. The next challenge is how they connect. 👉 Meaning without context is like words without grammar — you can read them, but you can’t reason with them. The context graph is that missing layer. It links “User → Activation → Conversion → Revenue,” turning definitions into a system of reasoning. 💡 In the last decade, we optimized analytics for compute — faster queries, bigger clusters. The next decade will optimize for context — systems that understand cause and effect. Compute makes data fast. Context makes data intelligent. The semantic layer built the dictionary. Context builds the language. Decision Intelligence writes the story. 💬 How close is your data stack to understanding relationships, not just definitions? #DecisionIntelligence #SemanticLayer #DataIntelligence #Analytics #KnowledgeGraph #DataModeling #ContextIsTheNewCompute
To view or add a comment, sign in
-
🎯 Tackling Class Imbalance in Data Science. Here’s how I’ve learned to handle class imbalance more effectively: 🔹 1. Understand the Data Before Modeling Don’t rush to train. Visualize class distributions. Know the scale of imbalance before you tweak anything. 🔹 2. Use the Right Metrics Accuracy alone can be deceptive. Try metrics like Precision, Recall, F1-score, or ROC-AUC they reveal what your model is really doing. 🔹 3. Resampling Techniques • Oversampling the minority (e.g., SMOTE) • Undersampling the majority Both can help, but balance is key too much oversampling can cause overfitting. 🔹 4. Algorithmic Approaches Some algorithms handle imbalance better (like XGBoost or Random Forest). You can also assign class weights to make the model “care more” about underrepresented cases. 💡 Good data science isn’t about just accuracy it’s about balanced understanding.
To view or add a comment, sign in
-
More from this author
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
Thanks Towards Data Science !