Since July I've been building a new reporting system from scratch. My IT team is Claude AI. 11 countries. Weekly Excel files. One automated pipeline. This is what it looks like. ↓ Over the next weeks I'll share how each part works — ingestion, data model, quality checks, outputs. First up: how do you parse 11 different Excel formats into one unified structure? #financeautomation #python #duckdb #claudeai
Michal Vajcik’s Post
More Relevant Posts
-
Rebuilt a simple data pipeline around SPY to get back into hands-on work. I kept it pretty straightforward—pull data, clean it, add moving averages, generate signals, and see how it performs against buy-and-hold. What stood out this time was how much clarity you get by structuring things properly. Breaking it into steps (ingestion → transform → signals → backtest) made everything easier to reason about. It’s not anything fancy, but going back to basics like this helped more than jumping straight into complex setups. Planning to build on this and experiment with a few more strategies next. Code here: https://lnkd.in/gkbUbbhD #DataEngineering #DataScience #Python #LearningInPublic
To view or add a comment, sign in
-
-
Data collection series · Post 07 Imputation strategies — beyond filling with the mean "Mean imputation is fast. It's also wrong in most cases. Here are 4 better strategies and exactly when to use each." Filling missing values with the mean is fast. It's also quietly wrong in most cases. Here are 4 better strategies — and exactly when to use each. ▼ Mean imputation is the default. Everyone learns it first. It's one line of code. It ships fast. But it has a serious flaw: It collapses variance. Replace 500 missing values with the mean — and your distribution gets an artificial spike right in the middle. Your correlations weaken. Your model learns a distorted world. There are better options. Here's the practical guide. --- #Python #DataScience #DataQuality #DataCleaning #Analytics #DataAnalyst #DataAnalytics #DataEngineering #Imputationstrategies
To view or add a comment, sign in
-
So there’s this exciting concept in data called “imputation.” Okay it’s not that exciting, I just like the name, but it’s actually pretty important. It’s basically when you deal with missing values by filling them in using the rest of the dataset. Not in a vague “surrounding data” way, but using actual methods like mean, median, or mode, sometimes forward or backward fill, and in more serious cases even models to estimate what should be there. The other option is to just delete the missing data. Either drop the rows or even the whole column. This is common with large datasets, especially when the missing values are small enough that removing them won’t mess with the overall analysis. But it’s not something you just do blindly, because depending on why the data is missing, you can end up biasing your results without realizing it. So yeah, it sounds like a small step, but it actually matters. #LearningInPublic #Python #DataCleaning #DataAnalysis #Data
To view or add a comment, sign in
-
-
𝐓𝐮𝐫𝐧𝐢𝐧𝐠 𝐃𝐚𝐭𝐚 𝐢𝐧𝐭𝐨 𝐈𝐧𝐬𝐢𝐠𝐡𝐭𝐬: 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐧𝐠 𝐂𝐮𝐬𝐭𝐨𝐦𝐞𝐫 𝐒𝐚𝐭𝐢𝐬𝐟𝐚𝐜𝐭𝐢𝐨𝐧 📊 Is it possible to know if a customer is happy before they even leave a review? I recently built a Machine Learning model to answer that exact question. Using Airline cutomer satisfaction dataset, I developed a classification system with decision tree that predicts the 'Satisfaction' column with 𝟗𝟐.𝟓% accuracy. 𝐓𝐡𝐞 𝐓𝐞𝐜𝐡 𝐒𝐭𝐚𝐜𝐤: 🐍 Python & Pandas for Data Manipulation 🤖 Scikit-Learn for Model Building 📈 Matplotlib/Seaborn for Visualization 𝐖𝐡𝐚𝐭 𝐈 𝐥𝐞𝐚𝐫𝐧𝐞𝐝: While the accuracy is high, the real value was in the 𝐑𝐞𝐜𝐚𝐥𝐥 (𝟎.𝟗𝟏). This means the model is incredibly reliable at catching dissatisfied customers, allowing businesses to intervene early. Swipe through the slides below to see my process and the final classification report! ➡️ I’ve added the full code to my GitHub. Check my Featured Section for the link! 🔗 https://lnkd.in/grS89Vty #MachineLearning #DataScience #CustomerExperience #Python #AI2026 #PortfolioProject
To view or add a comment, sign in
-
📊 Recently explored 𝘆𝗱𝗮𝘁𝗮-𝗽𝗿𝗼𝗳𝗶𝗹𝗶𝗻𝗴 pandas library for Exploratory Data Analysis (EDA) and it’s a game changer! It provides a complete summary of the dataset with powerful visualizations, helping to quickly understand: 1️⃣ Dataset overview (structure, types) 2️⃣ Missing values detection 3️⃣ Distribution analysis 4️⃣ Correlation insights 5️⃣ Automatic visual reports 💡 One key takeaway: Before starting any data project, it’s highly valuable to review your dataset at least once using this report by ydata-profiling pandas library. It saves time, highlights hidden patterns, and improves decision-making. 🚀 Turning raw data into insights becomes much more efficient! #DataScience #EDA #Python #DataAnalysis #MachineLearning #LearningJourney
To view or add a comment, sign in
-
𝐂𝐥𝐞𝐚𝐧𝐞𝐫 𝐂𝐨𝐝𝐞 > 𝐌𝐨𝐫𝐞 𝐂𝐨𝐝𝐞 (𝐀 𝐒𝐦𝐚𝐥𝐥 𝐏𝐚𝐧𝐝𝐚𝐬 𝐑𝐞𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧) Ran into something interesting while working with pandas today — chaining operations actually makes a huge difference. Instead of writing step-by-step code for filtering, grouping, and aggregating, combining them in a single flow made the analysis much cleaner and faster to read. Something like filtering a dataset → grouping by category → calculating averages — all in one pipeline. Feels closer to how real analysis should look instead of breaking everything into isolated steps. Still improving, but this felt like a shift from “practice code” to more structured data work. #Python #Pandas #DataAnalytics #LearningByDoing
To view or add a comment, sign in
-
I'm doing something a little different ----> I'm learning, practicing, and building all at the same time. The data came in as one messy array. Everything was a string ----> step counts, calories, mood, all jumbled together. Before I could analyze anything, I had to separate and convert each column manually: Python date, step_count, mood, calories, sleep, activity = data.T step_count = np.array(step_count, dtype='int') Took me a while to understand WHY this works. .T transposes the array ----> rows become columns, columns become rows. Suddenly extracting one feature at a time becomes simple. Lesson: half of data science is just getting the data into a shape you can actually work with. #Python #NumPy #DataCleaning #DataScience
To view or add a comment, sign in
-
Day 82 - Relational Plots & Time Series analysis 🚀 Continuing my journey into data visualization, today I focused on understanding relationships in data and extracting insights from time-based patterns using Python. Here’s what I explored: 📊 Scatter Plot with Marginal Histograms Visualizing relationships along with distributions gave a much richer context than a standalone scatter plot. 📈 Line Plot with Seaborn Improved how I represent trends with cleaner, more intuitive visualizations using Seaborn. ⏳ Time Series Plot with Seaborn & Pandas Worked with time-indexed data to uncover patterns and trends over time — a key skill in real-world analytics. 📉 Time Series with Rolling Average Smoothing noisy data using rolling averages helped reveal the underlying trend more clearly. 💡 Key takeaway: Effective visualization isn’t just about charts — it’s about telling a clear story with data. #DataScience #Python #Seaborn #Pandas #DataVisualization #TimeSeries #Analytics
To view or add a comment, sign in
-
-
Here’s a professional and engaging LinkedIn caption for your post: --- Turning messy data into meaningful insights is an art—and the right tools make all the difference. 📊✨ From confusing default plots to clean, decision-ready visuals, mastering Python & Seaborn can completely transform how you communicate data in the boardroom. And understanding concepts like Cross Join (Cartesian Product) isn’t just theory—it’s the foundation of smarter analytics. Stop guessing. Start visualizing. Start influencing decisions. 🚀 #DataAnalytics #Python #Seaborn #DataVisualization #BusinessIntelligence #AnalyticsJourney #DataScience #SQL #LearningEveryday #CareerGrowth #TechSkills #DataDriven #LinkedInLearning
To view or add a comment, sign in
-
-
🚀 𝗗𝗮𝘆 𝟯 : 𝗧𝗼𝗱𝗮𝘆 𝗜 𝗲𝘅𝗽𝗹𝗼𝗿𝗲𝗱 𝘀𝗼𝗺𝗲 𝗯𝗮𝘀𝗶𝗰 𝗯𝘂𝘁 𝘃𝗲𝗿𝘆 𝗶𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝘁 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 𝗶𝗻 𝗣𝗮𝗻𝗱𝗮𝘀 𝗳𝗼𝗿 𝗱𝗮𝘁𝗮 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 📊 🔍 1. head() Shows the first 5 rows of the dataset df.head() 🔍 2. tail() Shows the last 5 rows df.tail() 📏 3. shape Returns number of rows and columns df.shape ℹ️ 4. info() Provides summary of dataset (data types, null values) df.info() 📊 5. describe() Gives statistical summary (mean, min, max, etc.) df.describe() 📌 6. columns Shows all column names df.columns 💡 Key Learning: Understanding your dataset is the first step before doing any analysis. #Day3 #Pandas #Python #DataAnalytics #LearningJourney #DataExploration
To view or add a comment, sign in
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development