Ever run a Python script and get a frustrating “file not found” error? 😤 This simple snippet can save you hours 👇 import os # Check if we're in the right place print("Current directory: ", os.getcwd()) # Check if our data file exists data_path = "data/sales.csv" if os.path.exists(data_path): print(f"Found {data_path}") else: print(f"X Cannot find {data_path}") print("Make sure you're running from the sales-analysis folder!") 💡 What’s happening here? 🔹 os.getcwd() Prints your current working directory — this tells you where your script is running from. Many errors happen because you're in the wrong folder. 🔹 data_path = "data/sales.csv" Defines the relative path to your dataset. 🔹 os.path.exists(data_path) Checks if the file actually exists before trying to use it. 🔹 Conditional check (if / else) Gives clear feedback: ✔ Found the file ❌ Or tells you it’s missing 🚀 Why this matters Prevents runtime errors Helps debug file path issues quickly Makes your scripts more reliable Essential habit for data analysis projects 📊 Whether you're working on data science, automation, or AI — always verify your file paths before processing data. Small habit. Big impact. #Python #Programming #DataScience #AI #CodingTips #Debugging
Prevent File Not Found Errors with Python Script
More Relevant Posts
-
Nobody talks about the quiet revolution that already happened in Python data tooling. Pandas was the default for years. Comfortable. Familiar. Everywhere. But in 2024–2025, something shifted. Here's what the modern Python data stack actually looks like now: → DuckDB for analytical queries on local files No server. No setup. Just SQL that runs faster than you expect directly on CSVs and Parquets. → Polars for dataframe operations Written in Rust. Built from scratch for multi-core CPUs. Lazy evaluation by default. On large datasets, it's not 2× faster than Pandas. It's often 10–50×. → Pandas is still useful. But mostly as a last step for compatibility, not for computation. The real insight here isn't the tools. It's the mental model. The old stack was: load → transform → analyze (all in Pandas). The new stack is: query first (DuckDB) → transform fast (Polars) → output clean (Pandas if needed). If you're still running df.groupby() on a 5M-row CSV in Pandas and wondering why your laptop fan is screaming this is for you. I wrote a deep dive on exactly this shift covering benchmarks, real code comparisons, and when to use which tool. Follow for more practical AI & data engineering content. What's your current go-to for data wrangling? Still Pandas, or have you made the switch? 👇 #Pandas #Python #DataScience #AI #DataCleaning
To view or add a comment, sign in
-
𝐏𝐂𝐀 (𝐏𝐫𝐢𝐧𝐜𝐢𝐩𝐚𝐥 𝐂𝐨𝐦𝐩𝐨𝐧𝐞𝐧𝐭 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬)- 𝐖𝐡𝐞𝐧 𝐭𝐨𝐨 𝐦𝐚𝐧𝐲 𝐟𝐞𝐚𝐭𝐮𝐫𝐞𝐬 𝐬𝐭𝐚𝐫𝐭 𝐛𝐞𝐜𝐨𝐦𝐢𝐧𝐠 𝐚 𝐩𝐫𝐨𝐛𝐥𝐞𝐦… While working on datasets with a large number of features, I realized something important: 𝐌𝐨𝐫𝐞 𝐟𝐞𝐚𝐭𝐮𝐫𝐞𝐬 ≠ 𝐛𝐞𝐭𝐭𝐞𝐫 𝐦𝐨𝐝𝐞𝐥 In fact, too many features can lead to a problem called: - Curse of Dimensionality - Models become slow - Computation increases - Noise increases - Visualization becomes difficult 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧 → 𝐃𝐢𝐦𝐞𝐧𝐬𝐢𝐨𝐧𝐚𝐥𝐢𝐭𝐲 𝐑𝐞𝐝𝐮𝐜𝐭𝐢𝐨𝐧 𝐏𝐂𝐀 is an 𝐮𝐧𝐬𝐮𝐩𝐞𝐫𝐯𝐢𝐬𝐞𝐝 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 technique used when we only have input features (no target/output). It is a 𝐟𝐞𝐚𝐭𝐮𝐫𝐞 𝐞𝐱𝐭𝐫𝐚𝐜𝐭𝐢𝐨𝐧 𝐭𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞 𝐭𝐡𝐚𝐭 𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐬 𝐡𝐢𝐠𝐡-𝐝𝐢𝐦𝐞𝐧𝐬𝐢𝐨𝐧𝐚𝐥 𝐝𝐚𝐭𝐚 𝐢𝐧𝐭𝐨 𝐥𝐨𝐰𝐞𝐫 𝐝𝐢𝐦𝐞𝐧𝐬𝐢𝐨𝐧𝐬 while preserving most of the important information. " In simple words: It keeps the essence of data but reduces complexity." 𝐔𝐬𝐢𝐧𝐠 𝐏𝐂𝐀 𝐡𝐞𝐥𝐩𝐬:- Reduce number of features - Improve model performance - Reduce computation cost - Speed up training - Make data easier to visualize 𝐇𝐨𝐰 𝐏𝐂𝐀 𝐖𝐨𝐫𝐤𝐬 (𝐒𝐭𝐞𝐩𝐬 𝐈 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐞𝐝) 𝐒𝐭𝐞𝐩 1️⃣: 𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝𝐢𝐳𝐞 𝐭𝐡𝐞 𝐝𝐚𝐭𝐚 Because PCA is scale-sensitive 𝐒𝐭𝐞𝐩 2️⃣: 𝐂𝐨𝐦𝐩𝐮𝐭𝐞 𝐂𝐨𝐯𝐚𝐫𝐢𝐚𝐧𝐜𝐞 𝐌𝐚𝐭𝐫𝐢𝐱 To understand relationships between features 𝐒𝐭𝐞𝐩 3️⃣: 𝐅𝐢𝐧𝐝 𝐄𝐢𝐠𝐞𝐧𝐯𝐚𝐥𝐮𝐞𝐬 & 𝐄𝐢𝐠𝐞𝐧𝐯𝐞𝐜𝐭𝐨𝐫𝐬 import numpy as np eigen_values, eigen_vectors=np.linalg.eig(cov_matrix) 𝐒𝐭𝐞𝐩 4️⃣: 𝐒𝐞𝐥𝐞𝐜𝐭 𝐏𝐫𝐢𝐧𝐜𝐢𝐩𝐚𝐥 𝐂𝐨𝐦𝐩𝐨𝐧𝐞𝐧𝐭𝐬 Choose top components with highest variance 𝘗𝘊𝘈 𝘪𝘴 𝘯𝘰𝘵 𝘫𝘶𝘴𝘵 𝘳𝘦𝘥𝘶𝘤𝘪𝘯𝘨 𝘤𝘰𝘭𝘶𝘮𝘯𝘴… 𝘐𝘵’𝘴 𝘢𝘣𝘰𝘶𝘵 𝘬𝘦𝘦𝘱𝘪𝘯𝘨 𝘵𝘩𝘦 𝘮𝘰𝘴𝘵 𝘪𝘮𝘱𝘰𝘳𝘵𝘢𝘯𝘵 𝘪𝘯𝘧𝘰𝘳𝘮𝘢𝘵𝘪𝘰𝘯 𝘸𝘩𝘪𝘭𝘦 𝘳𝘦𝘮𝘰𝘷𝘪𝘯𝘨 𝘳𝘦𝘥𝘶𝘯𝘥𝘢𝘯𝘤𝘺 #Datascience #Dataanalyst #Machinelearning #curseofdimensionality #featureextraction #python #numpy
To view or add a comment, sign in
-
📘 Day 7 – Understanding Dictionaries, Tuples & Sets in Python So far in this journey, we’ve already explored lists — how to store, access, and manipulate ordered data. Today, we move a step further by understanding three other powerful Python data structures: Dictionaries, Tuples, and Sets. 🔹 1. Dictionary (key-value pairs) Think of a dictionary like a real-life glossary 📖 — each word (key) has a meaning (value). Example: student = { "name": "Abiodun", "track": "AI/ML", "day": 7 } ✔ Stores data in key-value format ✔ Fast lookup using keys ✔ Very useful for structured data (e.g., user profiles, configs) 🔹 2. Tuple (ordered but immutable) Tuples are like lists, but cannot be changed after creation. Example: coordinates = (10, 20) ✔ Ordered ✔ Cannot add/remove items ✔ Faster and safer for fixed data 🔹 3. Set (unique, unordered collection) Sets automatically remove duplicates. Example: numbers = {1, 2, 2, 3, 4} # Output: {1, 2, 3, 4} ✔ No duplicate values ✔ Unordered ✔ Useful for filtering unique items 💡 Quick Comparison - List → Ordered, changeable - Tuple → Ordered, not changeable - Set → Unordered, no duplicates - Dictionary → Key-value pairs #Python #DataStructures #AIJourney #M4ACE #M4ACELearningChallenge #LearningInPublic
To view or add a comment, sign in
-
📊 Beyond the Bell Curve: Handling "Messy" Data in Python As data scientists, we often dream of perfect, Gaussian (normal) distributions. But in the real world—especially with variables like car prices or housing data—the data is rarely "normal." I recently worked through a project involving Left-Skewed and Non-Parametric data. Here’s a breakdown of how I handled it using Python: 1️⃣ Identifying the Shape Before running any tests, I used Matplotlib to visualize the distribution. A high bin count (150) helped reveal a significant Left Skew, where the mean was being pulled down by a long tail of lower-priced entries. Python plt.hist(prices, bins=150) plt.show(); 2️⃣ The Transformation Strategy When data is left-skewed, standard parametric tests (like T-Tests) can become biased. To pull that "tail" back toward the center and achieve symmetry, I explored Square ($x^2$) and Cube ($x^3$) transformations. By stretching the right side of the distribution more than the left, these mathematical shifts can often "normalize" the data, allowing for more powerful statistical modeling. 3️⃣ When to Stay Non-Parametric If the data is truly "Non-Parametric" (multimodal or containing extreme gaps), forcing a transformation isn't the answer. In those cases, I pivot to Rank-Based tests like: ✅ Mann-Whitney U (instead of T-Test) ✅ Kruskal-Wallis (instead of ANOVA) ✅ Spearman’s Rank (instead of Pearson Correlation) The takeaway: Don't just import your library and hit "run." Understanding the geometry of your data is the difference between a biased model and an accurate insight. 💡 #DataScience #Python #Statistics #MachineLearning #Pandas #DataAnalytics #DataIntegrity
To view or add a comment, sign in
-
-
𝙥𝙞𝙥 𝙞𝙣𝙨𝙩𝙖𝙡𝙡 𝙙𝙖𝙩𝙖𝙥𝙧𝙞𝙨𝙢 We just open-sourced dataprism. A Python EDA library built for how data work happens now, not a decade ago. Most EDA tools assume you're sitting in a Jupyter notebook, clicking through HTML reports, eyeballing distributions one at a time. They were never designed for multi-source datasets, AI agents or production pipelines. 𝗱𝗮𝘁𝗮𝗽𝗿𝗶𝘀𝗺 𝗶𝘀. We put it up against 5 popular EDA tools. The comparison table speaks for itself. Here's what it does that no other EDA tool offers out of the box: 1️⃣ Predictive power analysis — know which features matter before you model 2️⃣ Drift detection — catch when your data shifts over time or between splits 3️⃣ Data quality scoring — one number to tell how clean your data really is 4️⃣ Multi-source support — built for datasets stitched from multiple providers 5️⃣ Schema-aware — define data once, analyze it consistently every time 6️⃣ Structured JSON — built for AI agents & pipelines, not just dashboards 7️⃣ Interactive explorer — when you do want to look, it's all there in one page dataprism is open source under MIT. This is day one. Try it. Break it. Star it. Contribute to it.
To view or add a comment, sign in
-
Huge kudos to Gunasekaran for open sourcing this EDA tool we've been using internally in our products for all Data Scientists and Analysts to leverage. Our clients have already seen its power, especially in BFSI when scouring through a huge data catalog and trying to find what works for them. 3400 downloads already and counting! We will strive to keep adding new and more useful features that make EDA less daunting and your analysis more model ready. 𝙥𝙞𝙥 𝙞𝙣𝙨𝙩𝙖𝙡𝙡 𝙙𝙖𝙩𝙖𝙥𝙧𝙞𝙨𝙢 | https://lnkd.in/gVWkVB4V
𝙥𝙞𝙥 𝙞𝙣𝙨𝙩𝙖𝙡𝙡 𝙙𝙖𝙩𝙖𝙥𝙧𝙞𝙨𝙢 We just open-sourced dataprism. A Python EDA library built for how data work happens now, not a decade ago. Most EDA tools assume you're sitting in a Jupyter notebook, clicking through HTML reports, eyeballing distributions one at a time. They were never designed for multi-source datasets, AI agents or production pipelines. 𝗱𝗮𝘁𝗮𝗽𝗿𝗶𝘀𝗺 𝗶𝘀. We put it up against 5 popular EDA tools. The comparison table speaks for itself. Here's what it does that no other EDA tool offers out of the box: 1️⃣ Predictive power analysis — know which features matter before you model 2️⃣ Drift detection — catch when your data shifts over time or between splits 3️⃣ Data quality scoring — one number to tell how clean your data really is 4️⃣ Multi-source support — built for datasets stitched from multiple providers 5️⃣ Schema-aware — define data once, analyze it consistently every time 6️⃣ Structured JSON — built for AI agents & pipelines, not just dashboards 7️⃣ Interactive explorer — when you do want to look, it's all there in one page dataprism is open source under MIT. This is day one. Try it. Break it. Star it. Contribute to it.
To view or add a comment, sign in
-
We built dataprism at LattIQ to solve our own frustrations with EDA — messy multi-source datasets, no structured output for pipelines, and tools that assumed you'd always be staring at a notebook. Now it's open source under MIT. If you work with messy, multi-source data — try it and tell us what's missing. 𝘱𝘪𝘱 𝘪𝘯𝘴𝘵𝘢𝘭𝘭 𝘥𝘢𝘵𝘢𝘱𝘳𝘪𝘴𝘮
𝙥𝙞𝙥 𝙞𝙣𝙨𝙩𝙖𝙡𝙡 𝙙𝙖𝙩𝙖𝙥𝙧𝙞𝙨𝙢 We just open-sourced dataprism. A Python EDA library built for how data work happens now, not a decade ago. Most EDA tools assume you're sitting in a Jupyter notebook, clicking through HTML reports, eyeballing distributions one at a time. They were never designed for multi-source datasets, AI agents or production pipelines. 𝗱𝗮𝘁𝗮𝗽𝗿𝗶𝘀𝗺 𝗶𝘀. We put it up against 5 popular EDA tools. The comparison table speaks for itself. Here's what it does that no other EDA tool offers out of the box: 1️⃣ Predictive power analysis — know which features matter before you model 2️⃣ Drift detection — catch when your data shifts over time or between splits 3️⃣ Data quality scoring — one number to tell how clean your data really is 4️⃣ Multi-source support — built for datasets stitched from multiple providers 5️⃣ Schema-aware — define data once, analyze it consistently every time 6️⃣ Structured JSON — built for AI agents & pipelines, not just dashboards 7️⃣ Interactive explorer — when you do want to look, it's all there in one page dataprism is open source under MIT. This is day one. Try it. Break it. Star it. Contribute to it.
To view or add a comment, sign in
-
🧠 Group Anagrams: The "Fingerprint" Strategy In this problem, I moved beyond the standard sorting approach (O(n .m log m)) to a more efficient Frequency Array strategy (O(n . m)). Memory Management: I learned how Python handles memory during loops. By declaring count = [0] * 26 inside the outer loop, I’m giving each word a fresh "sheet of paper" to record its letter frequency. Once that word is processed and "locked" as a tuple (to serve as a dictionary key), Python’s Garbage Collector steps in to clean up the old list. The Data Science Connection: This frequency array isn't just a coding trick; it's the foundation of One-Hot Encoding and Bag of Words in Data Science. It’s how we turn raw text into numerical vectors that AI models can actually understand. 🔍 Longest Common Prefix: The Power of Vertical Scanning Instead of checking one word at a time, I focused on Vertical Scanning—checking the first letter of every word, then the second, and so on. Complexity: Achieved O(S) time complexity. By using the shortest word as my base, I ensured zero wasted cycles and no IndexError traps. Pythonic Elegance: I explored the zip(*strs) strategy. It’s amazing how Python can "unpack" a list and group characters by their index in a single line. The Sorting Shortcut: A clever logic leap—if you sort the list, you only need to compare the first and last strings. If they share a prefix, everything in the middle must share it too. The takeaway? Code isn't just about getting the right answer; it's about knowing how your data sits in RAM and how to make every operation count. Onto the next one! 🐍💻 #DataScience #Python #SoftwareEngineering #Neetcode#ProblemSolving #TechLearning "6 down, 244 to go. The dashboard might show 6/250, but the real progress is in the 'Medium' difficulty milestone I hit today and the logic I've mastered behind the scenes."
To view or add a comment, sign in
-
-
🚀 Built a GUI-Based Data Analysis Tool while Learning Python with AI As part of my Python learning journey using AI-assisted development, I built a GUI-based data analysis tool that simplifies working with Excel and CSV data by helping users quickly explore datasets, generate summaries, and visualize insights without manual data processing. 🛠 Tech Stack: Python, Pandas, Tkinter, Matplotlib ✨ Key Features: ✅ Upload & analyze Excel/CSV files ✅ Automatic dataset profiling (rows, columns, headers) ✅ Smart detection of text & numeric columns ✅ GroupBy reports with multiple aggregations ✅ Built-in charts (Bar, Line, Column, Pie) ✅ Export reports (Excel/CSV) & charts (PNG) 🎯 This project helped me gain hands-on experience in Python development, data analysis workflows, and building practical business-focused tools with AI support. Excited to keep learning and building — feedback is welcome! #PythonLearning #DataAnalytics #AIAssistedDevelopment #Tkinter #Pandas #Automation #LearningByDoing
To view or add a comment, sign in
-
🌐 Most people work with datasets… But where does the data actually come from? One of the most interesting things I explored recently was web scraping collecting data directly from websites instead of relying on pre-built datasets. 💡 What I realized: Real-world data is rarely clean or readily available. Before any analysis or AI model, the first step is often: → Extracting the data → Structuring it properly → Handling inconsistencies 🔧 In this project, I worked on: • Extracting data from web pages • Parsing and cleaning raw HTML content • Converting unstructured data into usable format • Preparing data for analysis 💡 Key takeaway: Data collection itself is a major part of the pipeline and sometimes more challenging than the analysis. This gave me a better understanding of how data pipelines actually begin. I’ve shared the project here: 👉 https://lnkd.in/eRzXNgsZ Curious to hear: 💬 Have you ever worked on collecting your own dataset instead of using ready-made data? #WebScraping #Python #DataEngineering #DataCollection #DataScience #BuildInPublic
To view or add a comment, sign in
-
More from this author
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development