Introducing smarteda: EDA Library for Python Developers

1mo Edited

🚀 Excited to share my first Python package smarteda! I’ve been working on strengthening my data analysis skills, and as part of that journey, I built and published my own package on PyPI. 🔹 What is smarteda? It’s a simple and beginner-friendly library designed to make Exploratory Data Analysis (EDA) faster and easier. 🔹 Why I built this? Instead of just learning concepts, I wanted to create something practical that solves real problems and can be used by others. 🔹 What I learned: Structuring a Python package Writing reusable code Publishing to PyPI Thinking like a developer, not just a learner 🔗 PyPI: https://lnkd.in/gA_VzM7K 🔗 GitHub: https://lnkd.in/g8heBZqi This is just the beginning. I’ll keep improving it and building more tools as I grow in data analytics and machine learning. 👉 Would love your feedback and suggestions! #Python #DataAnalytics #EDA #MachineLearning #PyPI #OpenSource #LearningInPublic

To view or add a comment, sign in

More Relevant Posts

Saeed Bayatjozani
1mo
Report this post
If you work with Python for data analysis, especially in financial and economic modelling, chances are that Jupyter Notebook has long been your go-to environment. Today, I want to introduce you to a brilliant alternative that might just change how you work: Marimo. What makes Marimo truly special is that it functions as both a coding environment and an interactive dashboard simultaneously. Instead of just writing scripts, you can seamlessly integrate sliders, tables, charts, and UI elements directly into your workflow. We all know the pain of out-of-order execution in Jupyter, where changing a variable means you have to manually re-run downstream cells to update your results. Marimo solves this problem by acting like a smart spreadsheet. If you change a variable, your entire notebook updates instantly and automatically. Furthermore, Marimo saves your work as standard Python (.py) files rather than JSON. This makes reading the code much more straightforward and turns version control on GitHub from a headache into a breeze. This is just scratching the surface of what this fantastic library can do. I'll be sharing more about Marimo's capabilities soon. Have you had a chance to experiment with Marimo yet? Let me know your thoughts in the comments! Check it out for yourself here: https://marimo.io/ #DataScience #Python #Marimo #JupyterNotebook #DataAnalysis #FinancialModelling #PythonDeveloper #DataEngineering
1 Comment
Like Comment
To view or add a comment, sign in
Yubisono P.

Experienced Credit Specialist with a demonstrated history of working in the Financial Services Industry. Data Scientist and Machine Learnings using Python, SQL, PostgreSQL, Tableau, Pentaho, Chat GPT, Gemini 2.5 Flash
1mo
Report this post
Machine Learning Financial Data using backtesting py #machinelearning #datascience #financialdata #backtestingpy Backtesting.py is a Python framework for backtesting trading strategies on historical candlestick data. Backtesting.py is a fast, lightweight Python framework for testing trading strategies against historical market data. It gives you a clean, minimal API to define a strategy, run a simulation, inspect detailed statistics, and explore interactive charts — all without locking you into a particular data provider or indicator library. backtesting.py simulates a trading strategy running bar-by-bar through a historical OHLCV dataset. You write a Strategy subclass, declare your indicators in init(), and place orders in next(). The Backtest engine handles order execution, position tracking, commissions, and statistics. The library is indicator-library-agnostic: you can use NumPy, pandas, TA-Lib, Tulipy, scikit-learn, or any other tool to compute indicator values. If it returns a NumPy array the same length as your data, it works. https://lnkd.in/gC_dz6xG

GitHub - kernc/backtesting.py: 🔎 📈 🐍 💰 Backtest trading strategies in Python. github.com

2 Comments
Like Comment
To view or add a comment, sign in
Taoheed Ojediran
3w Edited
Report this post
@HexSoftwares I just wrapped up a comprehensive exploratory data analysis (EDA) on student performance factors. Using Python (Pandas, Seaborn, Matplotlib), I went beyond the surface to see which habits—and hurdles—impact exam scores the most. Key Takeaways: • Study Time vs. Scores: A clear positive correlation ($r = 0.45$)—effort pays off! • Socioeconomic Baseline: High-income access correlates with higher median scores, though outliers exist in every category. • Data Integrity: Cleaned and imputed missing categorical data to ensure a robust analysis. • Consistency is Key: Attendance and study hours show the strongest positive correlation with high scores. • Past as Prologue: Previous academic scores remain one of the most reliable predictors of current results. • The Socioeconomic Gap: High-income access often provides a more stable baseline for performance, though hard work (hours studied) can bridge much of that gap. Check out the full breakdown in the video below and explore the code on GitHub!🔗 GitHub Repository: [https://lnkd.in/dT6WRDSz] #DataScience #Python #DataAnalytics #StudentSuccess #MachineLearning
Like Comment
To view or add a comment, sign in
Atharva Rawal
4w
Report this post
Built a Basic Stock Market Analyzer using Python As part of my learning journey, I created a simple stock analysis dashboard to get hands-on experience with how different Python libraries actually work in real-world scenarios. This is a beginner-level project, but it helped me understand the practical use of tools like yfinance, pandas, numpy, matplotlib, and streamlit. What it does: • Takes a company's stock market symbol as input • Fetches real-time stock data using yfinance • Calculates key metrics like percentage change, volatility, highest & lowest price • Uses moving averages (MA7 & MA30) to identify trends • Visualizes stock performance through graphs • Allows analysis of multiple stocks The focus was not complexity, but building something functional and learning by doing. I completed this project under the guidance of Mohit Payasi, whose support helped me understand the concepts more clearly. Going forward, as I progress in my Machine Learning journey, I plan to enhance this project by adding more advanced features like predictions, better UI, and deeper analysis. Always open to feedback and suggestions! #Python #DataAnalytics #MachineLearning #Streamlit #StockMarket #LearningByDoing #Projects

2 Comments
Like Comment
To view or add a comment, sign in
Yubisono P.

Experienced Credit Specialist with a demonstrated history of working in the Financial Services Industry. Data Scientist and Machine Learnings using Python, SQL, PostgreSQL, Tableau, Pentaho, Chat GPT, Gemini 2.5 Flash
1mo
Report this post
Machine Learning Time Series Data using pmdarima #machinelearning #datascience #timeseriesdata #pmdarima pmdarima brings R’s beloved auto.arima to Python, making an even stronger case for why you don’t need R for data science. pmdarima is 100% Python + Cython and does not leverage any R code, but is implemented in a powerful, yet easy-to-use set of functions & classes that will be familiar to scikit-learn users. pmdarima is essentially a Python & Cython wrapper of several different statistical and machine learning libraries (statsmodels and scikit-learn), and operates by generalizing all ARIMA models into a single class (unlike statsmodels). It does this by wrapping the respective statsmodels interfaces (ARMA, ARIMA and SARIMAX) inside the pmdarima.ARIMA class, and as a result there is a bit of monkey patching that happens beneath the hood. The auto_arima function itself operates a bit like a grid search, in that it tries various sets of p and q (also P and Q for seasonal models) parameters, selecting the model that minimizes the AIC (or BIC, or whatever information criterion you select). To select the differencing terms, auto_arima uses a test of stationarity (such as an augmented Dickey-Fuller test) and seasonality (such as the Canova-Hansen test) for seasonal models. https://lnkd.in/gjnJVA5T

GitHub - alkaline-ml/pmdarima: A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function. github.com
Like Comment
To view or add a comment, sign in
Harmanpreet Singh
3w
Report this post
🚀 Last month, I built and published my first Python package — Pristinizer I wanted to solve a simple but real problem in data science: 👉 Cleaning and understanding raw datasets takes way too much time. So I built Pristinizer, a lightweight Python package that helps streamline data cleaning + EDA in just a few lines of code. 🔍 What Pristinizer does: • Cleans messy datasets (duplicates, missing values, column formatting) • Generates structured dataset summaries • Visualizes missing data (heatmap, matrix, bar chart) ⚙️ Tech Stack: Python • pandas • matplotlib • seaborn 📦 Try it out: >> pip install pristinizer >> import pristinizer as ps df = ps.clean(df) ps.summarize(df) ps.missing_heatmap(df) 🧠 What I learned while building this: • Designing a clean and intuitive API • Structuring a real-world Python package • Publishing to PyPI • Writing proper documentation for users 📌 Next, I’m planning to add: • Outlier detection • Automated preprocessing pipelines • Advanced EDA reports Would love to hear your thoughts or feedback! #Python #DataScience #MachineLearning #OpenSource #Pandas #EDA #Projects
Like Comment
To view or add a comment, sign in
Yubisono P.

Experienced Credit Specialist with a demonstrated history of working in the Financial Services Industry. Data Scientist and Machine Learnings using Python, SQL, PostgreSQL, Tableau, Pentaho, Chat GPT, Gemini 2.5 Flash
1mo
Report this post
Machine Learning Financial Data using alphalens #machinelearning #datascience #financialdata #alphalens Alphalens is a Python Library for performance analysis of predictive (alpha) stock factors. Alphalens works great with the Zipline open source backtesting library, and Pyfolio which provides performance and risk analysis of financial portfolios. https://lnkd.in/gYNTkaGm

GitHub - quantopian/alphalens: Performance analysis of predictive (alpha) stock factors github.com
Like Comment
To view or add a comment, sign in
SATISH KUMAR
1mo
Report this post
Day 49 of my #100DaysOfCode challenge 🚀 Today I worked on a Python program to find the Equilibrium Index of an array. An equilibrium index is an index where the sum of elements on the left is equal to the sum of elements on the right. What the program does: • Takes an array as input • Finds an index where left sum = right sum • Returns the index if found • Returns -1 if no such index exists How the logic works: • Calculate the total sum of the array • Initialize left_sum = 0 • Traverse the array using enumerate() • For each element: – Right sum = total_sum - left_sum - current element – If left_sum == right_sum, return the index • Add the current element to left_sum • If no equilibrium index is found, return -1 Example: Input: [-7, 1, 5, 2, -4, 3, 0] Output: 3(Left sum = Right sum = 1) Another example: Input: [1, 2, 3] Output: -1 (No equilibrium index) Another example: Input: [1, 0, -1] Output: 1 Why this approach is efficient: – Uses prefix sum concept – Avoids nested loops – Time Complexity: O(n) Key learnings from Day 49: – Understanding prefix sums – Optimizing from brute force to O(n) – Working with running totals – Strengthening array problem-solving #100DaysOfCode #Day49 #Python #PythonProgramming #Arrays #PrefixSum #Algorithms #DataStructures #ProblemSolving #CodingPractice #InterviewPrep #LearnByDoing #ProgrammingJourney #DeveloperGrowth #BTech #CSE #AIandML #VITBhopal #TechJourney
Like Comment
To view or add a comment, sign in
Osman Mohd
4w
Report this post
I’m excited to share my latest project: a comprehensive Descriptive Statistics Suite built in Python! 🚀 Before jumping into complex Machine Learning models, every great data story starts with a deep dive into the data's "personality." This project automates that process using the industry-standard stack: NumPy, Pandas, and SciPy. Key highlights of what I’ve built: 🔹 Central Tendency: Automated calculation of Mean, Median, and Mode to find the "heart" of the data. 🔹 Dispersion Analysis: Measuring Variance, Standard Deviation, and IQR to quantify data spread and volatility. 🔹 Distribution Shape: Using Skewness and Kurtosis to identify symmetry and the likelihood of extreme outliers. 🔹 Visualizations: Clean, publication-ready Histograms, Frequency Polygons, and Pie Charts for intuitive storytelling. This repository is designed to be a "one-click" solution for anyone performing initial Exploratory Data Analysis (EDA). 📂 Check out the full code and documentation on GitHub: https://lnkd.in/gBPsc95s I’d love to hear your thoughts or any suggestions for future statistical features! #DataScience #Python #DataAnalytics #Statistics #GitHub #Pandas #NumPy #DataVisualization #MachineLearning #Coding

GitHub - aahilali12/Statistics_Descriptive_Stats: This Python repository provides a robust framework for Descriptive Statistical Analysis using NumPy, Pandas, and SciPy. It automates the transition from raw data to meaningful narratives by offering numerical summaries and visual insights. Key features include Central Tendency, Dispersion (IQR/Variance), and Distribution Shape (Skewness) github.com
Like Comment
To view or add a comment, sign in
Aashita Mishra
3w
Report this post
🚀 Day 12 & 13 – Consistency is the Key! Still going strong on my Python learning journey, and these two days were all about revision + real application 💻 🔁 Quick Revision: Revisited core concepts like loops, functions, and conditionals — because strong basics = strong foundation. 💡 Mini Project: Bill Generator Built a simple yet practical Python project using: ✔️ if-elif-else statements ✔️ Operators (arithmetic & logical) ✔️ User inputs for dynamic calculations 🔹 Features included: - Item selection & pricing - Quantity-based calculations - Discount logic - Final bill generation 🧠 What I Improved: - Better problem-solving approach - Writing cleaner, more readable code - Debugging with more confidence - Thinking in a more structured, logical way Every small project is making me more confident and bringing me one step closer to becoming a skilled data professional 📈 🙏 Special thanks to Anurag Srivastava and the Data Engineering Bootcamp for the constant guidance and support! #Python #LearningJourney #100DaysOfCode #DataEngineering #Coding #BeginnerToPro #Consistency

1 Comment
Like Comment
To view or add a comment, sign in

882 followers

View Profile Connect

Introducing smarteda: EDA Library for Python Developers

More from this author

The Difference Between Learning Programming and Understanding Programming

The 30-Minute Rule That Can Transform Your Coding Skills

Why Watching Tutorials Is Slowing Down Your Tech Career

Explore content categories

Introducing smarteda: EDA Library for Python Developers

More Relevant Posts

More from this author

The Difference Between Learning Programming and Understanding Programming

The 30-Minute Rule That Can Transform Your Coding Skills

Why Watching Tutorials Is Slowing Down Your Tech Career

Explore related topics

Explore content categories