Ever wondered what __init__.py actually does? While setting up my RAG project, my folder looked like this. rag-retriever/ │ ├── src/ │ ├── __init__.py │ ├── data_loader.py │ ├── chunk_splitter.py │ └── semantic_split.py │ └── main.py At first, I kept getting import errors — until I understood the real role of __init__.py. It’s simple yet powerful: 1) It tells Python that “this folder is a package.” 2) It lets you import cleanly from one file to another. So now I can: Inside src/: use from .semantic_split import function_name Outside (in main.py): use from src import chunk_splitter That tiny file makes the entire structure modular and production-ready 🚀 --- In short: __init__.py is your folder’s identity card — it transforms random scripts into a real Python package 📦 #Python #LLM #RAG #MLOps #SoftwareEngineering #DataScience #CodingTips #LearningJourney
What is __init__.py and why is it important in Python?
More Relevant Posts
-
Day 22/50: The Pagination That Tanked at Scale The Setup:- Report works fine with 10,000 rows. With 1M rows, query times out. The Problem:- Deep pagination with LIMIT/OFFSET. Query had to scan and discard 999,980 rows just to get to page 100,000. The Investigation:- Query looked innocent: --- python # Page 100,000 with 10 items per page SELECT * FROM orders ORDER BY id LIMIT 10 OFFSET 999990 Database scanned 999,990 rows unnecessarily before returning 10. --- The Solution:- Switched to keyset pagination (cursor-based): --- python # Instead of offset, track the last ID last_id = request.GET.get('last_id') results = Order.objects.filter(id__gt=last_id).order_by('id')[:10] Time dropped from 12 seconds to 0.2 seconds. ---- The Lesson:- LIMIT/OFFSET doesn't scale. Use cursor-based pagination with indexed lookups for large datasets. '''Have you been burned by deep pagination?''' #Day22 #50DaysOfDebugging #Database #Pagination #Performance #Scalability #SoftwareEngineering #Django #SQL
To view or add a comment, sign in
-
I spent 3 days debugging one whitespace. I used to ignore the "Golden Rule" of Python strings. It cost me hours of frustration until I realized: Strings are immutable. I was writing `text.strip()` thinking it cleaned my data. But the variable remained dirty because I wasn't assigning it back. Once I fixed my workflow, I discovered the 3 tools that actually separate pros from beginners: 1. The Janitor: Data is rarely clean. [cite_start]`.strip()` removes the hidden spaces that break your code, while `.zfill()` perfectly pads your IDs 2. The Power Duo: `.split()` and `.join()` are the most powerful text processing team. [cite_start]They turn messy CSV strings into structured lists instantly 3. The Modern Standard: Stop using `.format()`. [cite_start]F-strings are cleaner, faster, and the absolute standard for injecting variables . Stop fighting your data. Start formatting like a pro. --- #Python #DataScience #CodingTips #EdTech #TechSkills #DigitalTransformation #DeveloperLife 💡 What is the one coding error you keep making? Share below!
To view or add a comment, sign in
-
Speed up your Frappe queries with frappe.get_cached_value() If you’re repeatedly fetching the same field value from the database, don’t use frappe.db.get_value() every time — it hits the database on each call. Instead, use frappe.get_cached_value() . It stores the result in memory (cache) and returns it faster on the next request. When to use it? Use frappe.get_cached_value() when :- Fetching non-changing fields like settings, configs, defaults You want better performance with fewer DB calls Accessing single fields from large DocTypes When NOT to use :- When the data changes frequently When you need fresh DB values every time When fetching multiple rows — use frappe.get_all() or get_list() instead Use caching smartly — small optimizations add up in big Frappe apps 💪 #Frappe #ERPNext #Backend #Performance #OpenSource #Python Rushabh MehtaHussain NagariaEjaaz KhanAditya HaseSherin K RRitvik SardanaManish DipankarFrappeefeone
To view or add a comment, sign in
-
-
❓ Sample rows at random for quick insights Handle missing values with ease Crunch descriptive stats in seconds Build correlation matrices to spot trends Create frequency tables for instant summaries Resample data for time-series magic Index numbers by group for advanced grouping Work with leading/lagging variables for forecasting Compute moving and cumulative aggregations effortlessly Craft conditionally formatted plots that pop Visualize with pairplots, jitterplots, small multiples, jointplots, and bubble plots for stunning charts 💡 Answer: 👉 What are quick wins with Python in Excel? Each of these is its own quick win... and you’ll get all 15 inside this short, practical video course. 🎓 Includes: - 15 bite-sized lessons (under 10 min each) - Downloadable Excel files to follow along - Cheatsheet with every function and workflow - Lifetime access, no prior Python experience required 💰 15 wins for $15. Doesn’t take a Ken Jennings to see that’s a smart bet. 👉 https://lnkd.in/gFDirK4J
To view or add a comment, sign in
-
-
🚀 𝑨𝒖𝒕𝒐𝒎𝒂𝒕𝒊𝒏𝒈 𝑫𝒂𝒕𝒂 𝑰𝒎𝒑𝒐𝒓𝒕𝒔 — 𝑶𝒏𝒍𝒚 𝑾𝒉𝒂𝒕’𝒔 𝑵𝒆𝒆𝒅𝒆𝒅! 🧠📊 Over the last 10 years, I’ve accumulated a huge collection of performance files in my system. Each file, fortunately, contains its own date — which gave me an idea 💡. Instead of manually selecting which files to import every quarter, I wrote a small yet powerful Python script that automatically 𝐢𝐦𝐩𝐨𝐫𝐭𝐬 𝐨𝐧𝐥𝐲 𝐭𝐡𝐞 𝐜𝐮𝐫𝐫𝐞𝐧𝐭 𝐪𝐮𝐚𝐫𝐭𝐞𝐫’𝐬 𝐝𝐚𝐭𝐚 from my archive. Here’s how it works 🧾 ✅ Scans the target folder for all .zip files ✅ Extracts the date (in DDMMYYYY format) from filenames ✅ Identifies the 𝐜𝐮𝐫𝐫𝐞𝐧𝐭 𝐪𝐮𝐚𝐫𝐭𝐞𝐫 𝐚𝐧𝐝 𝐲𝐞𝐚𝐫 dynamically ✅ Filters and imports only the files that fall within that rang This small automation saves time ⏱️, reduces mistakes ❌, and keeps the data pipeline clean and focused on the 𝐜𝐮𝐫𝐫𝐞𝐧𝐭 𝐪𝐮𝐚𝐫𝐭𝐞𝐫’𝐬 𝐊𝐏𝐈𝐬. 🔹 Function name: get_current_quarter_files() 🔹 Output: A list of .zip files belonging to the 𝐜𝐮𝐫𝐫𝐞𝐧𝐭 𝐪𝐮𝐚𝐫𝐭𝐞𝐫 (𝐞.𝐠., 𝐐𝟒 𝟐𝟎𝟐𝟓) Python continues to be my go-to tool for streamlining repetitive data engineering tasks — one function at a time 🐍⚙️ #Python #DataAutomation #QuarterlyData #KPIs #DataEngineering #Productivity #Automation
To view or add a comment, sign in
-
-
🚀 Master the read_csv() Function in Pandas! 🐼 If you’ve ever worked with data in Python, chances are you’ve used the legendary function: pd.read_csv('data.csv') But did you know it has over 50+ parameters that can make your data importing super powerful? ⚙️ Here are some of the most useful ones 👇 🔹 1️⃣ sep – Define your separator pd.read_csv('data.csv', sep=';') 👉 Use this when your file isn’t comma-separated (e.g., ; or |). 🔹 2️⃣ header – Control header rows pd.read_csv('data.csv', header=None) 👉 Useful for files without column names. 🔹 3️⃣ names – Manually assign column names pd.read_csv('data.csv', names=['A', 'B', 'C']) 🔹 4️⃣ usecols – Read only specific columns pd.read_csv('data.csv', usecols=['Name', 'Age']) 👉 Saves memory and speeds up loading! ⚡ 🔹 5️⃣ dtype – Set data types pd.read_csv('data.csv', dtype={'Age': int}) 👉 Prevents unexpected type errors later. 🔹 6️⃣ na_values – Handle missing data pd.read_csv('data.csv', na_values=['N/A', '-']) 👉 Convert custom placeholders into NaN. 🔹 7️⃣ parse_dates – Parse date columns automatically pd.read_csv('data.csv', parse_dates=['Date']) 👉 No more manual date parsing! 📅 💡 Pro Tip: Combine parameters smartly to handle even the messiest CSVs efficiently. With great data comes great responsibility — and read_csv() is your superpower! 💪 #Python #Pandas #DataScience #MachineLearning #Analytics #Coding #PythonTips #100DaysOfCode #DataEngineer #LearnWithMe #CSV 🧠📊🐍
To view or add a comment, sign in
-
Day 21/50: The Connection Pool That Slowly Died The Setup:- Application running smoothly. After 48 hours: "No more connections available in pool.". The Problem:- Connections leaked. Database connections never got closed, accumulating until pool exhausted. The Investigation:- -> Enabled connection pool logging: -> Found database queries never closing connections in finally blocks: ---- python # The culprit cursor = db.cursor() result = cursor.execute("SELECT * FROM users") # No cleanup! return result ---- -> After 48 hours of traffic, all 20 pool connections were open and abandoned. The Solution:- -> Used context managers for automatic cleanup: ---- python with db.connection.cursor() as cursor: result = cursor.execute("SELECT * FROM users") return result # Connection auto-closes on exit ---- Or explicitly close in finally: ---- python try: cursor = db.cursor() result = cursor.execute("SELECT * FROM users") finally: cursor.close() # Always runs ---- The Lesson:- -> Connection leaks are silent killers. Always use context managers or try-finally blocks. -> Monitor pool metrics regularly—don't wait for crashes. ""Ever lost a pool to leaks? Tell your story."" #Day21 #50DaysOfDebugging #Python #Database #ConnectionPooling #SoftwareEngineering #Debugging #BestPractices
To view or add a comment, sign in
-
Hey everyone! As data professionals, we all know the drill: getting our hands on raw data is often just the beginning. The real magic happens when we transform those messy datasets into sparkling clean, analysis-ready gold. Python, with its incredible ecosystem, is my absolute superpower here. Mastering a few key tricks can save you hours and make your data cleaning workflow not just efficient, but genuinely enjoyable. Think about leveraging Pandas' `apply()` with custom functions for complex transformations, or using powerful string methods (`.str.contains()`, `.str.replace()`) and regex for pattern matching and normalization. Even smart use of `fillna()` or `dropna()` with specific strategies can drastically improve data quality. These aren't just lines of code; they're your secret weapons for taming even the wildest data. #PythonForData #DataCleaning #DataAnalytics #Pandas #PythonTricks What's your absolute favorite Python trick for turning a data mess into a masterpiece? Share your insights below!
To view or add a comment, sign in
-
-
The latest English Indices of Deprivation data was released last week! Link here -> https://lnkd.in/eud9sCHj Like with a lot of official statistics, the data is often scattered across multiple files and sheets. I know that a lot of data analysts, and data folk in general, will be busy wrangling away and starting to analyse the data. So... I wrote a super simple Python package to help speed up the process. Install the package. Run "iod load" in your terminal. Load the latest data into DuckDB in a minute or two and away you go! Link to the Python package is in the comments :)
To view or add a comment, sign in
-
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
Yes, even I was getting import errors while trying to dockerize my tensorflow-fastapi web app.This was due to some custom classes which were facing a hard time to get shared within the folder contents. Adding __init__.py resolved the same.