Piotr Sziroky’s Post

NOBOT - Status Update: Docker, ORM, and DB Design 🚀 Building in public means showing the process, even when it’s not "pixel-perfect". Here’s what’s happening with NOBOT: 1. Why PostgreSQL? 🐘 After a long brainstorming session (SQLite vs. one central hub), I decided to bet on PostgreSQL. With all the relations I have planned, Postgres just felt like the right, solid choice for the backbone of the system. 2. The Docker Dopamine Hit 🐳 There’s nothing like the satisfaction of seeing docker-compose up working for the first time. I just pushed the Dockerfile and docker-compose.yml to GitHub. Seeing the DB and data services running in containers gave me a huge smile – a small win for the infrastructure! 3. From Pydantic Contracts to ORM 🧹 I’m using Contract-Driven Development, so I started with Pydantic models to define how data should flow. Now I’m bridging that with an ORM. It lets me map my contracts directly to the database, keeps the code clean, and saves me from writing manual SQL modules. 4. Thinking Out Loud 🧠 The attached diagram is my current "brain dump." It’s changing every hour as I review my early contracts and refine the logic. Visualizing the schema helps me find edge cases before they turn into annoying bugs. I’m curious — what are your favorite tools for database design? I’m using dbdiagram.io right now, but sometimes I still think about the chaotic energy of Microsoft Paint! 😂 #BuildInPublic #Python #PostgreSQL #Docker #AI #RPA #SoftwareEngineering #NOBOT #IDP

To view or add a comment, sign in

More Relevant Posts

Ganiu Kuku
2w
Report this post
A few weeks ago, I took on an end-to-end Docker project thinking it would be straightforward. I came out thinking completely differently about how software is actually built. Here is what I built: 🐳 A multi-container data architecture, a Streamlit application talking to a PostgreSQL database, orchestrated seamlessly with Docker Compose. The data flow looks like this: → A user uploads a CSV through the Streamlit UI. → SQLAlchemy processes and persists the data into PostgreSQL. → The database serves summary statistics back to the interface. → All of this runs in isolated containers that find each other by name, not by IP. Three things I will never forget from this project: 1️⃣ Your Dockerfile order is a caching strategy, not a style choice. Code changes constantly - so COPY . . goes at the bottom. Dependencies change rarely, so pip install goes at the top. Flip that, and every tiny typo fix costs you a full 3-minute rebuild. 2️⃣ Volumes are what give databases a memory. Containers are ephemeral by design. Without a named volume mounted to /var/lib/postgresql/data, every docker compose down takes your data with it. 3️⃣ POSTGRES_HOST=app-db just works. Docker Compose creates an internal network and handles DNS resolution automatically. Your app never needs to know a container's dynamic IP address. Shoutout to the Data Engineering Community for being part of this space that keeps pushing me to go deeper than just "making things work." That accountability matters. If you are learning Docker or containerizing your first data pipeline, the repo is open. Clone it, break it, learn from it. Links to the code and my full architectural write-up are in the comments! 👇 #DataEngineering #Docker #Data #Python #Streamlit #PostgreSQL #LearningInPublic #SoftwareArchitecture #DevOps #BuildInPublic
7 Comments
Like Comment
To view or add a comment, sign in
Maxwell Hiamatsu
3w
Report this post
A few months ago, I barely knew what an ORM was. Today I'm designing relational databases from scratch and querying them with raw SQL like it's second nature. Here's what I've been building 👇 🛠️ The Project A full data modelling and SQL project built in Python — designing schemas, seeding realistic test data, and running analytical queries against a live PostgreSQL database. 📐 The Stack → SQLAlchemy ORM to define clean, Pythonic relational models → PostgreSQL as the database engine (running locally via Docker) → pgcli for a smoother terminal querying experience with syntax highlighting and autocomplete → Claude Code inside VS Code as my AI pair programmer 🗂️ The Schema I modelled four core entities and their relationships: • Users → with emails, names, and timestamps • Addresses → linked to users via foreign key, with a default flag • Products → with categories, pricing, stock, and unique SKUs • Orders → tying it all together The thing nobody tells you about data engineering: the modelling decisions you make early ripple through everything downstream. Get the foreign keys wrong and your joins become a nightmare. I learned that the hard way — which is honestly the best way. 💡 Key Takeaways → SQLAlchemy keeps your schema readable and maintainable without writing raw DDL → pgcli makes working in the terminal genuinely enjoyable → Thinking carefully about entity relationships before writing a single line of code saves you hours of refactoring later → Seeding realistic synthetic data early forces you to stress-test your schema assumptions 📍 What's Next Layering in complex analytical queries, exploring how this data model feeds into a broader pipeline, and eventually connecting it to a transformation layer with dbt. Always building. The fundamentals matter more than the frameworks. 🚀 #DataEngineering #SQL #Python #SQLAlchemy #PostgreSQL #LearningInPublic #BuildInPublic #MachineLearning #DataScience #CareerJourney
Like Comment
To view or add a comment, sign in
Alain AIROM (Ayrom)
2w
Report this post
🚀 What happens when you give your Relational Data a brain? You get the future of Enterprise Search. 🙄 Forget choosing between precise #SQL queries and powerful #RAG. #IBMBob (our favorite tech architect) just proved you can have both, and the results are game-changing. 🏗️ Bob had engineered a hybrid architecture that integrates the structured world of PostgreSQL with the unstructured capabilities of a Retrieval-Augmented Generation (RAG) engine. The secret ingredient? Open source #Docling (by IBM Research). Here’s how this integration stack works: 1️⃣ Docling acts as the ultimate document processor, extracting hierarchical information and rich semantics from your files that simple text extraction misses. 2️⃣ This structured information is then integrated directly into #PostgreSQL, creating a "Smart DB" that understands both numbers and narratives. 3️⃣ The RAG Engine can now query both relational tables (SQL) and vector embeddings simultaneously to provide contextual, accurate answers. The result: A system that can answer queries like: "Show me the top 5 candidates who live in New York (SQL) AND have deep experience in Python, as evidenced by their project summaries (RAG)." ⁉️ Intrigued? Read #Bob’s full implementation breakdown in my latest blog post! https://lnkd.in/ejnrSF5v
Like Comment
To view or add a comment, sign in
Tanmay Kayande
4w
Report this post
AutoHub — Serve Layer Deep Dive Everything before this — discovery, extraction, normalization, storage — exists so that a single API call can return clean, structured car data on demand. This is what the pipeline was building toward. The catalog endpoint GET /catalog is the core of the serve layer. One query with chained joinedload across all four tables — Brand, Model, Variant, Spec, and Images — returns the full nested hierarchy as a single JSON response. The joinedload chain is what makes this work cleanly. SQLAlchemy walks Brand → Model → Variant → Spec in one shot, no N+1, no separate calls per record. The response shape is defined entirely by the BrandNested Pydantic schema — same validation layer, same structure, every time. Granular endpoints /catalog/brands, /catalog/models, /catalog/variants, /catalog/specs, /catalog/images exist as separate endpoints alongside the full catalog — for targeted reads and writes without pulling the entire hierarchy. The auth split All read endpoints are public. All write endpoints require JWT — Depends(get_current_user) on every POST, PUT, and DELETE. Consistent across catalog, news, and users. Login returns a unified 401 for both wrong email and wrong password — same response either way, no information leaked about which field failed. Pipeline trigger POST /pipeline/run kicks off the full automation as a background task via FastAPI's BackgroundTasks. Endpoint returns immediately, pipeline runs async. JWT protected. News Separate router with full CRUD. Create with images in one request, update replaces the image set, delete with cascade. Same auth pattern throughout. Six posts covering Discover, Download, Extract, Normalize, Store, and Serve. AutoHub's core pipeline is documented end to end — from finding a brochure URL to serving structured car data via REST API. Still on the roadmap: PostgreSQL migration, containerization, and cloud deployment. The image pipeline post series starts next. GitHub: https://lnkd.in/gkhKP5Ww #Python #FastAPI #SQLAlchemy #BackendDevelopment #AutoHub

1 Comment
Like Comment
To view or add a comment, sign in
Ibrahim Fadhili
2w
Report this post
Most data analysts on my team spent more time writing SQL than actually analysing data. So I built a fix — without touching our existing Superset setup. It's called a Text-to-SQL Sidecar: a standalone FastAPI microservice that sits alongside Apache Superset and turns plain English into validated, safe SQL. You ask: "which products had the highest return rate last quarter?" It generates, validates, and executes the SQL — then hands the results back. A few things I was deliberate about: → AST-level SQL validation (not string matching — trivially bypassable) → Per-database table allowlists so the LLM can only touch what it's supposed to → Schema caching so we're not hammering the DB on every request → LLM-agnostic design — swap the endpoint URL, change the model → Reasoning traces returned alongside SQL so analysts can actually trust the output Superset never needs to know it exists. It just receives SQL. I wrote up the full implementation — architecture, code walkthrough, and the design decisions that make it production-ready. Link in the comments 👇 #DataEngineering #AI #SQL #FastAPI #ApacheSuperset #LLM #Python

Building a scalable Text-to-SQL Sidecar for Apache Superset medium.com
Like Comment
To view or add a comment, sign in
Muhammad Danish Iqbal
1mo
Report this post
Do you know the difference between a static default and a dynamic callable in your ORM? It’s a small distinction in code that makes a massive difference in your database. 🚀 📍 Static Defaults These are defined once when the model is initialized. Every new record gets the exact same value. Use case: Setting a starting status (e.g., status='draft') or a counter starting at 0. 📍 Dynamic Defaults (Callables) These are calculated at the moment the record is created. By passing a function (like a lambda or a method), the ORM executes that logic for every single insert. Use case: Timestamps (datetime.now), UUIDs, or record-specific tokens. ⚠️ The Common Trap: One of the most frequent bugs is passing default=datetime.now() (with parentheses) instead of default=datetime.now. With (): The time is captured when the server starts. Every record will have the same timestamp until you restart the service! Without (): The ORM calls the function fresh for every new entry. Check out the infographic below for a side-by-side comparison using SQLAlchemy examples! #Python #ORM #SQLAlchemy #BackendDevelopment #CleanCode #SoftwareEngineering #Python #ORM #SQLAlchemy #Odoo #OdooDevelopment #BackendDevelopment #CleanCode #SoftwareEngineering #DatabaseDesign #ProgrammingTips #WebDevelopment #BackendEngineering #PythonDev #CodingBestPractices #ERP #FullStackDeveloper
Like Comment
To view or add a comment, sign in
Nilesh Chavan
3w
Report this post
In the last couple of days, while working on a RAG implementation, we realized Accuracy isn’t just about better embeddings or larger models. The real challenge we faced was context loss—situations where related information existed in the knowledge base but wasn’t retrieved together. This led to fragmented, inconsistent answers, even though the data was technically present. That’s when we explored an alternative framework: LightRAG. By combining graph-based knowledge structures with vector search, LightRAG enables: Deeper contextual understanding Relationship-aware retrieval Significantly more accurate and coherent responses Why LightRAG stood out 👇 ✅ Graph-aware indexing ✅ Dual-level retrieval (low-level details + high-level knowledge) ✅ Easy implementation using PostgreSQL with graph support ✅ Incremental updates for fast-changing data For anyone struggling with context fragmentation in traditional RAG pipelines, LightRAG offers a compelling and practical approach. Explore implementation details here: https://lnkd.in/dpbWmR8X #RAG #LightRAG #GenAI #LLM #GraphDatabase #PostgreSQL #AIEngineering #KnowledgeGraphs

GitHub - HKUDS/LightRAG: [EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation" github.com
Like Comment
To view or add a comment, sign in
Fasya Nabila Salim

Ex Frontend Engineer Intern | Aspiring Data Science | Undergraduate Information System Student | Relation of PUDC
3w
Report this post
New project unlocked🔓 I just finished building a 𝗖𝘂𝘀𝘁𝗼𝗺𝗲𝗿 𝗟𝗶𝗳𝗲𝘁𝗶𝗺𝗲 𝗩𝗮𝗹𝘂𝗲 (𝗖𝗟𝗩) 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻 𝗦𝘆𝘀𝘁𝗲𝗺. The starting question: 𝘩𝘰𝘸 𝘮𝘶𝘤𝘩 𝘳𝘦𝘷𝘦𝘯𝘶𝘦 𝘸𝘪𝘭𝘭 𝘦𝘢𝘤𝘩 𝘤𝘶𝘴𝘵𝘰𝘮𝘦𝘳 𝘨𝘦𝘯𝘦𝘳𝘢𝘵𝘦 𝘰𝘷𝘦𝘳 𝘵𝘩𝘦𝘪𝘳 𝘭𝘪𝘧𝘦𝘵𝘪𝘮𝘦 𝘪𝘯 𝘰𝘶𝘳 𝘣𝘶𝘴𝘪𝘯𝘦𝘴𝘴? Using the PostgreSQL DVD Rental dataset, I built an end-to-end pipeline: - Designed an ETL pipeline that processes ~14,000 transactions from 9 tables into a customer-level OLAP star schema - Engineered RFM-based features (Recency, Frequency, Monetary) for CLV modeling - Trained and compared multiple ML models (Linear Regression, Random Forest, Gradient Boosting) using chronological split and TimeSeriesSplit to avoid data leakage - Deployed everything into an interactive Django web app with a prediction form and business recommendations - The final model (Gradient Boosting) achieved strong performance, with R² close to 0.99 and low prediction error. One insight that came out of the analysis: customers who rent frequently, even at lower spend per transaction, often generate more lifetime value than occasional high spenders. Frequency matters more than monetary average! One limitation is that the dataset is static (historical DVD rental data), so the model reflects past behavior patterns rather than real-time customer activity. Additionally, some features like recency and tenure showed very low importance, likely due to the limited time range of the dataset, but they were still kept to ensures the model remains interpretable, aligned with business logic, and more generalizable to real-world scenarios beyond this dataset. This project helped me understand how data engineering, machine learning, and business thinking come together in a real system, not just a model. 🖇️GitHub → https://lnkd.in/g4k7iQuy Would love any feedback or thoughts!🖖🏻 #DataAnalytics #MachineLearning #Django #Python #PostgreSQL #PortfolioProject
Like Comment
To view or add a comment, sign in
Marcos Vinicius Thibes Kemer
1mo
Report this post
🐛 #PythonJourney | Day 148 — Debugging SQLAlchemy Models & Type Compatibility Today was about learning through debugging. I encountered multiple SQLAlchemy type compatibility issues and learned valuable lessons about database design. Key accomplishments: ✅ Fixed critical SQLAlchemy issues: • JSONB and INET are PostgreSQL-specific types • Must import from sqlalchemy.dialects.postgresql • Resolved naming conflicts (metadata is reserved) ✅ Solved type mismatches: • User.id must be UUID(as_uuid=True) • URL.user_id must match User.id type exactly • Foreign key constraints require compatible types • PostgreSQL is strict about type casting ✅ Debugged relationship definitions: • back_populates must reference correct class names • Cascade deletes prevent orphaned data • Bidirectional relationships need proper naming ✅ Created test user script: • Generates database tables automatically • Creates sample user with API key • Tests database connectivity ✅ All 5 SQLAlchemy models are now production-ready: • User (authentication) • URL (shortened URLs) • Click (event tracking) • ClickAggregate (analytics summaries) • AuditLog (compliance) What I learned today: → Database type safety is critical → PostgreSQL has its own type system (JSONB, UUID, INET) → SQLAlchemy type imports matter - core vs dialect-specific → Debugging error messages contain the actual problem - read them carefully → Foreign key constraints are strict about type compatibility The lesson: Sometimes the best learning comes from fixing errors. Each error message was an opportunity to understand the framework better. #Python #SQLAlchemy #PostgreSQL #DatabaseDesign #Backend #Debugging #SoftwareDevelopment #TechLearning
Like Comment
To view or add a comment, sign in
Marcos Vinicius Thibes Kemer
3w
Report this post
🚀 #PythonJourney | Day 151 — BREAKTHROUGH: API Fully Functional & First Successful Request Today marks a major milestone: **the URL Shortener API is LIVE and responding correctly!** After 8 days of building and debugging, I finally got the first successful POST request working. This breakthrough moment proves that all the pieces fit together. Key accomplishments: ✅ Fixed critical database type mismatch: • PostgreSQL was storing user_id as VARCHAR • SQLAlchemy was trying to query with UUID • Solution: Dropped volumes, rebuilt schema from scratch ✅ Fixed Pydantic response validation: • Model had clicks_total, database had total_clicks • Docker image was caching old code • Solution: Forced rebuild of container image ✅ First successful API call: • POST /api/v1/urls now returns proper JSON • Short code generated automatically • URL stored in database correctly • Full response validation passing ✅ Production-ready API endpoints confirmed: • Authentication working (API key validation) • Request validation (Pydantic models) • Database operations (CRUD) • Error handling (proper HTTP status codes) • Response serialization (JSON output) ✅ Lessons learned about debugging: • Always check the actual container logs • Volume management is critical in Docker • Type consistency across layers matters • Docker caching can hide recent changes • Patience and persistence beat quick fixes What happened today: → Identified the root cause through careful log analysis → Understood the full request/response cycle → Learned when to reset vs. when to patch → Experienced the joy of a working API! The API now successfully: - Validates user authentication - Creates shortened URLs with unique codes - Stores data in PostgreSQL - Returns properly formatted JSON responses - Handles errors gracefully This is what backend development is about: building reliable systems piece by piece, debugging methodically, and celebrating when it finally works. Status update: - ✅ Backend: FUNCTIONAL - ✅ Database: WORKING - ✅ API Endpoints: RESPONDING - ✅ Authentication: VERIFIED - ⏳ Full test suite: Next - ⏳ Deployment: Next week #Python #FastAPI #Backend #API #PostgreSQL #Docker #Debugging #SoftwareDevelopment #Victory #CodingJourney
Like Comment
To view or add a comment, sign in

213 followers

35 Posts

View Profile Connect

Piotr Sziroky’s Post

More Relevant Posts

Explore content categories