Moving my Python projects from local scripts to the cloud. I recently finished a weather data project that helped me bridge the gap between writing code and deploying a functional system. Instead of just running a script on my pc, I wanted to see if I could build a pipeline that manages itself. The project tracks weather trends in Greece, and here is how I put it together: The Pipeline: I used Python and Pandas to fetch and clean data from the Visual Crossing API. The Database: To keep the data persistent, I used a cloud-hosted PostgreSQL database. I had to ensure the logic was idempotent so the script wouldn't create duplicates during daily runs. The Automation: I used GitHub Actions to schedule the ETL process. It now runs automatically every morning without me needing to touch it. The Environment: I wrapped the whole thing in Docker to make sure it works exactly the same in the cloud as it does on my machine. The UI: I built a simple Streamlit dashboard to visualize the results. The most challenging part wasn't the code itself, but managing the "plumbing" handling secrets securely, setting up CI/CD workflows, and troubleshooting environment mismatches. Live Dashboard: [https://lnkd.in/d6GK6DZb] GitHub Repo: [https://lnkd.in/dvzBrqgM] #DataEngineering #Python #Docker #Automation #LearningInPublic
Alex Mallis’ Post
More Relevant Posts
-
🚀 Built an End-to-End Data Pipeline using API, Python & SQL Server! Excited to share a hands-on project where I implemented a complete data pipeline across two systems 💻 🔹 Project Overview: ✔ Extracted data from PostgreSQL (Laptop 1) ✔ Exposed data via Django API (JSON format) ✔ Accessed API from another machine (Laptop 2) ✔ Converted JSON → CSV using Python (pandas) ✔ Dynamically created table (no manual schema!) ✔ Loaded data into SQL Server using pyodbc 🔹 Architecture: PostgreSQL → Django API → JSON → Python → CSV → SQL Server 🔹 Key Learnings: 💡 API as a bridge between systems 💡 Handling JSON data in real-world scenarios 💡 Automating schema creation 💡 Cross-machine data transfer 💡 Building end-to-end ETL pipelines This project gave me practical exposure to how modern data pipelines work in real-world data engineering 🚀 Looking forward to building more scalable and production-ready pipelines! #DataEngineering #Python #SQLServer #FastAPI #Django #ETL #DataPipeline #APIs #LearningInPublic #100DaysOfCode
To view or add a comment, sign in
-
-
🚀 𝗤𝘂𝗶𝘇 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗕𝗮𝗰𝗸𝗲𝗻𝗱 𝗔𝗣𝗜 – 𝗕𝘂𝗶𝗹𝘁 𝘄𝗶𝘁𝗵 𝗙𝗮𝘀𝘁𝗔𝗣𝗜 I recently built a backend system for a Quiz Application using modern Python backend technologies. 🔧 𝗧𝗲𝗰𝗵 𝗦𝘁𝗮𝗰𝗸: • FastAPI (High-performance API framework) • SQLAlchemy (ORM for database management) • PostgreSQL (Relational database) • Pydantic (Data validation & schema handling) 📌 𝗞𝗲𝘆 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀: • RESTful API endpoints for questions and choices • One-to-many relationship between Questions and Choices • Secure database session handling with dependency injection • Proper request validation using Pydantic models • Clean and scalable backend architecture 🔗 𝗔𝗣𝗜 𝗘𝗻𝗱𝗽𝗼𝗶𝗻𝘁𝘀: • GET /questions/{question_id} → Fetch a specific question • GET /choices/{question_id} → Fetch all choices for a question • POST /questions → Create a question with multiple choices 🧠 𝗪𝗵𝗮𝘁 𝗜 𝗟𝗲𝗮𝗿𝗻𝗲𝗱: • How FastAPI handles async backend development efficiently • Working with SQLAlchemy ORM for relational data modeling • Designing clean backend architecture with separation of concerns • Implementing database relationships and migrations logic 💻 𝗚𝗶𝘁𝗛𝘂𝗯 𝗥𝗲𝗽𝗼𝘀𝗶𝘁𝗼𝗿𝘆: 👉 https://lnkd.in/dHJczetV This project helped me strengthen my understanding of backend development, API design, and database integration. #FastAPI #Python #BackendDevelopment #APIs #SQLAlchemy #PostgreSQL #SoftwareEngineering #LearningByBuilding
To view or add a comment, sign in
-
-
🚀 pytest-capquery 0.3 is live! This release was heavily focused on the Developer Experience (DX). We've officially introduced automated SQL snapshot testing, heavily inspired by the Jest framework. Instead of manually hardcoding and maintaining massive SQL strings in your Python tests, you can now seamlessly generate and validate physical .sql execution baselines with zero friction. To dive deeper into the "why," I've just published a new article breaking down the reality of database performance in production. The post covers: - 🚨 A painfully familiar SRE late-night "novel" - 🏢 The cultural divide between Developers and DBAs - 🛡️ Common architectural pitfalls (like the Python GC trap and the JOIN illusion) -💡 How pytest-capquery bridges the gap, complete with a Getting Started guide You can read the full breakdown here: https://lnkd.in/dJzBQ8nV If you care about engineering excellence, catching N+1 regressions in CI, and building robust backend systems, I invite you to check out the repository! Follow the project, drop a star, or open a PR. Together we can do more! 🤝 🔗 https://lnkd.in/d9EJgd8V #Python #SQLAlchemy #Pytest #SRE #EngineeringExcellence #OpenSource #DatabasePerformance
To view or add a comment, sign in
-
🚀 Just finished a hands-on Data Engineering workshop by DataTalksClub led by Alexey Grigorev and built a data ingestion pipeline using Docker. I’ve worked with data pipelines before, but this was my first time containerizing the workflow and connecting everything end-to-end. Here’s what I built: Ingested NYC taxi data using Python (pandas + SQLAlchemy) Loaded the data into a PostgreSQL database Used Docker to run Postgres and pgAdmin Connected services using Docker networks 💡 One thing I explored beyond the workshop: I integrated the ingestion pipeline into the Docker Compose setup, so the database, UI, and ingestion job can run together as a single system. This made the setup cleaner and closer to how real-world data pipelines are structured. Key things I learned: Difference between running locally vs inside containers How Docker networking works (localhost vs container names) Why volumes are important for data persistence How to build reproducible environments with Docker The project is available on my GitHub (linked in my profile) — open to feedback and suggestions! #DataEngineering #Docker #Python #PostgreSQL
To view or add a comment, sign in
-
Designed and implemented a modular ETL pipeline in Python to extract data from a REST API, transform and normalize JSON structures, and load processed data into PostgreSQL using SQLAlchemy. Focused on clean separation of pipeline stages and scalable architecture. Tech: Python, Pandas, SQLAlchemy, PostgreSQL. Link => https://lnkd.in/dWNjvx9n
To view or add a comment, sign in
-
I use Apache Airflow every day. Here's what it actually does. 🔧⚡ When I first heard "workflow orchestration," I had absolutely no idea what that meant. 😅 Now? It's the backbone of every pipeline I build. 💪 So what is Apache Airflow? 🤔 Airflow is a platform that lets you schedule, monitor, and manage data pipelines written entirely in Python 🐍 At its core, everything in Airflow is a DAG (Directed Acyclic Graph) 🔁 A DAG is just a fancy way of saying: "run these tasks, in this order, on this schedule." A simple real-world example: 📦 ➡️ Extract data from an API ➡️ Transform it with Python ➡️ Load it into Snowflake ❄️ ➡️ Trigger an alert if anything fails 🚨 Airflow handles ALL of that automatically, every day, while you sleep 😴✨ 3 things I love about it: ❤️ ✅ Tasks are just Python functions (no new language to learn!) ✅ The UI makes it easy to see what ran, what failed, and why ✅ Retry logic and alerting are built right in Is it perfect? No 😬 But for orchestrating complex pipelines, nothing has come close for me. If you're getting into data engineering — Airflow is worth learning early. 🚀 What tool do you use for pipeline orchestration? Drop it below 👇😊 #DataEngineering #ApacheAirflow #DataPipelines #ETL #LearningInPublic #Python
To view or add a comment, sign in
-
A few weeks ago, I took on an end-to-end Docker project thinking it would be straightforward. I came out thinking completely differently about how software is actually built. Here is what I built: 🐳 A multi-container data architecture, a Streamlit application talking to a PostgreSQL database, orchestrated seamlessly with Docker Compose. The data flow looks like this: → A user uploads a CSV through the Streamlit UI. → SQLAlchemy processes and persists the data into PostgreSQL. → The database serves summary statistics back to the interface. → All of this runs in isolated containers that find each other by name, not by IP. Three things I will never forget from this project: 1️⃣ Your Dockerfile order is a caching strategy, not a style choice. Code changes constantly - so COPY . . goes at the bottom. Dependencies change rarely, so pip install goes at the top. Flip that, and every tiny typo fix costs you a full 3-minute rebuild. 2️⃣ Volumes are what give databases a memory. Containers are ephemeral by design. Without a named volume mounted to /var/lib/postgresql/data, every docker compose down takes your data with it. 3️⃣ POSTGRES_HOST=app-db just works. Docker Compose creates an internal network and handles DNS resolution automatically. Your app never needs to know a container's dynamic IP address. Shoutout to the Data Engineering Community for being part of this space that keeps pushing me to go deeper than just "making things work." That accountability matters. If you are learning Docker or containerizing your first data pipeline, the repo is open. Clone it, break it, learn from it. Links to the code and my full architectural write-up are in the comments! 👇 #DataEngineering #Docker #Data #Python #Streamlit #PostgreSQL #LearningInPublic #SoftwareArchitecture #DevOps #BuildInPublic
To view or add a comment, sign in
-
-
# NeoSQLite is now 3x faster than MongoDB on the same hardware But the real story isn't just the numbers—it's how we got here. ## From Python Fallbacks to SQL-Native: A 12-Month Journey When we started building NeoSQLite, we took a "get it working first" approach. Complex aggregation operations like `$in`, `$nin`, `$elemMatch`, and `$project` were handled by Python fallbacks—meaning we'd fetch ALL documents from SQLite, then filter them in Python. It worked, but it was slow. Then we started dogfooding with **Neo-Bloggy** (our blogging platform that runs entirely on NeoSQLite instead of MongoDB). Production usage revealed the pain points real users would face. ## The SQL-Tier Revolution (v1.14.x series) Over the last 6 releases, we systematically moved operations from Python into native SQL: **v1.14.0** — Moved `$project` stage to SQL-tier (no more loading full documents just to project 2 fields) **v1.14.9-10** — Fixed `$elemMatch` and `$in`/`$nin` on array fields. Instead of returning 0 results or unfiltered documents, they now use proper SQL CTE patterns with `json_each()` **v1.14.11** — Added native regex operators (`$regexMatch`, `$regexFind`) directly in SQL tier using custom SQLite functions. Array operators got **10-100x speedup** with CTE patterns **v1.14.12** — Fixed the "malformed JSON" edge case (because even SQLite has its quirks with `json_each()` syntax!) ## The NX-27017 Milestone In v1.13.0, we shipped something unexpected—a **MongoDB Wire Protocol Server** that lets PyMongo connect directly to SQLite. No code changes needed. This isn't just an API clone; it's full wire protocol compatibility with 100% test parity against real MongoDB. ## What This Means - **3x faster** than MongoDB for typical operations - **30-300x faster** for index operations (SQLite's B-trees are fast) - **Zero network overhead** — embedded database, embedded performance - **Drop-in replacement** — existing PyMongo code works unchanged ## The Lesson Building a database isn't about getting the API right. It's about getting the execution model right. Every time we pushed logic from Python down to SQL, we got closer to SQLite's raw performance while maintaining MongoDB's developer experience. The 3x number isn't theoretical—it's measured against a real MongoDB instance in our CI pipeline, running 54 different operation categories across 10 iterations each. Want to try it? ```bash pip install neosqlite ``` Or check out the discussion: https://lnkd.in/gAdPAeCc
To view or add a comment, sign in
-
Sail 0.6 is out. Three new surfaces, all Arrow-native: - Arrow UDFs from Spark 4. Python functions decorated with @arrow_udf run against Arrow data directly. Because Sail executes Python Arrow UDFs inline within the same Rust process, it enables Python code to run at native speed with zero-copy data transfer, avoiding the separate-process overhead inherent in Spark's architecture. - Variant type in SQL. Parse JSON into a variant with parse_json, then query it with variant_get and path expressions. Lookups run against binary data instead of re-parsing strings. - Arrow Flight SQL server on the wire. The first alternative protocol Sail supports beyond Spark Connect. Start a Flight SQL server powered by Sail and connect from any Flight SQL client to query it directly. Read the full post: https://lnkd.in/gdAJ28gw
To view or add a comment, sign in
-
🚀 #PythonJourney | Day 151 — BREAKTHROUGH: API Fully Functional & First Successful Request Today marks a major milestone: **the URL Shortener API is LIVE and responding correctly!** After 8 days of building and debugging, I finally got the first successful POST request working. This breakthrough moment proves that all the pieces fit together. Key accomplishments: ✅ Fixed critical database type mismatch: • PostgreSQL was storing user_id as VARCHAR • SQLAlchemy was trying to query with UUID • Solution: Dropped volumes, rebuilt schema from scratch ✅ Fixed Pydantic response validation: • Model had clicks_total, database had total_clicks • Docker image was caching old code • Solution: Forced rebuild of container image ✅ First successful API call: • POST /api/v1/urls now returns proper JSON • Short code generated automatically • URL stored in database correctly • Full response validation passing ✅ Production-ready API endpoints confirmed: • Authentication working (API key validation) • Request validation (Pydantic models) • Database operations (CRUD) • Error handling (proper HTTP status codes) • Response serialization (JSON output) ✅ Lessons learned about debugging: • Always check the actual container logs • Volume management is critical in Docker • Type consistency across layers matters • Docker caching can hide recent changes • Patience and persistence beat quick fixes What happened today: → Identified the root cause through careful log analysis → Understood the full request/response cycle → Learned when to reset vs. when to patch → Experienced the joy of a working API! The API now successfully: - Validates user authentication - Creates shortened URLs with unique codes - Stores data in PostgreSQL - Returns properly formatted JSON responses - Handles errors gracefully This is what backend development is about: building reliable systems piece by piece, debugging methodically, and celebrating when it finally works. Status update: - ✅ Backend: FUNCTIONAL - ✅ Database: WORKING - ✅ API Endpoints: RESPONDING - ✅ Authentication: VERIFIED - ⏳ Full test suite: Next - ⏳ Deployment: Next week #Python #FastAPI #Backend #API #PostgreSQL #Docker #Debugging #SoftwareDevelopment #Victory #CodingJourney
To view or add a comment, sign in
-
Explore related topics
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development