Hey everyone, Today I would like to share with a programming project I have recently completed. This project takes a dataset of Q1-3 2025 job listings, provided by Workforce Opportunities & Residency Cayman in the Cayman Islands, and ingest it into a postgreSQL database to serve via a fastapi REST API. The API is then demoed in a jupyter notebook which requests data from the API via HTTP and then performs data analysis on it. The structure of the project follows the pyproject template and focuses on a clear separation of concerns and dependency injection through a decoupling of creation and usage of the various components. Logging is implemented throughout the program to provide clear visibility into runtime behavior, facilitating debugging and issue diagnosis. Database Layer Data normalized into a central fact table with dimension tables to reduce redundancy before being ingested into the database using a staging table with transactions/procedural statements to ensure data integrity before being committed. View created for serving fact table to abstract SQL logic from API layer, with indices to reduce join complexity and speed up query parameter filtered searches. API Layer Asynchronous API made using fastapi and psycopg to serve data from the postgres database. Asynchronous connection pool used during API lifespan to reduce startup costs associated with initiating new connections. The main route is the '/jobs' route which exposes the view created at the SQL layer. Query parameters can be passed on all routes for filtered searches. The routes have been integration tested using pytest and the fastapi TestClient to verify that data is correctly being served from the database and that errors are correctly caught. Data Analysis Layer Analysis initally performed directly in the database at the SQL layer using CTE's and Analytic functions. To demo how an end user could utilize the API, data analysis also performed in a jupyter notebook whereby data is fetched from API using requests for HTTP. Data is handled using pandas, modeled using scikit-learn, and visualized using matplotlib. *all sensitive data hashed using hashlib Please visit the github to view the code/documentation as well as the video demonstration I made below. If you have any feedback, or would like to chat to me directly in more detail regarding my project, please leave a comment or send me a message. Thanks! github: https://lnkd.in/ex8jnnur https://lnkd.in/e7D7ip-d
Q1-3 2025 Job Listings API and Analysis Project
More Relevant Posts
-
Day 10/30 — Social Network Analyzer (Python + MySQL) 🔹 Project Overview: Developed a Social Network Analyzer system using Python and MySQL to model user relationships, analyze connections, and recommend new links using graph-based algorithms. 🔹 Tools Used: Python | MySQL | Data Structures | Graph Algorithms | NetworkX | Matplotlib 🔹 Key Features: • Designed relational database to manage users and connections • Built graph structure to represent real-world relationships • Implemented BFS to find shortest connection paths • Identified mutual connections between users • Developed recommendation engine based on shared connections • Added network visualization for interactive analysis • Created CLI-based interface with clean and colored output 🔹 What I Learned: • Applying graph algorithms in real-world scenarios • Working with MySQL for structured data management • Building scalable backend logic using Python • Visualizing relationships using network graphs • Designing modular and maintainable code 🔗 GitHub Repository: https://lnkd.in/dpSCzhQG Would appreciate your feedback and suggestions 🙌 #30DaysOfCoding #PythonProjects #SQL #DataStructures #BackendDevelopment #LearningByDoing
To view or add a comment, sign in
-
I was loading CSV files into SQL Server. It was slow. Then I switched to BULK INSERT. 💥 Everything changed. BULK INSERT is a native SQL method. It is built for speed. But the real power comes when you combine it with Python. 𝗪𝗵𝗮𝘁 𝗱𝗶𝗱 𝗜 𝗱𝗼? ✔️ Python to handle multiple CSV files ✔️ Python to clean and normalize data ✔️ BULK INSERT for fast loading into SQL Server This combination is simple. And very powerful. Python manages flexibility. SQL manages performance. 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁: a faster ingestion process a cleaner pipeline a more reliable system 𝗗𝗮𝘁𝗮 𝗶𝗻𝗴𝗲𝘀𝘁𝗶𝗼𝗻 𝗶𝘀 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗹𝗼𝗮𝗱𝗶𝗻𝗴. 𝗜𝘁’𝘀 𝗮𝗯𝗼𝘂𝘁 𝘀𝗽𝗲𝗲𝗱 𝗮𝗻𝗱 𝗰𝗼𝗻𝘁𝗿𝗼𝗹. Curious how it works? 🔗 GitHub repository: https://lnkd.in/dwjwP-bh P.S. I know BULK sounds a lot like “HULK”… not very original, but I like it 😄
To view or add a comment, sign in
-
-
I'm often asked how to handle edge cases when building data layers with MongoDB and Python. Simple CRUD is great, but real-world apps need robust query patterns and clean architecture. Working in VS Code on this project, I focused on layering logic. Instead of calling the database directly from the application layer, I used a modular service pattern (like user_service.py calling db_utils.py). A few key practices I implemented: ✅ Robust Error Handling: Ensuring a clean return for cases like invalid ObjectIds, which prevents app crashes. ✅ Modular Query Logic: Abstracting queries into specific, reusable functions (e.g., get_users_by_college) makes the main logic much easier to read and test. ✅ Automated Postman-Free Testing: In my terminal, you can see I'm using curl and echo to script a "Full CRUD Test Cycle." This is a fast, reproducible way to verify APIs during development. What's your go-to pattern for structuring database interactions in your applications? Do you stick with raw queries, ORMs, or custom data access objects? Let me know in the comments! GitHub link - > https://lnkd.in/dASzkj7T #mongodb #python #development #dataservices #vscode #backend #programming #softwareengineering
To view or add a comment, sign in
-
-
I built a TCG database in Python to organize and analyze card data pulled from a public API. I stored everything in a SQLite database for querying and deeper analysis. https://lnkd.in/gGi36EBa I quickly ran into some data challenges: * Some cards have multiple Pokémon in their name (e.g. Pheromosa & Buzzwole), which caused duplicate entries when querying by individual names. * Certain special variants (like promotional or championship cards) were not consistently represented in the API. * The API data structure required careful JSON parsing to extract pricing across different variants (holofoil, reverse holo, etc.). I am still iterating on the project, but it taught me about API limitations and was a great exercise in combining Python, APIs, and SQL. The attached photo is the top 10 most expensive Ultra Beast tagged cards I found in the dataset:
To view or add a comment, sign in
-
-
I used to think SQL and Python were separate skills… Now I realize — they’re incomplete without each other. Because in real-world systems: 👉 SQL stores and retrieves data 👉 Python processes and automates it 💡 Today I integrated SQL with Python And this unlocked a completely new level of understanding. 📊 What this combination allows you to do: • Store structured data efficiently (SQL) • Query large datasets quickly • Process results dynamically (Python) • Build complete data workflows 👉 This is how real applications are built 💡 Real-world example: E-commerce system 👇 • Store orders in database (SQL) • Query revenue by category • Load results into Pandas • Use Python to automate reports 👉 End-to-end data flow Before this: ❌ SQL = only querying ❌ Python = only scripting After this: ✅ SQL + Python = complete system 💡 Biggest realization: Tools don’t create value… 👉 Integration does 📌 Mistakes I learned: • Doing everything in Python (slow) • Writing inefficient SQL queries • Not using database strengths properly 👉 Right tool + right job = real efficiency 💬 Let’s discuss: Do you prefer doing aggregations in SQL or Pandas — and why? #Python #SQL #DataEngineering #PythonDeveloper #BackendDevelopment #DataAnalytics #SQLtoPython #CodingJourney #LearnInPublic #DevelopersIndia #Tech #100DaysOfCode #BuildInPublic #PythonTutorial
To view or add a comment, sign in
-
Database Migration project -Moves data from SQL Server to PostgreSQL using Python. -An ETL pipeline that extracts, transforms, and loads data across databases -Clean error handling so nothing silently breaks GITHUB: https://lnkd.in/dZutvSBY If you're into data engineering, ETL, or just like building things with Python — I'd love to connect. #DataEngineering #Python #ETL #PostgreSQL #SQLServer
To view or add a comment, sign in
-
-
Building your first data pipeline with Python + SQL is easier than you think. You don’t need complex tools to get started. Just the right flow 👇 1️⃣ Start with the connection Use Python to connect to your database: → SQLAlchemy → pandas Define your source and target tables clearly 2️⃣ Extract & Transform in one flow → Write a clean SQL query to extract data → Load it into a pandas DataFrame → Apply transformations (cleaning, joins, calculations) 3️⃣ Load & schedule → Use df.to_sql() to load data back → Wrap everything in a single .py file → Schedule it using cron (or Airflow later) That’s it. You’ve built your first pipeline using Python + SQL. Start simple. Focus on understanding the flow. Tools can come later. But many people struggle at this stage. They focus too much on tools, ignore the fundamentals, and underestimate SQL. This often leads to random learning, no clear structure, no preparation strategy… And when you’re stuck in that loop, having the right mentor can make a huge difference. That’s why, if you want to go deeper into building real-world pipelines, I recommend checking out Bosscoder Academy’s Data Engineering program. They focus on fundamentals, projects, and system-level thinking. 🔗 Check their program here: bcalinks.com/39Hf27EV Every advanced pipeline starts with a simple one. #DataEngineering #Python #SQL
To view or add a comment, sign in
-
-
Working with data often means jumping between tools—but what if you could bring everything together seamlessly? Recently, I explored integrating SQL with Python, and it completely changed how I approach data analysis. Instead of manually exporting data, I connected directly to my SQL Server database using Python. With just a few lines of code, I was able to: 🔹 Establish a secure connection using pyodbc 🔹 Fetch data directly from SQL tables 🔹 Convert the data into a pandas DataFrame 🔹 Prepare it for further analysis and visualization Here’s a small part of the process I used: Defined connection parameters (server, database, driver) Created a connection string Connected to SQL Server using pyodbc Queried system tables to explore available data What I found most valuable is how this integration removes friction from the workflow. No more repetitive exports—just real-time access to structured data. This approach opens up powerful possibilities: ✔ Automating data pipelines ✔ Performing advanced analysis in Python ✔ Creating visualizations with libraries like matplotlib ✔ Building scalable data workflows For anyone working with data, combining SQL and Python is not just a technical skill—it’s a productivity booster. #DataAnalytics #Python #SQL #DataScience #Automation #LearningJourney
To view or add a comment, sign in
-
Every Python Developer Must Know These 30 Concepts 👇 1. Variables & Data Types (int, float, list, tuple, dict, set) 2. Mutable vs Immutable Objects 3. List Comprehensions 4. Generators & yield 5. Functions & Lambda Expressions 6. *args and **kwargs 7. Decorators 8. Closures in Python 9. Recursion 10. Exception Handling (try, except, finally, custom exceptions) 11. File Handling (read, write, context managers) 12. Context Managers (with statement) 13. Object Oriented Programming (Classes, Objects) 14. Inheritance & Multiple Inheritance 15. Magic Methods (dunder methods like __init__, __str__) 16. Dataclasses 17. Modules & Packages 18. Virtual Environments (venv) 19. Package Management (pip) 20. Iterators & Iterable Protocol 21. Multithreading vs Multiprocessing 22. Async Programming (asyncio, async/await) 23. GIL (Global Interpreter Lock) 24. Memory Management & Garbage Collection 25. Logging in Python 26. Testing (unittest, pytest) 27. Working with APIs (requests, JSON handling) 28. Serialization (pickle, JSON) 29. Pythonic Coding (PEP 8, idiomatic Python) 30. Performance Optimization (profiling, caching, time complexity)
To view or add a comment, sign in
-
Ever stuck with unstructured data in Excel sheets or spreadsheets and needed to push that messy data into a structured database? 🤯 Recently, I faced a similar challenge, a large spreadsheet filled with inconsistent, unstructured data that needed to be transformed into multiple clean tables. Doing it manually would’ve been time consuming and error prone. Here comes Python 🐍 Instead of struggling with manual cleanup, I built a small data pipeline using Python to automate the entire process from parsing and structuring the data to inserting it directly into a PostgreSQL Supabase database. What could’ve taken hours was reduced to minutes with better accuracy and scalability. As software engineers, knowing the right tool can turn a messy problem into an elegant solution. #Python #DataEngineering #Automation #PostgreSQL #Supabase #SoftwareEngineering
To view or add a comment, sign in
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development