Q1-3 2025 Job Listings API and Analysis Project

Hey everyone, Today I would like to share with a programming project I have recently completed. This project takes a dataset of Q1-3 2025 job listings, provided by Workforce Opportunities & Residency Cayman in the Cayman Islands, and ingest it into a postgreSQL database to serve via a fastapi REST API. The API is then demoed in a jupyter notebook which requests data from the API via HTTP and then performs data analysis on it. The structure of the project follows the pyproject template and focuses on a clear separation of concerns and dependency injection through a decoupling of creation and usage of the various components. Logging is implemented throughout the program to provide clear visibility into runtime behavior, facilitating debugging and issue diagnosis. Database Layer Data normalized into a central fact table with dimension tables to reduce redundancy before being ingested into the database using a staging table with transactions/procedural statements to ensure data integrity before being committed. View created for serving fact table to abstract SQL logic from API layer, with indices to reduce join complexity and speed up query parameter filtered searches. API Layer Asynchronous API made using fastapi and psycopg to serve data from the postgres database. Asynchronous connection pool used during API lifespan to reduce startup costs associated with initiating new connections. The main route is the '/jobs' route which exposes the view created at the SQL layer. Query parameters can be passed on all routes for filtered searches. The routes have been integration tested using pytest and the fastapi TestClient to verify that data is correctly being served from the database and that errors are correctly caught. Data Analysis Layer Analysis initally performed directly in the database at the SQL layer using CTE's and Analytic functions. To demo how an end user could utilize the API, data analysis also performed in a jupyter notebook whereby data is fetched from API using requests for HTTP. Data is handled using pandas, modeled using scikit-learn, and visualized using matplotlib. *all sensitive data hashed using hashlib Please visit the github to view the code/documentation as well as the video demonstration I made below. If you have any feedback, or would like to chat to me directly in more detail regarding my project, please leave a comment or send me a message. Thanks! github: https://lnkd.in/ex8jnnur https://lnkd.in/e7D7ip-d

To view or add a comment, sign in

Explore content categories