Automated Positive News Aggregator with Python & Node.js

3mo

Scrolling through negative headlines every day? I decided to change that. 𝐝𝐚𝐢𝐥𝐲𝐩𝐨𝐬𝐢𝐭𝐢𝐯𝐞.𝐧𝐞𝐰𝐬 automatically highlights positive news that actually matters. It’s an automated data pipeline + backend system that aggregates news from 16+ trusted sources and uses AI to filter noise and highlight articles with genuine positive human impact. The frontend is intentionally simple. It focuses on highlighting the day's most relevant positive news. The real work is in the pipeline, data flow, and backend architecture. 🔧 What I built: 🧠 𝐃𝐚𝐭𝐚 𝐏𝐢𝐩𝐞𝐥𝐢𝐧𝐞 (Python 3.12) A fully automated pipeline running 24/7: • Hourly and daily jobs fetching RSS feeds from BBC, Nature, MIT, Forbes, The Guardian, HBR and 10+ other sources • GPT-4o-mini scoring each article on positivity and human impact (0-1 scale) • Batch AI processing (20 articles per call, structured JSON output), bringing AI cost down to cents per day • PostgreSQL deduplication by URL (ON CONFLICT DO NOTHING) • Graceful degradation. One source failing never stops the pipeline 🧩 𝐑𝐄𝐒𝐓 𝐀𝐏𝐈 (Node.js + TypeScript) • Express.js with strict TypeScript • Helmet security headers and rate limiting (100 req / 10 min) • PostgreSQL connection pooling (pg) • Flexible filters by score, category, source, date range, country and language ☁️ 𝐈𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 (AWS) • Single EC2 instance running Docker Compose • 4 containers: PostgreSQL, pipeline, API and Nginx • Nginx as reverse proxy and static file server • HTTPS with Let’s Encrypt (Certbot auto-renewal) • Custom domain 🎯 𝐊𝐞𝐲 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐝𝐞𝐜𝐢𝐬𝐢𝐨𝐧𝐬 • Dual AI scoring. Positivity alone isn’t enough. • Final score = positivity × 0.45 + human impact × 0.55 • Batch-first AI design to reduce cost and latency. • Database-level guarantees instead of application logic. • Fully containerized services for isolation and reproducibility. 📊 𝐑𝐞𝐬𝐮𝐥𝐭 A production system running 24/7, automatically curating positive news from around the world. Total infrastructure cost under $3 per month. 🔗 In the comments I built this project to practice backend and data engineering concepts on a real-world problem. Feedback is very welcome. #Python #TypeScript #NodeJS #PostgreSQL #Docker #AWS #OpenAI #Backend

1 Comment

Rafhael B. 3mo

🌐 Live: https://dailypositive.news 💻 GitHub: https://github.com/rafhaelbrum/dailypositive-news

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Bhuman Soni
3mo
Report this post
🚀 Just published a new deep‑dive for backend engineers and AI builders! I walked through how to build a DuckDuckGo Search Storage API using FastAPI + SQLModel — a clean, modern stack that’s perfect for lightweight search pipelines, data ingestion, and AI‑powered applications. This piece breaks down: 🔹 Designing a modular API architecture 🔹 Integrating DuckDuckGo search programmatically 🔹 Persisting structured results with SQLModel 🔹 Clean async patterns for high‑performance workloads 🔹 Why this pattern is ideal for LLM agents, retrieval layers, and microservices If you're exploring search augmentation, RAG pipelines, or lightweight data services, this is a practical blueprint you can adapt instantly. 📘 Read the full blog: Building a DuckDuckGo Search Storage API with FastAPI and SQLModel https://lnkd.in/gigV2dAq

Building a DuckDuckGo search storage app with FastAPI - Bhuman https://mydaytodo.com
Like Comment
To view or add a comment, sign in
Safouan Benzeyan
3mo Edited
Report this post
🚀 Milestone Reached: Connecting the Brain and the Memory 🧠💾 Following up on my last post about the Local AI Assistant Platform, I’ve hit a major development milestone: the integration of Spring Security (JWT) and a PostgreSQL-backed persistence layer into the Hexagonal Architecture. It’s one thing to have a chat working in memory; it’s another to build a production-ready system where: 🛡️ Security is a Shield: We’ve moved to a Stateless JWT architecture. Using a custom UserDetailsService, every request is authenticated against our PostgreSQL database. No valid token, no AI access. 📂 Memory is Permanent: Conversations aren't just transient. Every interaction is mapped from Domain objects to JPA Entities, ensuring the AI "remembers" context even after a docker container restart. 🏗️ Clean Architecture Prevails: The core AI logic remains "pure." Whether I’m storing history in Postgres or eventually scaling to a Vector store, the domain logic stays framework-independent. What’s now live in the stack: ✅ Orchestration: Spring AI (providing the seamless bridge between Java and our local models). ✅ Auth: Stateless JWT with BCrypt password hashing for secure identity management. ✅ Database: PostgreSQL (Dockerized) handling user data and session history. ✅ Architecture: Fully implemented Ports and Adapters for clean data mapping. ✅ Local LLM: Powered locally (Dockerized) by Ollama for 100% data privacy. The Big Picture: Moving toward RAG 🔍 Beyond simple history, I’m designing the foundation for Retrieval-Augmented Generation (RAG). Next stop: Building the Angular frontend to bring this all together. I'm curious—for those of you building AI apps, are you using Spring AI for its portability, or sticking to standard REST templates? And are you jumping into Vector stores early to support RAG from day one? #SoftwareArchitecture #SpringAI #SpringBoot #SpringSecurity #HexagonalArchitecture #Ollama #Java #PostgreSQL #FullStack #AIEngineering #RAG #GenerativeAI #Docker
Like Comment
To view or add a comment, sign in
Nil Monfort
3mo
Report this post
I had a RAG system. It worked. It was slow. Processing 100 documents took 500 seconds. Users were waiting. That's the moment I started digging. Fix #1: Parallel Embedding Processing The bottleneck was obvious. I was calling the embedding API one document at a time. Sequential. Waiting for each result before moving to the next. asyncio.gather() changed everything. Instead of: call API → wait → call API → wait → repeat. I did: call all APIs at once. Grab all results together. Result: 50 seconds dropped to 8 seconds. 10x faster for 100 documents. Fix #2: True Async with AWS SDK I thought my function was async. It wasn't. boto3 (the AWS SDK) is synchronous. When I called send_message(), it blocked the entire event loop. Other requests piled up waiting. asyncio.to_thread() was the fix. Wrap the sync call in a thread pool. The event loop stays free. Uploads no longer freeze the app. Impact: 60% latency reduction. Document processing happens in the background. Fix #3: Push Filtering to the Database I was fetching 1000 vectors from Qdrant, then filtering them in Python. Silly in hindsight. I was doing it anyway. The fix: Qdrant supports native database-level filters. Filter there. Return only what you need. Query time: 500ms → 100ms. 5x faster. Network transfer drops. Processing drops. Three small changes. Three massive wins. 10x from parallelization. 60% from true async. 5x from database filtering. None required rewriting the system. The lesson? Look for three things: waiting (parallelization), blocking (true async), and where you're doing work that belongs elsewhere (push down to the database). Most performance problems aren't architectural. They're patterns. Find the waits. Eliminate them. What's your biggest RAG bottleneck right now?

10 Comments
Like Comment
To view or add a comment, sign in
Shehroz Qadri
3mo Edited
Report this post
Want to build your own RAG & Vector Search app from scratch? Here is the blueprint. I’ve open-sourced a new project demonstrating how to implement search patterns. The goal was to move beyond simple vector lookups and implement Hybrid Search—combining the precision of keyword matching with the semantic understanding of vector embeddings. Key Technical Implementations: 1. Data Ingestion Pipeline: Python scripts using langchain and voyageai to chunk, embed, and index PDF data. 2. Vector Search: Leveraging MongoDB Atlas Vector Search for dense retrieval. 3. RRF (Reciprocal Rank Fusion): A custom aggregation pipeline to merge and rank results from keyword and vector queries. 4. RAG Architecture: Context-aware generation using Llama 3 on Groq. This project is a great reference for anyone looking to integrate AI search features into a Node.js application. Grab the code from the Repo link, and build your own AI search engine this weekend: https://lnkd.in/gzJuDbJ7 #GenAI #LLM #MongoDB #VoyageAI #Groq

GitHub - shehrozqadri/vector-search-workshop github.com

1 Comment
Like Comment
To view or add a comment, sign in
Honney Walia
3mo
Report this post
🚀 Just Built: Chat with Your Codebase using RAG + LLMs 💬 Ever opened a large GitHub repository and thought: “Where is this logic even implemented?” 😅 So I built a solution. I recently developed Chat with Your Codebase, a full-stack AI application that lets you ask natural language questions about any GitHub repository and get grounded answers directly from the code. 🔍 What it does Load a GitHub repository Parse and chunk Python, JavaScript, and TypeScript files Index code using vector embeddings Ask questions like: “Where is authentication handled?” “Explain this function” Get answers with exact code references & line numbers 🧠 Tech Stack RAG (Retrieval-Augmented Generation) Streamlit (UI + deployment) LangChain MongoDB Atlas (Vector Search) Tree-sitter for code parsing Groq LLM API GitPython ✨ Why I built this Reading unfamiliar codebases is slow Documentation is often outdated AI is most powerful when grounded in real source code This project helped me deeply understand: Production-ready RAG pipelines Vector databases in real apps Cloud deployment challenges (Streamlit, secrets, envs) Parsing multi-language codebases properly 🔗 Demo & Code 👉 GitHub: https://lnkd.in/gAp-6wkj 👉 Live App: https://lnkd.in/gEcUqTcz Would love feedback, ideas, or collaboration thoughts 🙌 If you’re working with large codebases or LLMs, let’s connect! #AI #LLM #RAG #Streamlit #LangChain #MongoDB #Groq #OpenSource #MachineLearning #DeveloperTools
Like Comment
To view or add a comment, sign in
Sai Krishna Kokkula
3mo
Report this post
🚀 Build Update: I shipped a full-stack RAG application. I built DebugAI to challenge myself to create a production-ready AI application from scratch. The goal was simple: Create a tool that parses error logs and uses RAG (Retrieval-Augmented Generation) to find relevant Stack Overflow discussions. It’s a practical implementation of modern AI engineering patterns, focusing on performance and observability. 🛠️ The Tech Stack (The "Real" Work): Backend: FastAPI (Python) - Async architecture Database: Supabase (PostgreSQL + pgvector) Performance: Redis Caching (for instant results) Observability: Custom Cost Tracking & Analytics Search: Semantic search using OpenAI embeddings Frontend: Next.js 14 + Tailwind CSS 🔮 What's Next? This project was step one. Now that I have the foundation, I'm planning to experiment with Agentic Workflows. My next goal is to build a "Self-Evolving AI Engineer" (SEAE). The idea is to move beyond simple Retrieval-Augmented Generation (RAG) and try to build agents that can self-diagnose and learn from feedback loops. It's a big learning curve, but that's the fun part. Check out Github repository : https://lnkd.in/e5hj4gVx Check demo: https://lnkd.in/em_9MVhK #SoftwareEngineering #FastAPI #RAG #Supabase #Redis #ProjectShowcase #LearningInPublic #ai
Like Comment
To view or add a comment, sign in
Abdulhamid Sonaike
3mo
Report this post
I built a full-stack AI-powered churn prediction system in one sitting. Here's what it does and why it matters. (Emphasis on "why it matters.") 1. The problem is real Companies lose customers every day without seeing it coming. By the time they notice, it's too late. 2. So I built an early warning system Upload customer data. Train a Random Forest model. Get risk scores. Every customer scored. Every risk explained. Every action suggested. 3. The stack is simple but complete Next.js frontend. FastAPI backend. PostgreSQL. scikit-learn. One docker compose up and the whole thing runs. 4. It doesn't just predict, it explains Each customer gets a risk band: Low, Medium, or High. Plus the top reasons driving their churn risk. Plus suggested retention actions your team can act on today. 5. I built it to learn, not just to ship Could I have used a SaaS tool? Sure. But building it taught me more about ML pipelines, full-stack architecture, and data modeling than any course ever did. 6. The lesson I keep relearning The best way to learn something is to build something real with it. Not tutorials. Not theory. A real system that solves a real problem. Tech breakdown: - Frontend: Next.js + TypeScript + Tailwind CSS - Backend: FastAPI + SQLAlchemy + Alembic - ML: scikit-learn RandomForestClassifier - Infra: Docker Compose + PostgreSQL
Like Comment
To view or add a comment, sign in
Muhammad Irtaza Ghaffar
2mo
Report this post
Regex: a double-edged sword that can be both elegantly concise and catastrophically inefficient. I recently faced this when a seemingly innocuous regex in a Node.js automation service brought a critical backend process to its knees. On specific, longer input strings, the regex would trigger catastrophic backtracking, freezing the CPU for a full 10 seconds. This wasn't a dev environment hiccup; it was happening in production, directly impacting data ingestion and real-time processing pipelines. The impact was immediate and severe. A core MERN stack service, responsible for processing incoming payloads, became a bottleneck. Latency spiked, messages queued up in Kafka, and downstream systems starved for data. What looked like an elegant, compact solution for parsing a complex, semi-structured log format was, in reality, a ticking performance bomb waiting for the right input to detonate. Our resolution wasn't to painstakingly debug and optimize the regex itself. Instead, we opted for a fundamental shift: replacing it entirely with a dedicated string-parsing approach. For this particular use case, building a lightweight state machine parser or leveraging a purpose-built library designed for robust, predictable parsing was the clear winner. This provided an O(N) performance guarantee, ensuring processing time scaled linearly with input size, not exponentially. The lesson here is profound for engineering leaders: while regex excels at simple validation and pattern matching, it’s rarely the right tool for complex, multi-state string parsing, especially when dealing with varied or untrusted input from external systems. Prioritize readability, maintainability, and predictable performance. Sometimes, a few extra lines of explicit code beat a cryptic, one-line regex that hides a potential DoS vulnerability. #SoftwareEngineering #BackendDevelopment #PerformanceTuning #Regex #CatastrophicBacktracking #Nodejs #MERNStack #AWS #Docker #Scalability #SystemDesign #Automation #DevOps #TechLeadership #CTO #Founders #EngineeringManagement #CodeQuality #TechnicalDebt #SoftwareArchitecture #ProblemSolving #DataProcessing #WebDevelopment #AIAutomation #DistributedSystems
Like Comment
To view or add a comment, sign in
Amlan Mishra
3mo
Report this post
🚀 FastAPI Async: async def (The Definition): Defines a "Coroutine." It tells FastAPI that this function will perform I/O operations and should not block the main execution thread. await (The Yield): The pause button. It tells the Event Loop: "I’m waiting for data (DB/API). Go handle other incoming requests while I sit idle." asyncio.gather (Parallel execution): Used to fire multiple coroutines at once. It returns a list of results only after the slowest task completes. asyncio.TaskGroup (The 2026 Standard): Introduced in Python 3.11 for "Structured Concurrency." It manages multiple tasks safely; if one task fails, it cleans up the others automatically to prevent "zombie" processes. 📂 When to use async def vs. def Use async def for I/O-Bound tasks:- Calling external REST APIs (via httpx). Querying databases (via asyncpg, motor, or SQLAlchemy async sessions). Reading/writing files or interacting with cloud storage (S3/GCS). Use def (standard) for CPU-Bound tasks:- Heavy data manipulation (Pandas, NumPy). Image processing or machine learning model inference. Using legacy "blocking" libraries (like requests or psycopg2). Note: FastAPI automatically offloads these to a separate threadpool so they don't freeze your API. 💡 Best Practices & Golden Rules No time.sleep in Async: Never use time.sleep() inside an async def block. It stops the entire event loop (and your whole server). Use await asyncio.sleep() instead. Use Async Drivers: You only get the performance benefits of async if your database driver is also asynchronous. Using a blocking driver inside an async function is a performance bottleneck. Leverage TaskGroup for Reliability: Move away from gather for complex workflows. TaskGroup provides better error handling and ensures that if one part of your parallel logic crashes, the whole group is handled gracefully. Keep it Non-Blocking: If you have a massive for loop that takes 2 seconds to calculate something, don't put it in an async def. Move that logic to a def function or a background task.
Like Comment
To view or add a comment, sign in
Karan Gupta
3mo Edited
Report this post
𝟱𝟬𝟬 𝘂𝘀𝗲𝗿𝘀 𝗵𝗶𝘁 𝗺𝘆 𝗮𝗶 𝗯𝗮𝗰𝗸𝗲𝗻𝗱 𝗮𝘁 𝘁𝗵𝗲 𝘀𝗮𝗺𝗲 𝘁𝗶𝗺𝗲 𝗮𝗻𝗱 𝗶𝘁 𝗰𝗿𝗮𝘀𝗵𝗲𝗱. Recently, I launched an AI-driven automation tool to handle real-time tasks for multiple users. The first day was chaotic: 500+ users hit the system at once, AI responses slowed, and some data got corrupted. 𝗪𝗛𝗔𝗧 𝗪𝗘𝗡𝗧 𝗪𝗥𝗢𝗡𝗚: . High concurrency exposed race conditions . Standard ORM saves were not thread-safe . No caching → every request hit the database 𝗛𝗢𝗪 𝗜 𝗙𝗜𝗫𝗘𝗗 𝗜𝗧: . Row-level locking → 0 race conditions . Redis caching → latency under 200ms .Optimized Django backend → seamless handling of hundreds of simultaneous requests 𝗥𝗘𝗦𝗨𝗟𝗧: ✅ AI tasks automated in 1.8 seconds ✅ Data integrity 100% ✅ System now scales effortlessly 𝗧𝗔𝗞𝗘𝗔𝗪𝗔𝗬𝗦 𝗙𝗢𝗥 𝗔𝗟𝗟 𝗗𝗘𝗩𝗘𝗟𝗢𝗣𝗘𝗥𝗦: . Don’t just write code. Ask: “What breaks if users grow 10× overnight?” . Real expertise = keeping systems alive under pressure #BackendEngineering #PythonDeveloper #Scalability #SystemDesign #Python #Django #AI

1 Comment
Like Comment
To view or add a comment, sign in

16 followers

1 Post

View Profile Connect

Automated Positive News Aggregator with Python & Node.js

More Relevant Posts

Explore related topics

Explore content categories