Marcos Vinicius Thibes Kemer’s Post

✅ #PythonJourney | Day 154 — Test Suite Complete: 14 Tests, 100% Endpoint Coverage Today: Completed the comprehensive test suite. Every API endpoint now has automated tests validating behavior, error handling, and authentication. Key accomplishments: ✅ Full test coverage (14 tests): • Health Check: 1 test • Create URL: 4 tests (success, invalid format, no auth, invalid auth) • List URLs: 3 tests (empty, with data, no auth) • Get URL Details: 2 tests (success, not found) • Delete URL: 2 tests (success, not found) • Get Analytics: 2 tests (success, not found) ✅ Testing patterns implemented: • Fixture-based setup (conftest.py) • Isolated database per test • Mock user creation • Authentication validation • Error condition testing • Status code verification ✅ All edge cases covered: • Valid requests return proper responses • Invalid inputs rejected with 422 • Missing auth returns 401 • Non-existent resources return 404 • Successful deletes return 204 • Analytics properly calculated ✅ Test execution: • 14 passed in 2.51s • Zero flaky tests • All database operations isolated • Clean setup and teardown What I learned today: → Comprehensive testing catches edge cases early → Fixtures reduce boilerplate and improve maintainability → Test isolation prevents hidden dependencies → Fast tests enable rapid development cycles → Good test names document expected behavior The test suite now validates: - ✅ API contract (request/response format) - ✅ Authentication (API key validation) - ✅ Authorization (users see only their data) - ✅ Error handling (proper HTTP status codes) - ✅ Business logic (URL creation, deletion) - ✅ Data persistence (database operations) This is production-grade testing: - Every endpoint tested - Every error case covered - Fast feedback on code changes - Confidence to refactor safely - Documentation through tests Current status: - ✅ Backend: Production-ready - ✅ Tests: 14/14 passing (100%) - ✅ Code coverage: All endpoints - ✅ API: Fully validated - ⏳ Deployment: Next (GCP) From zero to production-grade in 154 days. The backend is ready for real-world use. Next: Deploy to Google Cloud Platform (GCP). #Python #Testing #Pytest #Backend #API #Quality #SoftwareDevelopment #TDD #Production

To view or add a comment, sign in

More Relevant Posts

Marcos Vinicius Thibes Kemer
2w
Report this post
✅ #PythonJourney | Day 152 — All API Endpoints Tested & Production Ready Today: Comprehensive endpoint testing. The entire URL Shortener API is now fully operational! Key accomplishments: ✅ Tested 4 critical endpoints: • POST /api/v1/urls → Creates shortened URL with auto-generated short code • GET /api/v1/urls → Returns user's URL list (ordered by newest first) • GET /api/v1/urls/{url_id} → Retrieves specific URL details • GET /{short_code} → Redirects to original URL + tracks click in database ✅ Fixed SQLAlchemy Click model: • Issue: Composite primary key (id + clicked_at) prevented autoincrement • Solution: Made id the sole primary key, clicked_at just a timestamp • Result: Click tracking now works perfectly ✅ Verified full request/response cycle: • Authentication: API key validation ✓ • Input validation: Pydantic models ✓ • Database operations: CRUD complete ✓ • Click tracking: Events recorded correctly ✓ • Response serialization: JSON output perfect ✓ ✅ Data flow confirmed: 1. User creates URL → Stored in PostgreSQL 2. User accesses short code → Redirect happens 3. Click event → Recorded in clicks table 4. URL counter → Incremented automatically 5. JSON response → Properly formatted What I learned today: → Comprehensive testing reveals edge cases early → SQLAlchemy's primary key behavior affects autoincrement → Docker image caching can hide recent code changes → Click tracking requires careful database schema design → Manual testing validates the entire architecture The API is now: - ✅ Accepting requests from multiple sources - ✅ Storing data reliably in PostgreSQL - ✅ Returning proper JSON responses - ✅ Tracking user behavior - ✅ Handling redirects correctly - ✅ Managing database transactions safely Endpoints remaining to test: - GET /api/v1/urls/{url_id}/analytics (analytics aggregation) - DELETE /api/v1/urls/{url_id} (soft delete) Status: API Core is production-ready. Ready for comprehensive test suite (pytest) next. This is what backend development looks like: build → test → debug → iterate → victory! #Python #FastAPI #API #Testing #Backend #PostgreSQL #Docker #SoftwareDevelopment #StartupLife
Like Comment
To view or add a comment, sign in
AlgoAscend

31 followers
3w Edited
Report this post
💻 What “Structured Coding” Means in Claude? Claude is strongest when: Problem is clearly defined Input → Output is well structured You want clean, production-style code 👉 Think: APIs Backend logic Data processing Automation scripts NOT: messy brainstorming code vague “build something cool” prompts 🧠 Why Claude is Good at This Follows instructions very strictly Writes clean, readable code Handles large codebases (huge context) 🔧 Example: Build a Flask API (Structured Backend) 📝 Prompt you give Claude: Build a Flask API for user login with: - JWT authentication - Password hashing - In-memory database - Endpoints: /register, /login, /profile 💡 Claude Output (Typical Quality) from flask import Flask, request, jsonify import jwt, datetime from werkzeug.security import generate_password_hash, check_password_hash app = Flask(__name__) app.config['SECRET_KEY'] = 'secret' users = {} @app.route('/register', methods=['POST']) def register(): data = request.json users[data['email']] = generate_password_hash(data['password']) return jsonify({"message": "User registered"}) @app.route('/login', methods=['POST']) def login(): data = request.json user = users.get(data['email']) if user and check_password_hash(user, data['password']): token = jwt.encode({ 'email': data['email'], 'exp': datetime.datetime.utcnow() + datetime.timedelta(hours=1) }, app.config['SECRET_KEY']) return jsonify({"token": token}) return jsonify({"error": "Invalid credentials"}), 401 Register → Store Hashed Password → Login → Verify → Generate Token → Access System #Python #DataScientist #DataAnalyst #CS #IT #BCA #MCA

1 Comment
Like Comment
To view or add a comment, sign in
Chaiwat Suttipongsakul
2w
Report this post
# NeoSQLite is now 3x faster than MongoDB on the same hardware But the real story isn't just the numbers—it's how we got here. ## From Python Fallbacks to SQL-Native: A 12-Month Journey When we started building NeoSQLite, we took a "get it working first" approach. Complex aggregation operations like `$in`, `$nin`, `$elemMatch`, and `$project` were handled by Python fallbacks—meaning we'd fetch ALL documents from SQLite, then filter them in Python. It worked, but it was slow. Then we started dogfooding with **Neo-Bloggy** (our blogging platform that runs entirely on NeoSQLite instead of MongoDB). Production usage revealed the pain points real users would face. ## The SQL-Tier Revolution (v1.14.x series) Over the last 6 releases, we systematically moved operations from Python into native SQL: **v1.14.0** — Moved `$project` stage to SQL-tier (no more loading full documents just to project 2 fields) **v1.14.9-10** — Fixed `$elemMatch` and `$in`/`$nin` on array fields. Instead of returning 0 results or unfiltered documents, they now use proper SQL CTE patterns with `json_each()` **v1.14.11** — Added native regex operators (`$regexMatch`, `$regexFind`) directly in SQL tier using custom SQLite functions. Array operators got **10-100x speedup** with CTE patterns **v1.14.12** — Fixed the "malformed JSON" edge case (because even SQLite has its quirks with `json_each()` syntax!) ## The NX-27017 Milestone In v1.13.0, we shipped something unexpected—a **MongoDB Wire Protocol Server** that lets PyMongo connect directly to SQLite. No code changes needed. This isn't just an API clone; it's full wire protocol compatibility with 100% test parity against real MongoDB. ## What This Means - **3x faster** than MongoDB for typical operations - **30-300x faster** for index operations (SQLite's B-trees are fast) - **Zero network overhead** — embedded database, embedded performance - **Drop-in replacement** — existing PyMongo code works unchanged ## The Lesson Building a database isn't about getting the API right. It's about getting the execution model right. Every time we pushed logic from Python down to SQL, we got closer to SQLite's raw performance while maintaining MongoDB's developer experience. The 3x number isn't theoretical—it's measured against a real MongoDB instance in our CI pipeline, running 54 different operation categories across 10 iterations each. Want to try it? ```bash pip install neosqlite ``` Or check out the discussion: https://lnkd.in/gAdPAeCc

v1.14.12 · cwt neosqlite · Discussion #80 github.com
Like Comment
To view or add a comment, sign in
Felipe Cardoso Martins
3w
Report this post
🚀 pytest-capquery 0.3 is live! This release was heavily focused on the Developer Experience (DX). We've officially introduced automated SQL snapshot testing, heavily inspired by the Jest framework. Instead of manually hardcoding and maintaining massive SQL strings in your Python tests, you can now seamlessly generate and validate physical .sql execution baselines with zero friction. To dive deeper into the "why," I've just published a new article breaking down the reality of database performance in production. The post covers: - 🚨 A painfully familiar SRE late-night "novel" - 🏢 The cultural divide between Developers and DBAs - 🛡️ Common architectural pitfalls (like the Python GC trap and the JOIN illusion) -💡 How pytest-capquery bridges the gap, complete with a Getting Started guide You can read the full breakdown here: https://lnkd.in/dJzBQ8nV If you care about engineering excellence, catching N+1 regressions in CI, and building robust backend systems, I invite you to check out the repository! Follow the project, drop a star, or open a PR. Together we can do more! 🤝 🔗 https://lnkd.in/d9EJgd8V #Python #SQLAlchemy #Pytest #SRE #EngineeringExcellence #OpenSource #DatabasePerformance

GitHub - fmartins/pytest-capquery: capquery is a pytest fixture for high-precision SQL testing in SQLAlchemy. It captures executed queries, normalizes formatting via sqlparse, and tracks transaction savepoints. Designed for strict query plan validation, it includes built-in debugging tools to easily maintain and update database assertions. github.com

2 Comments
Like Comment
To view or add a comment, sign in
Joe Natoli
1mo Edited
Report this post
⁉️ So, Claude Code's "Source Code" leaked, and it changed the post I was about to make as I wanted to discuss this literally a few days ago. First, though this potentially gives competitors a look behind the curtain, the meat of what makes Claude Code what it is are the MODELS powering it. That source code didn't leak, because there is no source code. Only data, lots of compute, and training. What was actually interesting in this leak were the instructions guiding the model, which were fairly bare bones all things considered. Nothing that prompt engineers weren't already doing when setting up their own systems to get Claude, or any other model, to do whatever their wrapper's goal was. The most interesting thing though was the Python file containing tools for how to use BASH. This was most likely written by Claude itself, which is almost certainly what Anthropic's CEO means when he says Claude is writing itself. Even that shouldn't have surprised anyone, and I'm genuinely shocked I haven't seen more people discussing it. I actually brought this upat SONODAY on Friday (when I should have posted about this). If you are a heavy Claude Code user, familiar with command line tooling, and have a technical understanding of LLMs, what Claude Code was doing was fairly obvious. Every prompt triggered tools like grep to locate words and intents from your prompt. It picked out keywords from your prompt, checked them against its memory of your project, 100% a markdown file it creates as we know now, which is just language, then found the best terms to GREP for, grabbed the context, combined it back with your original prompt, and output code that fit both what the project IS and what you WANT. When I noticed this, I changed how I worked with Claude. My job as the manager was to guide it using language, because language is how it works. Specific phrases, words, and ideas, repeated consistently, because that is exactly what it was grepping for. A real example: when building https://listen.sonoday.com, I had Claude name a component "The Stage" and add that in comments throughout the code. Whenever I wanted to work on it, I just typed "The Stage" into the prompt and we had a shared language. Other developers are attempting to describe what they want, ignoring helping themselves and claude work in the future... instead of creating a shared language for the project. When you understood how Claude Code was actually working, you can now direct it effectively! So no, I am completely unsurprised that the leak showed heavy BASH and command line tooling. That was there if you were observent. What does surprise me is how many people seemingly missed it, or just don't want to talk about it favoring narratives about how AI will totally replace us. What do you think of the leak, and will it change how you use Claude, AI models, or anything else? 🤔 I should really write this type of long form stuff on my personal website don't you think?
Like Comment
To view or add a comment, sign in
Navin Nishanth K S
1w
Report this post
Built a Spark pipeline for 3B rows. Took 4 days. Looked “fine” on paper. It wasn’t. Here are the exact errors and signals that showed up, and what they were really trying to say: 1. "java.lang.OutOfMemoryError: Java heap space" Translation: you tried to process too much data in one place. Typical causes: - Huge partitions - pandas in the middle of a Spark pipeline - reading large CSV shards into driver memory 2. "GC overhead limit exceeded" Translation: JVM is spending all its time cleaning memory, not doing work. Usually means: - memory pressure is already critical - you're close to a crash 3. Executor lost / container killed Seen as: - “ExecutorLostFailure” - “Container killed by YARN / Kubernetes” Translation: - executor hit memory limit or got OOM-killed - often caused by skew or massive shuffle 4. Driver crashes / notebook just dies No clean error sometimes. Translation: - driver ran out of memory - very common when using pandas ("read_csv") on large files 5. Slow jobs with low CPU usage No explicit error. Just pain. Translation: - I/O bottleneck - single-threaded processing hiding inside a “distributed” pipeline 6. Tiny files / too many stages / constant job triggers Symptoms: - many small writes - frequent stage execution Translation: - poor batching strategy - excessive overhead per operation 7. “Response too large” (BigQuery) Translation: - trying to pull data via API instead of exporting - wrong data movement strategy The real lesson None of these are “Spark problems”. They’re architecture problems. - Distributed system → forced through single node - Parallel pipeline → serialized with pandas - Columnar systems → converted to CSV The fix (what actually works) - Keep everything distributed end-to-end - Use Parquet, not CSV - Avoid pandas in big data paths - Repartition aggressively - Let Spark read directly from storage One line summary If your big data pipeline feels slow, somewhere in the middle, you probably turned it into a small data pipeline. And that’s where everything breaks.
Like Comment
To view or add a comment, sign in
Navin Nishanth K S
6d
Report this post
Built a cloud-native ETL pipeline that actually scales (9B rows written in 1hr) - here’s the architecture in a nutshell: → BigQuery for high-performance querying → GCS as a staging layer → Spark for distributed transformations → Snowflake as the final warehouse Key design principles: • Each system is used for what it’s best at — no forced compromises • No single-machine bottlenecks — everything is horizontally scalable • Columnar, strongly-typed formats end-to-end for efficiency • Clean authentication across both execution layers (Python driver + JVM executors) Result: a pipeline that handles billions of rows reliably without falling apart under load. This is what happens when architecture is intentional, not accidental. #DataEngineering #BigData #ETL #CloudArchitecture #Spark #Snowflake #BigQuery

Navin Nishanth K S

Data Consultant @ZoomInfo | 8+ Years in Data Engineering, BI & SaaS | Expert in SQL, Python, Databricks, AWS & Tableau | Proven Leadership in Building Scalable Data Pipelines & Actionable Insights | MBA-Business Analytic
1w

Built a Spark pipeline for 3B rows. Took 4 days. Looked “fine” on paper. It wasn’t. Here are the exact errors and signals that showed up, and what they were really trying to say: 1. "java.lang.OutOfMemoryError: Java heap space" Translation: you tried to process too much data in one place. Typical causes: - Huge partitions - pandas in the middle of a Spark pipeline - reading large CSV shards into driver memory 2. "GC overhead limit exceeded" Translation: JVM is spending all its time cleaning memory, not doing work. Usually means: - memory pressure is already critical - you're close to a crash 3. Executor lost / container killed Seen as: - “ExecutorLostFailure” - “Container killed by YARN / Kubernetes” Translation: - executor hit memory limit or got OOM-killed - often caused by skew or massive shuffle 4. Driver crashes / notebook just dies No clean error sometimes. Translation: - driver ran out of memory - very common when using pandas ("read_csv") on large files 5. Slow jobs with low CPU usage No explicit error. Just pain. Translation: - I/O bottleneck - single-threaded processing hiding inside a “distributed” pipeline 6. Tiny files / too many stages / constant job triggers Symptoms: - many small writes - frequent stage execution Translation: - poor batching strategy - excessive overhead per operation 7. “Response too large” (BigQuery) Translation: - trying to pull data via API instead of exporting - wrong data movement strategy The real lesson None of these are “Spark problems”. They’re architecture problems. - Distributed system → forced through single node - Parallel pipeline → serialized with pandas - Columnar systems → converted to CSV The fix (what actually works) - Keep everything distributed end-to-end - Use Parquet, not CSV - Avoid pandas in big data paths - Repartition aggressively - Let Spark read directly from storage One line summary If your big data pipeline feels slow, somewhere in the middle, you probably turned it into a small data pipeline. And that’s where everything breaks.
Like Comment
To view or add a comment, sign in
Aleksandr Vlasov
1w Edited
Report this post
Everyone talks about MCP. Almost nobody actually builds one. Here is how to build your first MCP server and connect it to Claude (~1 hour) Save this post. You will need it. What is MCP in 1 sentence? It is USB-C for AI. One protocol, any tool. Build it once, and it works with Claude, ChatGPT, Cursor, and any other MCP-compatible client. 1️⃣ The Basics (5 min) 🔹 Install Claude Desktop. 🔹 Install Node.js 18+ or Python 3.10+. 🔹 Pick an SDK: Use Python + FastMCP for your first server. It has the least boilerplate. 2️⃣ Create the project (5 min) 🔹 Bash mkdir my-mcp-server cd my-mcp-server pip install fastmcp Create main. py: 🔹 Python from fastmcp import FastMCP mcp = FastMCP("my-first-server") @mcp.tool() def add_numbers(a: int, b: int) -> int: """Add two numbers together.""" return a + b if __name__ == "__main__": mcp. run() That is it. You have a functioning MCP server. 3️⃣ Connect to Claude (5 min) Open your claude_desktop_config.json: Mac: ~/Library/Application Support/Claude/ Windows: %APPDATA%\Claude\ Add this block: JSON { "mcpServers": { "my-first-server": { "command": "python", "args": ["/absolute/path/to/main. py"] } } } Restart Claude. Ask it: "add 5 and 7 using my tool." 4️⃣ Make it useful (30 min) The calculator is a toy. Replace it with something real - wrap a GitHub API, connect to your local DB, or link your internal docs. The pattern is always: 🔹 Define a tool with @mcp.tool(). 🔹 Write a clear description (Claude uses this to decide when to trigger it). 🔹 Return structured data. Mistakes I made (so you don't have to): 🔹 Never use print() or console.log() in stdio servers. It breaks the protocol. 🔹 Use logging to stderr. 🔹 Descriptions matter more than code. Write docstrings for a human; that is how the AI understands the tool's purpose. 🔹 Validate inputs. Use Pydantic or Zod. Never trust what the AI sends you. 🔹 Restart is mandatory. Claude won't "hot-reload" your config changes. The magic isn't in the protocol itself. It's the realization that you don't need a middleman SaaS for everything anymore. Your AI can now talk directly to your data and infrastructure on your own terms. What tool would you build first? #MCP #AI #Claude #Anthropic #SoftwareEngineering #DeveloperTools #AIAgents #Python #TechLeadership #BuildInPublic
Like Comment
To view or add a comment, sign in
Data Legion

16 followers
3w
Report this post
When we launched Data Legion, the API was the only interface. A few months later there are four: REST API, Python and Node.js SDKs, an MCP server, and now a CLI. Each layer exists because a different consumer needed a different way in. SDKs for application code. MCP for AI assistants like Claude and ChatGPT. The CLI for AI coding agents and shell scripts. The interesting part: we built the CLI last, after watching how agents actually interact with data tools. Here's why we built each layer and what we learned. https://lnkd.in/gM2KjGzR #B2BData #AIAgents #DeveloperTools #MCPServer #APIDesign

API, SDK, MCP, CLI: Building a Data Product for the Agent Era datalegion.ai
Like Comment
To view or add a comment, sign in
Jonathon F.
3w Edited
Report this post
I've been deep in AI agent development for the past few months, and one thing kept catching my attention: the way agents interact with CLIs versus MCP servers is very different under the hood. Watching Claude Code move through a CLI was almost elegant — fast, lightweight, barely touching the context window. But when we swapped in an MCP server for the same work, context usage jumped noticeably. That gap matters when you're building agents meant to stay efficient across long tasks. Then I noticed something else: Claude doesn't just use CLIs — it actually reads --help text and operates through stdin to figure out what commands to run and what output to expect. It's essentially learning the tool on the fly from the interface itself. That was a lightbulb moment for me. So I thought — what if I built a CLI specifically designed for an agent to use? At Data Legion we have an API for company and person enrichment/search, and I wanted to see if I could just talk to Claude in plain English and have it enrich and search people and companies on my behalf. No custom integrations, no hand-holding. It worked. Claude authenticated to our API, figured out the right calls to make, and came back with exactly the people I was looking for — all from a plain English ask. Building this taught me a lot about what makes a CLI agent-friendly, how Claude reasons through unfamiliar tooling, and where that pattern beats MCP for certain use cases. For more information: https://lnkd.in/grAD4e86

Data Legion

16 followers
3w

When we launched Data Legion, the API was the only interface. A few months later there are four: REST API, Python and Node.js SDKs, an MCP server, and now a CLI. Each layer exists because a different consumer needed a different way in. SDKs for application code. MCP for AI assistants like Claude and ChatGPT. The CLI for AI coding agents and shell scripts. The interesting part: we built the CLI last, after watching how agents actually interact with data tools. Here's why we built each layer and what we learned. https://lnkd.in/gM2KjGzR #B2BData #AIAgents #DeveloperTools #MCPServer #APIDesign

API, SDK, MCP, CLI: Building a Data Product for the Agent Era datalegion.ai

4 Comments
Like Comment
To view or add a comment, sign in

401 followers

200 Posts

View Profile Follow

Marcos Vinicius Thibes Kemer’s Post

More Relevant Posts

Explore content categories