Anton Rodzevich’s Post

I’ve been building a side project: a web-based combat tracker for a custom TTRPG. You can check out the repo here: https://lnkd.in/dZrM-mhe. I ran the full delivery loop, requirements through tests, while tightening agentic pipelines so they could run on trial-tier models and still land close to what I'd get from heavier ones. The bet was that clearer prompts and smaller scopes would do more than burning tokens, and that's where most of the learning actually happened. On the app itself: I drafted and refined requirements and scope in markdown in the repo (requirements-done, backlog notes) so changes could be checked against written intent. I used those pipelines to turn ideas into small, agent-ready stories. For design, Stitch let me iterate on layout and tone early; screens were then built as Flask templates and static assets so they still matched real routes, forms, and Socket.IO events. The stack is Flask + SQLAlchemy + SQLite, with Socket.IO for live updates; I added pytest where it helped, plus browser automation only where it paid off, and a one-command DB init so a fresh clone isn’t blocked on missing tables. The Python backend is mine line by line, with AI used in a teaching / review mode rather than "write the app for me" mode, which for me beat a generic paid course. This isn't evidence that agents replace engineers. It's one more example of using AI as leverage on a loop you still own. If you're trying something similar, the README and branch layout are meant to read without insider context; you're welcome to reuse the Skills in the repo if they help. If you’re using Cursor or similar tools, the practical suggestion is the same: treat AI as leverage on that loop, not as a substitute for thinking. #Python #Flask #Cursor #AgenticAI #OpenSource #TTRPG

To view or add a comment, sign in

More Relevant Posts

Anisha Agarwal
1w
Report this post
My AI agent spent 20 minutes debugging the wrong file. I only know because I built the thing that caught it. A few weeks ago I built agent-replay-debugger - a CLI that turns agent session traces into interactive timelines. v1 was basically a fancy log viewer. It told you what happened (the agent read 40 files) but not why it read 40 files when it only needed 3. So I added --analyze. One flag, and every reasoning block gets classified by an LLM: is the agent planning? investigating? implementing? Or - my personal favorite - is it backtracking because it just realized it's been editing the wrong file for the last 15 minutes? On a real 2-hour session with 600+ events, I got exactly 2 red flags. Those 2 flags were worth more than the other 598 events combined. Total cost of running the analysis: 2 cents. What else is new: The viewer used to show one flat blob per session. Now each user message creates its own span - a 2-hour session becomes 33 clickable nodes in the DAG, each showing how long the agent spent and how many tool calls it made. You can instantly see that "PR 1" took 2 hours and 83 tool calls while "list issues" took 10 seconds and 1 call. Also shipped a pick command because I got tired of copy-pasting UUIDs: ard view $(ard pick chore-champions) Still zero runtime dependencies. Still pure Python stdlib. 188 tests, 100% coverage enforced in CI. The --analyze flag talks to the Anthropic API using urllib - no SDK needed. Live demo (real session, LLM-annotated, all secrets auto-scrubbed): https://lnkd.in/gRFB7uWf Code: https://lnkd.in/gPTUt4ue #buildInPublic #AIAgents #LLM #Python #OpenSource #DevTools #AIEngineering
Like Comment
To view or add a comment, sign in
ADITYA BATHRE
4w Edited
Report this post
I added memory to my chatbot — and what I expected to be a minor upgrade turned into a bit of a "wait, this is actually cool" moment. Version 1 was straightforward: ask a question, get an answer, start fresh next time. Useful, but cold. Like a vending machine that also talks. Version 2 remembers you. Not in some fancy way — it's literally reading and writing to a .txt file. But because LangChain feeds that history back into the model at every turn, the conversation has continuity. You can say "remember what I told you earlier?" and it actually can. Building it made me realize: memory isn't just a feature. It's the thing that makes an AI feel like it's actually *with* you instead of just responding *to* you. The stack is still simple — Python, LangChain, Ollama running LLaMA 3.2 locally. No external APIs, no data leaving my machine. Where I want to take it: — Smarter memory with a vector database — Distinguishing between what to remember long-term vs short-term — A proper UI so it doesn't live only in a terminal It's still early. But it's starting to feel like I'm building something, not just tinkering. Code link is in the comments. 👇 #AI #MachineLearning #Python #LangChain #Chatbot #BuildInPublic #GenAI

8 Comments
Like Comment
To view or add a comment, sign in
Isha Kookanapalli math
1w Edited
Report this post
Everyone’s building AI agents. Almost nobody has a good way to debug why they fail. You run a workflow, get a bad output, tweak a prompt, try again, and hope for the best. That loop gets messy fast. So I built the fix. 🛠️ EvalForge is a lightweight tool for testing and debugging AI agent workflows. Load a benchmark, run a planner → worker → reviewer pipeline, and instantly see which tasks passed, which were flagged, and where things broke. Why I built it: A lot of agent demos look impressive when they work once. What’s much harder ? What's way more useful? is understanding why they fail consistently. With EvalForge, you can: ✅ Upload a benchmark or load sample tasks ✅ Run an eval flow in seconds ✅ Filter passed vs flagged tasks ✅ Inspect stage-by-stage traces ✅ Compare outcomes and spot failure patterns fast Built for the OpenAI X Handshake Codex Creater Challenge, and honestly, this was such a fun reminder that AI can speed up building, but product thinking, iteration, and scoping still matter just as much. Stack: Next.js · TypeScript · FastAPI · Python · Render · Vercel You can try EvalForge here: https://lnkd.in/gcK_8unR And if you check it out, comment what I should improve or change to make it even better for the challenge requirements 👇 #OpenAI #Codex #Handshake #AIShowcase #BuildInPublic #AIAgents #SoftwareEngineering #FullStackDevelopment #Nextjs #FastAPI #StudentBuilder #TechProjects
3 Comments
Like Comment
To view or add a comment, sign in
Aman Ali
1mo
Report this post
We all know git blame — but it was missing one thing: a sense of humor. Introducing gitblame-ai 🔥 It scans your repo, identifies the most "interesting" (read: suspicious) lines of code using custom heuristics, and sends them to Claude AI for a brutally honest critique. 🚢 Features: Roast Mode: Savage senior engineer critiques. Corporate Mode: Passive-aggressive "circling back" on your variables. Pirate Mode: "Arrr, this indentation be shallower than a coral reef!" Smarter Scanning: Automatically respects your .gitignore. Perfect for team standups, Friday afternoon vibes, or just holding your friends accountable in the funniest way possible. 🚀 Try it now: pip install gitblame-ai ⭐ Star it on GitHub: https://lnkd.in/dtww5zyG #Python #OpenSource #AI #Git #ClaudeAI #DeveloperTools #BuildInPublic
Like Comment
To view or add a comment, sign in
SynapseKit AI

13 followers
1mo
Report this post
📣 We ran chunk_size=300 on the same document across three frameworks. SynapseKit: 12 chunks. LangChain: 12 chunks. LlamaIndex: 2 chunks. Same parameter. Same document. Order of magnitude difference in output. Zero error messages. Here's what's happening: LlamaIndex's SentenceSplitter interprets chunk_size as tokens, not characters. chunk_size=300 means 300 tokens — roughly 1,200 characters. On a 1,972-character document that gives you 2 chunks averaging 986 characters each instead of the 12 chunks averaging 163 characters you'd expect. This is documented behavior. It is also the most common source of confusion when engineers copy parameters from a LangChain tutorial into LlamaIndex. Same parameter name. Completely different semantics. Your retrieval quality changes by an order of magnitude and nothing tells you why. The rule: never copy chunk parameters across frameworks without checking the unit. chunk_size=300 means... SynapseKit → 300 characters → 12 chunks LangChain → 300 characters → 12 chunks LlamaIndex → 300 tokens (~1,200 chars) → 2 chunks ⚠ A few other things worth knowing from this benchmark: LangChain ships 8 built-in splitters. LlamaIndex ships 9. SynapseKit ships 2. But two of LlamaIndex's splitters — SentenceWindowNodeParser and HierarchicalNodeParser — have no equivalent in the other frameworks and solve real production problems that the others don't address at all. LangChain's standalone splitter API is the most debuggable. You can inspect chunks before indexing. SynapseKit's chunking is opaque — parameters live on the Retriever and you can't see the split before it's indexed. Chunking is not configuration. It's architecture. The split you choose affects embedding quality, retrieval precision, and whether your LLM gets enough context. The tutorials that sprint past it in two lines are the same tutorials whose RAG demos fall apart on real documents. Full benchmark + reproducible Kaggle notebook → engineersofai.com #Python #AI #LLM #RAG #MLEngineering #OpenSource #AIEngineering #EngineersOfAI #SynapseKit
Like Comment
To view or add a comment, sign in
SAMARTHA H V
3w
Report this post
🚀 Built Something Useful for Every Claude Developer While working with Claude Code, I realized one big gap — there’s no clear visibility into usage, tokens, or costs. So I built a solution 👇 🔗 https://lnkd.in/g7kCBnCn 💡 Claude Usage Dashboard A lightweight, local-first tool to track, analyze, and optimize your Claude usage in real-time. ✨ What it does: • Tracks token usage across sessions • Estimates API costs • Provides a clean dashboard + CLI insights • Detects anomalies & suggests optimizations • Includes a budget guard (yes, it can even stop overspending) ⚡ Best part: No setup headache. No dependencies. Just run it with Python. 🧠 Why I built this: When you're building with LLMs, visibility = control. This tool gives you exactly that. If you're working with Claude or exploring AI tools, this might help you 👇 Would love your feedback, ideas, or contributions 🙌 #AI #LLM #Claude #OpenSource #Developers #Python #BuildInPublic #GitHub

GitHub - DEADSERPENT/claude-usage github.com

2 Comments
Like Comment
To view or add a comment, sign in
Patrick Czubinski
2w
Report this post
I spent one afternoon building an AI agent from scratch. Why? Because I wanted to understand the Agent Client Protocol (ACP). If you haven't looked at it yet, think of it as HTTP, but for AI agents. It allows any agent to talk to any client (IDE, terminal, script) using a universal language. 🛠️ The Project: A SchemaCheck Agent I built an agent that validates data files in real-time. → Mixed types in JSON? Caught. → Inconsistent CSV columns? Caught. → Missing fields or nulls? All caught. The biggest surprise? The AI wasn't the hard part. It was understanding the protocol. The "Lightbulb" Moment I swapped Gemini CLI for the GitHub Copilot CLI as my ACP server. It only took two lines of code to switch the backend. That is the power of a standard. I’ve open-sourced the project on GitHub. Feel free to clone it, poke around, or contribute: https://lnkd.in/gpcmvYqH #AI #AgentClientProtocol #BuildInPublic #Python #GenerativeAI #Coding

GitHub - czubi1928/schema-check-agent github.com
Like Comment
To view or add a comment, sign in
Ishtiaq Ahmad
3w
Report this post
🛠️ Giving LLMs Hands and Feet: Mastering #LangChain #Tools & #Agents An #LLM on its own is a brilliant thinker, but it’s "locked in a room" with no way to touch the real world. It can tell you how to book a flight, but it can't actually book it—unless you give it Tools. In my latest guide, I dive deep into #LangChain #Tools— the bridge between reasoning and action. The "Tooling" Hierarchy: #BuiltInTools: Ready-made connectors for Google Search, Wikipedia, and Python REPL. 🔌 #CustomTools (@tool): Turning any Python function into an LLM-callable action with just one decorator. 🐍 #StructuredTools (Pydantic): Production-grade tools with strict schema validation for complex APIs. ✅ #Toolkits: Grouping related actions (like a "Google Drive Toolkit") for modular agent design. 🧳 The Secret Sauce: Tool Binding & Calling The magic isn't just in the tool itself; it's in the Reasoning Loop. The LLM decides which tool to use and what arguments to send. As developers, we execute that call and feed the result back, creating a loop of autonomous intelligence. Are you building passive chatbots or active agents? Let's discuss the future of AI agency below! 👇 #GenerativeAI #LangChain #GenerativeAIUsingLangChain #AIAgents #Python #LLMOps #SoftwareEngineering #Automation #Pydantic #Innovation #Tools #ToolsCreation #ToolsBinding #ToolsCalling #ToolsExecution
Like Comment
To view or add a comment, sign in
Numan Jafar
2d
Report this post
Excited to share my latest project: a Dynamic Wumpus Logic Agent! 🚀🕵️♂️ I just wrapped up building a fully autonomous, Knowledge-Based web agent that navigates a randomized Wumpus World. But here is the catch: it doesn't use basic pathfinding or standard collision detection. It relies entirely on pure Propositional Logic to mathematically deduce safe paths! Built with a Python (Flask) backend and a reactive JavaScript UI, the agent senses its environment and proves its next move is safe before taking a single step. Building the AI from scratch was an incredible learning experience, especially tackling these two major hurdles: 🧠 1. The Logic Parser & CNF Translation The physical world is messy, but logic requires strict formatting. When the agent feels a 'Breeze', it's intuitive for a human to know a Pit is nearby. But teaching a parser to dynamically read the grid, translate a rule like Breeze <=> (Pit1 OR Pit2) into a clean, machine-readable Conjunctive Normal Form (CNF), and maintain that state in a growing Knowledge Base was a massive architectural puzzle. ⚙️ 2. The Resolution Refutation Engine To prove a cell is safe, the agent uses proof by contradiction. Writing the Resolution loop was tricky—standard theorem provers can easily get trapped in infinite loops or suffer from exponential memory bloat as clauses combine. By implementing a Set of Support strategy, I was able to optimize the algorithm to focus strictly on the target coordinates, allowing the backend to find contradictions and resolve safety checks in milliseconds. Check out the project in action! I would love to hear from other devs—have you ever tried building a custom inference engine or theorem prover? What optimization strategies did you use? Let me know below! 👇 #ArtificialIntelligence #Python #Flask #WebDevelopment #SoftwareEngineering #LogicProgramming #AI #TechProjects 🔗 Check out the Code repository here: https://lnkd.in/dgxRgFRw

2 Comments
Like Comment
To view or add a comment, sign in
Aditya Mansana
1w
Report this post
After implementing it in production, here's what I learned about AI-assisted test generation. I built a workflow using GitHub Copilot that generates Python Selenium test scripts with 95% accuracy. Here's exactly how it works and what I learned. The setup: Test cases exported from our test management tool as structured JSON Each JSON contains: test ID, steps, expected results, and requirements A carefully engineered prompt with framework context injected What the LLM does: Reads the JSON input Searches existing page objects for matching methods Calls existing methods where found Creates skeleton methods where not found Generates complete test scripts ready for review 3 things that made the difference: 1️⃣ Structured input beats free text — JSON gave 95% accuracy. Free text gave 60%. 2️⃣ Context injection is everything — giving the LLM your actual page objects and utilities means it generates integrated code, not generic boilerplate. 3️⃣ Prompt versioning is as important as code versioning — every new GPT model update degraded output until I retuned the prompt. Treat your prompts like code. The result: Test authoring time reduced by 60%. SDETs now review and add locators — the heavy lifting is done. This is not AI replacing testers. This is AI doing the repetitive work so testers focus on what matters — judgment, coverage strategy, and edge cases. AI works best with structure, context, and human oversight. Not a replacement for engineering thinking — a force multiplier for it. What has your experience been with AI-assisted test generation? Drop a comment 👇 #TestAutomation #AITesting #Playwright #Python #SDET #GenAI #QualityEngineering #Automation
7 Comments
Like Comment
To view or add a comment, sign in

261 followers

3 Posts

View Profile Connect

Anton Rodzevich’s Post

More Relevant Posts

Explore related topics

Explore content categories