Everyone's generating code with LLMs. Almost nobody is systematically checking it. I ran into this while translating Python to C++. The translation part was easy. Knowing whether the output was correct? That was the real problem. I haven't written C++ for a while, so I might not even recognize a wrong answer. So instead of hoping the output is correct, I added a verification layer. 3 agents, each with a different job: → Agent 1 (Gemini 2.5 Flash): translates Python to C++ → Agent 2 (GPT-5 Mini): reads the original Python and generates test expectations → Agent 3 (GPT-5 Mini): evaluates the C++ against those expectations and flags issues If the evaluation fails, the issues get fed back to the translator. It retries (up to 3 rounds) until it passes or returns its best effort. One deliberate choice: the translator and the evaluator use different LLMs. If the same model translates and evaluates, it tends to confirm its own mistakes. Gemini translates, GPT evaluates. A genuine second opinion. The whole verification is static analysis. No compiler, no execution. The evaluator reads the C++ and reasons about correctness. For deterministic math code, this works surprisingly well. For anything more complex, the lack of execution is a real gap. And the obvious next step. It started as a course exercise where you translate Python to C++ and manually compile to verify. I wanted to automate the part where you stare at the output and hope. Repo: https://lnkd.in/g3tWFUPZ
Damla Ikbal H.’s Post
More Relevant Posts
-
"If you are an experienced software engineer, you can learn Python in a few hours." Don't believe it! After 10+ years if not 20+ of writing Java, I’ve spent the last year diving deep into Python. Sure, I could write a for loop in an hour, but writing truly idiomatic, type-safe Python? That is a different journey entirely. We are still in a transition phase where we have to review code carefully, especially the vibe code, and the "simple" way isn't always the "right" way. Mastering the nuances of the type system is what separates a script from a production-grade system. Take a look at this evolution of a simple intent label as an example(a real story from the work): The "Just-do-it" approach (Generic): label: str = Field(description="Must be one of: fully_understand, partial_understand, or not_understand") The Problem: The LLM might "hallucinate" and send "mostly_understand" or just "understand". Your code won't catch it until it's too late. The "Pythonic Master" approach (Strict): label: Literal["fully_understand", "partial_understand", "not_understand"] = Field(description="intent understanding label") This uses Constrained Decoding. It doesn’t just "suggest" a value to an LLM; it mathematically restricts the output. It turns a runtime guessing game into a compile-time guarantee. This is one common task while building AI Agent: turn non-deterministic to deterministic. Syntax is easy. Semantics and type-safety are where the real work happens. Never stop learning, respect the complexity of the craft. Aim for the masterpiece! #SoftwareEngineering #Python #Java #VibeCoding #LLMs #TypeSafety #Pythonic #Agent #AIAgent
To view or add a comment, sign in
-
I built a library. ~900 downloads in one month. No marketing. No funding. Just <300 lines of Python. Here's what I learned building 𝗔𝗴𝗲𝗻𝘁𝗞𝘂𝗯𝗲-𝗠𝗶𝗻𝗶: Most people think agent orchestration is magic. It's not. It's a task list that knows which tasks depend on which. That's it. I was tired of reading agent framework docs that hid everything behind abstractions. You use it, it works, but you have no idea why. So I built the smallest possible version that actually ships. 300 lines. Zero dependencies. Open source. It does four things: - Defines agents and their dependencies as a DAG - Runs independent tasks in parallel automatically - Emits events at every step so you can see exactly what's happening - Shares memory so downstream agents use upstream outputs That's the whole engine. No magic. Just graph traversal and a scheduler. The moment it clicked for me was when engineers started using it as a teaching tool not just a production tool. "𝗜 𝗳𝗶𝗻𝗮𝗹𝗹𝘆 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱 𝘄𝗵𝗮𝘁 𝗟𝗮𝗻𝗴𝗚𝗿𝗮𝗽𝗵 𝗶𝘀 𝗱𝗼𝗶𝗻𝗴 𝘂𝗻𝗱𝗲𝗿 𝘁𝗵𝗲 𝗵𝗼𝗼𝗱." That's the comment I keep seeing. AgentKube-Mini is not trying to beat LangGraph. Use LangGraph when you need tool loops, human-in-the-loop, state persistence. It's genuinely better for that. The real unlock? Run your LangGraph sub-agents INSIDE an AgentKube-Mini DAG. Best of both worlds. 900 downloads taught me one thing: engineers are hungry to understand, not just use. Are you building on top of frameworks or do you actually know what's underneath? Drop it below. 👇
To view or add a comment, sign in
-
-
Wrote code in Sublime, just regular old Python autocomplete and sytnax highlighting. Asked claude to challenge ME to write some code (JS world ~10 years, not much python since college). When I didn't know something, I looked at documenation. I'll tell ya what, my solution was 💩 But it worked. I had fun. I felt achievement. I was reminded in Python dict on a list of 2-item lists it pops them out as nice key-value pairs. Re-learned that line.split() works on any amount of whitespace in strings, nothing like that in JS. List comprehension after I submitted my dooky solution. Practicing string manipulation and data structures feels important.... How else do we learn to put better things in and get better things out? Maybe with future models -- it doesn't matter, the AI is just better than you at everything from database design to dev ops. For now, I think there's still a reason to hone your craft, and a reason they have a bunch of PHDs building these models.
To view or add a comment, sign in
-
With low-quality tests, you're paying tokens for fixing them with little to no benefit that tests should provide Let's be real. Most of the Python tests out there are a waste of time. They are there to make the manager happy, to pass the compliance review, or to exercise dominance. Talking about tests that: - break due to unrelated changes, - make you restart the CI/CD pipeline and hope that they pass on the next run, - take forever to run, - pass, but the production is broken. Back in the day, one complained about having to work with such tests. Nowadays, we're paying LLM tokens while Claude Code is fixing them over and over. Pure waste of time and money. In my latest article, I'm describing 7 qualities of highly valuable tests that every developer should know. Qualities of tests that help you ship faster with AI without losing confidence or turning your status page into a traffic light🚦 Don't forget to subscribe to not miss the next tip 🔔
To view or add a comment, sign in
-
the reason i started programming in python was for it's simplicity, but with maturity, it seems python has some big inherent flaws which is going nowhere soon. the biggest: GIL (global interprator lock) - this limits the actual true parllelism unlike Java or Golang or any other compiled languages for that instance. the fact that no matter how many threads you add, due to locking it's only going to increase overhead for cpu-bound tasks rather reducing it is baffling and complete wastage of resources. if someone is interested towards building high performance systems or atleast interested in dwelling with the idea of building one, Python as a language seems to be bottleneck. and i'm a firm believer of whatever being built these days in the name of AI is merely API calls, that can be replicated using rather high performance programming languages, unless and until things are not dependent on open source ecosystem in case you're dealing with core machine learning and deep learning. one can argue that, oh, GIL nowhere is effecting IO bound tasks, and in case we're building using tensorflow, pytorch or cuda the underlying hood is almost always c++ code that's being executed. but i would argue, that still limits our performant systems, and why have something inferior when you can have something superior. challenges with ecosystem is understandable to be honest, not everything is measured in terms of raw speed, rather business impact as well. i so wish the entire thing can be changed. it's too late for now i assume! cpython3.13 implementation has experimental version with no GIL, but best of luck using it in production, only god knows what bugs it comes with.
To view or add a comment, sign in
-
Stop Writing "Mystery Code": The Power of Python Docstrings 🐍 When we build functions, we often focus solely on making the code run. However, well-designed code isn't just about execution; it's about communication. If you've ever returned to a project after a few months only to realize you’ve forgotten how your own functions work, you know the struggle. This is where docstrings become your best friend. A docstring is a specialized string used to describe what a function does, acting as a built-in manual that stays with your code wherever it goes. How to Structure Your Documentation To move beyond basic notes and write professional grade documentation, you should follow a multi-line format. This is particularly useful for complex research or data science functions that handle multiple variables: The One-Line Summary: Always start with a brief, high-level description of the function's purpose immediately following the function definition, wrapped in triple quotation marks """. Defining Arguments (Args): After a blank line, list the function’s arguments. For each one, specify the name, the expected data type in brackets, and a brief description of what it represents. Describing Returns: Finally, include a "Returns" section. This tells the user exactly what the function will output and what data type to expect (e.g., a float, a list, or a dataframe). The Bottom Line Writing documentation might feel like an extra step, but it is the hallmark of a disciplined developer. Whether you are working on academic research or building a commercial app, docstrings ensure your work is scalable, shareable, and most importantly understandable. How do you document your projects? Let’s share best practices in the comments! 👇 #PythonProgramming #CleanCode #DataEngineering #CodingTips #TechCommunity
To view or add a comment, sign in
-
↩️ I didn’t expect Python to send me straight back to 1999. A few weeks ago, I started diving deeper into Python. What I didn’t expect was rediscovering patterns from the Smalltalk systems I worked on in 1999. Yes, Smalltalk. Back then I was knee‑deep in large, object‑heavy systems. I thought I had safely archived those memories in the “nice, but let’s not go back there” drawer. But suddenly, Python’s dynamic nature and object model gave me a familiar déjà‑vu. Did Python just wink at me? 🈹 My Vibe Coding Expedition (a.k.a. Pair Programming with a Hyperactive Alien) Then I ventured into Vibe Coding — sketching code interactively with an LLM. Fun? Absolutely. Predictable? Not quite. 😵💫 Ask for a design idea → get a chaotic buffet of functions you never ordered. 😵💫 As the project grows, the LLM starts to drift — forgetting its own methods like someone who walked into the kitchen and can’t remember why. 😵💫 Keeping the architecture clean requires long, surprisingly philosophical negotiations. 😵💫 And yes, sometimes the LLM becomes… stubborn. It happens. I learned quickly: CONTEXT IS EVERYTHING! So I started asking the LLM to write Markdown summaries of our decisions. Otherwise, every new session felt like onboarding a colleague with complete amnesia. 🈴 And Then Came Skills, RAG, MCP & Tools (a.k.a. “Congratulations, you’re now building agentic software.”) Somewhere along the way, I stumbled into the broader ecosystem: 👽Skills vs. RAG — realizing that sometimes the model doesn’t need more data; it needs a well‑defined capability. 👽MCP (Model Context Protocol) — suddenly I’m designing structured tool interfaces like I’m negotiating API contracts with an alien species. 👽Tools — because of course the LLM requires a toolbox now. Why wouldn’t it? At this point it became clear: I’m not “just coding” anymore. I’m orchestrating a small team of invisible interns with questionable attention spans. 📜 The Re‑Discovery: Spec‑Driven Development (SDD) Eventually I circled back to Spec‑Driven Development (shoutout to the Martin Fowler article by Birgitta Böckeler: "Understanding Spec-Driven-Development: Kiro, spec-kit, and Tessl"). And I had to smile. Because in 1999, this was simply called: 😶 Doing your job properly. 😶 * You write the spec. * You implement the spec. * If you change the code, you change the spec first. 🙎♂️Back then it felt bureaucratic. 🤖 Today - in an AI‑assisted world - it suddenly feels like the future. 🧐The spec becomes the primary artifact. 🧐The code becomes the byproduct. A final realization about working with LLMs 👺 Their answers always make some sense — In Some Sense👺 #Python #AI #VibeCoding #SoftwareArchitecture #Smalltalk #AgenticAI #MCP #RAG #SpecDrivenDevelopment #TechHistory
To view or add a comment, sign in
-
I heard a tip to use Rust instead of Python whenever you are coding with AI due to the speed and more importantly the validation. The code wont compile if there are errors. Unlike writing in the Python where you have to do the validation for AI and go back and forth with prompts to fix it. I'm finding it way faster to generate to code. Even though I dont know Rust that well it will be a great learning experience. Right now I'm using Claude Code but I might switch back to OpenCode's models again to see if that works. https://lnkd.in/gaDkaHXu
To view or add a comment, sign in
-
I switched from n8n to Python + Claude Code mid-project. Best call I made all quarter. Here's the honest comparison. n8n is not the automation tool you think it is. It's perfect for 3-step workflows. It becomes a debugging nightmare past that. I've built workflows in both — here's the honest breakdown. n8n wins when: → The workflow is small (under 5 nodes) → Speed to first result matters more than everything → The person building it isn't a developer But complexity changes the math fast. A 20-node workflow breaks. You open the visual editor to find the problem. Half your afternoon is gone. And the AI token cost while building medium to large flows? Every tweak, every node adjustment burns more than you'd expect. It compounds quietly. That's where OpenClaw(or Claude Code) + Python changes everything. For medium to large workflows: → Debugging is just reading code — no visual maze → Building is faster, less back-and-forth with AI → Token usage drops significantly The visual layer feels like a feature when you start. It becomes friction when the workflow grows. Code doesn't have that problem. My rule now: → Quick, simple automations → n8n → Everything from medium up → Python + Claude Code (And I am NOT a Python Developer! I just can understand the generated code. But that is not the point. I just have to specify what I want and if anything breaks have to say what broke and how it is supposed to be. On the other hand, with n8n debugging is a nightmare! Try it out!!! The tool you prototype with isn't always the one you should scale with. Follow me for more honest takes on AI tooling. What's your experience been? Drop your thoughts below.
To view or add a comment, sign in
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development