Building a Machine Learning House Price Prediction System with Flask and MySQL

I didn’t just build a Machine Learning project. I built a system that failed at every stage before it finally worked. 🚧 🧠 Project: End-to-End House Price Prediction System At first, I thought: Train a model → deploy it → done. But real-world ML taught me something very different. --- ⚙️ What I built: • Random Forest ML model • Flask web application (API + UI) • MySQL database integration • Full ML pipeline (preprocess → train → deploy) --- 💥 Real challenges I faced: ❌ My model file became 5GB+ → Learned why model optimization matters ❌ Model saving/loading broke (.pkl errors) ❌ Scikit-learn version mismatch ❌ Feature mismatch between training & prediction ❌ Flask errors due to invalid user inputs ❌ MySQL issues: • Access denied • Socket errors • Server not starting • Full reinstall required ❌ Deployment struggle: I tried deploying on AWS / Google Cloud But I didn’t have a credit card → couldn’t proceed So I adapted. --- 🚧 What I did instead: • Shifted deployment approach to a free platform (Render) • Temporarily disabled MySQL integration in deployed version • Kept backend logic ready for database • Focused on learning system design --- 🧠 What I learned: ✔ Bigger model ≠ better model ✔ ML pipelines break easily without consistency ✔ Deployment is harder than training ✔ Real engineering = adapting to constraints --- 🚀 Final outcome: A working ML system that: • Predicts house prices in real time • Runs via Flask web interface • Designed with production thinking --- 📈 Next step: • Full deployment with database • UI/UX improvement • Model optimization --- 💬 Biggest takeaway: “You don’t need perfect resources to build real projects. You just need persistence to keep fixing what breaks.” --- 🔗 Try the live app: 👉 https://lnkd.in/gcFNjfFi 💻 Explore the code: 👉 https://lnkd.in/gbRqxkNW #MachineLearning #DataScience #Python #Flask #MySQL #AIProjects

To view or add a comment, sign in

More Relevant Posts

Divyansh Taksalia
1w
Report this post
Detailed & Professional (Best for showing deep learning) Headline: Leveling Up: Integrating Flask with SQLAlchemy for Dynamic Data Persistence! 🖥️🗄️ A great user interface means nothing if the data doesn’t stick around. Today, I took a massive leap forward in my backend development journey by connecting my Flask applications to a real database using SQLAlchemy. Moving beyond volatile, in-memory data, I mastered the art of making information persistent. Here’s a breakdown of my latest CRUD (Create, Read, Update, Delete) integration milestones: 🏗️ Database Connection & Setup: Successfully configured my Flask app to communicate with a database using SQLAlchemy, mapping Python classes to database tables (the magic of ORM!). 📩 Create: Inserting Data: I bridged the gap between my front-end Flask forms (WTForms) and the backend. User input is now captured and securely stored directly into the database. 📊 Read: Displaying Data: Mastered querying the database to retrieve stored information and dynamically rendering it in my HTML templates. Seeing real-time data flow from the DB to the user interface was highly rewarding! This isn't just about code; it’s the foundation for the student portal I’m building. Now, I can ensure that user registrations and test submissions are saved permanently. Up next: implementing the 'Update' and 'Delete' functionalities to complete the full CRUD cycle! #Python #Flask #SQLAlchemy #Database #WebDevelopment #BackendDeveloper #CRUD #CodingJourney #LearningToCode #DataPersistence #TechCommunity #FullStack #ORM
Like Comment
To view or add a comment, sign in
Emmanuel Ziggah
2w
Report this post
If you're building RAG pipelines or LLM-powered apps, this open-source tool deserves a spot in your stack.🔥 Meet OpenDataLoader PDF — a free, local, Apache 2.0 PDF parser built specifically for AI engineers. Here's why it's a game-changer for your workflow 👇 ➡️ Dead simple to get started Just pip install opendataloader-pdf and convert in 3 lines. That's it. No complex setup, no cloud dependency. 🧠 Built for RAG, not just text extraction For RAG pipelines, you need a parser that preserves document structure, maintains correct reading order, and provides element coordinates for citations. OpenDataLoader is designed specifically for this. It outputs structured JSON with bounding boxes, handles multi-column layouts with XY-Cut++, and runs locally without GPU. 📦 Multiple output formats out of the box It outputs structured Markdown for chunking, JSON with bounding boxes for source citations, and HTML. 🔍 Handles the hard stuff Built-in OCR in 80+ languages in hybrid mode, works with poor-quality scans, and handles complex/borderless tables, LaTeX formulas, and AI-generated picture/chart descriptions. 🔗 Plays nicely with your existing tools LangChain document loader support lets you parse PDFs directly into structured Document objects for RAG pipelines. Python, Node.js, and Java SDKs are also available. 🔒 Privacy-first by design Process your documents with the complete security of local execution. Your data stays on your machine, always, powering private, AI-driven document workflows with peace of mind. 📊 Benchmark-backed performance: Ranked #1 in benchmarks with a 0.90 overall score. 93% table accuracy in benchmarks. 🛡️ AI safety built in Built-in prompt injection filtering catches hidden text, off-page content, and invisible layers. Something most parsers completely ignore. Whether you're building a document Q&A system, automating data ingestion, or powering enterprise search, stop wrestling with unstructured PDFs and let OpenDataLoader do the heavy lifting. 👉 Check out the first comment for the link to the GitHub Repo. Credit: Stanislav Beliaev
2 Comments
Like Comment
To view or add a comment, sign in
GyaanSetu AI (Artificial Intelligence)

880 followers
2w
Report this post
𝗪𝗵𝘆 𝗕𝗎𝗶𝗹𝗱 𝗔 𝗟𝗼𝗰𝗮𝗹 𝗠𝗖𝗣 𝗦𝗲𝗿 v𝗲𝗿 You can use MCP servers to connect AI models to external tools and data sources. There are prebuilt servers that connect to GitHub, Slack, and Google Drive. But these servers solve general problems. Your problems are specific. Maybe you have a local database and you want an AI to query it. Maybe you have a folder full of markdown files and you want an AI to search them. You can build your own MCP server to solve these problems. To get started, you need: - Python installed on your machine - The fastmcp library installed using pip You can build a simple server using the following code: ```python is not allowed, rewriting in plain text: from fastmcp import FastMCP from pathlib import Path mcp = FastMCP\("Local Notes Search"\) NOTES\_DIR = Path.home\(\) / "notes" \@mcp.tool\(\) def search\_notes\(query: str\) -\> str: results = \[\] for f in NOTES\_DIR.glob\("\*.md"\): content = f.read\_text\(\) if query.lower\(\) in content.lower\(\): results.append\(f"\{f.name\}: \{content\[:150\]\}"\) return "\n".join\(results\) if results else "No matches found." You can connect this server to Claude Desktop by adding it to your config file. Restart Claude Desktop and your tool will show up in the tools menu. You can make your server more useful by adding more tools. For example, you can add a tool to list all available notes or read the full content of a specific note. When to use a local server: - For personal productivity - For experimentation When to use a remote server: - When you need to access the server from multiple devices - When you need to share the server with a team - When you need to trigger the server from other services Start with one tool and make it solve one problem you have. You will find the second and third tools quickly after that. Source: https://lnkd.in/gtw5FP2C Optional learning community: https://t.me/GyaanSetuAi
Like Comment
To view or add a comment, sign in
Muhammad Talha M.
1w
Report this post
I built a recommendation engine that had to respond in under 200ms. Here's what I learned about the gap between "it works" and "it works at scale." The first version was straightforward. Python service, takes user behavioral data, scores items, returns a ranked list. In development it worked great. In production with real traffic, it was way too slow. The problem wasn't the algorithm. It was when we were doing the work. We were computing recommendations at request time. Every API call triggered a fresh scoring pass over the dataset. At low traffic, fine. At real traffic, timeouts. The fix was separating the work into two parts: → Precompute: a background pipeline that scored and ranked recommendations ahead of time based on behavioral signals, then wrote the results to Redis → Serve: the API just read from Redis. No computation at request time. Sub-200ms, consistently. But the harder part wasn't the caching. It was knowing which strategy to trust. We had multiple ranking approaches. Instead of picking one based on gut feeling, we ran them side by side and compared on three signals: 1. Engagement: did users actually click/act on what we recommended? 2. Latency: did the serving path stay fast? 3. Coverage: were we recommending the same 20 items to everyone, or actually personalizing? That comparison was more valuable than any single optimization. It turned "we think this ranking is better" into "here's the data, pick the tradeoff you want." The takeaway: personalization is easy to demo and hard to ship. The difference is knowing what to precompute, what to serve live, and having the discipline to measure which approach actually works instead of guessing. #softwareengineering #python #recommendationsystems
Like Comment
To view or add a comment, sign in
Varun Dara
3d Edited
Report this post
𝗣𝗮𝗻𝗱𝗮𝘀 𝘃𝘀 𝗣𝗼𝗹𝗮𝗿𝘀 — Which One Should You Be Using in 2026? If you’ve worked with data pipelines or ML workflows, you’ve probably come across this debate. I’ve been exploring both recently, and here’s a straightforward take: 𝗣𝗮𝗻𝗱𝗮𝘀 — The OG - Massive ecosystem and community support - Simple, readable syntax - Works seamlessly with NumPy, Scikit-learn, Matplotlib - Great for EDA and quick prototyping Limitations: - Slower on large datasets - Higher memory usage - Limited parallelism 𝗣𝗼𝗹𝗮𝗿𝘀 — The New Challenger - Built in Rust, designed for performance - Multi-threaded execution - Lazy evaluation (optimizes queries before running) - More memory efficient (Apache Arrow) Limitations: - Smaller ecosystem (for now) - Slight learning curve for Pandas users - Limited native support in some ML libraries 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗰𝗵𝗲𝗰𝗸: On large datasets (~10M+ rows), Polars can be 5–10x faster than Pandas for operations like groupby, joins, and aggregations — and the gap only increases with scale. 𝗦𝗼 𝘄𝗵𝗶𝗰𝗵 𝗼𝗻𝗲 𝘀𝗵𝗼𝘂𝗹𝗱 𝘆𝗼𝘂 𝘂𝘀𝗲? - Use Pandas for quick analysis, prototyping, and when you rely heavily on the ML ecosystem - Use Polars when working with large datasets or building performance-critical pipelines 𝗠𝘆 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆: Pandas is still very relevant, but Polars is growing fast. Knowing both is a practical advantage. Polars GitHub: https://lnkd.in/gnYVCWGS #Python #DataEngineering #Pandas #Polars #DataScience #MLOps

GitHub - pola-rs/polars: Extremely fast Query Engine for DataFrames, written in Rust github.com
Like Comment
To view or add a comment, sign in
Srimon Danguria
2w Edited
Report this post
One of the biggest challenges in vector search is not retrieval itself. It is the query interface. qql-go was built to solve this particular problem in mind: agents first, humans too. The starting point was QQL (qdrant query language), originally shared by Kameshwara Pavan Kumar Mantha. The original idea, repo, and write-up came from that work. The idea brings the possibility of giving vector retrieval a cleaner interface for repeated use inside agent workflows. That is what led to qql-go: an independent Go port and extension of the idea. Repo: https://lnkd.in/gXjQdjaw The focus was simple: clean CLI, structured output, and a path that works well inside Skills. 👉 Install the Skill, and the agent can do the rest. That makes the whole thing much easier to start with, especially for Qdrant Cloud. Qdrant gives a very good entry point here: 1. free dense vectors (sentence-transformers/all-minilm-l6-v2) inference. 2. free BM25 (qdrant/bm25) inference. 3. free ColBERT multivector model. (answerdotai/answerai-colbert-small-v1). 4. 4 GB always-free cloud tier. So you can start with a real hybrid+reranking retrieval setup without spending money upfront. That is the part that matters. A retrieval interface becomes much more useful when it is: easy for agents to call, easy for humans to inspect, and cheap enough for people to actually adopt. Credit to Kameshwara Pavan Kumar Mantha for putting the original QQL idea out there and giving others something worth building on. 📖 Read the full article from the qql creator : https://lnkd.in/g_nh9T7s Original qql repo:- https://lnkd.in/gwppzjgw #Qdrant #Retrieval #AIEngineering #OpenSource #GoLang #DeveloperTools #Agents #VectorSearch #Skills

GitHub - pavanjava/qql github.com
Like Comment
To view or add a comment, sign in
Quan Dang
1mo
Report this post
From Prompt Hacks to Structured Output — How LLMs Learned to Speak JSON When building software, everything speaks JSON. So when LLMs could only return free-form text, we all hacked around it — injecting prompts like "output in JSON format" with a schema and sample data. Fragile, unreliable, lots of retries. Then the progression started: Function calling (June 2023) — OpenAI let you describe functions with JSON Schema. The model returns structured arguments. Schema-guided but best-effort. JSON mode (Nov 2023) — Guaranteed valid JSON output, but no schema enforcement. You still pray the model follows your structure. Structured Outputs (Aug 2024) — The game changer. Constrained decoding that guarantees the output validates against your exact schema. LLMs became programmable functions. Claude took a different path — forced tool use since April 2024. No constrained decoding, but reliable in practice. On the library side: I started with LangChain for provider abstraction, but hit the lag problem. Working with AWS Bedrock + Claude, it didn't support the latest features. If you stick with one provider, use the native SDK. If you need portability, the abstraction helps. Trade-offs. I switched to Instructor (by Jason Liu). It patches native SDKs, uses Pydantic models to define output schemas, and retries with validation errors fed back to the model. Clean, focused, works across all providers. The ecosystem has converged on Pydantic for schema definition — Field descriptions, Literal types for categorical fields, nested models, validators. Write Python, get structured output. If you're still parsing LLM output with regex, stop. Structured output is mature and widely supported. What tools are you using for structured output? Full post: https://lnkd.in/gPrCjubx
Like Comment
To view or add a comment, sign in
Saqib Bilal
2w Edited
Report this post
How to go from Web Developer to Data Scientist without quitting your job. The exact roadmap I followed 🧵 I did this transition while working full-time at Codelounge. Here's the month by month breakdown: Month 1-2 — Foundation → Python basics (1 hour per day — practice over theory) → Pandas & NumPy with real datasets → Don't touch ML yet — just get comfortable with data Month 3-4 — Core ML → IBM Data Science Certificate (Coursera) → Scikit-learn — classification, regression, clustering → Start a small project immediately — don't wait Month 5-6 — Build Something Real → Pick ONE problem you genuinely care about → Apply everything you've learned on real data → Push to GitHub — make it public from Day 1 Month 7+ — Specialize & Show Up → Pick a niche — NLP, Computer Vision, or MLOps → Post about your progress on LinkedIn weekly → Contribute to open source projects The secret nobody tells you: Your web dev skills make every step faster. → You already know Git, databases, APIs, debugging → You understand production systems → You know how to ship — most data scientists don't My proof: I built DiagnosBot — 82% accuracy, open source, real application. Not just a certificate. Not just a notebook. A product. Save this roadmap 🔖 Share with a developer considering the switch 👇 #CareerTransition #DataScience #WebDevelopment #Python #MachineLearning #Roadmap #AI
Like Comment
To view or add a comment, sign in
Noric Couderc
2w
Report this post
During my PhD, one of the biggest problems I had was "I changed this file, and want to re-run parts of my data analysis, but not the WHOLE thing, just what depends on the file I changed". If you ask a dev, they will answer "easy, write a makefile!", and they're kind of right! One of my bigger regrets is not using a build system to manage my code and experimental data. I started with a "tiny" python script, and couldn't fix things before it was a slow behemoth. Make's atrocious syntax is my excuse. But trust me, twisting a build system to update your data-analysis is a great idea. And turns out, there is a plethora of tooling to do that. Except they're given different names! Old-school devs call them build systems, statisticians call them "pipeline tools", and devs with high salaries call them "workflow management platforms". But at the core, they are the same thing, it's like Excel: Update one bit of data, and all those who depend on it are updated too. But only those! The industry standard is called Apache Airflow, built by Airbnb. Spotify made a smaller one called Luigi. But the one I find most accessible is called Apache Hamilton (https://lnkd.in/dBnXs5Dw). For people using R, there's also the "targets" package, which again, does the same thing. Importantly, Hamilton allows you to see the network of dependencies, and also track the provenance (also called "lineage") of one piece of data. So if you see something wrong with one table, you know how it was calculated!

Hamilton hamilton.apache.org
Like Comment
To view or add a comment, sign in

910 followers

9 Posts

View Profile Connect

Building a Machine Learning House Price Prediction System with Flask and MySQL

More Relevant Posts

Explore related topics

Explore content categories