SlothDB Embedded SQL Database for Fast Analytics

325 followers

SlothDB is a super fast embedded SQL database. 😁 You point SQL at a file. Parquet, CSV, JSON, Avro, Arrow, SQLite, Excel. No server, no import step, no extension to install before you can read a Parquet file. Same embedded model as SQLite and DuckDB, different defaults. A few things we cared about while building it: It is one binary. Drop slothdb.exe somewhere, run it. It runs in the browser. The WASM build is 1.3 MB and fits Workers' 1 MB script cap in the edge variant. It is fast enough to be worth the swap for analytical work. On a 5-query warm batch over 10M rows, SlothDB finishes in 138 ms. DuckDB 1.1.5 finishes the same batch on the same hardware in 540 ms. It is also early. v0.1.8 shipped today. The Python wheel had a packaging bug last week that I only caught because a stranger filed an issue. So if you hit a rough edge, file one. We read every one. Try it in 10 seconds at https://slothdb.org or pip install slothdb. Our Github repo- https://lnkd.in/gxCSmACA #SQL #DataEngineering #DuckDB #OpenSource #OLAP

5 Comments

Modern Data Stack France 1w

Romain Ferraton

2 Reactions

To view or add a comment, sign in

More Relevant Posts

Anushka Srivastava
3w
Report this post
🚨 SELECT * is silently killing your SQL queries. I've seen a 40ms query turn into a 9-second nightmare - all because of one hidden TEXT column. And the culprit? A lazy SELECT * in production. Here's what most developers don't realize: → Every unused column travels across the network on every query → LOB columns (TEXT, BLOB) silently explode your RAM usage → Databases can't optimize what they don't know you need → Schema changes break your app - often without a single error thrown In a benchmark of 10,000 rows with 22 columns: SELECT * consumed 6× more memory than explicit column lists. The fix is simple. The discipline is the hard part. Name your columns. Every. Single. Time. I put together a 7-slide breakdown covering: ✅ Why SELECT * hurts performance ✅ Real benchmark numbers ✅ The breaking changes it causes ✅ The exact fix with code examples ✅ 3 production SQL rules to live by Swipe through the doc and save it for your next code review. What's the worst SELECT * story from your production database? Drop it in the comments - I'd love to hear it. 👇 #SQL #DataAnalysis #QueryOptimization #Backend #DatabasePerformance #Programming #TechTips
Like Comment
To view or add a comment, sign in
Guilherme Narciso
6d
Report this post
I reduced an endpoint response time by 75% without changing a single line of infrastructure. I just changed how the query reached the database. The scenario: a report with 5 JOINs, SUM and COUNT aggregations, city and date filters, running against 1 million records. Spring Data JPQL with DTO projection: 1,240ms at p95. Starting point. Native SQL with nativeQuery = true: 780ms. 37% faster just by writing raw SQL and removing the ORM translation overhead. Materialized View mapped as a read-only entity: 310ms. The endpoint became a simple SELECT with filters. The database had already done the heavy lifting before the request arrived. The lesson was straightforward: the ORM is not the villain, but it has a cost that shows up when the query gets heavy. Knowing when to move away from JPQL and when to go beyond Native SQL makes a real difference in production. Which of these approaches are you using today for reports with many JOINs? #Java #SpringBoot #Backend #SoftwareEngineering #DatabasePerformance

2 Comments
Like Comment
To view or add a comment, sign in
Tarun Kumar Korimi
1w
Report this post
CTEs have quietly become my favorite SQL feature. Not because they're fancy, but because I can come back to a query three months later and actually understand what past-me was doing. 🧐 Instead of one monstrous nested subquery, you get named blocks that read top to bottom. customers_last_year AS (...), their_orders AS (...), final select. That's it. 👇 Recursive CTEs took me longer to warm up to. I avoided them for months because the syntax looked intimidating. Then I had to flatten an employee-manager hierarchy and spent an afternoon fighting it with self-joins before giving up and trying a recursive CTE. Took about 8 lines. Should have learned it sooner. 🔁🤓 Fair warning: they're not always faster. A temp table or indexed subquery sometimes wins on performance. But for making queries you won't hate opening later, CTEs are the move. 💡 #SQL #CTEs #DataAnalytics #Programming
Like Comment
To view or add a comment, sign in
Mohammad Sakib Howlader
3w
Report this post
Learning SQL separately from Django — and it hits different when you see the raw logic behind the ORM. Today's practice: → INSERT INTO — 9 records into a Books table → SELECT * — full table query → SELECT Price FROM Books — targeted column fetch No framework. No abstraction. Just SQL talking directly to the database. Foundations matter. Building mine. #SQL #MySQL #BackendDevelopment #LearningInPublic
Like Comment
To view or add a comment, sign in
Muhammad Afzaal
1w
Report this post
Exploring MySQL Stored Procedures through a different lens ✍️ I recently created this hand-drawn, architectural-style infographic to break down one of the most powerful features in SQL—Stored Procedures (SP). Instead of just reading documentation, I mapped everything visually: • Syntax & structure using "DELIMITER //" • Parameters: IN, OUT, INOUT • Variable declarations • Control flow (IF-ELSE, CASE, loops) • Error handling with handlers This approach helped me understand not just how stored procedures work, but why they matter—modularity, performance, and cleaner database logic. Sometimes, slowing down and sketching concepts like a developer’s notebook makes complex topics much easier to grasp. If you're learning SQL or backend development, try turning concepts into visual notes—it’s a game changer. #MySQL #SQL #WebDevelopment #BackendDevelopment #Database #Programming #LearningJourney #DeveloperNotes #100DaysOfCode
Like Comment
To view or add a comment, sign in
Mostafa Ahmed
2w
Report this post
I just shipped something I'm really proud of. 🚀 Semicolon — an open-source SQL formatter that turns messy, unreadable queries into clean, structured code in seconds. You don’t need to decode a wall of SQL just to find where the JOIN stops and the WHERE starts. You don’t need to spend minutes formatting it perfectly. SemiColon handles it for you instantly. Just install it, point it at your SQL, and you’re done. → pip install semicolonfmt → semicolon query.sql (format a file) → semicolon . (format everything in a directory) What it does: ✅ Formats messy SQL into clean, consistent, scannable queries ✅ Works on single files or entire directories ✅ CI/CD check mode so unformatted SQL never slips into prod ✅ Pre-commit hook support ✅ Zero config. Just run it. It's open source, it's free, and it's just getting started. ⭐ If you like it, give it a star on GitHub 🔧 Test it, push it to the limits, and open a PR if you spot something off 🔗 https://lnkd.in/dmYG-t4c Clean SQL is not a nice-to-have. It's a craft. Let's treat it like one. 💪 #OpenSource #SQL #PostgreSQL #Python #DevTools #BuildingInPublic #CleanCode

GitHub - mustafaa7med/semicolon: An open-source PostgreSQL SQL formatter that makes your queries clean, readable, and consistent. github.com

17 Comments
Like Comment
To view or add a comment, sign in
Hazem Mohamed
1w
Report this post
🚀 Why Scalar Functions Can Hurt Your SQL Performance! While working on backend systems and writing SQL queries, I discovered a common mistake that can silently slow down your application… 💡 Using Scalar Functions inside queries. At first, they seem very convenient — wrapping logic into a reusable function sounds like a clean solution. But here’s the problem 👇 ❌ Scalar Functions are executed row by row ❌ They prevent SQL Server from optimizing the query properly ❌ They can significantly slow down performance on large datasets 📌 Example: SELECT name, dbo.GetDiscount(price) FROM products; This looks clean… but behind the scenes, the function is executed for EACH row 😬 ✅ Better Approaches: ✔️ Use JOINs or inline logic instead ✔️ Prefer Inline Table-Valued Functions (iTVF) ✔️ Handle logic in the application layer when appropriate 🔥 Key Takeaway: "Clean code is not always fast code — always think about performance!" Have you ever faced performance issues بسبب Scalar Functions؟ 👇 #SQLServer #Backend #Java #SpringBoot #Performance #SoftwareEngineering #Database #CleanCode
Like Comment
To view or add a comment, sign in
Mike Bwalya
4w
Report this post
When querying large datasets from a database, utilizing SQL command statements like LIMIT and OFFSET can significantly enhance performance. By limiting the number of records retrieved, we can improve the speed of applications that rely on database interactions. For instance, if we have thousands of records in a database, we can design an algorithm to manage data retrieval effectively. By specifying a limit of 100 records per query, we can streamline the process. Here's how it works: - For a table with 1000 records, we can retrieve 100 records at a time. - The OFFSET allows us to specify where to start the next set of records. - After the first query returns records 1 to 100, the next query can start at record 101, then 201, and so on. - This approach can continue until we have retrieved all records, performing at least 10 queries to access all 1000 records. Implementing LIMIT and OFFSET is a practical strategy for managing large datasets efficiently. #computerprogramming #python #mysql #sql #limit #offset
Like Comment
To view or add a comment, sign in
Mike Bwalya
4w
Report this post
When querying large datasets from a database, utilizing SQL command statements like LIMIT and OFFSET can significantly enhance performance. By limiting the number of records retrieved, we can improve the speed of applications that rely on database interactions. For instance, if we have thousands of records in a database, we can design an algorithm to manage data retrieval effectively. By specifying a limit of 100 records per query, we can streamline the process. Here's how it works: - For a table with 1000 records, we can retrieve 100 records at a time. - The OFFSET allows us to specify where to start the next set of records. - After the first query returns records 1 to 100, the next query can start at record 101, then 201, and so on. - This approach can continue until we have retrieved all records, performing at least 10 queries to access all 1000 records. Implementing LIMIT and OFFSET is a practical strategy for managing large datasets efficiently. #computerprogramming #python #mysql #sql #limit #offset

1 Comment
Like Comment
To view or add a comment, sign in
Khaled Elawady
1w
Report this post
SQL Progress: Logic & CASE Statements! Today I solved another Medium challenge on LeetCode. This problem was a great lesson in how to calculate percentages and rates directly in SQL. What I learned today: 1. AVG with CASE WHEN: I learned that I can use AVG(CASE WHEN condition THEN 1.0 ELSE 0.0 END) to calculate a rate. It’s a very clear way. 2. Handling NULLs in Rates: By using a LEFT JOIN between the Signups and Confirmations tables, I ensured that users with no actions are still included, and the AVG function automatically treats them as 0 if they don't meet the "confirmed" criteria. 3. Precision with ROUND: Used ROUND(..., 2) to make sure the final confirmation rate is clean and meets the required format(0.00). I would love to learn from your experience: is ther another methods cleaner? قليل مستمر خير من كثير منقطع #SQL #DataEngineering #PostgreSQL #LeetCode #100DaysOfCode #DataAnalytics #ProblemSolving
Like Comment
To view or add a comment, sign in

325 followers

View Profile Connect

SlothDB Embedded SQL Database for Fast Analytics

More Relevant Posts

Explore content categories