Open Source projects to watch for in 2026 (Data Infra space) Each and every year I hand-pick some of the Open Source projects that I personally think move the data infrastructure ecosystem forward. These are projects that shows real traction, solves genuine problems & influences how modern data platforms are being built. The important thing to note is that - Open source continues to be the backbone of today's data infrastructure! I consider myself very lucky to have had the opportunity to work with some of these projects in the last couple years. Here is my list (for 2026): ✅ Apache Iceberg: Iceberg continues to be THE open table format of choice for building lakehouse architectures. 2025 has been a huge year for Iceberg with uncountable organic adoptions and contributions from the community. Iceberg's simplicity & community continues to be its success factor. ✅ Apache Arrow: Arrow is everywhere. What started as an in-memory columnar format is now a multi-language foundation for high-performance data processing and data transport across systems. ✅ Apache DataFusion: DataFusion is an extensible query engine that uses Arrow as its in-memory format. 2025 has seen so many adoption of DataFusion - Arroyo, Comet, DBT-fusion, ParadeDB to name a few. ✅ Apache Hudi: Hudi isn't just a table format - it is an end-to-end open lakehouse platform with its robust storage engine. I cannot stress about the kind of innovation Hudi has brought into the lakehouse space - advanced indexes, non-blocking concurrency control methods, clustering/compaction as service. ✅ Apache Ozone: Ozone is a scalable, distributed object store designed for lakehouse workloads, AI/ML, and cloud-native applications. With S3-compatible APIs and a Hadoop-compatible filesystem, it’s a compelling open-source alternative for object storage. ✅ Lance by LanceDB: Lance is a lakehouse format for multimodal AI. It contains both a file format & a table format. I have seen some really good applications of Lance in the past year for vector search, full-text search & random access. ✅ Velox: Velox is a high-performance, open-source C++ execution engine built for reuse across batch, interactive, streaming & AI workloads. Velox has seen some really good adoptions like - Presto C++, Gluten, NVIDIA CuDF & I really like the community that is organically growing. ✅ Vortex: Vortex introduces a fresh take on columnar formats, focusing on fast random access to compressed data and zero-copy interoperability with Arrow. Its extensible design supports both general analytics and specialized embedded use cases. ✅ Apache Fluss (Incubating): Fluss is a streaming storage engine built for real-time analytics. It's built around the idea of "streams" as continuously updating tables and I think there’s a huge scope for low latency updates, real-time ingestion, etc. What other OSS projects should I keep an eye out for? #dataengineering #softwareengineering
Open Source Innovation Platforms
Explore top LinkedIn content from expert professionals.
Summary
Open source innovation platforms are collaborative environments where anyone can access, use, improve, or build upon freely shared software and tools—helping businesses, developers, and communities accelerate new ideas and solutions. By enabling open contribution and integration, these platforms are transforming how organizations address technical challenges and create value in data, AI, and digital products.
- Explore community options: Evaluate open source tools and platforms that fit your needs, allowing flexibility without being locked into a single vendor's ecosystem.
- Encourage developer contributions: Invite external developers to build on your platform, which increases innovation and helps your product evolve with user needs.
- Build sustainable growth: Align your platform’s goals with its community to create a cycle where more contributors drive greater value and long-term success for everyone involved.
-
-
Open-source AI isn’t just one category. It’s an ecosystem. Most people miss this. That’s why they struggle to pick the right tools. Here’s the clarity you actually need ⤵️ ✦ Open-Source Base Models Where everything begins. They give you text, image, speech, and multimodal power, with the freedom to fine-tune for your domain. Best for teams that want flexibility instead of lock-in. ✦ Model Deployment Tools This is the bridge from “cool demo” to “real product.” Scalable infra, production-ready APIs, private inference, everything enterprises need to operationalize AI. ✦ Specialized Libraries Your precision tools. Image cleanup, audio enhancement, extraction, conversions, targeted solutions that save you from building basics from scratch. ✦ RAG Engines Where models become knowledge-aware. They fetch, reason, and respond using your data, without retraining. Perfect for intelligence, documentation, and enterprise knowledge systems. ✦ LLM Frameworks The orchestration layer. Memory, tools, agents, routing, the components that help you build full AI workflows instead of one-off prompts. ✦ Agentic Frameworks The advanced layer. Systems that can plan, decide, execute, and complete multi-step tasks. This is where AI stops being a tool… and starts becoming a teammate. --- When you understand these layers, you stop “trying tools.” You start architecting solutions. And that’s the difference between experiments and real AI products. ____________________________ 📌 If you want a high-res PDF: 1. Follow Sufyan Maan, M.Eng. 2. Like the post. 3. Repost to your network. 4. Subscribe to: sufyanmaan.substack.com
-
Open Source is Eating the Data Stack. What's Replacing Microsoft & Informatica Tools? I've been reading a great discussion about replacing traditional proprietary data tools with open-source alternatives. Companies are increasingly worried about vendor lock-in, rising costs, and scalability limitations with tools like SQL Server, SSIS, and Power BI. The consensus is clear: open source is winning in modern data engineering. 💡 What's particularly interesting is the emerging standard stack that data teams are gravitating toward: • PostgreSQL or DuckDB for warehousing • dbt or SQLMesh for transformations • Dagster or Airflow for orchestration • Superset, Metabase, or Lightdash for visualization • Airbyte or dlt for ingestion As one data engineer noted, "Your best hedge against vendor lock-in is having a warehouse and a business-facing data model worked out. It's hard work but keeping that layer allows you to change tools, mix tools, lower maintenance by implementing business logic in a sharable way." I see this shift every day. Teams want the flexibility to choose best-of-breed tools while maintaining unified control and visibility across their entire data platform. That's exactly why you should be building your data platform on top of tooling that integrates with your favorite tools rather than trying to replace them. Vertical integration sounds great, if you enjoy vendor lock-in, slow velocity, and rising costs. Python-based, code-first approaches are replacing visual drag-and-drop ETL tools. We all know SSIS is horrible to debug, slow and outdated. The modern data engineer wants software engineering practices like version control, testing, and modularity. The real value isn't just cost savings - it's improved developer experience, better reliability, and the freedom to adapt as technology evolves. For those considering this transition, start small. Replace one component at a time and build your skills. Remember that open source requires investment in engineering capabilities - but that investment pays dividends in flexibility and innovation. Where do you stand on the proprietary vs. open source debate? And if you've made the switch, what benefits have you seen? #DataEngineering #OpenSource #ModernDataStack #Dagster #dbt #DataOrchestration #DataMesh
-
💡 Meta’s OSPO: Driving Innovation and Community Engagement Meta has built one of the most influential OSPOs in the world through code and by shaping the cultural & technical foundations of modern development. Projects like PyTorch, React, and GraphQL have redefined how we build, scale, and collaborate. In terms of OSPO headcount, most likely the smaller globally, with just a few individuals, as open source is embedded in the fabric of the company, and this is how Meta rolls right now. Some standout elements of Meta’s open source strategy: 🧠 AI at Scale with PyTorch: PyTorch, born at Meta, hosted in The Linux Foundation under its own PyTorch Foundation, is a cornerstone of the global AI/ML community. It is THE framework of choice for research & production from startups to hyperscalers. 🖼 Frontend Revolution with React: React changed the game in web development, introducing a declarative and component-based UI model. 🔎 Smarter APIs with GraphQL: Meta launched GraphQL, now under the GraphQL Foundation, providing a more efficient, flexible alternative to REST. It’s now an industry standard. 🌐 Infrastructure-Grade Open Source: From Presto, now in the Presto Foundation, to HHVM, Meta’s open source stack is built for the scale of its internal infrastructure and shared with the world. These tools power massive data workloads and inspire contributions across enterprise and research sectors. 🌱 Community Building and Developer Enablement Meta backs its code with community: ☑️ Developer conferences ☑️ Grants & funding for ecosystem projects ☑️ Educational content, docs, and tutorials to onboard new contributors 🛡 Governance and OSPO Culture: Meta’s OSPO leads with strategic alignment and sustainability: ☑️ Ensures license compliance & risk mitigation at scale ☑️ Empowers internal devs with contribution pathways and tooling ☑️ Fosters long-term project health 🔭 Looking Ahead: Expect Meta’s OSPO to go deeper on: ☑️ Open AI research tools & public datasets ☑️ Sustainable computing and energy-efficient frameworks ☑️ Web3 and decentralized infra ☑️ IoT-ready open source integration ☑️ Open cloud and edge-native platforms Meta’s OSPO is an innovation engine and community amplifier. Few companies have shaped the open source landscape in recent years as fundamentally as Meta. To view the scope of OSS efforts at Meta, check out: "Meta Open Source: 2024 by the numbers": https://lnkd.in/dK45KsrJ 🧭 This post is part of an ongoing series spotlighting the role of OSPOs in driving strategic value through open source. Tomorrow, I am covering Google. Tune in! 🔁 If you find this post valuable, please share it with your network. #OpenSource #OSPO #OpenStandards #DevRel #OpenSourceStrategy The Linux Foundation Linux Foundation Europe Linux Foundation Japan TODO (OSPO) Group OpenChain Project ⚠️ This post represents my views and does not reflect those of my current or past employers. ⚠️
-
Opening Your Product Can Fuel Explosive Growth Not every product can — or should — become a platform. But if you’re building something that can support open APIs and foster an ecosystem, leaving space for others to build on top of your product can drive explosive growth. Here’s the key: It’s not just about user innovation. It’s about aligning incentives to create a scalable, self-sustaining ecosystem. That’s what we did at Magento, and what Figma (link in comments) got right. ➡️ If You Can Be a Platform, Open It Up Figma’s plugin system didn’t just enhance their product; it created a flywheel of innovation. When we built Magento, we left intentional gaps for developers to fill with their own solutions. If your product can support it, opening it up is a powerful strategy for scaling. ➡️ Align Incentives with Your Users By creating opportunities for developers to build on your platform, you establish a mutually beneficial relationship. At Magento, developers and partners built profitable businesses, and that fueled our growth. Figma’s plugin approach did the same. ➡️ Create a Flywheel of Growth The more developers and creators contribute, the more valuable the platform becomes. This network effect — as demonstrated by Magento’s marketplace and many other successful platforms — drives long-term scalability and deepens community engagement. Leave room for others to innovate and build on your product, and you’ll create not just growth — but a thriving, resilient ecosystem that scales itself. (in the photos, a community contributor barcamp event at Magento's annual conference, Imagine. 2016)
-
Hugging Face: The Open-Source Powerhouse Driving the Future of #AI Hugging Face has rapidly become the home of open innovation in AI. It combines 100+ large language models (#LLMs) and tools from industry leaders like OpenAI, Google, NVIDIA, and DeepSeek AI. It’s the platform where AI research meets real-world application. What makes it so powerful? ✅ Unified Access - Write, code, generate images, or run web searches seamlessly. Everything happens in one place. ✅ Smarter Model Routing with Omni - Hugging Face’s new Omni agent automatically picks the best model for your task. You can also manually explore 100+ open models right inside their interface. ✅ Fully Open Source - Developers can dive deep into the architecture. Its Arch-Router-1.5B system powers intelligent model routing efficiently, and it’s open source for everyone to explore. Real-World Example from Our Work: We recently used Hugging Face’s Transformers library to fine-tune a DistilBERT model for one of our fintech clients. 📈 Result: Improved classification accuracy by 20% Reduced inference costs by 25% Enabled real-time sentiment and entity recognition in their analytics dashboard This project showcased how open-source innovation can deliver enterprise-grade performance at a reasonable cost. Hugging Face isn’t just a collection of models, it’s a thriving ecosystem empowering developers, businesses, and creators to build transparent, scalable, and intelligent AI systems. 👉 Explore it yourself and see how open source AI is redefining the future. 🔗 Link in comments. #OpenSource #ArtificialIntelligence #Innovation #REAdvisory
-
How Do Open-Source Projects Transform Embedded Systems Development? In the intricate world of embedded systems, open-source projects are not just tools; they're the foundation for tomorrow's innovations. These projects empower developers, fuel technological advancements, and democratize access to technology. 🔹 Operating Systems & Frameworks - Linux Kernel is the stable and versatile core for a wide range of devices, from desktops to embedded systems. - FreeRTOS is a real-time operating system for microcontrollers, providing a lightweight, efficient kernel with a focus on simplicity and ease of use. - Zephyr Project offers a scalable real-time operating system for connected, resource-constrained devices, emphasizing security and modularity. - Yocto Project enables the creation of custom Linux distributions for embedded devices, optimizing customization. - Raspberry Pi OS supports educational and hobbyist projects with a vast community and comprehensive resources. 🔹 Development Platforms & Tools - Arduino simplifies prototyping and learning with its user-friendly platform, welcoming hobbyists and experts. - Espressif IDF (IoT Development Framework) enables advanced applications on ESP32 and ESP8266 SoCs, offering a comprehensive set of tools and libraries for high-performance wireless solutions, pushing the boundaries of IoT development. - OpenWrt revolutionizes networking projects with its open-source router firmware, offering unparalleled flexibility and control over network devices. - Eclipse offers a versatile IDE for diverse programming needs, preferred by developers worldwide. - PlatformIO is a cross-platform IoT development ecosystem that simplifies the process of programming and integrating microcontroller boards, supporting numerous platforms and frameworks. 🔹 Machine Learning & IoT - TensorFlow Lite and EdgeX Foundry enhance devices with smart decision-making and seamless IoT integration, driving forward the edge computing revolution. - OpenCV: Essential for computer vision projects, enabling powerful image processing and analysis capabilities on embedded devices. 🔹 Programming Languages & Scripting - Rust offers memory safety without a garbage collector, revolutionizing system-level programming with its focus on safety and performance. - MicroPython brings the simplicity and power of Python to microcontrollers, making programming accessible and fun for developers of all skill levels. 🔹 Hardware Platforms BeagleBoard, Adafruit Feather, and Pine64 champion open-source hardware, inspiring creators to explore and innovate Are you familiar with all of these open-source projects? Which ones have you used in your own embedded systems projects, and how have they influenced your work? 📢 Exciting Announcement! I've created a Telegram channel to share more interesting insights with students and professionals looking to scale their skills. Join us at https://lnkd.in/e5tSAtAN #embeddedSystems #OpenSource #IOT
-
Open-source innovation is powering the next generation of LLM applications - and 2025 is all about building faster, smarter, and more modular. This visual neatly categorizes the best open-source tools every LLM developer should know, across four key pillars: 1. Frameworks – From LangChain and LlamaIndex to AutoGen and Semantic Kernel, these tools help you orchestrate agents, manage memory, and design robust LLM workflows. 2. Libraries – Tools like Transformers (Hugging Face), Promptify, and FastEmbed simplify access to models, embedding generation, and prompt optimization. 3. Vector Databases – Whether it’s ChromaDB for open-source simplicity, Weaviate for semantic search, or Pinecone for managed RAG, this stack lets you choose the right fit for scalable similarity search. 4. Dev Tools / Utilities – From no-code builders like Flowise AI, to developer agents like OpenDevin and Continue.dev, these utilities supercharge productivity and bring copilots into every dev workflow. Whether you're building your first RAG pipeline or scaling multi-agent systems - this toolkit is your starting point for LLM development done right. 📌 Save this list and bookmark your next experiment. 👉 Follow Vaibhav Aggarwal for more deep dives on GenAI architecture, tools, and open-source ecosystems.
-
A Useful Tool to Track the AI Open-Source Ecosystem If you work in AI or build AI products, keeping up with new repositories and projects can feel overwhelming. I recently discovered a platform that helps track the rapidly growing AI open-source ecosystem: https://goodailist.com Here’s what makes it interesting: • Tracks 14,000+ AI open-source repositories • Includes contributions from 145,000+ developers worldwide • Uses 123 AI-related keywords and topics to discover new repositories daily • Automatically categorizes projects and surfaces trending repositories • Shows where contributors are located globally, making it easier to connect with AI builders in different cities or countries The annotations are generated using AI, so they may not always be perfect, but it’s still a great way to discover emerging tools, frameworks, and research projects early. For anyone working in AI Engineering, Machine Learning, or AI Applications, this is a useful resource to stay updated with the latest innovations. 💬 Curious to know — How do you usually discover new AI repositories or open-source projects? #AIEngineering #ArtificialIntelligence #MachineLearning #OpenSource #AIApplications #AIInnovation
-
The shift towards LLM powered products can be viewed as just three basic layers. 1. Frontier Models - These are expensive to train and only very few people in the world know how to train them to be useful. Developed by leading AI research labs and companies like OpenAI and Anthropic, they form the core AI capabilities. 2. Inference layer - Enable the deployment and usage of these advanced frontier models. Together AI and Amazon Web Services (AWS) and examples of these inference platforms. 3. Closed source Agentic Plaforms - These platforms make it simple for you to build agents and connect them together to solve any business need. Reducing complexity with an intuitive design is critical in this layer. Some examples are Ema Unlimited Taskade Relevance AI. 3. Open-Source Agentic Frameworks: Projects like CrewAI, AutogenAI, and OpenAI's Swarm AI offer open-source tools for building and integrating AI agents. These are an alternative to the Closed source models listed above. As this landscape matures, we will see existing software products be replaced by "AI-native" solutions built on top of these agentic platforms. So, in the near future instead of Salesforce Admin's we will have a new category of jobs that will be created called 'Agentic Admin'. This role will need people to specialize in selecting, configuring and optimizing the platforms above to your business use case. Sharing a growing list of these agentic platforms here: https://lnkd.in/gjmfR3kT #GenAI #AI #Agentic #FrontierModels #
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Event Planning
- Training & Development