Federated Learning - Build Models without Sharing Data

Krishna K

Published Jun 21, 2025

Some of the popular high-quality AI applications that we use today, such as Google’s Gemini or PaLM, or OpenAI’s ChatGPT, are proprietary AI models. We do not have access to the source code; these companies maintain ownership and control over access. If you want to use some of their advanced features, these are locked behind a paywall. Closed-source code, restrictions on use, and paywalls are not inherently evil or immoral. Companies need to make money, and one way they do this is by protecting their IP and charging for their services. The algorithm behind the AI model is but one part of the puzzle. Unlike other procedural or logic-based code, AI models rely heavily on the data they are trained on, and the energy and infrastructure expenditure to maintain these AI models are prohibitive for these companies to offer all their services for free, even if they wanted to.

Ever since I entered the world of Ubuntu Linux, I have been a fan of open-source software. In the past, I had to pay high prices or was locked out of software like Microsoft Office, Adobe Photoshop, Adobe Illustrator, Mathwork’s MatLab, and more, until I learned that Linux had open-source equivalents such as Libreoffice, Krita, Gimp, Inkscape, and Octave that satisfied my needs. This software was free. Free in price and in what we are allowed to do with it. Users like me could use this software without worrying about company usage limitations. We could write code to extend and expand the software. We could use add-ons from other users unrelated to the original software creators. And often, this software was available on multiple operating systems. The beauty of open-source software is that the software they develop is usually patent-free and available to all. It is software democratized.

But like in many democracies, people always want benefits, but rarely want to pay for them. Open-source software is often built and maintained by enthusiasts and receives little to no funding support. This makes long-term development, maintenance, and updates of open-source software difficult. The challenge is even more stark when the software developed by the open-source community requires running and maintaining resources, such as in the cloud.

AI models are not simply software algorithms. They require data to be collected, stored, analyzed, trained, fine-tuned, and deployed regularly. High-tech companies spend a lot of money on servers, energy, and engineers to review and fine-tune their models, which is difficult for a standard open-source project to sustain.

Enter Federated Learning

One way for open-source communities to build AI models is federated learning. Federated learning, or collaborative learning, is a technique in which multiple clients collaborate to train a model while keeping their data decentralized. This methodology was primarily designed to minimize data sharing and enhance data privacy. Unlike a central AI, where all data for training is fed, you have several localized models in different machines that train based on data available only on that local machine. Once the model is trained based on the data available to that local machine, only the model parameters, such as the weights and biases, are shared. The original data remains on the local machine, minimizing data sharing and thus data breaches over networks.

Article content — Figure 1: Federated learning using a centralized, orchestrated setup.

There are several ways to design a system to accomplish federated learning, and I will discuss two here.

Centralized Federated Learning

In centralized federated learning, there is a central machine and several local machines, which can be spread out across the web. The AI models are initially the same in both the global (called the centralized server) and local machines (called clients), but as each machine trains the model on the data in its local databases, the models diverge. The local machine only sends the updated model parameters to the global machine, which then updates its model. It doesn’t send the data itself.

Decentralized Federated Learning

In decentralized federated learning, there is no global machine or centralized server. Instead, only client machines communicate with each other peer-to-peer. Client machines can join or leave the network anytime, and there is no single point of failure. This makes the decentralized system more flexible and resilient. The local machines (clients) train their local AI models based on data in their local database and only share the model parameters with other client machines.

A Real World Example

Researchers from Arizona State University developed a machine learning model called Ark+. This model uses federated learning to evaluate images of chest radiography and diagnose diseases.

Chest radiography is used to diagnose lung diseases. If given chest radiographic data, machine learning models can be trained to detect and diagnose various lung diseases. However, a challenge researchers face is the lack of data to train these models. Healthcare data is protected in the United States, and sharing images of patient radiography carries risks. If the data is leaked, it can expose hospitals and medical institutions to legal risks and can impact the patients themselves. Therefore, institutions generally guard this data and are reticent to share medical or personally identifiable information.

For researchers, this reluctance or hesitancy to share medical data with machine learning researchers poses a problem, as it is hard to train a machine learning model without sufficient data. Machine learning models before Ark+ faced issues of generalizability, adaptability, robustness, and extensibility.

Ark+’s federated learning methodology solves the problem of sharing sensitive healthcare data. The initial Ark+ model is shared with other institutions, who can use the model to train on their local data. Once the model is trained on the local data, only the model parameters are shared with the central research hub. This way, the sensitive data is never shared over networks and remains in the local/client machines. Over time, the central model updates and improves based on the model parameters received from remote clients.

References

Ma, D., Pang, J., Gotway, M. B., & Liang, J. (2025). A fully open AI foundation model applied to chest radiography. Nature, 1–11. https://doi.org/10.1038/s41586-025-09079-8

Kim, N. (2025). An open AI model could help medical experts to interpret chest X-rays. Nature. https://doi.org/10.1038/d41586-025-01525-x

Federated Learning - Build Models without Sharing Data

Krishna K

Enter Federated Learning

Centralized Federated Learning

Decentralized Federated Learning

Recommended by LinkedIn

How is Federated Learning helpful to the Open-source community?

A Real World Example

References

Science the Hell Out of This!

41 followers

More articles by Krishna K

Others also viewed

My First Steps into Federated Learning

Apple Launches TuriCreate - Open Source Machine Learning Framework

Federated Learning Makes Billion-Parameter LLMs Trainable Without Sharing Data

MCP in AI and Interworking with APIs

Transfer Learning - Makes the Machine Learning Models Works Even with Insufficient* Labelled Data

Machine Learning

Maximizing Efficiency with OpenAI's Batch API: A Game Changer for Asynchronous Task Processing

When Is a Semi-Supervised Machine Learning Useful?

10 Things Everyone Should Know About Machine Learning

Machine Learning: An Attempt to introduce with ML.NET

Explore content categories

Enter Federated Learning

Centralized Federated Learning

Decentralized Federated Learning

Recommended by LinkedIn

How is Federated Learning helpful to the Open-source community?

A Real World Example

References

Science the Hell Out of This!

41 followers

More articles by Krishna K

Galaxy Morphology Classification using Vision Transformer (ViT)

Energy Consumption Analysis

AI's Privacy Problem: Why We Can't Just Trust the Tech Giants

Dot Code and Hidden AI Messages

Solve a Crime with Autoencoders

An Intuition for Convolutional Neural Networks

LLM Anxiety and Mindfulness exercises

Understand BIGBench - Evaluating Multi-dimensional Social Biases in Text-to-Image Models

Perlbrew - The admin-free perl installation management tool

Computer Vision - Face Detection

Others also viewed

My First Steps into Federated Learning

Apple Launches TuriCreate - Open Source Machine Learning Framework

Federated Learning Makes Billion-Parameter LLMs Trainable Without Sharing Data

MCP in AI and Interworking with APIs

Transfer Learning - Makes the Machine Learning Models Works Even with Insufficient* Labelled Data

Machine Learning

Maximizing Efficiency with OpenAI's Batch API: A Game Changer for Asynchronous Task Processing

When Is a Semi-Supervised Machine Learning Useful?

10 Things Everyone Should Know About Machine Learning

Machine Learning: An Attempt to introduce with ML.NET

Similar topics

Open Source Tools for Autonomous AI Software Engineering

How Open-Source Models can Challenge AI Giants

How Open Source Influences AI Development

Explore content categories