Distributed Machine Learning at the Edge - DMLE
Data is increasingly produced by decentralized sources, for example, mobile phones, vehicles, IoT devices, and smart sensors. These sources generate large, high-frequency data streams that render centralized data processing in a cloud or computing cluster infeasible. To see that, let's look at autonomous driving: An autonomous vehicle generates around 1GB of data per second [1]. If only 10 Million cars (i.e., the amount of cars Volkswagen sold in 2018) would produce this data, it would require processing 10 peta bytes of data per second - in comparison, all LHC experiments at CERN combined process around 25GB per second [2]. If you want to go to even larger scales, let's look at the industry: Siemens collects around 2 exa bytes of data per day [3] from sensors in machines, in wind parks, and in gas turbines. Moreover, a lot of this data is privacy sensitive. For example, sensor data from machines can reveal corporate secrets, data from mobiles can infringe the user's privacy, so does data from autonomous vehicles.
Thus, centralizing data in a cluster or cloud has three major disadvantages: (i) it does not scale well with the number of data generating devices and it neglects their computing power, (ii) it requires a prohibitive amount of communication, and (iii) it often requires sharing privacy-sensitive data.
In order to overcome these disadvantages, data can be processed at - or close to - the data generating devices. This is often called edge computing or in-situ processing. Such decentralized approaches reduce communication overhead and utilize the processing power of the data generating devices. This not only results in communication-efficient methods but also avoids centralizing privacy-sensitive data, enabling possible novel, large-scale applications.
Recent advances in machine learning at the edge for machine learning [4,5,6], in particular for deep learning [7] - where it was termed Federated Learning [8] - shift the focus from high-performance computation in clusters to decentralized learning. Federated learning is now gaining a lot of interest as an approach to bring machine learning to edge devices [9,10,11,12], in particular mobile phones [13,14,15].
In order to bring together researchers and practitioners we organize the second edition of the workshop on Decentralized Machine Learning at the Edge (DMLE) in conjunction with ECMLPKDD 2019 It aims at providing a platform for the exchange of novel concepts and ideas, as well as to disseminate decentralized learning, parallelization, and federated learning approaches within the machine learning community. The workshop addresses theoretical and empirical aspects of machine learning at the edge, including large-scale machine learning, federated learning, communication-efficiency, theoretical guarantees for distributed learning, in-situ processing, data mining from distributed sources, privacy aspects, resource constraint machine learning for edge devices, and hardware aspects for edge devices.
We happily welcome submissions to the workshop (see the call for papers), and hope to have lively discussions with both researchers and practitioners at the workshop on the 16th of September in Würzburg, Germany.
[1] Shi, Weisong et al. Edge computing: Vision and challenges. Internet of Things Journal, pages 637–646. IEEE, 2016.
[2] CERN: Processing: What to record?, retrieved 27.3.2019.
[3] Rüdiger Köhn. Ringen um die Vorherrschaft über Industrie 4.0. Blick in die Zukunft: Trends und Szenarien für die Welt von morgen. FAZ, 2014.
[4] Kamp, Michael, et al. Communication-efficient distributed online prediction by dynamic model synchronization. In ECMLPKDD. Springer, 2014.
[5] Kamp, Michael, et al. Communication-efficient distributed online learning with kernels. In ECMLPKDD. Springer, 2016.
[6] Kamp, Michael, et al. Effective parallelisation for machine learning. Advances in Neural Information Processing Systems, 2017.
[7] Kamp, Michael, et al. Efficient decentralized deep learning by dynamic model averaging. In ECMLPKDD. Springer, 2018.
[8] McMahan, Brendan, et al. "Communication-Efficient Learning of Deep Networks from Decentralized Data." Artificial Intelligence and Statistics, pages 1273–1282, 2017.
[9] Yang, Q., et al. "Federated machine learning: Concept and applications". ACM Transactions on Intelligent Systems and Technology (TIST) 10(2), 12 (2019).
[10] Abadi, M., et al. "Deep learning with dierential privacy". In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (2016).
[11] Mohri, M., Sivek, G., Suresh, A.T. Agnostic federated learning. CoRR (2019)
[12] Zhao, Y., et al. Federated learning with non-iid data. CoRR (2018)
[13] Hard, A., et al. Federated learning for mobile keyboard prediction. CoRR (2018)
[14] Yang, T., et al. Applied federated learning: Improving google keyboard query suggestions. CoRR (2018)
[15] Bonawitz, K., et al. Towards federated learning at scale: System design. CoRR (2019)