Flair AI/ML Research Project

A European consortium project on which I collaborate. In many AI use cases, the training stage is done on a central server, meaning that data is shared. Hence, we are developing a solution based on Federated Learning (without sharing any kind of data) integrating the VEDLIoT (also an EU-funded project) toolchain into our use case (voice recognition) within a 5G network setup

As the VEDLIoT platform specifies, «The primary goal of the VEDLIoT toolchain is the optimisation of existing Deep Neural Networks towards a specific target hardware using the EmbeDL optimiser technology». Furthermore, we can clearly see that TensorFlow is one of the supported frameworks.

Federated Learning

Federated Leaning (FL) is a distributed and decentralized machine learning implementation where clients contribute to learning a global model without sending any data to a central server for privacy purposes. The main assumption in a FL solution is that the data is not equally distributed, non-Independent and Identically Distributed (Non-IID), needing distributed training strategies.

The steps of a FL solution are:

The server starts from an untrained model.
The model is sent to each selected client.
The clients perform training on their private data.
The server gets an update of all the clients results, aggregates them following a FL strategy obtaining a new global model. The resulting merged model is distributed to all clients.
And the iterative training process continues until convergence or some fatal error.

FL framework options

TensorFlow Federated (TFF)

TensorFlow Federated (TFF) is developed by the Tensorflow community (led by Google) as an open-source framework for machine learning and other computations on decen- tralized data. Its interface is organized in two main layers:

Federated Learning (FL) API: This layer offers a set of high-level interfaces that allow developers to apply the included implementations of federated training and evaluation to their existing TensorFlow models. Applying a high-level interface allows to easily implement a federated learning solution from zero or from an existing TensorFlow model.
FederatedCore (FC) API: FC is the core of the TFF system, and it provides a low-level interface to fully scale our solution and fully customize and define new federated algorithms and federated computations. In fact, the Federated Learning layer is built on top of the Federated Core layer.

Any device able to run a TensorFlow Runtime (TFRT) can execute TFF. In other words, to run TFF in a device, it needs to support a TFRT. So, TFRT is responsible for efficient execution of kernels (low-level device-specific primitives) on targeted hardware. The Figure below depicts an architecture of TFRT inside TensorFlow environment.

Despite that TFF seems to be very flexible, customizable and functional, the current public version does not offer the possibility to integrate TFF into a real project, since it does not support multi-node execution. As of now (August 2022), we can only perform simulations on federated datasets.

Flower

Flower is a novel end-to-end open-source FL framework, supported by a growing community of researchers, engineers, students, professionals, academics, and other enthusiasts. It enables a more seamless transition from experimental simulation to system research on real edge devices, specifically, devices such as Nvidia Jetson are easy to setup. Flower offers a stable, language and ML framework-agnostic implementation of the core components of a FL system, and provides higher-level abstractions to enable researchers to experiment and implement new ideas on top of a reliable stack.

Moreover, Flower allows for rapid transition of existing ML training pipelines into a FL setup to evaluate their convergence properties and training time in a federated setting. Most importantly, Flower provides support for extending FL implementations to mobile and wireless clients, with heterogeneous compute, memory, and network resources.

Software plan

Main plan

We have researched many FL frameworks options, and we have come to the conclusion at this time (August 2022) that the most viable option is to use Flower since:

We can have a single-node simulation.
We can have a multi-node execution.
It has been proven to work on real-world experiments.

In addition, Flower makes it very easy to run Federated Learning workloads on edge devices. As it is shown in the Figure below we can clearly run a FL workload on embedded devices such as NVIDIA Jetson Xavier-NX, RaspberryPi as Flower clients. Flower, offers us demos (e.g. Federated Learning on Embedded Devices with Flower) as well as setting the necessary configuration, etc.

Flower Federation Loop Implementation

Before exposing the experiments, it is necessary to explain briefly how Flower framework architecture works. FL can be described as an interplay between global and local computations. Global computations are executed on the server-side and are responsible for orchestrating the learning procedure over connected clients. Local computations are executed on individual clients and have access to training data. The Flower architecture reflects that perspective.

As in all FL setups, client-side and server-side need to have the same model graph. However, the training stage is performed locally in each client with its local data. The server does not store any type of data and aggregates the model updates from clients.

Global logic for client selection, configuration, parameter update aggregation, and federated or centralized model evaluation is represented through the Strategy abstraction. A Strategy instance implements a single FL algorithm and decides the behaviour of the server in Flower. The framework provides a variety of already implemented Strategies, which rely on popular FL aggregation algorithms, such as Federated Averaging. Every already implemented Strategy contains its own set of predefined configuration options.

Overall architecture

As shown below, we have an overall view of the software structure and its related hardware components. Concretely, we can divide the entire architecture for the FL implementation and integration in three main structure components:

MEC Server: It will have the Flower framework integrated with the VEDLIoT toolchain to obtain the FL server software. The developed software will be pushed through the VEDLIoT t.RECS platform using their accelerators. For the software-defined mobile network we will use the Low Lattency Multi-access Edge Computing platform (LL-MEC software by MOSAIC5G).
5G Base Station & 5G Core: As we have said, the FLAIR’s 5G network implementation will use open-source software for the nodes. Specifically, we will use OpenAirInterface for the 5G Base Sation and the 5G Core Network. Moreover, the 5G Base Station will use an FPGA-based SDR (Software Defined Radio) allowing the use of software for the modulation/demodulation and the processing of radio signals.
5G IoT Device(s): Several IoT devices with the Flower framework, also integrated with the VEDLIoT toolchain to obtain the FL clients software. The developed software will also be pushed through the VEDLIoT u.RECS platform using their accelerators.

Evaluation of Scalable Federated Learning using Virtual Client Engine

Due to confidential reasons, the detailed research paper on our Federated Learning project cannot be published. However, we are excited to share a focused evaluation titled "Evaluation of Scalable Federated Learning Using Virtual Client Engine in Flower".

Download Article