This is my Final Year Project as an undergraduate Computer Science student in UCC. It is a study based around federated machine learning, with Google's approach to federated learning implemented in TensorFlow Federated and from scratch. Several extensions to federated learning were also implemented which included the idea of weighted averaging and selective inclusion (based on being at most one standard deviation worse than the average of evaluations). The extensions were also implemented in a peer-to-peer context where every user would take the decision independantly instead of a central agent performing the averaging for them.
Final grade achieved: 255/300 (85%)
Federated machine learning is the idea (from Google) of anonymised machine learning (or rather deep learning). It is a way to get a Neural Network trained on everyones data, but without having direct access to everyones data.
Traditionally, a Neural Network would require a lot of data from users to train a model that is fairly accurate. But with the federated approach, the users dont have to share their data with anyone else to obtain a better overall model. Instead, they train a model locally no their own data, and then send the weights and biases of the model (the original user data cannot be recreated with these weights and biases) to a server which then averages them and sends them to you all the users. Because the weight and biases are being sent, instead of the users actual data (like their images), privacy is maintained and essentially a model is trained using anonymised data from several users.
My project is based on implementing the way in which Google does this, and then implementing several more strategies proposed by Derek and comparing their outcome. At a high level, these strategies include discarding the weights of users in certain conditions or using a weighted average of their weights and biases.
This dataset was found on the UCI Machine Learning Repository and can be found here. The downloaded csv file needs to placed in the /datasets
directory for the gestures.ipynb
notebook to work.
This dataset was found on Kaggle and can be accessed here. The natural_images
folder needs to be placed in the /datasets
directory for it to work with the code in the notebooks. The full path of it is therefore /datasets/natural_images
.
Setup a virtual environment using venv or virtualenv or conda (recommended when not using tensorflow-federated
) and then install the following packages:
tensorflow scipy numpy pandas scikit-learn matplotlib jupyterlab pillow
Note: Latest version of tensorflow includes tensorflow-gpu
, but you may need to include it manually as well.
Optionally, include tensorflow_federated nest_asyncio
in the above statement as well for the tensorflow federated notebooks (*tff*.ipynb
) to work.
More info found here
ssh -L <PORT1>:localhost:<PORT2> [email protected] -t ssh -L <PORT2>:localhost:<PORT3> [email protected]
Example:
ssh -L 6543:localhost:6542 [email protected] -t ssh -L 6542:localhost:6541 [email protected]
jupyter lab --no-browser --port <PORT3>
Example:
jupyter lab --no-browser --port 6541
localhost:<PORT1>
Example:
localhost:6543