Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Federated runtime with TLS: network is not getting updated properly with current FQDN #1265

Open
payalcha opened this issue Jan 13, 2025 · 3 comments · May be fixed by #1327
Open

Federated runtime with TLS: network is not getting updated properly with current FQDN #1265

payalcha opened this issue Jan 13, 2025 · 3 comments · May be fixed by #1327
Assignees

Comments

@payalcha
Copy link
Contributor

payalcha commented Jan 13, 2025

Describe the bug
Tried to run Federated runtime with tls enabled. During the federation run, system is picking random port and system hostname as fqdn instead of assigned port-50050 and fqdn-localhost.

To Reproduce
Steps to reproduce the behavior:

  1. Create certificates using below steps
image 2. Run envoys and director using image 3. Run the notebook [MNIST_Watermarking.ipynb](https://github.com/securefederatedai/openfl/blob/develop/openfl-tutorials/experimental/workflow/301_MNIST_Watermarking.ipynb).

Expected behavior
Notebook must run properly without any error.

Screenshots
image

Desktop (please complete the following information):

  • OS: [e.g. iOS] - ubuntu
  • Browser [e.g. chrome, safari] - chrome
  • Version [e.g. 22] - 22
  • python - 3.10

Additional context
After hardcoding fqdn and port in default network.yaml it worked.
We understand that it should get updated during runtime instead of hardcoding the same.

@scngupta-dsp
Copy link
Contributor

Hi @payalcha

We tried to follow the process that was shared and enabled TLS in the FederatedRuntime using following steps. As you can see from director and envoy logs this runs successfully

Request if you could review the steps below and share the steps you have followed to enable us to reproduce the issue

Install the CA server: fx pki install -p </path/to/ca/dir> --ca-url <host:port>

(env_openfl_latest) scngupta@soc-9TTR9K3:~/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking$ fx pki install -p server/ --ca-url localhost:50050
The password will encrypt some ca files
Enter the password:
Repeat for confirmation:
[16:03:03] INFO Creating CA ca.py:159
CA binaries will be downloaded now [Y/n]: y
[16:03:14] INFO Create CA Config ca.py:247

Generating root certificate...
all done!

Generating intermediate certificate...
all done!

✔ Root certificate: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/server/step_config/certs/root_ca.crt
✔ Root private key: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/server/step_config/secrets/root_ca_key
✔ Root fingerprint: e9575ce274f70944aae7bfb6725bc485f69caa42efabdb05da08d0736853224e
✔ Intermediate certificate: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/server/step_config/certs/intermediate_ca.crt
✔ Intermediate private key: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/server/step_config/secrets/intermediate_ca_key
✔ Database folder: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/server/step_config/db
✔ Default configuration: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/server/step_config/config/defaults.json
✔ Certificate Authority configuration: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/server/step_config/config/ca.json

Your PKI is ready to go. To generate certificates for individual services see 'step help ca'.

FEEDBACK 😍 🍻
The step utility is not instrumented for usage statistics. It does not
phone home. But your feedback is extremely valuable. Any information you
can provide regarding how you’re using step helps. Please send us a
sentence or two, good or bad: [email protected] or join
https://github.com/smallstep/certificates/discussions.
Success! Your step-ca config has been updated. To pick up the new configuration SIGHUP (kill -1 ) or restart the step-ca process.
Your public key has been saved in /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/server/step_config/certs/pub.json.
Your private key has been saved in /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/server/step_config/secrets/priv.json.
Success! Your step-ca config has been updated. To pick up the new configuration SIGHUP (kill -1 ) or restart the step-ca process.

Run the CA Server: fx pki run -p </path/to/ca/dir>

(env_openfl_latest) scngupta@soc-9TTR9K3:~/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking$ fx pki run -p server/
[16:04:02] INFO Up CA server
ca.py:188
badger 2025/01/13 16:04:02 INFO: All 0 tables opened in 0s
2025/01/13 16:04:02 Serving HTTPS on localhost:50050 ...

Generate a token for the director: fx pki get-token -n <director_fqdn> --ca-path </path/to/ca/dir> --ca-url <host:port>

(env_openfl_latest) scngupta@soc-9TTR9K3:~/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking$ fx pki get-token -n soc-9TTR9K3 --ca-path server --ca-url localhost:50050
✔ Provisioner: provisioner (JWK) [kid: IoTD0r5mzGdWnJRapNzVp0FyVNFKuf99Q3NKKKWSORA]
Token:
ZXlKaGJHY2lPaUpGVXpJMU5pSXNJbXRwWkNJNklrbHZWRVF3Y2pWdGVrZGtWMjVLVW1Gd1RucFdjREJHZVZaT1JrdDFaams1VVROT1MwdExWMU5QVWtFaUxDSjBlWEFpT2lKS1YxUWlmUS5leUpoZFdRaU9pSm9kSFJ3Y3pvdkwyeHZZMkZzYUc5emREbzFNREExTUM4eExqQXZjMmxuYmlJc0ltVjRjQ0k2TVRjek5qYzJORGcxT1N3aWFXRjBJam94TnpNMk56WTBOVFU1TENKcGMzTWlPaUp3Y205MmFYTnBiMjVsY2lJc0ltcDBhU0k2SWpsa05qbGxZekEwTW1SaE5Ua3daRGRrWW1WalpqaGpaVEpoTm1FMU1ERXpNV05sT1RBME56TmlNVFZqWkRnM1lURTROV00xWkdWa1pUZzNaamxoTlRnaUxDSnVZbVlpT2pFM016WTNOalExTlRrc0luTmhibk1pT2xzaWMyOWpMVGxVVkZJNVN6TWlYU3dpYzJoaElqb2laVGsxTnpWalpUSTNOR1kzTURrME5HRmhaVGRpWm1JMk56STFZbU0wT0RWbU5qbGpZV0UwTW1WbVlXSmtZakExWkdFd09HUXdOek0yT0RVek1qSTBaU0lzSW5OMVlpSTZJbk52WXkwNVZGUlNPVXN6SW4wLnhlbUdUdUl5bmxleW40LVVuX3hKcmNIZDFDbFNIMkQ3a0NYVEZkUi1PMFRzalc5cWo1S3FzaEQ3T1NoN0NaTjZzSk9vZ3FxVDd3cHIydjdYNzBjcld3.LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJrVENDQVRhZ0F3SUJBZ0lSQVAvTHpJWE1WZmJXZ2o5OHErY2Y2Zjh3Q2dZSUtvWkl6ajBFQXdJd0pqRU4KTUFzR0ExVUVDaE1FYm1GdFpURVZNQk1HQTFVRUF4TU1ibUZ0WlNCU2IyOTBJRU5CTUI0WERUSTFNREV4TXpFdwpNek14TkZvWERUTTFNREV4TVRFd016TXhORm93SmpFTk1Bc0dBMVVFQ2hNRWJtRnRaVEVWTUJNR0ExVUVBeE1NCmJtRnRaU0JTYjI5MElFTkJNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVLY1c3Z213aWY5UDQKaVRpaWNUeENETEgxaG5OSWJNNXZETEpBOWFYb3EzVHRtT3kzVHdIOGZUNnFTMFZWRWhGMjJQem1jYURYVS9kTQpObEVpS2NLQVNhTkZNRU13RGdZRFZSMFBBUUgvQkFRREFnRUdNQklHQTFVZEV3RUIvd1FJTUFZQkFmOENBUUV3CkhRWURWUjBPQkJZRUZJMWdDaEVBWlV0WWJ0cmx2YVliVFE2eWhJMlVNQW9HQ0NxR1NNNDlCQU1DQTBrQU1FWUMKSVFDT1VmWTU4M3ZmMjVlVmRTUmhlTnByRm43MlQvNVdjdHBnbjNXVFJHZ0Q3d0loQUtmeE9QMDRNWDNvZTBvbApUTmRBa0t4Z3pkaVUzMTQrMXVzVGx0MlorODgvCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K

✔️ OK

Certify the token for the director: fx pki certify -n <director_fqdn> -t <director_token>

(env_openfl_latest) scngupta@soc-9TTR9K3:~/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/director$ fx pki certify -n soc-9TTR9K3 -t ZXlKaGJHY2lPaUpGVXpJMU5pSXNJbXRwWkNJNklrbHZWRVF3Y2pWdGVrZGtWMjVLVW1Gd1RucFdjREJHZVZaT1JrdDFaams1VVROT1MwdExWMU5QVWtFaUxDSjBlWEFpT2lKS1YxUWlmUS5leUpoZFdRaU9pSm9kSFJ3Y3pvdkwyeHZZMkZzYUc5emREbzFNREExTUM4eExqQXZjMmxuYmlJc0ltVjRjQ0k2TVRjek5qYzJORGcxT1N3aWFXRjBJam94TnpNMk56WTBOVFU1TENKcGMzTWlPaUp3Y205MmFYTnBiMjVsY2lJc0ltcDBhU0k2SWpsa05qbGxZekEwTW1SaE5Ua3daRGRrWW1WalpqaGpaVEpoTm1FMU1ERXpNV05sT1RBME56TmlNVFZqWkRnM1lURTROV00xWkdWa1pUZzNaamxoTlRnaUxDSnVZbVlpT2pFM016WTNOalExTlRrc0luTmhibk1pT2xzaWMyOWpMVGxVVkZJNVN6TWlYU3dpYzJoaElqb2laVGsxTnpWalpUSTNOR1kzTURrME5HRmhaVGRpWm1JMk56STFZbU0wT0RWbU5qbGpZV0UwTW1WbVlXSmtZakExWkdFd09HUXdOek0yT0RVek1qSTBaU0lzSW5OMVlpSTZJbk52WXkwNVZGUlNPVXN6SW4wLnhlbUdUdUl5bmxleW40LVVuX3hKcmNIZDFDbFNIMkQ3a0NYVEZkUi1PMFRzalc5cWo1S3FzaEQ3T1NoN0NaTjZzSk9vZ3FxVDd3cHIydjdYNzBjcld3.LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJrVENDQVRhZ0F3SUJBZ0lSQVAvTHpJWE1WZmJXZ2o5OHErY2Y2Zjh3Q2dZSUtvWkl6ajBFQXdJd0pqRU4KTUFzR0ExVUVDaE1FYm1GdFpURVZNQk1HQTFVRUF4TU1ibUZ0WlNCU2IyOTBJRU5CTUI0WERUSTFNREV4TXpFdwpNek14TkZvWERUTTFNREV4TVRFd016TXhORm93SmpFTk1Bc0dBMVVFQ2hNRWJtRnRaVEVWTUJNR0ExVUVBeE1NCmJtRnRaU0JTYjI5MElFTkJNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVLY1c3Z213aWY5UDQKaVRpaWNUeENETEgxaG5OSWJNNXZETEpBOWFYb3EzVHRtT3kzVHdIOGZUNnFTMFZWRWhGMjJQem1jYURYVS9kTQpObEVpS2NLQVNhTkZNRU13RGdZRFZSMFBBUUgvQkFRREFnRUdNQklHQTFVZEV3RUIvd1FJTUFZQkFmOENBUUV3CkhRWURWUjBPQkJZRUZJMWdDaEVBWlV0WWJ0cmx2YVliVFE2eWhJMlVNQW9HQ0NxR1NNNDlCQU1DQTBrQU1FWUMKSVFDT1VmWTU4M3ZmMjVlVmRTUmhlTnByRm43MlQvNVdjdHBnbjNXVFJHZ0Q3d0loQUtmeE9QMDRNWDNvZTBvbApUTmRBa0t4Z3pkaVUzMTQrMXVzVGx0MlorODgvCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
CA binaries will be downloaded now [Y/n]: y
✔ CA: https://localhost:50050/1.0/sign
✔ Certificate: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/director/cert/soc-9TTR9K3.crt
✔ Private Key: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/director/cert/soc-9TTR9K3.key

✔️ OK

Generate a token for the envoy: fx pki get-token -n <envoy_name> --ca-path </path/to/ca/dir> --ca-url <host:port>

(env_openfl_latest) scngupta@soc-9TTR9K3:~/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/Bangalore$ fx pki get-token -n Bangalore --ca-path ../server/ --ca-url localhost:50050
✔ Provisioner: provisioner (JWK) [kid: IoTD0r5mzGdWnJRapNzVp0FyVNFKuf99Q3NKKKWSORA]
Token:
ZXlKaGJHY2lPaUpGVXpJMU5pSXNJbXRwWkNJNklrbHZWRVF3Y2pWdGVrZGtWMjVLVW1Gd1RucFdjREJHZVZaT1JrdDFaams1VVROT1MwdExWMU5QVWtFaUxDSjBlWEFpT2lKS1YxUWlmUS5leUpoZFdRaU9pSm9kSFJ3Y3pvdkwyeHZZMkZzYUc5emREbzFNREExTUM4eExqQXZjMmxuYmlJc0ltVjRjQ0k2TVRjek5qYzJOVEExTkN3aWFXRjBJam94TnpNMk56WTBOelUwTENKcGMzTWlPaUp3Y205MmFYTnBiMjVsY2lJc0ltcDBhU0k2SWpGbVpqUXlNamMyTW1VellUbGlOMk0yT1dJNVpUZGhZemRpWWpabE9XRTVNemczWWprME1tWTVOR1JtTWpGak0yRm1ObUU1T0dJNFlqazNORGd6TWpNaUxDSnVZbVlpT2pFM016WTNOalEzTlRRc0luTmhibk1pT2xzaVFtRnVaMkZzYjNKbElsMHNJbk5vWVNJNkltVTVOVGMxWTJVeU56Um1OekE1TkRSaFlXVTNZbVppTmpjeU5XSmpORGcxWmpZNVkyRmhOREpsWm1GaVpHSXdOV1JoTURoa01EY3pOamcxTXpJeU5HVWlMQ0p6ZFdJaU9pSkNZVzVuWVd4dmNtVWlmUS56YnFOVWV2SXA4LXA3RkJuQ3lwZTBkOWthV3dhUGxxSWVwSjBvQzh3RExYUGZnZEU5NlVvNHV1cDNHLWtpQzBTT0lILWVVMmF6dkR2V0xoel85dEhEdw==.LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJrVENDQVRhZ0F3SUJBZ0lSQVAvTHpJWE1WZmJXZ2o5OHErY2Y2Zjh3Q2dZSUtvWkl6ajBFQXdJd0pqRU4KTUFzR0ExVUVDaE1FYm1GdFpURVZNQk1HQTFVRUF4TU1ibUZ0WlNCU2IyOTBJRU5CTUI0WERUSTFNREV4TXpFdwpNek14TkZvWERUTTFNREV4TVRFd016TXhORm93SmpFTk1Bc0dBMVVFQ2hNRWJtRnRaVEVWTUJNR0ExVUVBeE1NCmJtRnRaU0JTYjI5MElFTkJNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVLY1c3Z213aWY5UDQKaVRpaWNUeENETEgxaG5OSWJNNXZETEpBOWFYb3EzVHRtT3kzVHdIOGZUNnFTMFZWRWhGMjJQem1jYURYVS9kTQpObEVpS2NLQVNhTkZNRU13RGdZRFZSMFBBUUgvQkFRREFnRUdNQklHQTFVZEV3RUIvd1FJTUFZQkFmOENBUUV3CkhRWURWUjBPQkJZRUZJMWdDaEVBWlV0WWJ0cmx2YVliVFE2eWhJMlVNQW9HQ0NxR1NNNDlCQU1DQTBrQU1FWUMKSVFDT1VmWTU4M3ZmMjVlVmRTUmhlTnByRm43MlQvNVdjdHBnbjNXVFJHZ0Q3d0loQUtmeE9QMDRNWDNvZTBvbApUTmRBa0t4Z3pkaVUzMTQrMXVzVGx0MlorODgvCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K

✔️ OK

Certify the token for the envoy: fx pki certify -n <envoy_name> -t <envoy_token>

(env_openfl_latest) scngupta@soc-9TTR9K3:~/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/Bangalore$ fx pki certify -n Bangalore -t ZXlKaGJHY2lPaUpGVXpJMU5pSXNJbXRwWkNJNklrbHZWRVF3Y2pWdGVrZGtWMjVLVW1Gd1RucFdjREJHZVZaT1JrdDFaams1VVROT1MwdExWMU5QVWtFaUxDSjBlWEFpT2lKS1YxUWlmUS5leUpoZFdRaU9pSm9kSFJ3Y3pvdkwyeHZZMkZzYUc5emREbzFNREExTUM4eExqQXZjMmxuYmlJc0ltVjRjQ0k2TVRjek5qYzJOVEExTkN3aWFXRjBJam94TnpNMk56WTBOelUwTENKcGMzTWlPaUp3Y205MmFYTnBiMjVsY2lJc0ltcDBhU0k2SWpGbVpqUXlNamMyTW1VellUbGlOMk0yT1dJNVpUZGhZemRpWWpabE9XRTVNemczWWprME1tWTVOR1JtTWpGak0yRm1ObUU1T0dJNFlqazNORGd6TWpNaUxDSnVZbVlpT2pFM016WTNOalEzTlRRc0luTmhibk1pT2xzaVFtRnVaMkZzYjNKbElsMHNJbk5vWVNJNkltVTVOVGMxWTJVeU56Um1OekE1TkRSaFlXVTNZbVppTmpjeU5XSmpORGcxWmpZNVkyRmhOREpsWm1GaVpHSXdOV1JoTURoa01EY3pOamcxTXpJeU5HVWlMQ0p6ZFdJaU9pSkNZVzVuWVd4dmNtVWlmUS56YnFOVWV2SXA4LXA3RkJuQ3lwZTBkOWthV3dhUGxxSWVwSjBvQzh3RExYUGZnZEU5NlVvNHV1cDNHLWtpQzBTT0lILWVVMmF6dkR2V0xoel85dEhEdw==.LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJrVENDQVRhZ0F3SUJBZ0lSQVAvTHpJWE1WZmJXZ2o5OHErY2Y2Zjh3Q2dZSUtvWkl6ajBFQXdJd0pqRU4KTUFzR0ExVUVDaE1FYm1GdFpURVZNQk1HQTFVRUF4TU1ibUZ0WlNCU2IyOTBJRU5CTUI0WERUSTFNREV4TXpFdwpNek14TkZvWERUTTFNREV4TVRFd016TXhORm93SmpFTk1Bc0dBMVVFQ2hNRWJtRnRaVEVWTUJNR0ExVUVBeE1NCmJtRnRaU0JTYjI5MElFTkJNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVLY1c3Z213aWY5UDQKaVRpaWNUeENETEgxaG5OSWJNNXZETEpBOWFYb3EzVHRtT3kzVHdIOGZUNnFTMFZWRWhGMjJQem1jYURYVS9kTQpObEVpS2NLQVNhTkZNRU13RGdZRFZSMFBBUUgvQkFRREFnRUdNQklHQTFVZEV3RUIvd1FJTUFZQkFmOENBUUV3CkhRWURWUjBPQkJZRUZJMWdDaEVBWlV0WWJ0cmx2YVliVFE2eWhJMlVNQW9HQ0NxR1NNNDlCQU1DQTBrQU1FWUMKSVFDT1VmWTU4M3ZmMjVlVmRTUmhlTnByRm43MlQvNVdjdHBnbjNXVFJHZ0Q3d0loQUtmeE9QMDRNWDNvZTBvbApUTmRBa0t4Z3pkaVUzMTQrMXVzVGx0MlorODgvCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
CA binaries will be downloaded now [Y/n]: y
✔ CA: https://localhost:50050/1.0/sign
✔ Certificate: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/Bangalore/cert/Bangalore.crt
✔ Private Key: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/Bangalore/cert/Bangalore.key

✔️ OK

Same steps repeated for Chandler

Generate token for experiment manager: fx pki get-token -n <manager> --ca-path </path/to/ca/dir> --ca-url <host:port>

✔ Provisioner: provisioner (JWK) [kid: IoTD0r5mzGdWnJRapNzVp0FyVNFKuf99Q3NKKKWSORA]
Token:
ZXlKaGJHY2lPaUpGVXpJMU5pSXNJbXRwWkNJNklrbHZWRVF3Y2pWdGVrZGtWMjVLVW1Gd1RucFdjREJHZVZaT1JrdDFaams1VVROT1MwdExWMU5QVWtFaUxDSjBlWEFpT2lKS1YxUWlmUS5leUpoZFdRaU9pSm9kSFJ3Y3pvdkwyeHZZMkZzYUc5emREbzFNREExTUM4eExqQXZjMmxuYmlJc0ltVjRjQ0k2TVRjek5qYzJOVE13T0N3aWFXRjBJam94TnpNMk56WTFNREE0TENKcGMzTWlPaUp3Y205MmFYTnBiMjVsY2lJc0ltcDBhU0k2SW1WbVpHTmlaV1JtWTJOa01UWmhPV0kzTUdRMVlUbGlOV0V6T1dJM05UbGhNMkprWkRkbVpHUTFNelF3WVdGaE1EUTJOall3WkRCa1pUWXhObVZsTWpraUxDSnVZbVlpT2pFM016WTNOalV3TURnc0luTmhibk1pT2xzaVRXRnVZV2RsY2lKZExDSnphR0VpT2lKbE9UVTNOV05sTWpjMFpqY3dPVFEwWVdGbE4ySm1ZalkzTWpWaVl6UTROV1kyT1dOaFlUUXlaV1poWW1SaU1EVmtZVEE0WkRBM016WTROVE15TWpSbElpd2ljM1ZpSWpvaVRXRnVZV2RsY2lKOS5MaldiVDYxWmVmWVlsWGdmVF82OTRKWHdFR3hzRU1PT2NmZFIxOW1hYlBLSTRwMUlxZHJ3X3FIeWlIR1I3cUhLbk9iZTJzd2FNMVNnWkgyMVhTT3l5dw==.LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJrVENDQVRhZ0F3SUJBZ0lSQVAvTHpJWE1WZmJXZ2o5OHErY2Y2Zjh3Q2dZSUtvWkl6ajBFQXdJd0pqRU4KTUFzR0ExVUVDaE1FYm1GdFpURVZNQk1HQTFVRUF4TU1ibUZ0WlNCU2IyOTBJRU5CTUI0WERUSTFNREV4TXpFdwpNek14TkZvWERUTTFNREV4TVRFd016TXhORm93SmpFTk1Bc0dBMVVFQ2hNRWJtRnRaVEVWTUJNR0ExVUVBeE1NCmJtRnRaU0JTYjI5MElFTkJNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVLY1c3Z213aWY5UDQKaVRpaWNUeENETEgxaG5OSWJNNXZETEpBOWFYb3EzVHRtT3kzVHdIOGZUNnFTMFZWRWhGMjJQem1jYURYVS9kTQpObEVpS2NLQVNhTkZNRU13RGdZRFZSMFBBUUgvQkFRREFnRUdNQklHQTFVZEV3RUIvd1FJTUFZQkFmOENBUUV3CkhRWURWUjBPQkJZRUZJMWdDaEVBWlV0WWJ0cmx2YVliVFE2eWhJMlVNQW9HQ0NxR1NNNDlCQU1DQTBrQU1FWUMKSVFDT1VmWTU4M3ZmMjVlVmRTUmhlTnByRm43MlQvNVdjdHBnbjNXVFJHZ0Q3d0loQUtmeE9QMDRNWDNvZTBvbApUTmRBa0t4Z3pkaVUzMTQrMXVzVGx0MlorODgvCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K

Certify the token for the experiment manager: fx pki certify -n <manager> -t <manager_token>

(env_openfl_latest) scngupta@soc-9TTR9K3:~/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/workspace$ fx pki certify -n Manager -t ZXlKaGJHY2lPaUpGVXpJMU5pSXNJbXRwWkNJNklrbHZWRVF3Y2pWdGVrZGtWMjVLVW1Gd1RucFdjREJHZVZaT1JrdDFaams1VVROT1MwdExWMU5QVWtFaUxDSjBlWEFpT2lKS1YxUWlmUS5leUpoZFdRaU9pSm9kSFJ3Y3pvdkwyeHZZMkZzYUc5emREbzFNREExTUM4eExqQXZjMmxuYmlJc0ltVjRjQ0k2TVRjek5qYzJOVE13T0N3aWFXRjBJam94TnpNMk56WTFNREE0TENKcGMzTWlPaUp3Y205MmFYTnBiMjVsY2lJc0ltcDBhU0k2SW1WbVpHTmlaV1JtWTJOa01UWmhPV0kzTUdRMVlUbGlOV0V6T1dJM05UbGhNMkprWkRkbVpHUTFNelF3WVdGaE1EUTJOall3WkRCa1pUWXhObVZsTWpraUxDSnVZbVlpT2pFM016WTNOalV3TURnc0luTmhibk1pT2xzaVRXRnVZV2RsY2lKZExDSnphR0VpT2lKbE9UVTNOV05sTWpjMFpqY3dPVFEwWVdGbE4ySm1ZalkzTWpWaVl6UTROV1kyT1dOaFlUUXlaV1poWW1SaU1EVmtZVEE0WkRBM016WTROVE15TWpSbElpd2ljM1ZpSWpvaVRXRnVZV2RsY2lKOS5MaldiVDYxWmVmWVlsWGdmVF82OTRKWHdFR3hzRU1PT2NmZFIxOW1hYlBLSTRwMUlxZHJ3X3FIeWlIR1I3cUhLbk9iZTJzd2FNMVNnWkgyMVhTT3l5dw==.LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJrVENDQVRhZ0F3SUJBZ0lSQVAvTHpJWE1WZmJXZ2o5OHErY2Y2Zjh3Q2dZSUtvWkl6ajBFQXdJd0pqRU4KTUFzR0ExVUVDaE1FYm1GdFpURVZNQk1HQTFVRUF4TU1ibUZ0WlNCU2IyOTBJRU5CTUI0WERUSTFNREV4TXpFdwpNek14TkZvWERUTTFNREV4TVRFd016TXhORm93SmpFTk1Bc0dBMVVFQ2hNRWJtRnRaVEVWTUJNR0ExVUVBeE1NCmJtRnRaU0JTYjI5MElFTkJNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVLY1c3Z213aWY5UDQKaVRpaWNUeENETEgxaG5OSWJNNXZETEpBOWFYb3EzVHRtT3kzVHdIOGZUNnFTMFZWRWhGMjJQem1jYURYVS9kTQpObEVpS2NLQVNhTkZNRU13RGdZRFZSMFBBUUgvQkFRREFnRUdNQklHQTFVZEV3RUIvd1FJTUFZQkFmOENBUUV3CkhRWURWUjBPQkJZRUZJMWdDaEVBWlV0WWJ0cmx2YVliVFE2eWhJMlVNQW9HQ0NxR1NNNDlCQU1DQTBrQU1FWUMKSVFDT1VmWTU4M3ZmMjVlVmRTUmhlTnByRm43MlQvNVdjdHBnbjNXVFJHZ0Q3d0loQUtmeE9QMDRNWDNvZTBvbApUTmRBa0t4Z3pkaVUzMTQrMXVzVGx0MlorODgvCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
CA binaries will be downloaded now [Y/n]: y
✔ CA: https://localhost:50050/1.0/sign
✔ Certificate: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/workspace/cert/Manager.crt
✔ Private Key: /home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/workspace/cert/Manager.key

✔️ OK

Start Director: fx director start -c director_config.yaml -rc cert/root_ca.crt -pk cert/soc-9TTR9K3.key -oc cert/soc-9TTR9K3.crt
Settings modified to below to avoid failure in connection on envoys

settings:
listen_host: 0.0.0.0
listen_port: 50050
envoy_health_check_period: 5 # in seconds

Start Bangalore: fx envoy start -n Bangalore --envoy-config-path Bangalore_config.yaml -dh soc-9TTR9K3 -dp 50050 -rc cert/root_ca.crt -pk cert/Bangalore.key -oc cert/Bangalore.crt

Start Chandler: fx envoy start -n Chandler --envoy-config-path Chandler_config.yaml -dh soc-9TTR9K3 -dp 50050 -rc cert/root_ca.crt -pk cert/Chandler.key -oc cert/Chandler.crt

Jupyter Notebook: edited code as follows

director_info = {
    'director_node_fqdn':'soc-9TTR9K3',
    'director_port':50050,
    'cert_chain': '/home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/workspace/cert/root_ca.crt',
    'api_cert': '/home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/workspace/cert/Manager.crt',
    'api_private_key': '/home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/workspace/cert/Manager.key',    
}

authorized_collaborators = ['Bangalore', 'Chandler']

federated_runtime = FederatedRuntime(
    collaborators=authorized_collaborators,
    director=director_info, 
    notebook_path='./MNIST_Watermarking.ipynb',
    tls=True
)

LOGS
director_with_tls.log
Bangalore_with_tls.log
Chandler_with_tls.log

@ishant162 ishant162 self-assigned this Jan 16, 2025
@ishant162
Copy link
Collaborator

Initial analysis of logs shared by @payalcha and @noopurintel, indicates that there is a hostname mismatch between the Aggregator and the Director.

Here's a detailed breakdown:

  1. Director's Hostname:
  • The Director is configured to use localhost as its hostname.
(tls_fix) ishant@soc-PF2F2E02:~/tls_issue/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/director$ fx director start -c director_config.yaml -rc cert/root_ca.crt -pk cert/localhost.key -oc cert/localhost.crt
/home/ishant/miniforge3/envs/tls_fix/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
[12:15:01] INFO     🧿 Starting the Director Service.                                                                                             director.py:86
           INFO     Starting director server on localhost:50050                                                                           director_server.py:146
[12:15:04] INFO     Envoy Bangalore is attempting to connect                                                                              director_server.py:179
           INFO     Envoy Bangalore is connected                                                                                          director_server.py:182
[12:15:33] INFO     Experiment FederatedFlow_MNIST_Watermarking registered                                                                director_server.py:314
  1. Aggregator's Hostname:
  • During plan parsing the Aggregator's hostname resolves to the machine's FQDN.
INFO     FL-Plan hash is 4794523320c407e0031c301f6b13c63c5d5775c829163efeb254226fc9104fd31c1c82cafea463f5e0a77d16aa9dca1b                 plan.py:267
           INFO     FL-Plan hash is 9ef6330c565bbde25bfddf17852707863d0bd2ecee6a001082cc4ca1dc6af00be38e8db81cd01b8568156dfb1a9e0321                 plan.py:267
           INFO     Parsing Federated Learning Plan : SUCCESS : plan/plan.yaml.                                                                      plan.py:161
           INFO     aggregator:                                                                                                                      plan.py:166
                      settings:
                        rounds_to_train: 1
                      template: openfl.experimental.workflow.component.Aggregator
                    collaborator:
                      settings: {}
                      template: openfl.experimental.workflow.component.Collaborator
                    federated_flow:
                      settings:
                        checkpoint: true
                        model: src.experiment.model
                        optimizer: src.experiment.optimizer
                        watermark_pretrain_optimizer: src.experiment.watermark_pretrain_optimizer
                        watermark_retrain_optimizer: src.experiment.watermark_retrain_optimizer
                      template: src.experiment.FederatedFlow_MNIST_Watermarking
                    network:
                      settings:
                        agg_addr: soc-PF2F2E02.clients.intel.com  // Machine's FQDN
                        agg_port: 58646
                        cert_folder: cert
                        client_reconnect_interval: 5
                        disable_client_auth: false
                        hash_salt: auto
                        tls: true
                      template: openfl.federation.Network
  1. Subsequently when collaborator tries to connect to the aggregator the connection fails probably due to hostname discrepency.

Next Steps: Need to analyze the hostnames used in aggregator server and collaborator clients.

@ishant162
Copy link
Collaborator

ishant162 commented Jan 22, 2025

Detailed Analysis:

  1. Director:

The Director is started with the hostname set to localhost, and the mTLS certificates were generated with localhost as the Common Name (CN).

(tls_fix) ishant@soc-PF2F2E02:~/tls_issue/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermaking/director$ fx director start -c director_config.yaml -rc cert_one/root_ca.crt -pk cert_one/localhost.key -oc cert_one/localhost.crt
[08:25:33] INFO     🧿 Starting the Director Service.                                                                                             director.py:86
           INFO     Starting director server on localhost:50050                                                                           director_server.py:146
[08:25:42] INFO     Envoy Bangalore is attempting to connect                                                                              director_server.py:179
           INFO     Envoy Bangalore is connected                                                                                          director_server.py:182
  1. Aggregator URI Configuration in network.yaml:
  • When the Director spawns the Aggregator, it uses the network section from plan.yaml to configure the Aggregator's URI.
  • In this configuration: agg_addr is dynamically populated with the machine's Fully Qualified Domain Name (FQDN) and agg_port is assigned a random port during the Plan parsing process.

For Example:

           INFO     Parsing Federated Learning Plan : SUCCESS : plan/plan.yaml.
     plan.py:161
           INFO     aggregator:
     plan.py:166
                      settings:

                        rounds_to_train: 1

                      template: openfl.experimental.workflow.component.Aggregator

                    collaborator:

                      settings: {}

                      template: openfl.experimental.workflow.component.Collaborator

                    federated_flow:

                      settings:

                        checkpoint: true

                        model: src.experiment.model

                        optimizer: src.experiment.optimizer

                        watermark_pretrain_optimizer: src.experiment.watermark_pretrain_optimizer

                        watermark_retrain_optimizer: src.experiment.watermark_retrain_optimizer

                      template: src.experiment.FederatedFlow_MNIST_Watermarking

                    network:

                      settings:

                        agg_addr: soc-PF2F2E02.clients.intel.com   // Machine's FQDN

                        agg_port: 53584

                        cert_folder: cert

                        client_reconnect_interval: 5

                        disable_client_auth: false

                        hash_salt: auto

                        tls: true

                      template: openfl.federation.Network

However, in the code for openfl/experimental/workflow/transport/grpc/aggregator_server.py, the Aggregator's hostname is hardcoded to [::], which binds the server to all available network interfaces (IPv6 and IPv4). This mismatch means the agg_addr in network configuration is effectively ignored.

For Example:

           INFO     Building `openfl.experimental.workflow.component.Aggregator` Module.
     plan.py:207
           INFO     MetaflowInterface creation.                                                                                                aggregator.py:129
           INFO     🧿 Starting the Aggregator Service.                                                                                        experiment.py:193
           INFO     Starting Aggregator gRPC Server on [::]:53584                                                                       aggregator_server.py:221
           INFO     Starting Aggregator gRPC Server                                                                                            experiment.py:196
           INFO     Starting round 1...                                                                                                        aggregator.py:207
  1. Collaborator's Connection to Aggregator via Envoy:

When the Envoy spawns the Collaborator, it retrieves the agg_addr (e.g., soc-PF2F2E02.clients.intel.com) and agg_port from the network configuration in plan.yaml to connect to the Aggregator. The connection still fails as per the following example

           INFO     🧿 Starting the Collaborator Service.
    envoy.py:197
           INFO     Building `openfl.experimental.workflow.component.Collaborator` Module.
     plan.py:207
           INFO     Waiting for tasks...                                                                                                     collaborator.py:180
           INFO     Response code: StatusCode.UNAVAILABLE                                                                                aggregator_client.py:54
           INFO     Attempting to connect to aggregator at soc-PF2F2E02.clients.intel.com:53584                                          aggregator_client.py:29
[08:26:18] INFO     Response code: StatusCode.UNAVAILABLE                                                                                aggregator_client.py:54
           INFO     Attempting to connect to aggregator at soc-PF2F2E02.clients.intel.com:53584                                          aggregator_client.py:29
  1. Root Cause: mTLS Certificate Domain Mismatch:

The failure occurs due to a mismatch in the mTLS certificate domain validation. If the Aggregator's gRPC server certificate's Common Name (CN) or Subject Alternative Name (SAN) does not include the hostname that the Collaborator uses to connect (as defined in agg_addr), the connection will fail.

For example:

If agg_addr is set to soc-PF2F2E02.clients.intel.com, but the Aggregator's certificate is created with a different CN/SAN (For e.g localhost) the mTLS handshake will fail.

Success Case Example:

  • Director TLS Certificate Common Name (CN): soc-PF2F2E02 // Machine's FQDN

  • Connections:

Director : soc-PF2F2E02:50050 <-> Envoy : soc-PF2F2E02:50050 → Successfully Connected
Aggregator : [::]:53584 <-> Collaborator : soc-PF2F2E02:53584 → Successfully Connected

  • Reason for Successful Connection:

The Aggregator and Collaborator successfully connect because the hostname used by the Collaborator to connect with the Aggregator matches the Common Name (CN) in the TLS certificate.

Next Steps:

  • Working on a potential fix for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants