Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

track: VLS <> CLN desync in v24.02 #476

Open
cdecker opened this issue Jul 7, 2024 · 13 comments · Fixed by #488
Open

track: VLS <> CLN desync in v24.02 #476

cdecker opened this issue Jul 7, 2024 · 13 comments · Fixed by #488
Labels
component::vls P1 High Priority issues state::upstream These are issues that are related to upstream projects and should be tackled there. state::verifying upstream::cln

Comments

@cdecker
Copy link
Collaborator

cdecker commented Jul 7, 2024

Some users have been reporting an issue with the VLS signer refusing to sign a channel re-establish (likely the commitment secret exchagne) due to the following error:

FAILED PRECONDITION: policy failure: get_per_commitment_secret: cannot revoke commitment_number 695 when next_holder_commit_num is 696

On the server-side we can see the same in the log lines:

UNUSUAL 02c811e575be2df47d8b48dab3d3f1c9b0f6e16d0d40b5ed78253308fc2bd7170d-channeld-chan#1: Adding HTLC 18446744073709551615 too slow: killing connection
[BreezSdk] {INFO} (2024-07-07T12:27:19.990279Z) : node-logs: INFO    02c811e575be2df47d8b48dab3d3f1c9b0f6e16d0d40b5ed78253308fc2bd7170d-chan#1: Peer transient failure in CHANNELD_NORMAL: Adding HTLC timed out: killed connection

Which then eventually times out the reconnection timeout, we try to pay anyway, but fail because no channels are available.

This does appear to be a new issue, but reminiscent of #431 which was fixed by v24.02. There appears to be another non-atomic transition in VLS <> CLN that we are not reflecting.

@cdecker cdecker added P1 High Priority issues component::vls state::investigating This is currently under investigation upstream::cln labels Jul 7, 2024
@devrandom
Copy link
Collaborator

if next_holder_commit_num is 696, then the current is 695. and of course we can't revoke the current.

so the question is how we arrived at this state.

the question is whether 696 was actually signed by VLS?

  • if so, how did VLS forget that?
  • if not, why does CLN think it's OK to revoke 695?

logs regarding the signing of holder commitment transactions would be useful here.

@cdecker
Copy link
Collaborator Author

cdecker commented Jul 8, 2024

I'm not quite sure how this could happen, as it appears that the CLN node made progress while the VLS state did not? That's what I'm reading out of the error reporting that CLN is trying to revoke our current commitment number, right? But we persist the VLS state first, and only then pass it to the node, so how could the latter make progress without the former?

@cdecker
Copy link
Collaborator Author

cdecker commented Jul 20, 2024

I have a node (03442386091363708e6e3bcda4dd978d7777743fde7a4990166cdec2e5065149f1/ prod/ttt) that is affected by this issue. Pulling logs, state and CLN DB to reproduce the case. Let's see if we can find the exact cause for the desync, or at least reproduce the state and then come up with a fix after the fact to recover.

@cdecker
Copy link
Collaborator Author

cdecker commented Jul 20, 2024

We have the following entry in the DB for the channel:

  3 |       1 |                  |                    5 |                     6 |     3 |      1 |             0 |             0 |            22092 |             22089 |        10842 | \x91fcb6f06b5b94586b13101b827bb204d78137d87031a346b27501280b239f0d |                 0 |          105000 |                     1 |             0 |        1330066 | \x03daa3e7827dcda27b240797ff43691f62ef5db21a3c78da57b33b9c99f280592b | \x02e338223fb0b618b46f221ed5fd2d4094da857ca63dc72ba6e9cba8bf477fb72f | \x03c9e0ad6900853d166362c7c6a15524c40046f8bd201ea140b247d514fd57d618 | \x02595a3116d6984f639f3d0510c303c9dc286b03a9fccefb009ca4088cd00653d3 | \x033e1e74247f8b4825a308560b4f40dc20c4837f3f7543a55ae81d8264ecf2f9c1 | \x03a947b234e56bc725692f99d40c2421146d58f4ac8eafe0b1fefc8eb71392e895 | \x03ecb61cb9a984e727f38f94f5f298e18ebbbb4296a0bd408bb5003be156acafd0 |                      |                       |                  3 |                                                                        |                     3 |                        |                     | \x70736274ff010204020000000103049a91f02001040101010501040106010301fb04020000000001012b289a01000000000022002087f8eb092754e219e7a66a6b900360a4cd31b927ed4855469bcbd1268e49b69a220203daa3e7827dcda27b240797ff43691f62ef5db21a3c78da57b33b9c99f280592b473044022044b87211cb947c15edb8331dc7600f731326e1315bc2bcbf40007d6032d74bef02201012db5ad200ee9af5d9c91a3291736ab86903fa0297f3d2664304c60bb1900d010103040100000001054752210293585196fac8ce82ba42cae3c51deda74667739edf287588cbbe69bf3a07a98d2103daa3e7827dcda27b240797ff43691f62ef5db21a3c78da57b33b9c99f280592b52ae22060293585196fac8ce82ba42cae3c51deda74667739edf287588cbbe69bf3a07a98d084330c3f000000000220603daa3e7827dcda27b240797ff43691f62ef5db21a3c78da57b33b9c99f280592b086c959ddf00000000010e2091fcb6f06b5b94586b13101b827bb204d78137d87031a346b27501280b239f0d010f0400000000011004539ca180000101282103daa3e7827dcda27b240797ff43691f62ef5db21a3c78da57b33b9c99f280592bac736460b2680103084a01000000000000010422002096516bcdab4c23f3fa72953bd7214d1fed2c5b537462593b5e5af212d9c96ea300010128210293585196fac8ce82ba42cae3c51deda74667739edf287588cbbe69bf3a07a98dac736460b2680103084a010000000000000104220020d5d031569d1d9b2697f4e6c58064cae906570a0ca246592abb644b1e070fb28e0001014d6321026fdba2f7c432947b88de2c15567679d08d343b1c25a2a5b5ef586577f058d91f67029000b2752102877e42c9c5704f1d9fdc3f2a7f933a3657cf80ddaf941ccbcbf2e011c808f98368ac010308320500000000000001042200207157815c402b50c0949747da7d649bcd7467531893e05599a9e5dd65065c12c9000101252103c9e0ad6900853d166362c7c6a15524c40046f8bd201ea140b247d514fd57d618ad51b2010308368c0100000000000104220020586919ba64a087365ece03fe2fd9648aac3d09e81e324b41593ddad4aada812100 | \x44b87211cb947c15edb8331dc7600f731326e1315bc2bcbf40007d6032d74bef1012db5ad200ee9af5d9c91a3291736ab86903fa0297f3d2664304c60bb1900d |                      |                      |         808181 |               0 |                  32 |                    32 |            41794214 |              41794214 |                10842 |                   9411 |             91166719 |               40464148 |                 1250 |                20638 |                  0 |           10000140 |                             | \x080000000000002a59 |         1000 |          10 |                                |                     |                        |                       0 | \x5120f380254c56eefffb235a5f30c2099e7aab4a308caa78b00f801f37e41370029b |                   0 |                     0 | \x91fcb6f06b5b94586b13101b827bb204d78137d87031a346b27501280b239f0d |              |      2 |                   3 |                               0 | \x036e4b8a8e12b5ad38f3230d23b5f003d9b15e9573f1d8476bfa8377d684b41b10 | \x03219ada05bf77e5a90bca7af3bbb70dc9467cd87c5a2cf7e38890fb5377079fb4 | \x0274bac2cc991ef2e158a7b2db36f06e07337d42a1fd7f20d85ac5030717baed97 | \x03edc7507bc2ad555b1a1ba1ebaf8931e99bbd6945c4c6f510f0b3ed0fcb05c089 | \x0293585196fac8ce82ba42cae3c51deda74667739edf287588cbbe69bf3a07a98d |                     |                       |                            0 |                             0 |                  |                     |                    |            0 |         105000000 |                 0 | -8834417532920193432 |  3982172984378634980 | 888605506458025984 |                             0 |                            0 | \x400000401000   |                 1 |                1000 |                 10 |                       34 |                       -1 |                        0 |             1720866916

Notice the nedxt_index_local=22092 which is also reflected in the below error message:

[2024-07-20T14:46:39Z DEBUG gl_client::signer] Handling message GetPerCommitmentPoint(GetPerCommitmentPoint { commitment_number: 22092 })
[2024-07-20T14:46:39Z DEBUG gl_client::signer] Current approvals: []
[2024-07-20T14:46:39Z INFO  vls_protocol_signer::handler] Restore node 03442386091363708e6e3bcda4dd978d7777743fde7a4990166cdec2e5065149f1
[2024-07-20T14:46:39Z INFO  lightning_signer::node]   Restore channel 02c811e575be2df47d8b48dab3d3f1c9b0f6e16d0d40b5ed78253308fc2bd7170d0100000000000000 outpoint Some(OutPoint { txid: 2aadd40a3058ddcbf8791602478a72ab43ef582d03737f938751e15b7065444e, vout: 0 })
[2024-07-20T14:46:39Z INFO  lightning_signer::node]   Restore channel 02c811e575be2df47d8b48dab3d3f1c9b0f6e16d0d40b5ed78253308fc2bd7170d0200000000000000 outpoint Some(OutPoint { txid: ff010cbb204960a71eb4acdead435cd1949cfbff6dee674156a6abb541ada7c3, vout: 0 })
[2024-07-20T14:46:39Z INFO  lightning_signer::node]   Restore channel 02c811e575be2df47d8b48dab3d3f1c9b0f6e16d0d40b5ed78253308fc2bd7170d0300000000000000 outpoint Some(OutPoint { txid: 0d9f230b280175b246a33170d83781d704b27b821b10136b58945b6bf0b6fc91, vout: 0 })
[2024-07-20T14:46:39Z INFO  lightning_signer::node] Restore node 03442386091363708e6e3bcda4dd978d7777743fde7a4990166cdec2e5065149f1 on testnet
[2024-07-20T14:46:39Z INFO  vls_protocol_signer::handler] vls_protocol_signer::handler::HandlerBuilder::do_build:: NodeState::summary 0344: 1 invoices, 0 issued_invoices, 1 payments, excess_amount 0
[2024-07-20T14:46:39Z INFO  vls_protocol_signer::handler] signer protocol_version 5, node hsm_wire_max_version 5, setting protocol_version to 5
[2024-07-20T14:46:39Z DEBUG gl_client::signer] Updating state from 0 context request
[2024-07-20T14:46:39Z ERROR lightning_signer::util::status] FAILED PRECONDITION: policy failure: get_per_commitment_point: commitment_number 22092 invalid when next_holder_commit_num is 22092
[2024-07-20T14:46:39Z ERROR lightning_signer::util::status] BACKTRACE:
       0: backtrace::capture::Backtrace::create

[2024-07-20T14:46:39Z ERROR vls_protocol_signer::handler] Signing(Status { code: FailedPrecondition, message: "policy failure: get_per_commitment_point: commitment_number 22092 invalid when next_holder_commit_num is 22092" })

And the signer state for this channel is this (formatted as a JSON document):

{
    "blockheight": None,
    "channel_setup": {
        "channel_value_sat": 105000,
        "commitment_type": "AnchorsZeroFeeHtlc",
        "counterparty_points": {
            "delayed_payment_basepoint": "033e1e74247f8b4825a308560b4f40dc20c4837f3f7543a55ae81d8264ecf2f9c1",
            "funding_pubkey": "03daa3e7827dcda27b240797ff43691f62ef5db21a3c78da57b33b9c99f280592b",
            "htlc_basepoint": "02595a3116d6984f639f3d0510c303c9dc286b03a9fccefb009ca4088cd00653d3",
            "payment_point": "03c9e0ad6900853d166362c7c6a15524c40046f8bd201ea140b247d514fd57d618",
            "revocation_basepoint": "02e338223fb0b618b46f221ed5fd2d4094da857ca63dc72ba6e9cba8bf477fb72f"
        },
        "counterparty_selected_contest_delay": 144,
        "counterparty_shutdown_script": None,
        "funding_outpoint": {"txid": "91fcb6f06b5b94586b13101b827bb204d78137d87031a346b27501280b239f0d", "vout": 0},
        "holder_selected_contest_delay": 144,
        "holder_shutdown_script": None,
        "is_outbound": False,
        "push_value_msat": 0
    },
    "channel_value_satoshis": 105000,
    "enforcement_state": {
        "channel_closed": False,
        "counterparty_secrets": {
            "old_secrets": [
                [[216, 142, 164, 23, 77, 205, 108, 33, 114, 15, 186, 148, 200, 222, 118, 180, 109, 0, 222, 144, 68, 188, 165, 0, 151, 28, 154, 24, 161, 115, 81, 40], 281474976688569],
                [[129, 119, 227, 98, 245, 155, 79, 163, 184, 252, 82, 148, 98, 89, 99, 239, 181, 13, 206, 209, 156, 250, 79, 109, 156, 41, 4, 217, 34, 195, 79, 181], 281474976688570],
                [[189, 236, 242, 157, 175, 129, 246, 142, 121, 103, 118, 37, 162, 128, 245, 210, 253, 42, 238, 143, 28, 239, 113, 233, 214, 88, 248, 204, 171, 134, 236, 230], 281474976688572],
                [[148, 181, 99, 13, 185, 143, 42, 51, 211, 235, 107, 6, 97, 64, 181, 118, 204, 213, 21, 125, 155, 6, 96, 222, 64, 237, 46, 29, 178, 153, 26, 70], 281474976688568],
                [[59, 54, 146, 91, 49, 211, 208, 113, 170, 170, 215, 51, 98, 254, 155, 170, 55, 112, 116, 19, 110, 70, 248, 198, 162, 76, 248, 215, 119, 160, 246, 27], 281474976688592],
                [[87, 170, 72, 42, 98, 152, 174, 197, 80, 115, 100, 179, 187, 16, 198, 38, 168, 197, 30, 87, 205, 147, 111, 38, 56, 90, 156, 80, 10, 9, 53, 229], 281474976688608],
                [[101, 181, 121, 243, 54, 211, 24, 37, 130, 3, 9, 171, 217, 133, 152, 12, 52, 121, 106, 67, 64, 236, 174, 148, 181, 84, 86, 9, 223, 86, 55, 101], 281474976688576],
                [[145, 166, 87, 205, 168, 157, 2, 64, 22, 223, 188, 69, 180, 17, 228, 143, 165, 1, 106, 40, 49, 207, 224, 35, 33, 214, 216, 53, 223, 210, 138, 243], 281474976688768],
                [[173, 217, 166, 113, 98, 235, 5, 3, 138, 106, 109, 59, 75, 29, 192, 253, 175, 76, 22, 172, 209, 174, 87, 2, 225, 10, 219, 32, 215, 46, 14, 128], 281474976688896],
                [[42, 143, 149, 197, 27, 180, 199, 250, 249, 108, 15, 70, 187, 128, 76, 189, 39, 51, 180, 112, 39, 176, 207, 19, 246, 164, 235, 91, 113, 113, 222, 114], 281474976688640],
                [[241, 50, 195, 86, 217, 83, 174, 42, 250, 189, 30, 189, 84, 75, 181, 3, 206, 235, 166, 138, 19, 40, 13, 255, 191, 175, 37, 7, 248, 245, 166, 159], 281474976689152],
                [[77, 214, 9, 134, 195, 112, 174, 9, 107, 164, 61, 226, 217, 85, 27, 101, 152, 182, 169, 235, 96, 72, 44, 90, 148, 72, 170, 51, 26, 9, 227, 72], 281474976692224],
                [[222, 36, 179, 80, 238, 24, 83, 5, 61, 14, 223, 138, 19, 156, 165, 76, 225, 200, 68, 127, 3, 248, 203, 146, 217, 95, 130, 18, 217, 218, 34, 194], 281474976690176],
                [[218, 140, 179, 160, 87, 48, 211, 238, 168, 39, 239, 120, 155, 55, 14, 101, 180, 207, 38, 200, 101, 223, 42, 0, 32, 182, 45, 49, 102, 206, 129, 198], 281474976702464],
                [[9, 12, 81, 184, 24, 251, 48, 65, 240, 120, 65, 255, 28, 207, 23, 191, 169, 81, 95, 92, 242, 129, 116, 251, 216, 111, 249, 46, 37, 127, 33, 191], 281474976694272],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656],
                [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 281474976710656]
            ]
        },
        "current_counterparty_commit_info": {"feerate_per_kw": 1405, "is_counterparty_broadcaster": True, "offered_htlcs": [], "received_htlcs": [], "to_broadcaster_value_sat": 101430, "to_countersigner_value_sat": 1330},
        "current_counterparty_point": "03ecb61cb9a984e727f38f94f5f298e18ebbbb4296a0bd408bb5003be156acafd0",
        "current_counterparty_signatures": ["3044022044b87211cb947c15edb8331dc7600f731326e1315bc2bcbf40007d6032d74bef02201012db5ad200ee9af5d9c91a3291736ab86903fa0297f3d2664304c60bb1900d", []],
        "current_holder_commit_info": {"feerate_per_kw": 1405, "is_counterparty_broadcaster": False, "offered_htlcs": [], "received_htlcs": [], "to_broadcaster_value_sat": 1330, "to_countersigner_value_sat": 101430},
        "initial_holder_value": 0,
        "next_counterparty_commit_num": 22089,
        "next_counterparty_revoke_num": 22088,
        "next_holder_commit_info": None,
        "next_holder_commit_num": 22092,
        "previous_counterparty_commit_info": None,
        "previous_counterparty_point": "028dd8f530f122ebc4fb19fecd7c7a9312088db9dbb5d917516e2626c8d1d8e933"
    },
    "id": None
}

Let"s dig in :-)

@cdecker
Copy link
Collaborator Author

cdecker commented Jul 20, 2024

Notice that this is my personal node, which is why I'm happy to share this information publicly. This is not something we do for user nodes, but in this case not having to jump through hoops is useful ^^

@devrandom
Copy link
Collaborator

devrandom commented Jul 22, 2024

so, yes, it's looks desynced - CLN thinks that 22092 is the current holder commit num, and VLS thinks it's the next.

but I was more interested in logs, to understand how we got there.

@devrandom
Copy link
Collaborator

or to put it more precisely:

if 22092 is the current commit num, then 22091 must have been revoked. so the thing to check is that VLS actually got that revocation. if it didn't, then something is wrong on the CLN side. if it did, then something is wrong on VLS side.

just to make sure about correct 2-phase commit: you first persist the VLS state, and only when that is successful you return the result to CLN, right?

@cdecker
Copy link
Collaborator Author

cdecker commented Jul 23, 2024

Yeah, I'll look through the logs asap to see how we got into this state (I have them pulled, just didn't have time to give them a go).

The 2PC idea is essentially that we delay the ratcheting on the vls state until CLN has committed, and make the ratcheting idempotent (so that we can replay it against the signer to sync it up again). From what I can tell here CLN advanced for some reason but the VLS state lagged behind, so maybe this is a revocation that CLN processed but didn't quite forward to the signer yet?

@devrandom
Copy link
Collaborator

well, I don't see how that can logically happen, because VLS would not have released the revocation secret if it didn't actually revoke and persist. that's the mystery...

@ksedgwic
Copy link
Collaborator

Maybe VLS regressed the commitment number due to VLS #502

@devrandom
Copy link
Collaborator

ah, very good point. will follow up offline

@devrandom
Copy link
Collaborator

the relevant fix is in 0.12, currently in release-candidate, should be final mid next week.

@devrandom devrandom reopened this Aug 2, 2024
@cdecker cdecker added state::verifying state::upstream These are issues that are related to upstream projects and should be tackled there. and removed state::investigating This is currently under investigation labels Aug 9, 2024
@cdecker cdecker pinned this issue Aug 9, 2024
@devrandom
Copy link
Collaborator

to clarify, our fix prevents a specific desync scenario. it does not correct a desynced state that has already been persisted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component::vls P1 High Priority issues state::upstream These are issues that are related to upstream projects and should be tackled there. state::verifying upstream::cln
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants