Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare hang #58

Open
HyperCodec opened this issue May 15, 2024 · 39 comments · May be fixed by #80
Open

Rare hang #58

HyperCodec opened this issue May 15, 2024 · 39 comments · May be fixed by #80
Assignees
Labels
bug Something isn't working

Comments

@HyperCodec
Copy link
Owner

Not sure how this is happening but in extremely rare circumstances it is possible to hang indefinitely. See #57 workflow run for more info.

My guess is there is one very small outlying situation that causes a rwlock to be locked and used by a child node, but this shouldn't be possible with the well-tested circulation prevention algorithm. This definitely requires further debugging, but it is so rare and obscure that it is difficult to catch it and the details about what happened.

@HyperCodec HyperCodec added the bug Something isn't working label May 15, 2024
@HyperCodec HyperCodec self-assigned this May 15, 2024
@Bowarc
Copy link

Bowarc commented May 31, 2024

image

After a couple of generation, it just stops and i don't know why.

It appears to be 100% of the time with my current test, I pushed it at https://github.com/Bowarc/doodlai_jump/tree/ea955a6b681fcbaa2a4e3ec6d81f14970d5414b7

(The /ring package is responsible for training (the one hanging after a couple of generations), game is a lib for a rly simple version of doodle jump and display is to see the ai play)

@HyperCodec
Copy link
Owner Author

image

After a couple of generation, it just stops and i don't know why.

It appears to be 100% of the time with my current test, I pushed it at https://github.com/Bowarc/doodlai_jump/tree/ea955a6b681fcbaa2a4e3ec6d81f14970d5414b7

(The /ring package is responsible for training (the one hanging after a couple of generations), game is a lib for a rly simple version of doodle jump and display is to see the ai play)

Hmm so it's probably something with a recursive RwLock. I'll have to look into it further. It's probably some internal function causing a cyclic neuron dependency (like DFS not working or something).

@HyperCodec
Copy link
Owner Author

Btw @Bowarc can you use the serde feature to dump a json (or ron) file on the generation that hangs? (Probably the easiest way to do this would be to overwrite the same file with each generation and then stop the program when it hangs)

@Bowarc
Copy link

Bowarc commented May 31, 2024

Ok, i'll do that tomorrow

@Bowarc
Copy link

Bowarc commented May 31, 2024

Well, i stayed up longer than expected 😅
Here is the dna of every genome of a sim that froze

DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [RwLock { data: NeuronTopology { inputs: [(Input(3), -0.5625169)], bias: 0.59482414, activation: sigmoid
 }, poisoned: false, .. }], output_layer: [RwLock { data: NeuronTopology { inputs: [(Hidden(0), 0.9505495)], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.97373414), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }

made a new commit if you wanna check it 8d75367

@HyperCodec
Copy link
Owner Author

Well, i stayed up longer than expected 😅 Here is the dna of every genome of a sim that froze...

Something I noticed here is that there are a lot inputs for Input layer neurons on one of the output neurons for each genome. I doubt this is just a result of evolution or something because of that huge ratio between it and the other neurons. Probably another issue to fix.

Anyways, I created #61 for the duplicate neuron references that are in the inputs to that output neuron.

@HyperCodec
Copy link
Owner Author

HyperCodec commented Jun 1, 2024

Merged #62, which is the main suspect of this issue.

@Bowarc Can you try to run with neat = { git = "https://github.com/hypercodec/neat", branch = "dev", features = ["whateveryouhadbefore"] } and see if it still hangs?

@Bowarc
Copy link

Bowarc commented Jun 1, 2024

I've now tested over 3k generations, it seems to be stable, thank you for the fix (i had ["crossover", "rayon", "serde"] as features)

@HyperCodec
Copy link
Owner Author

Np

@HyperCodec
Copy link
Owner Author

You can use dev branch for now but it's not a good branch to stay on bc of large api changes, def change back to stable after next release.

@Bowarc
Copy link

Bowarc commented Jun 1, 2024

Alright, thanks !

@Bowarc
Copy link

Bowarc commented Jun 1, 2024

image
Oh
I swapped to DivisionReproduction (i was on CrossoverReproduction before) and first try after about 125 generations it deadlocked

Here is the simulation data
sim.backup.txt

I tried more tests, even went back to CrossoverReproduction w/ crossover_pruning_nextgen but it appears to be deadlocking 100% of the time again.
After more tests i found that if i have a too low number of genome per generation (<100) it deadlocks in about 10/100 gens
Seems fine with 1000 genomes / gen

DivisionReproduction hangs after a bit with 1000 genomes, here is the sim data:
sim.backup.txt

@HyperCodec HyperCodec reopened this Jun 1, 2024
@HyperCodec
Copy link
Owner Author

I swapped to DivisionReproduction (i was on CrossoverReproduction before) and first try after about 125 generations it deadlocked

I tried more tests, even went back to CrossoverReproduction w/ crossover_pruning_nextgen but it appears to be deadlocking 100% of the time again. After more tests i found that if i have a too low number of genome per generation (<100) it deadlocks in about 10/100 gens Seems fine with 1000 genomes / gen

DivisionReproduction hangs after a bit with 1000 genomes, here is the sim data: sim.backup.txt

Interesting that it made it through ~3k generations without deadlocking when on CrossoverReproduction the first time but not the second time. Perhaps you just got really lucky on that run. At least this eliminates the premise that the double neuron input thing is causing a deadlock (although it probably also was causing a deadlock in and of itself, maybe there are just multiple issues here)

@HyperCodec
Copy link
Owner Author

After looking through your backup files, I noticed that there are still duplicate inputs. I am not sure this time how they are being made.

@Bowarc
Copy link

Bowarc commented Jun 2, 2024

While testing performances & learning curves, i found out that high mutation rate (=>0.1) deadlocks in less than 50 gens 100% of the time, and now that i think of it, it might be the difference between me saying that it looks good and me saying that it doesn't work again

Example:

pub const NB_GAMES: usize = 3;
pub const GAME_TIME_S: usize = 20; // Nb of secconds we let the ai play the game before registering their scrore
pub const GAME_DT: f64 = 0.05; // 0.0166
pub const NB_GENERATIONS: usize = 100;
pub const NB_GENOME_PER_GEN: usize = 2000;

neat::NeuralNetworkTopology::new(0.2, 3, rng)

Deadlocks in 15 generations

sim15.backup.txt

@HyperCodec
Copy link
Owner Author

While testing performances & learning curves, i found out that high mutation rate (=>0.1) deadlocks in less than 50 gens 100% of the time, and now that i think of it, it might be the difference between me saying that it looks good and me saying that it doesn't work again

Example:

pub const NB_GAMES: usize = 3;

pub const GAME_TIME_S: usize = 20; // Nb of secconds we let the ai play the game before registering their scrore

pub const GAME_DT: f64 = 0.05; // 0.0166

pub const NB_GENERATIONS: usize = 100;

pub const NB_GENOME_PER_GEN: usize = 2000;



neat::NeuralNetworkTopology::new(0.2, 3, rng)

Deadlocks in 15 generations

sim15.backup.txt

So yeah the deadlock issue is probably one of the mutations.

@HyperCodec
Copy link
Owner Author

I wonder if the deadlock might be happening during the mutation phase, leading to something that can't be accurately debugged as it hasn't finished mutating the neural network before it deadlocks.

@HyperCodec
Copy link
Owner Author

HyperCodec commented Jun 7, 2024

Might not necessarily mean anything, but just ran some stress tests and such on windows in dev branch (rayon and crossover) and it didn't deadlock once.

Either I'm just really lucky or this has something to do with platform-specific things.

@Bowarc
Copy link

Bowarc commented Jun 7, 2024

Have you tried high mutation rate ?

@HyperCodec
Copy link
Owner Author

Yeah I just got lucky, it happens on any platform.

I did more testing and found that the deadlock is during the running phase, meaning that it's still probably some type of recursive RwLock.

@HyperCodec
Copy link
Owner Author

Still can't find this deadlock even after weeks, it's being really evasive.

It's almost certainly a recursive RwLockor duped input, but I have code to prevent both of those from happening.

I thought it might be something like those while loops that reroll until a valid state is reached infinitely looping because there is no valid state, but the deadlock doesn't happen during mutation so it can't be that (although probably do want to patch that, it's extremely rare and unlikely to ever happen but is still a possibility).

I'm really just out of ideas for what could possibly cause this issue.

@HyperCodec
Copy link
Owner Author

While I think this is definitely a high-priority issue that urgently needs to be fixed, I'll take a break from it so it doesn't keep taking time away from new features and such.

@HyperCodec
Copy link
Owner Author

HyperCodec commented Jul 12, 2024

I think I found the cause of the issue: if all threads have a lock waiting on other tasks, rayon has no way to access and run those dependency tasks.

@HyperCodec
Copy link
Owner Author

created rayon-rs/rayon#1181, waiting for confirmation on a solution. if rayon takes too long to introduce a fix I can probably make a temporary fix here.

@dsgallups
Copy link

I'm not sure if this helps; I've been working on a crate based on yours and noticed that the network topology is able to create cycles in the data structure of the neural network. Please let me know if I'm missing something! (drawing a picture real quick)

@dsgallups
Copy link

dsgallups commented Sep 15, 2024

Visual example attached

image

While NeuralNetworkTopology::mutate checks for duplicate inputs, it does not appear to resolve graph cycles. I think back edge detection would work here.

Edit: I've implemented this here

@HyperCodec
Copy link
Owner Author

HyperCodec commented Sep 15, 2024

Visual example attached

image

While NeuralNetworkTopology::mutate checks for duplicate inputs, it does not appear to resolve graph cycles. I think back edge detection would work here.

Edit: I've implemented this here

I had a DFS algorithm that was attempting to resolve these loops. Pretty sure I had it working but kind of hard to tell with how random things are in genetic simulations.

https://github.com/HyperCodec/neat/blob/main/src/topology/mod.rs#L119

I've also narrowed this down to pretty much only ever happening with the rayon feature enabled, so I'm thinking it's probably some lock collisions. The cpu usage goes down a ton, which also suggests that the threads are paused.

@HyperCodec
Copy link
Owner Author

Now that I think about it, I should really use seeded rng when testing these things so get rid of some of the randomness.

@dsgallups
Copy link

dsgallups commented Sep 24, 2024

Found it in my fork. On deeply nested structures, par_iter(...).sum will be blocked on all threads, and therefore, no values can return when the summation of inputs occurs:

neat/src/runnable.rs

Lines 106 to 111 in 228f7af

.par_iter()
.map(|&(n2, w)| {
let processed = self.process_neuron(n2);
processed * w
})
.sum();

the sum operation can never complete, even if all of the iterator's components have returned. This is because, at the instant the final child completes, the thread is returned to the pool. Before the sum operation is provided to this last open thread, that thread is allocated to another par_iter that will block. Then, all other threads in rayon's thread pool are blocked (of which some are waiting this node's function to return), and cannot be given out to complete the sum op. I had posted proof of concept but have since moved my repo visibility to private.

@HyperCodec
Copy link
Owner Author

HyperCodec commented Sep 24, 2024

Found it in my fork. On deeply nested structures, par_iter(...).sum will be blocked on all threads, and therefore, no values can return when the summation of inputs occurs:

neat/src/runnable.rs

Lines 106 to 111 in 228f7af

.par_iter()
.map(|&(n2, w)| {
let processed = self.process_neuron(n2);
processed * w
})
.sum();

the sum operation can never complete, even if all of the iterator's components have returned. This is because, on the final return of the iterator, the thread is returned to the pool. Before the sum operation is provided to the open thread, that thread is allocated to another par_iter that will block. Then, all other threads in rayon's thread pool are blocked (of which some are waiting this node's function to return), and cannot be given out to complete the sum op. I had posted proof of concept but have since moved my repo visibility to private.

Are you sure this is because of lazy stacked sum and not the call to map before it, which uses rwlocks and such?

If sum is causing this, then would converting back to single-threaded iterator after mapping solve this issue?

@dsgallups
Copy link

dsgallups commented Sep 24, 2024

Good point! Lemme make a real fork rq with rayon and run it with a high SplitConnection mutation rate and compare.

@dsgallups
Copy link

dsgallups commented Sep 24, 2024

Ah, you were right. for_each also does not complete, even after the result is returned. I essentially have been using trace to determine this. here's the details, trying a RwLock instead of using .sum:

        let mut sum = RwLock::new(0.);

        self.inputs()
            .unwrap()
            .par_iter()
            .enumerate()
            .for_each(|(idx, input)| {
                info!(
                    "{} REQUEST INPUT ({}/{})",
                    self.id_short(),
                    idx,
                    num_inputs - 1
                );
                let res = input.get_input_value(self.id_short(), idx);
                info!(
                    "{} RECEIVED INPUT ({}/{}) ({})",
                    self.id_short(),
                    idx,
                    num_inputs - 1,
                    res
                );
                let mut sum = sum.write().unwrap();
                *sum += res;
            });

        info!("{} RETURNING RESULT FROM INPUTS", self.id_short());

        let sum = sum.into_inner().unwrap();
        self.activated_value = Some(sum);

The following log identifies a neuron that has received back all its inputs. However, the function never returns. Logs follow this view from other threads, but the last info trace of this particular node isn't called.

2024-09-24T16:05:14.445053Z  INFO candle_neat::simple_net::neuron: 398ba9 RECEIVED INPUT (0/1) (0)
2024-09-24T16:05:14.445084Z  INFO candle_neat::simple_net::neuron: 398ba9 RECEIVED INPUT (1/1) (0)

@dsgallups
Copy link

dsgallups commented Sep 24, 2024

One interesting property to note is that, at least on my end, attaching by_uniform_blocks(1) to the parallel iterator stops this blocking behavior...at least that's what I've found after running a super high split rate after 5-6 minutes...I'm pretty sure this just makes the iterator sequential, but yeah lol

@HyperCodec
Copy link
Owner Author

neat hang diagram

This is a little diagram I made explaining my earlier theory. I'm not sure what can be done to prevent this without completely forking rayon (making a custom lock type compatible with it) or making some hacky spinlock solution with tons of rayon::yield_now() calls

@HyperCodec
Copy link
Owner Author

HyperCodec commented Sep 25, 2024

The reason this doesn't always deadlock is because rayon is work-stealing, meaning if any thread finishes before the others (as in the dependency task is the first one to be added to its queue or all the base tasks are on some other thread) it can steal tasks from the waiting threads, preventing a deadlock.

This deadlock only happens when all threads have a waiting task at the start of their queue, which isn't super common (and gets much rarer with each CPU core added).

@HyperCodec
Copy link
Owner Author

@dsgallups would you be able to look into this a bit? There is an issue on the rayon GitHub about it (rayon-rs/rayon#592) but it's been open since 2018 and doesn't appear like it's going to be fixed any time soon.

@HyperCodec
Copy link
Owner Author

It looks from that issue that there is a workaround with a custom ThreadPool for locking stuff but not sure how well that'll work with a recursive algorithm like this.

@dsgallups
Copy link

dsgallups commented Sep 25, 2024

If this was async, I'd know how to handle this with tokio since threads can rejoin the thread pool across await boundaries...it's an interesting challenge to determine when an iterator is ready to complete when all threads are being blocked. I'll take a look into it

edit: Going to see if rayon-rs/rayon#1175 is a quick win

@dsgallups
Copy link

Unfortunately, I've decided not to pursue debugging rayon. I'm opting to do network expansion, transforming the network into a set of tensors as defined here and running on candle-rs. Hope someone else will be able to figure this one out! Just wanted to give an update. Thanks for your efforts!

@HyperCodec HyperCodec linked a pull request Nov 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants