Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Remove producer/consumer; Confusing output #491

Open
utdrmac opened this issue Aug 14, 2021 · 12 comments
Open

FR: Remove producer/consumer; Confusing output #491

utdrmac opened this issue Aug 14, 2021 · 12 comments
Labels
enhancement New feature or request

Comments

@utdrmac
Copy link
Contributor

utdrmac commented Aug 14, 2021

There is no real purpose behind the P/C operation of TRD. It does not make TRD process any faster due to parallelism nor does it improve concurrency. What it does do is present confusion:

2021-08-14 13:19:25,125 - producer  - INFO - Calculation report is created at '/home/stuff/.trd/reports/tz1RpNqAzQNaZuxeLSbfEzVEzUWssddsL6Kw/calculations/383.csv'
2021-08-14 13:19:25,125 - producer  - INFO - Reward creation is done for cycle 383, created 1135 rewards.
2021-08-14 13:19:25,126 - producer  - INFO - Run mode ONETIME satisfied. Terminating...
2021-08-14 13:19:25,126 - producer  - DEBUG - Producer returning...
2021-08-14 13:19:25,735 - MainThread - INFO - Interrupted.
2021-08-14 13:19:25,735 - MainThread - INFO - TRD is shutting down...
2021-08-14 13:19:25,735 - MainThread - INFO - --------------------------------------------------------
2021-08-14 13:19:25,736 - MainThread - INFO - Sensitive operations are in progress!
2021-08-14 13:19:25,736 - MainThread - INFO - Please wait while the application is being shut down!
2021-08-14 13:19:25,736 - MainThread - INFO - --------------------------------------------------------

TRD remained like this for a full minute. Is TRD stuck? Looks like it is shutting down/stopping. But no, not really. In the background is another thread injecting the multiple batches of payouts.

IMO, we should remove the P/C in favor of a simple payout loop. The P/C is just unnecessary overhead/complexity.

@utdrmac utdrmac added the enhancement New feature or request label Aug 14, 2021
@utdrmac
Copy link
Contributor Author

utdrmac commented Apr 9, 2022

More on this. When using -M 3 mode, the messages are confusing as it says TRD is shutting down, but not really. In fact, there are still many batches to process until it can shut down.

2022-04-09 19:04:10,231 - consumer0 - INFO - 1276 payments will be done in 5 batches
...
2022-04-09 19:04:14,189 - producer  - INFO - Reward creation is done for cycle 462, created 1571 rewards.
2022-04-09 19:04:14,189 - producer  - INFO - Run mode ONETIME satisfied. Terminating...
2022-04-09 19:04:14,189 - producer  - INFO - Sending KeyboardInterrupt signal.
2022-04-09 19:04:22,746 - MainThread - INFO - Interrupted.
2022-04-09 19:04:22,746 - MainThread - INFO - TRD is shutting down...
2022-04-09 19:04:22,746 - MainThread - INFO - --------------------------------------------------------
2022-04-09 19:04:22,747 - MainThread - INFO - Sensitive operations are in progress!
2022-04-09 19:04:22,747 - MainThread - INFO - Please wait while the application is being shut down!
2022-04-09 19:04:22,747 - MainThread - INFO - --------------------------------------------------------
2022-04-09 19:04:22,747 - MainThread - INFO - Lock file removed!
2022-04-09 19:04:41,480 - consumer0 - INFO - Operation op3DA5U2YgHBSVAsbQo58yZoujdZyz8uHPoYsvZT91MBVdWDUP5 is included
2022-04-09 19:04:41,483 - consumer0 - INFO - Payment of batch 1 succeeded in 1 attempt(s)
2022-04-09 19:04:41,484 - consumer0 - INFO - Payment of batch 2 started
... (TRD continues)

@ericlavoie
Copy link
Contributor

It's normal there are a few thread/process that must end before it does shut down. This is a message from the main thread/process telling you the shutdown command is received and processing. It will close once the child process are ended, do not kill it. It probably either finishing it's batch payments or waiting for payment confirmation. This is correct, not a bug.

@utdrmac
Copy link
Contributor Author

utdrmac commented Apr 9, 2022

Yes, I completely understand what is happening. The feature request (FR) is to remove the threaded nature as it is completely unnecessary and outputs confusing information.

@ericlavoie
Copy link
Contributor

You want a redesign that is not threaded because the log is too hard to understand? Is that correct?

@jdsika
Copy link
Contributor

jdsika commented Apr 9, 2022

Maybe improve the log messages and see if there are actual race conditions. Have a second pair of eyes on the threading. Rebuilding the whole thing? .... mäh... IMO it is working as intended with just some confusing output

@ericlavoie
Copy link
Contributor

ericlavoie commented Apr 9, 2022

I agree with jdiska, the output is only confusing if you do not understand the threaded nature, maybe a section to explain it. Also some better output message in the log.
Something like "mainthread received shutdown commmand, waiting on child threads to complete before shutting down". Would be more descriptive.

@utdrmac
Copy link
Contributor Author

utdrmac commented Apr 10, 2022

You want a redesign that is not threaded because the log is too hard to understand? Is that correct?

I never said hard to understand. I said it was confusing to state the program was exiting/shutting down, when in fact it could be 10m until it stops. I'm also saying that the threaded nature provides absolutely zero benefit to the application's performance as a whole and only adds unnecessary code complexity. I'd estimate a good 500-900 lines of code could be removed and the performance/behavior would be exactly the same.

Changing the messaging would be a good band-aid so that it's no longer confusing.

@ericlavoie
Copy link
Contributor

Actually if more than one ops occurs during that span, it does make sense. From what I saw of the code, it would be a fair bit of redesign. There are IMO benefits to the multi threaded design in the way it works. All because the shutdown occurs 10 minutes after it says it is shutting down. Is it overly complex, yeah I agree. But a redesign would be as complex and doing sequential with no thread will bring problems of it's own.

@nicolasochem
Copy link
Contributor

👍 I never understood the benefits of producer/consumer. It's like we are forcing ourselves to do it async processing in what is fundamentally a synchronous path. Step 1: calculate the rewards. Step 2: pay them. Why do we have this in the first place?

I could see paying several cycles in a row, in this case you calculate all cycles in a sequence and you pay all batches in a sequence, in parallel. So you can save time here. But what's the point? Paying several cycles in a row is dangerous and unusual, not something we want to make faster.

@ericlavoie
Copy link
Contributor

Maybe it would male sense, if you ran payment for more than one baker on one system (it does not) It could be the reason it was designed that way.

@nicolasochem
Copy link
Contributor

@habanoz can you please provide historical context on this? Why do we have a producer and a consumer?

@habanoz
Copy link
Contributor

habanoz commented Apr 12, 2022

@habanoz can you please provide historical context on this? Why do we have a producer and a consumer?

It was designed this way from the very beginning. TRD is not designed solely for single baker single cycle payments. Multiple bakers support was on the road map but eventually had to be dropped. (see #15 )

P/C is preferred due to its extensible and scalable nature. Another benefit is concern separation:

Producers
As far as I can remember there are three producers:

  1. Reward producer (aka payment producer): There can be more than one pending cycle to pay even for a single baker. Reward calculation can be slow and may benefit from asynchronous behavior.
  2. Retry producer: This one runs parallel to the reward producer and tries to complete failed payments.
  3. Batch producer: This one is provided as part of the batch payment utility.

Consumer
There is only one consumer. Payment consumer only knows how to pay. It is only interested in addresses and amounts. It produces a report of payments attempted.

Regards

nicolasochem added a commit that referenced this issue Aug 22, 2023
We added code in #653 to have TRD send a non-zero exit status when
anything wrong happens - including payout failures, or (relevant here)
keyboard interrupts.

However, it turns out that when using TRD in runmode -2, -3, -4, the
regular course of operations is to have the producer thread send a
KeyboardInterrupt when done... which is then treated as error!

This is done with `_thread.interrupt_main()`. It's bad: humans send
interrupts, programs should not. Instead, I'm suggesting that the
producer use `SIGUSR1` to trigger a normal interruption.

This triggers the same code path as today, just with an exit status of
SUCCESS. So I expect that the main thread will still wait for the
consumer, and it won't actually exit before payouts are done, if there
are any.

I'll deploy in a small baker and see if that's true. I have already
verified that when no payouts are due, exit code is now zero.

For the record, I still believe this entire producer/consumer logic
should be thrown away and we should simply do things in order
(calculate, then pay). I've ranted about this before (#491).
jdsika added a commit that referenced this issue Feb 7, 2024
* fix #678 - Fix single-shot mode always exiting in error (#679)
   * We added code in #653 to have TRD send a non-zero exit status when anything wrong happens - including payout failures, or (relevant here) keyboard interrupts.
   * However, it turns out that when using TRD in runmode -2, -3, -4, the regular course of operations is to have the producer thread send a KeyboardInterrupt when done... which is then treated as error!
   * This is done with `_thread.interrupt_main()`. It's bad: humans send interrupts, programs should not. Instead, I'm suggesting that the producer use `SIGUSR1` to trigger a normal interruption.
   * This triggers the same code path as today, just with an exit status of SUCCESS. So I expect that the main thread will still wait for the consumer, and it won't actually exit before payouts are done, if there are any.
   * For the record, I still believe this entire producer/consumer logic should be thrown away and we should simply do things in order (calculate, then pay). I've ranted about this before (#491).
* catch the consumer failures and fail the main process accordingly
* Also propagate error for producer failures (e.g. tzkt unresponsive)
* Contributor: nicolasochem, Effort=3h
* Reviewer: jdsika, Effort=0.5h
---------

Co-authored-by: Carlo van Driesten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants