Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: tasks container gets stuck on updating remote instances nightly #5112

Open
5 tasks done
Illecors opened this issue Oct 9, 2024 · 8 comments
Open
5 tasks done
Labels
bug Something isn't working

Comments

@Illecors
Copy link

Illecors commented Oct 9, 2024

Requirements

  • Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a single bug? Do not put multiple bugs in one issue.
  • Do you agree to follow the rules in our Code of Conduct?
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

Every midnight the scheduled daily tasks kick off and get stuck at the same spot. Per @Nutomic suggestion I have patched the scheduler to be more verbose. Attaching the diff, the log at midnight and the log at point of getting stuck in the technical details field.

The tasks container stays stuck indefinitely. In this particular case I have restarted it after ~8 hours, but I had it stuck in this state for 5 days when I was busy and it just wouldn't come back into action on its own.

I'm also aware from the chat on matrix that nobody else seems be experiencing this.

Steps to Reproduce

  1. Run scheduled tasks in a separate container
  2. Wait till midnight on the server timezone for daily tasks to kick in
  3. Get stuck

Technical Details

diff --git a/src/scheduled_tasks.rs b/src/scheduled_tasks.rs
index 2885b5f74..77a96af72 100644
--- a/src/scheduled_tasks.rs
+++ b/src/scheduled_tasks.rs
@@ -101,8 +101,11 @@ pub async fn setup(context: LemmyContext) -> LemmyResult<()> {

   // Manually run the scheduler in an event loop
   loop {
+    info!("Beginning of loop");
     scheduler.run_pending().await;
+    info!("Middle of loop");
     tokio::time::sleep(Duration::from_millis(1000)).await;
+    info!("End of loop");
   }
 }
lemmy-tasks-blue-1  | 2024-10-08T23:59:54.456075Z  INFO lemmy_server::scheduled_tasks: End of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:54.456231Z  INFO lemmy_server::scheduled_tasks: Beginning of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:54.456416Z  INFO lemmy_server::scheduled_tasks: Middle of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:55.458001Z  INFO lemmy_server::scheduled_tasks: End of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:55.458069Z  INFO lemmy_server::scheduled_tasks: Beginning of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:55.458160Z  INFO lemmy_server::scheduled_tasks: Middle of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:56.459704Z  INFO lemmy_server::scheduled_tasks: End of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:56.459761Z  INFO lemmy_server::scheduled_tasks: Beginning of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:56.459853Z  INFO lemmy_server::scheduled_tasks: Middle of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:57.460583Z  INFO lemmy_server::scheduled_tasks: End of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:57.460671Z  INFO lemmy_server::scheduled_tasks: Beginning of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:57.460759Z  INFO lemmy_server::scheduled_tasks: Middle of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:58.462408Z  INFO lemmy_server::scheduled_tasks: End of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:58.462486Z  INFO lemmy_server::scheduled_tasks: Beginning of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:58.462565Z  INFO lemmy_server::scheduled_tasks: Middle of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:59.463334Z  INFO lemmy_server::scheduled_tasks: End of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:59.463420Z  INFO lemmy_server::scheduled_tasks: Beginning of loop
lemmy-tasks-blue-1  | 2024-10-08T23:59:59.463545Z  INFO lemmy_server::scheduled_tasks: Middle of loop
lemmy-tasks-blue-1  | 2024-10-09T00:00:00.464812Z  INFO lemmy_server::scheduled_tasks: End of loop
lemmy-tasks-blue-1  | 2024-10-09T00:00:00.464941Z  INFO lemmy_server::scheduled_tasks: Beginning of loop
lemmy-tasks-blue-1  | 2024-10-09T00:00:00.469886Z  INFO lemmy_server::scheduled_tasks: Updating active site and community aggregates ...
lemmy-tasks-blue-1  | 2024-10-09T00:00:00.471131Z DEBUG tokio_postgres::query: executing statement s3 with parameters: [ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 23, array_oid: 1007 })), Some([0, 0, 0, 1]))]
lemmy-tasks-blue-1  | 2024-10-09T00:00:00.472182Z  INFO lemmy_server::scheduled_tasks: Updating hot ranks for all history...
lemmy-tasks-blue-1  | 2024-10-09T00:00:00.474675Z  INFO lemmy_server::scheduled_tasks: Overwriting deleted posts...
lemmy-tasks-blue-1  | 2024-10-09T00:11:14.811490Z  INFO lemmy_server::scheduled_tasks: Finished process_hot_ranks_in_batches execution for community (processed 18 rows)
lemmy-tasks-blue-1  | 2024-10-09T00:11:14.811643Z  INFO lemmy_server::scheduled_tasks: Finished hot ranks update!
lemmy-tasks-blue-1  | 2024-10-09T00:11:14.812197Z DEBUG tokio_postgres::query: executing statement s3 with parameters: [ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 23, array_oid: 1007 })), Some([0, 0, 0, 1]))]
lemmy-tasks-blue-1  | 2024-10-09T00:11:14.813463Z DEBUG tokio_postgres::query: executing statement s213 with parameters: [ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 1186, array_oid: 1187 })), Some([0, 0, 0, 0, 35, 195, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0]))]
lemmy-tasks-blue-1  | 2024-10-09T00:11:14.814833Z  INFO lemmy_server::scheduled_tasks: Done.
lemmy-tasks-blue-1  | 2024-10-09T00:11:14.814954Z DEBUG tokio_postgres::query: executing statement s3 with parameters: [ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 23, array_oid: 1007 })), Some([0, 0, 0, 1]))]
lemmy-tasks-blue-1  | 2024-10-09T00:11:14.823327Z DEBUG tokio_postgres::prepare: preparing query s6000 with types [Text, Text, Bool, Interval, Text]: UPDATE "post" SET "body" = $1, "name" = $2 WHERE ((("post"."deleted" = $3) AND ("post"."updated" < (CURRENT_TIMESTAMP - $4))) AND ("post"."body" != $5))
lemmy-tasks-blue-1  | 2024-10-09T00:11:14.825148Z DEBUG tokio_postgres::query: executing statement s6000 with parameters: [ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 25, array_oid: 1009 })), Some([42, 80, 101, 114, 109, 97, 110, 101, 110, 116, 108, 121, 32, 68, 101, 108, 101, 116, 101, 100, 42])), ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 25, array_oid: 1009 })), Some([42, 80, 101, 114, 109, 97, 110, 101, 110, 116, 108, 121, 32, 68, 101, 108, 101, 116, 101, 100, 42])), ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 16, array_oid: 1000 })), Some([1])), ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 1186, array_oid: 1187 })), Some([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1])), ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 25, array_oid: 1009 })), Some([42, 80, 101, 114, 109, 97, 110, 101, 110, 116, 108, 121, 32, 68, 101, 108, 101, 116, 101, 100, 42]))]
lemmy-tasks-blue-1  | 2024-10-09T00:11:15.339651Z  INFO lemmy_server::scheduled_tasks: Done.
lemmy-tasks-blue-1  | 2024-10-09T00:11:15.339703Z  INFO lemmy_server::scheduled_tasks: Overwriting deleted comments...
lemmy-tasks-blue-1  | 2024-10-09T00:11:15.342155Z DEBUG tokio_postgres::prepare: preparing query s6001 with types [Text, Bool, Interval, Text]: UPDATE "comment" SET "content" = $1 WHERE ((("comment"."deleted" = $2) AND ("comment"."updated" < (CURRENT_TIMESTAMP - $3))) AND ("comment"."content" != $4))
lemmy-tasks-blue-1  | 2024-10-09T00:11:15.343474Z DEBUG tokio_postgres::query: executing statement s6001 with parameters: [ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 25, array_oid: 1009 })), Some([42, 80, 101, 114, 109, 97, 110, 101, 110, 116, 108, 121, 32, 68, 101, 108, 101, 116, 101, 100, 42])), ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 16, array_oid: 1000 })), Some([1])), ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 1186, array_oid: 1187 })), Some([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1])), ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 25, array_oid: 1009 })), Some([42, 80, 101, 114, 109, 97, 110, 101, 110, 116, 108, 121, 32, 68, 101, 108, 101, 116, 101, 100, 42]))]
lemmy-tasks-blue-1  | 2024-10-09T00:11:27.800172Z  INFO lemmy_server::scheduled_tasks: Done.
lemmy-tasks-blue-1  | 2024-10-09T00:11:27.801892Z DEBUG tokio_postgres::query: executing statement s3 with parameters: [ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 23, array_oid: 1007 })), Some([0, 0, 0, 1]))]
lemmy-tasks-blue-1  | 2024-10-09T00:11:27.804195Z DEBUG tokio_postgres::prepare: preparing query s6002 with types [Bool, Bool]: UPDATE "person" SET "banned" = $1 WHERE (("person"."banned" = $2) AND ("person"."ban_expires" < CURRENT_TIMESTAMP))
lemmy-tasks-blue-1  | 2024-10-09T00:11:27.805885Z DEBUG tokio_postgres::query: executing statement s6002 with parameters: [ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 16, array_oid: 1000 })), Some([0])), ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 16, array_oid: 1000 })), Some([1]))]
lemmy-tasks-blue-1  | 2024-10-09T00:11:29.246266Z DEBUG tokio_postgres::query: executing statement s117 with parameters: []
lemmy-tasks-blue-1  | 2024-10-09T00:11:29.248486Z DEBUG tokio_postgres::query: executing statement s3 with parameters: [ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 23, array_oid: 1007 })), Some([0, 0, 0, 1]))]
lemmy-tasks-blue-1  | 2024-10-09T00:11:29.255067Z DEBUG tokio_postgres::query: executing statement s122 with parameters: [ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 1186, array_oid: 1187 })), Some([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0]))]
lemmy-tasks-blue-1  | 2024-10-09T00:11:29.263052Z  INFO lemmy_server::scheduled_tasks: Done.
lemmy-tasks-blue-1  | 2024-10-09T00:11:29.265113Z  INFO lemmy_server::scheduled_tasks: Updating instances software and versions...
lemmy-tasks-blue-1  | 2024-10-09T00:11:29.265654Z DEBUG tokio_postgres::query: executing statement s3 with parameters: [ToSqlHelper(PgTypeMetadata(Ok(InnerPgTypeMetadata { oid: 23, array_oid: 1007 })), Some([0, 0, 0, 1]))]
lemmy-tasks-blue-1  | 2024-10-09T00:11:29.271431Z DEBUG tokio_postgres::prepare: preparing query s6003: SELECT "instance"."id", "instance"."domain", "instance"."published", "instance"."updated", "instance"."software", "instance"."version" FROM "instance"
lemmy-tasks-blue-1  | 2024-10-09T00:11:29.280640Z DEBUG tokio_postgres::query: executing statement s6003 with parameters: []
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.123936Z DEBUG HTTP request{http.method=GET http.scheme=https http.host=secluded.ch net.host.port=0 otel.kind="client" otel.name=GET}: reqwest::connect: starting new connection: https://secluded.ch/
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.129713Z DEBUG hyper::client::connect::dns: resolving host="secluded.ch"
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.253521Z DEBUG HTTP request{http.method=GET http.scheme=https http.host=secluded.ch net.host.port=0 otel.kind="client" otel.name=GET}: hyper::client::connect::http: connecting to 185.173.128.28:443
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.277869Z DEBUG HTTP request{http.method=GET http.scheme=https http.host=secluded.ch net.host.port=0 otel.kind="client" otel.name=GET}: hyper::client::connect::http: connected to 185.173.128.28:443
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.506587Z DEBUG hyper::proto::h1::io: flushed 140 bytes
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.550820Z DEBUG hyper::proto::h1::io: parsed 19 headers
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.551424Z DEBUG hyper::proto::h1::conn: incoming body is content-length (111 bytes)
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.553105Z DEBUG hyper::proto::h1::conn: incoming body completed
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.554046Z DEBUG HTTP request{http.method=GET http.scheme=https http.host=secluded.ch net.host.port=0 otel.kind="client" otel.name=GET}: hyper::client::pool: pooling idle connection for ("https", secluded.ch)
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.577242Z DEBUG HTTP request{http.method=GET http.scheme=https http.host=secluded.ch net.host.port=0 otel.kind="client" otel.name=GET}: hyper::client::pool: reuse idle connection for ("https", secluded.ch)
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.604265Z DEBUG hyper::proto::h1::io: flushed 132 bytes
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.616997Z DEBUG hyper::proto::h1::io: parsed 19 headers
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.617054Z DEBUG hyper::proto::h1::conn: incoming body is content-length (258 bytes)
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.617136Z DEBUG hyper::proto::h1::conn: incoming body completed
lemmy-tasks-blue-1  | 2024-10-09T00:11:30.617359Z DEBUG HTTP request{http.method=GET http.scheme=https http.host=secluded.ch net.host.port=0 otel.kind="client" otel.name=GET}: hyper::client::pool: pooling idle connection for ("https", secluded.ch)

Version

0.19.5

Lemmy Instance URL

lemmy.cafe

@Illecors Illecors added the bug Something isn't working label Oct 9, 2024
@Nutomic
Copy link
Member

Nutomic commented Oct 10, 2024

So the loop stops at midnight, but the process keeps handling HTTP requests? And there is no error message in between? Thats very weird, I cant think of any reason why this would happen.

@Illecors
Copy link
Author

This is just the tasks container - does it actually handle any requests?

I have - for now - set up a cron that restarts the tasks container at 00:30 just to keep the other tasks applied, but this means that my instance always thinks 98% of the fediverse is dead as no other interaction happens to keep them "updated".

@Nutomic
Copy link
Member

Nutomic commented Oct 10, 2024

No I misread, the second log shows some outgoing HTTP requests. Seems that the loop is stopped, but some scheduled tasks are still active at that point. The logs you shared are missing the most relevant part between 0:00 and 0:11, can you share that also?

@Illecors
Copy link
Author

Attaching the full log from midnight till it stops printing anything at all
midnight.log

@Nutomic
Copy link
Member

Nutomic commented Oct 10, 2024

So it prints "beginning of loop", but not "middle of loop" which indicates that the problem is with scheduler.run_pending().await;. Not sure what that could be, and unfortunately the library hasnt been maintained for over 2 years.

@dessalines
Copy link
Member

dessalines commented Oct 15, 2024

The only thing I can think of, is that you might be running lemmy manually with an older rust version that doesn't play well with the library or async. Or there is some misconfigured network setting causing the fetch to stall out.

@Illecors
Copy link
Author

It's literally whatever the dockerfile contains. It's also the same with the official 0.19.5 as well as 0.19.6-beta.9.

@dessalines
Copy link
Member

Hrm... if its docker then I can only think that its some network issue causing the job to stall out. It looks like we have reqwest timeouts on that client also though. Not really sure what it could be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants