Resource limit #55

mrputty · 2020-10-18T20:55:32Z

Before going further with this, I thought I'd post a draft for design review. Does what I have here make sense? It meets all my requirements, and though this isn't fully backwards compatible, the change required to integrate this into my project were straightforward.

Suggested order for review:

src/OddJobs/Types.hs
src/OddJobs/ConfigBuilder.hs
src/OddJobs/Job.hs
Everything else

Update 2020-10-20: I decided to move forward with some additional work, so I'll keep pushing commits to this review as I go. @saurabhnanda, I've merged your fix/tests branch from #4 into this branch so I could get the existing tests updated for my changes.

LogJobStart was not being logged - FIXED

…saurabhnanda#38) Many small changes were required, but the majority were mechanical changes. With the exception of the `createJob`/`scheduleJob` family of function, this should be mostly backwards compatible with existing user code that does not use "internal" functions. Further discussion about how to handle backwards compatibility is needed. Note also that this commit by itself is NOT compatible with existing databases. A future commit will address migration from existing to new database definitions. The intention is for this feature to have a negligable impact on performance when the resource limiting features are not being used, and only a minor impact when they are used. However, performance testing has not yet been done. Material changes inclue: - New resource table - New `resource_id` column in to job table - New `cfgDefaultResourceLimit` configuration value - New `createResourceJob` and `scheduleResourceJob` functions - New `TableNames` type to carry the now multiple values needed - Updated existing functions and queries to account for these changes - Updated dequeueing queries to implement resource concurrency limits - New `DevelDb` module to the `devel` executable to help ad hoc testing TODO: - Resource record management functions - Migration functionality (need to discuss database versioning) - Unit tests - Performance tests - Documentation updates - Resource management UI

mrputty · 2020-10-18T20:57:39Z

dev/DevelDb.hs

@@ -0,0 +1,49 @@
+{-# LANGUAGE DeriveAnyClass #-}


This stuff was helpful for me during testing so I thought I'd leave it here.

mrputty · 2020-10-18T20:59:48Z

src/OddJobs/Types.hs

@@ -35,8 +35,22 @@ import Control.Monad.Logger (LogLevel)
 -- @
 type TableName = PGS.Query


The changes for #53 will need to be integrated here.

mrputty · 2020-10-18T21:02:45Z

src/OddJobs/Types.hs

-pgEventName :: TableName -> Query
-pgEventName tname = "job_created_" <> tname
+-- | Specifies the table names used for job management.
+data TableNames = TableNames


This seemed the obvious name, but if there are there any future features that might require more database configuration than just table names, it might be worthwhile to make this name more general now.

Simpler to add this in the top-level Config object with cfgJobTableName being mandatory, and cfgJobResourceTableName being a Maybe x?

mrputty · 2020-10-18T21:05:14Z

src/OddJobs/Types.hs

 newtype JobRunnerName = JobRunnerName { unJobRunnerName :: Text } deriving (Eq, Show, FromField, ToField, Generic, ToJSON, FromJSON)

+newtype ResourceId = ResourceId { unResourceId :: Text } deriving (Eq, Show, FromField, ToField, Generic, ToJSON, FromJSON)


TODO: Add a comment on the type explaining its purpose as an opaque identifier.

Style nitpick - I've started preferring rawX instead of unX, so rawResourceId

mrputty · 2020-10-18T21:11:20Z

src/OddJobs/ConfigBuilder.hs

            , cfgOnJobTimeout = (const $ pure ())
            , cfgConcurrencyControl = ccControl
+            , cfgDefaultResourceLimit = 1


My thought is that 1 (meaning jobs with a resource with no limit would "allow execution with minimal impact") is a reasonable default. The other more or less reasonable defaults are 0 (meaning these jobs would never run) and maxBound (meaning there would effectively be no limit to the concurrency for these jobs). Thoughts?

mrputty · 2020-10-18T21:13:23Z

src/OddJobs/ConfigBuilder.hs

@@ -63,14 +63,15 @@ mkConfig logger tname dbpool ccControl jrunner configOverridesFn =
            , cfgDbPool = dbpool
            , cfgOnJobStart = (const $ pure ())
            , cfgDefaultMaxAttempts = 10
-            , cfgTableName = tname
+            , cfgTableNames = simpleTableNames tname


This allows backwards compatibility, assuming that for simple use cases, users don't plan on using resources and don't care about controlling every table name.

mrputty · 2020-10-18T21:24:49Z

src/OddJobs/Job.hs

+  "update " <> tnJob tnames <>
+  "\n  set status = ?, locked_at = now(), locked_by = ?, attempts=attempts+1" <>
+  "\nwhere id in ( select id from " <> tnJob tnames <> " j_out" <>
+  "\n              where (run_at<=now() AND (status in ? OR (status = ? and locked_at < now() - ? * interval '1 second')))" <>


I am strongly conditioned, by painful experience, to as much as possible do datetime evaluation either on the database client or on the database server, but not both. Here, I've moved all the datetime logic into the query.

I agree about using the DB as the single source of truth for current_timetamp.

mrputty · 2020-10-18T21:30:00Z

src/OddJobs/Job.hs

+  "\n                                  and j_in.locked_at >= now() - ? * interval '1 second' )" <>
+  "\n                            < coalesce(( select resource_limit from " <> tnResource tnames <> " r" <>
+  "\n                                         where r.resource_id = j_out.resource_id ), ?)" <>
+  "\n                    end" <>


This is the core of the feature. You can see that in the case where there is no resource, the performance should not really be affected. The sum and the resource limit are both supported by indices, so hopefully, they are also performant, but testing is needed.

By doing this all within the query, there is no need for any additional "bookkeeping" (and no need for the transaction management issues that would bring). This is why I was adamant about doing this inside the library. Of course all of this could be externalized, but, like transaction management, that could be rather disruptive.

Let's discuss just this SQL query first. The other stuff is about naming, style nit-picks, etc.

mrputty · 2020-10-18T21:34:34Z

src/OddJobs/Job.hs

@@ -532,21 +557,33 @@ jobEventListener :: (HasJobRunner m)
 jobEventListener = do
  log LevelInfo $ LogText "Starting the job monitor via LISTEN/NOTIFY..."
  pool <- getDbPool
-  tname <- getTableName


Here too, I moved the datetime logic into the database.

mrputty

Suggested order for review:

src/OddJobs/Types.hs
src/OddJobs/ConfigBuilder.hs
src/OddJobs/Job.hs
Everything else

* setup a simple CI using GitHub actions * describe a simple dev setup process * build branches with pull requests against master, run build on multiple operating systems * trigger CI for all branches * always checkout the ref * build only on pull_request events not to duplicate builds * build only on Ubuntu

Because PostgreSQL limits notification payloads to 8000 bytes by default, passing the full job record in the notification artificially limits the job payload size. By passing only required fields in the notification, this limitation is removed.

…saurabhnanda#38) Many small changes were required, but the majority were mechanical changes. With the exception of the `createJob`/`scheduleJob` family of function, this should be mostly backwards compatible with existing user code that does not use "internal" functions. Further discussion about how to handle backwards compatibility is needed. Note also that this commit by itself is NOT compatible with existing databases. A future commit will address migration from existing to new database definitions. The intention is for this feature to have a negligable impact on performance when the resource limiting features are not being used, and only a minor impact when they are used. However, performance testing has not yet been done. Material changes inclue: - New resource table - New `resource_id` column in to job table - New `cfgDefaultResourceLimit` configuration value - New `createResourceJob` and `scheduleResourceJob` functions - New `TableNames` type to carry the now multiple values needed - Updated existing functions and queries to account for these changes - Updated dequeueing queries to implement resource concurrency limits - New `DevelDb` module to the `devel` executable to help ad hoc testing TODO: - Resource record management functions - Migration functionality (need to discuss database versioning) - Unit tests - Performance tests - Documentation updates - Resource management UI

Because there are unit tests that delete tabes, and now that there are multiple table names in play, it seemed useful to explicitly implement a deletion function. Becuase this deserved a comment, I also added comments to the other externally useful functions in this module, and rearranged the file a bit to highlight these defintions.

Now, if the environment variable "ODD_JOBS_TEST_DB_CONNECT" is set, it will be used as a connection string for unit test. If not, the existing hard-coded default connection will be used. This is useful when developing in a containerized environment.

This includes the changes that are on `fix/tests` branch.

saurabhnanda

Do we have an explain analyze for the before/after versions of jobPollingSql? Let's start with the perf impact there.

saurabhnanda · 2020-10-21T16:59:18Z

src/OddJobs/Types.hs

-pgEventName :: TableName -> Query
-pgEventName tname = "job_created_" <> tname
+-- | Specifies the table names used for job management.
+data TableNames = TableNames


Simpler to add this in the top-level Config object with cfgJobTableName being mandatory, and cfgJobResourceTableName being a Maybe x?

saurabhnanda · 2020-10-21T17:00:25Z

src/OddJobs/Types.hs

 newtype JobRunnerName = JobRunnerName { unJobRunnerName :: Text } deriving (Eq, Show, FromField, ToField, Generic, ToJSON, FromJSON)

+newtype ResourceId = ResourceId { unResourceId :: Text } deriving (Eq, Show, FromField, ToField, Generic, ToJSON, FromJSON)


Style nitpick - I've started preferring rawX instead of unX, so rawResourceId

saurabhnanda · 2020-10-21T17:00:53Z

src/OddJobs/Types.hs

@@ -158,9 +172,10 @@ instance FromJSON Status where
      Left e -> fail e
      Right r -> pure r

-
 newtype JobRunnerName = JobRunnerName { unJobRunnerName :: Text } deriving (Eq, Show, FromField, ToField, Generic, ToJSON, FromJSON)


TODO: Change to rawJobRunnerName

saurabhnanda · 2020-10-21T17:03:17Z

src/OddJobs/Types.hs

+    -- that unlike the limit of `cfgConcurrencyControl`, this IS a global limit
+    -- across the entire job-queue. This means that this limit will be applied
+    -- regardless of what machine the jobs are running on.
+  , cfgDefaultResourceLimit :: Int


Another thought:

cfgPerResourceConcurrencySettings :: Maybe (TableName, Int)

saurabhnanda · 2020-10-21T17:06:55Z

src/OddJobs/Job.hs

+  "update " <> tnJob tnames <>
+  "\n  set status = ?, locked_at = now(), locked_by = ?, attempts=attempts+1" <>
+  "\nwhere id in ( select id from " <> tnJob tnames <> " j_out" <>
+  "\n              where (run_at<=now() AND (status in ? OR (status = ? and locked_at < now() - ? * interval '1 second')))" <>


I agree about using the DB as the single source of truth for current_timetamp.

saurabhnanda · 2020-10-21T17:16:00Z

src/OddJobs/Job.hs

+  "\n                                  and j_in.locked_at >= now() - ? * interval '1 second' )" <>
+  "\n                            < coalesce(( select resource_limit from " <> tnResource tnames <> " r" <>
+  "\n                                         where r.resource_id = j_out.resource_id ), ?)" <>
+  "\n                    end" <>


Let's discuss just this SQL query first. The other stuff is about naming, style nit-picks, etc.

saurabhnanda · 2020-10-21T17:20:08Z

Apart from the perf concerns, I was also wondering if there is any way we can support a simpler use-case: concurrency limits on a job-type basis. With this PR as-is, users who have a simpler requirement of throttling certain job-types will have to take an additional step of attaching a resource-id to each job.

mrputty · 2020-10-22T21:49:09Z

(@saurabhnanda, I missed this earlier, and I don't know how to reply to this inline above, so let's move more general stuff here.)

Apart from the perf concerns, I was also wondering if there is any way we can support a simpler use-case: concurrency limits on a job-type basis. With this PR as-is, users who have a simpler requirement of throttling certain job-types will have to take an additional step of attaching a resource-id to each job.

Yes, there would have to be an explicit resource created for each throttled job type. Where would the resource/job limit be specified and stored if not for that? I suppose you could have a helper function that would allow a user to declare a job-level limit in code, I guess as part of their code that manages migrations?

Hmm, I'm having trouble wrapping my head around how this would work. Could you flesh out the requirements of what you would like to see?

mrputty · 2020-11-05T18:06:08Z

I have spent a bit of time working through validating the performance of the core query changes. I have an updated query and will post some analysis soon. Unfortunately, I've been distracted by other work and looming deadlines. I will get back to this as soon as I can.

This is based on the system in saurabhnanda#55, and aims to address saurabhnanda#38, where jobs need to have some limited access to resources that controls how many can run simultaneously. Unlike that PR, this implements a system where jobs can require access to 0 or more resources, with different amounts of usage. This is because of a business need for jobs to be able to require multiple resources. The implementation is intended to have no performance impact on existing code wherever the user has not opted in to resource-based concurrency. This is done by having parallel implementations wherever necessary that switch based on the concurrency control chosen. This subsumes and replaces the job-type-based concurrency control, because it is possible to model that system using job types as resources.

This is based on the system in saurabhnanda#55, and aims to address saurabhnanda#38, where jobs need to have some limited access to resources that controls how many can run simultaneously. Unlike that PR, this implements a system where jobs can require access to 0 or more resources, with different amounts of usage. This is because of a business need for jobs to be able to require multiple resources. The implementation is intended to have no performance impact on existing code wherever the user has not opted in to resource-based concurrency. This is done by having parallel implementations wherever necessary that switch based on the concurrency control chosen.

saurabhnanda and others added 4 commits October 16, 2020 23:34

getting the test suites to work again

240cea6

got more tests to pass

1ff4676

LogJobStart was not being logged - FIXED

Updated .gitignore to ignore some Cabal artifacts

3e2186e

mrputty commented Oct 18, 2020

View reviewed changes

mrputty mentioned this pull request Oct 18, 2020

Proper test suite #4

Open

kanagarajkm and others added 11 commits October 19, 2020 16:07

Fix import path in the doc (saurabhnanda#49)

05bd9cc

Updated .gitignore to ignore some Cabal artifacts

265d06d

Added mkResourceConfig for use with resource-limited jobs

1a1c56b

Updated tests for resource-limited jobs

7c727b0

Organized imports in test/Test.hs

0b6738f

Merge branch 'resource-limit-tests' into resource-limit

984da57

This includes the changes that are on `fix/tests` branch.

mrputty force-pushed the resource-limit branch from ba55eca to 984da57 Compare October 20, 2020 19:08

saurabhnanda reviewed Oct 21, 2020

View reviewed changes

saurabhnanda mentioned this pull request Nov 11, 2020

fix createJobTable failure when table name contains dot #53

Merged

saurabhnanda mentioned this pull request Jul 7, 2021

Add concurrency control based on job type #71

Closed

ivb-supercede mentioned this pull request Sep 2, 2021

Add resource-based concurrency control SupercedeTech/odd-jobs#4

Merged

ivb-supercede mentioned this pull request Sep 2, 2021

Add resource-based concurrency control #80

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource limit #55

Resource limit #55

mrputty commented Oct 18, 2020 •

edited

Loading

mrputty Oct 18, 2020

mrputty Oct 18, 2020

mrputty Oct 18, 2020

saurabhnanda Oct 21, 2020

mrputty Oct 18, 2020

saurabhnanda Oct 21, 2020

mrputty Oct 18, 2020

mrputty Oct 18, 2020

mrputty Oct 18, 2020

saurabhnanda Oct 21, 2020

mrputty Oct 18, 2020

saurabhnanda Oct 21, 2020

mrputty Oct 18, 2020

mrputty left a comment

saurabhnanda left a comment

saurabhnanda Oct 21, 2020

saurabhnanda Oct 21, 2020

saurabhnanda Oct 21, 2020

saurabhnanda Oct 21, 2020

saurabhnanda Oct 21, 2020

saurabhnanda Oct 21, 2020

saurabhnanda commented Oct 21, 2020

mrputty commented Oct 22, 2020

mrputty commented Nov 5, 2020

		@@ -35,8 +35,22 @@ import Control.Monad.Logger (LogLevel)
		-- @
		type TableName = PGS.Query

		newtype JobRunnerName = JobRunnerName { unJobRunnerName :: Text } deriving (Eq, Show, FromField, ToField, Generic, ToJSON, FromJSON)

		newtype ResourceId = ResourceId { unResourceId :: Text } deriving (Eq, Show, FromField, ToField, Generic, ToJSON, FromJSON)

Resource limit #55

Are you sure you want to change the base?

Resource limit #55

Conversation

mrputty commented Oct 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrputty left a comment

Choose a reason for hiding this comment

saurabhnanda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saurabhnanda commented Oct 21, 2020

mrputty commented Oct 22, 2020

mrputty commented Nov 5, 2020

mrputty commented Oct 18, 2020 •

edited

Loading