Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursor Helper #491

Merged
merged 17 commits into from
Apr 5, 2024
Merged

Cursor Helper #491

merged 17 commits into from
Apr 5, 2024

Conversation

josephjclark
Copy link
Collaborator

@josephjclark josephjclark commented Mar 20, 2024

This PR adds a cursor() function designed to make the timestamp and pagination cursor pattern easier.

Closes #486

This approach leans heavily on input state and works best in v2.

See the docs PR at OpenFn/docs#465

General approach

Unlike v1, Lightning (and the CLI) give us good control of the input state. We need to use that.

So, if you want to run a job with the cursor in a specific place, just set the input to { cursor: '<date>' } and that will be respected by the job.

To make date management easier, you pass human friendly strings now, today, yesterday, n hours ago, n days ago, and start (the time at which the run started). Note that these will parse relative to system time - which in lightning's case probably means UTC.

Usage

Use the cursor function from common to set a cursor value on state.

cursor($.cursor, { defaultValue:  'today' })

This pattern will allow you to use the value on state if provided, but if no cursor is passed will default to "today".

You can update the cursor within the job by setting state.cursor. Or, if you want to use a natural language timestamp, use the cursor operation again.

cursor($.cursor, { defaultValue:  'today' })
fn(/* do something */)
cursor('now')

The value that gets written to state.cursor will be an actual ISO date, so when the next job runs, the cursor will be set to this start time.

Note that adaptors will need to be updated to export cursor from common.

Options

The second argument accepts options:

  • options.defaultValue is the value that will be used if the value argument is falsy
  • options.key lets you change the state key used (ie, use page instead of cursor)

Later we might introduce a timezone and a flag to disable natural language date parsing.

Usage examples

Basic usage: a cursor which starts from the start of the current day

cursor('today')
fn(() => { /* do the thing using state.cursor */ })

Typical usage: use the input state if provided, or else default to 'now'. At the end of the job, set the cursor to the start time, ready for the next run.

cursor($.cursor, { defaultValue: 'now' })
fn(() => { /* do the thing, use and set state.cursor */ })
cursor('start')

Pagination cursor which defaults to page 0

cursor($.cursor, { defaultValue: 0, key: 'page' })
fn(() => { /* do the thing, use and set state.page */ })

@josephjclark

This comment was marked as outdated.

@josephjclark josephjclark requested a review from mtuchi March 20, 2024 13:08
mtuchi

This comment was marked as resolved.

@josephjclark

This comment was marked as outdated.

@mtuchi
Copy link
Collaborator

mtuchi commented Apr 4, 2024

Hiya @josephjclark here is my feedback

1st. You need to expandReference on cursor value

cursor(state=> state.cursor || 'start'); Does not work because the value does not use expandReferences.

My quick fix was

 return state => {
    const [resolvedValue] = newExpandReferences(state, value);
    if (!cursorStart) {
      cursorStart = new Date().toISOString();
    }

    if (typeof resolvedValue === 'string') {
      state.cursor = parseDate(resolvedValue, cursorStart);
      console.log(`Setting cursor "${resolvedValue}" to ${state.cursor}`);
    } else {
      state.cursor = resolvedValue;
      console.log(`Setting cursor to ${resolvedValue}`);
    }

    return state;
  };

2nd. Inaccurate cursor dates for some values

My test code

cursor('start');
cursor('now');
cursor('today');
cursor('yesterday');
cursor('1 hour ago');
cursor('2 hours ago');
cursor('3 hours ago');
cursor('72 hours ago');
cursor('1 day ago');
cursor('2 days ago');
cursor('3 days ago');
cursor('end');

CLI logs

Setting cursor "start" to 2024-04-04T07:54:46.484Z
Setting cursor "now" to 2024-04-04T07:54:46.484Z
Setting cursor "today" to 2024-04-03T21:00:00.000Z ❌ 
Setting cursor "yesterday" to 2024-04-02T21:00:00.000Z ❌ 
Setting cursor "1 hour ago" to 2024-04-04T06:54:46.485Z
Setting cursor "2 hours ago" to 2024-04-04T05:54:46.485Z
Setting cursor "3 hours ago" to 2024-04-04T04:54:46.485Z
Setting cursor "72 hours ago" to 2024-04-01T07:54:46.485Z
Setting cursor "1 day ago" to 2024-04-03T21:00:00.000Z
Setting cursor "2 days ago" to 2024-04-03T21:00:00.000Z ❌ 
Setting cursor "3 days ago" to 2024-04-03T21:00:00.000Z ❌ 
Setting cursor "end" to 2024-04-04T07:54:46.485Z

❌ Means inaccurate

3rd. Timezone ?

This is more of a questions + suggestion

  • What timezone is used ?
  • Can i change a timezone ? or more important one do we want this ability and why ?

It will be best to log the timezone as well because the timestamp does not match my local time so i am not sure which timezone the cursor is using

@josephjclark
Copy link
Collaborator Author

Thank you @mtuchi - I'll look into your feedback. But what I really need to know is: does this actually help anything?

Maybe I can fix the issues and we can try and test some existing cursor jobs with it, see whether it helps at all. The date conversions in the water-aid (?) workflow we were looking at seem pretty hairy - we need to work out how this function can help

@mtuchi
Copy link
Collaborator

mtuchi commented Apr 4, 2024

For some reason i am tempted to ask for support for a callback in a cursor function. I think there was a case in AKF workflow were i need to create a cursor at the end of upsert operation then post it to inbox. The triggered workflow was using that cursor to clear some data in salesforce.

I guess my point here is the cursor that was created after the upsert operation is not related to that workflow. So the final state state.cursor should not use that cursor. I should be able to specify something like state.clearRecordsCursor when i am setting the cursor for the clear records workflow

@mtuchi
Copy link
Collaborator

mtuchi commented Apr 4, 2024

@josephjclark I think it's useful because it just cut down these lines of code 👇🏽 into one line of code

fn(state => {
  const manualCursor = '2023-01-15';
  const cursor = manualCursor ? new Date(manualCursor) : new Date();

  console.log(cursor, 'Cursor');

  return { ...state, cursor };
});

To

cursor(state => state.cursor || '2023-01-15');

Which can even be something like this 👇🏽 if initial input is { "manualCursor": "2023-01-15"}

cursor(state => state.cursor || state.manualCursor);

I think using the cursor() helper is way cleaner.

@mtuchi
Copy link
Collaborator

mtuchi commented Apr 4, 2024

I just got another light bulk moment 💡, Sometimes i need to set the cursor to todays date or a specific date but i don't need the timestamp. So i usually have this pattern new Date().toISOString().slice(0, 10);. This will give me today's date in a format of YYYY-MM-DD

I was wondering if the cursor() function have an options params cursor(value, options. callback= s=> s) This way i can do something like this cursor('today', {format: 'YYY-MM-DD'})

@josephjclark
Copy link
Collaborator Author

josephjclark commented Apr 4, 2024

@mtuchi I think you're starting to get it! 💡

I'm adding an options object shortly, so you'll be able to do this (which is super readable)

cursor($.cursor, { defaultValue: '2023-01-15' });

I really don't think you need to explicitly use a manual cursor key. But sure, if you absolutely need it, that pattern works.

tbh I can't see why you'd need a callback - it's a synchronous function after all. But you can set or clear it at the end of the job:

cursor()

Instead of new Date().toISOString().slice(0, 10), why woudn;t you just do setCursor('today') or setCursor('2024-03-12')? Both should work (all Date.parse() formats should work)

@mtuchi
Copy link
Collaborator

mtuchi commented Apr 4, 2024

My wishful thinking, i am not sure if it's a good pattern but here is how i would use the callback

cursor('now', { format: 'YYY-MM-DD' });
cursor(
  state => state?.manualCursor || state.cursor,
  { format: 'YYY-MM-DD' },
  state => ({
    ...state,
    matchExactDate: state.cursor === state.manualCursor,
  })
);

@josephjclark
Copy link
Collaborator Author

josephjclark commented Apr 4, 2024

Regarding timezones, well, you've hurt my brain 🤯

For now, all timezones will be in UTC, and I think we'll release in that state.

I would like to support this: cursor('today', { timezone: 'EST' }), which would set the timestamp to a UTC date representing midnight on the current day in US Eastern time.

But I have two problems with this:

  1. I have no idea how to do it (I've been looking at libraries a bit bit it feels like a very hard problem!)
  2. I'll bet most databases are in UTC anyway, so maybe a UTC timezone probably makes most sense.

We will need to document that if you cursor('today'), we'll take the current time in UTC and go back to the start of that day - which could be a different calendar day if you're in the East

I suppose this is an argument for cursor('24h') (EDIT oh, you can do cursor('24 hours ago'))

@mtuchi
Copy link
Collaborator

mtuchi commented Apr 4, 2024

On timezone, i think we should lock it to UTC or (which ever timezone lighting is using). And add in CLI logs the timezone.
Eg: Setting cursor "now" to 2024-04-04T12:07:08.775Z UTC Timezone

@josephjclark
Copy link
Collaborator Author

@mtuchi I'm still investigating but with regard to those time errors you reported, I actually think time zones are the cause! The dates you're seeing in the console are UTC times, not local times. So what it reports for "today" isn't the start of your local today, it's the start of your today in UTC time

🤯

I suppose when you run in the CLI, all these dates wil be in you sytem time (I've just verified this), so outputting the locale time, rather than UTC time, would make sense.

But when running in Lightning, and Lightning is on UTC time, these relative times get very complex

@mtuchi
Copy link
Collaborator

mtuchi commented Apr 4, 2024

@josephjclark by the way, not to add more complexity i saw this library that is related to date-fns timezone - https://www.npmjs.com/package/date-fns-tz i don't know if it will help or makes things more complicated

@josephjclark
Copy link
Collaborator Author

@mtuchi Thank you - I know about date-fns-tz and I've been looking at it this morning. I think it might be part of the solution, but I can't quite see how it goes yet.

Again, I think for now we just have to say "all dates are generated in UTC". If that doesn't work for you, should should be able to pass a date string like August 19, 1975 23:15:30 GMT+00:00 and it should parse with the timezone (converted to UTC). It's just the automatic conversion of "today" that's hard.

@josephjclark
Copy link
Collaborator Author

josephjclark commented Apr 4, 2024

Ok @mtuchi can you do a retest?

  • When you set a cursor like "today", the CLI should print the time in your local time zone, so the dates should be accurate now
  • There's an options argument now, so you can do cursor($.cursor, { defaultValue: 'today', key: 'startDate' })

I'm going to update the main text in the PR to reflect the most recent styles. I'll tweak docs too.

@josephjclark josephjclark marked this pull request as ready for review April 4, 2024 14:50
@mtuchi
Copy link
Collaborator

mtuchi commented Apr 5, 2024

Hiya @josephjclark, Loving the $.cursor 🕺🏽. Here is my feedback

1. Invalid Date Error in console.log

I was getting this error Setting cursor "end" to Invalid Date when setting cursor like this cursor($.cursor, { defaultValue: 'today', key: 'lastRunDate' })

Updating the console.log to use state[cursorKey] instead of state.cursor should solve the problem

Setting cursor "${cursor}" to ${new Date(state[cursorKey] ).toLocaleString()

2. Timestamp/Timezone

The timestamp logs now reflect my local timezone Eg: Setting cursor "end" to 4/5/2024, 11:31:20 AM

If it's not too much to ask can we print the timezone as well ?
Just modifying console to use.toLocaleString('en-US', { timeZoneName: 'short' })} which will print Setting cursor "end" to 4/5/2024, 11:31:20 AM GMT+3

3. n days ago not working

Specifying the number of days ago still is not working. It always set the cursor to todays date
Eg:

cursor('2 days ago');
cursor('22 days ago');

// Will set the cursor to todays 
Setting cursor "2 days ago" to 4/5/2024, 12:00:00 AM 
Setting cursor "22 days ago" to 4/5/2024, 12:00:00 AM 

4. expandReference on options

I think we should expand reference on options as well. It will allow me todo something like this

 cursor($.cursor, { defaultValue: $.manualCursor })

And i can easily change the initial state. state.manualCursor: '72 hours ago'

5. Pagination cursor

In most cases pagination cursor will be an object. Eg

{
  "_metadata": 
  {
      "page": 5,
      "per_page": 20,
      "page_count": 20,
      "total_count": 521,
      "Links": [
        {"self": "/products?page=5&per_page=20"},
        {"first": "/products?page=0&per_page=20"},
        {"previous": "/products?page=4&per_page=20"},
        {"next": "/products?page=6&per_page=20"},
        {"last": "/products?page=26&per_page=20"},
      ]
  },
  "records": [
    {
      "id": 1,
      "name": "Widget #1",
      "uri": "/products/1"
    },
  ]
}

So i would assume the way i would set the cursor($.response._metadata) or cursor($.response._metadata.next) [assumes state.response._metadata.next]

In that case the console.log should also print the JSON object.
Right now if you set a cursor to an object. Eg: cursor({next: 22}) It will log Setting cursor to [object Object]

Please consider changing the console.log to console.log('Setting cursor to ', cursor); So that the log will print Setting cursor to { next: 22 }

@mtuchi
Copy link
Collaborator

mtuchi commented Apr 5, 2024

Also don't forget to build to generate the latest ast.json file.

@josephjclark
Copy link
Collaborator Author

@mtuchi this is amazing feedback - you've caught some brilliant issues there!

It's all been addressed. I've also added a seperate PR for docs - we should merge both at about the same time.

I've still got to do versioning stuff - I'll create a release branch for that

@josephjclark josephjclark changed the base branch from main to release-cursor April 5, 2024 11:28
@mtuchi
Copy link
Collaborator

mtuchi commented Apr 5, 2024

Hey @josephjclark tested the cursor everything works perfect ✅ 🥳 ,
One last feedback 🤞🏽

I find the date logs eg:3/14/2024 a bit confusing, because it's formatted MM-DD-YYY while the actual state.cursor is formatted YYY-MM-DDT:010. So it's a bit of a mind map exercise whenever i have to compare what i am seeing in CLI logs to what i have in final output.json.

My quick suggestion to fix this, can we adjust the logs to be like this

const humanLocaleDate = date.toLocaleString(undefined, {
  weekday: "long",
  year: "numeric",
  month: "short",
  day: "numeric",
  timeZoneName: "short",
});

That will turn the entire date into human readable form which is very easy to relate to. Eg Setting cursor "end" to: Friday, Apr 5, 2024, GMT+3

@josephjclark
Copy link
Collaborator Author

@mtuchi thanks - I think if we're gonna do that we'll just use date.format, or maybe date-fns.format. But I agree and I'll make that change now!

@josephjclark josephjclark merged commit a97de01 into release-cursor Apr 5, 2024
1 check passed
@josephjclark josephjclark deleted the cursor-helper branch April 5, 2024 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

common: create a cursor function / cron job helper
2 participants