Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuters poller business logic #92

Merged
merged 5 commits into from
Jan 22, 2025
Merged

Reuters poller business logic #92

merged 5 commits into from
Jan 22, 2025

Conversation

bryophyta
Copy link
Contributor

@bryophyta bryophyta commented Dec 19, 2024

What does this change?

First pass at business logic for Reuters feed.

Steps:

1. Auth

  • Read auth token from Secrets
  • If the auth token fails, try to request a new one to use for subsequent calls using the client id and secret from Secrets

2. Request feed via search for the last n*1.2 seconds, where n is the frequency at which the poller runs.

  • nb. as per our conversation with Reuters' API team there is not currently a supported method for getting 'only new items since last poll', so this is my suggested heuristic for getting new stories, though it's not foolproof
  • this means there will be overlap between any two consecutive queries, but currently opting to rely on the ingestion lambda to deduplicate stories unless this proves problematic for performance or other reasons

3. Pass the most recent auth token back

The poller lambda wrapper, as of #99, will check this value and update the Secret value if needed.


The implementation of (1) adds a little complexity and mutation that it would be nice to avoid, but the API guidelines suggest we shouldn't request new access tokens too often, so we should respect that. It could be refactored so that e.g. fetchWithReauth and fetchAllPages were defined outside the scope of the main function, and had accessToken passed as an argument, but on reflection it felt to me as though declaring accessToken via let makes it more explicit a) that it can mutate, and b) which functions can mutate it. I'm open to other perspectives though.

Script

Also adds a script to help get raw item data from the Reuters API, to help with debugging.

How to test

  • Run locally
  • Deploy to CODE, see that Reuters stories are being processed as expected.

How can we measure success?

Have we considered potential risks?

Images

Accessibility

@bryophyta bryophyta force-pushed the pf/reuters-poller branch 2 times, most recently from 1d949b5 to 4889fdc Compare December 19, 2024 18:01
@bryophyta bryophyta changed the base branch from main to pf/ap-poller-business-logic December 23, 2024 10:26
@bryophyta bryophyta force-pushed the pf/ap-poller-business-logic branch from fb0920c to f6de1a7 Compare December 23, 2024 16:43
@bryophyta bryophyta force-pushed the pf/reuters-poller branch 2 times, most recently from c983801 to 684492e Compare December 23, 2024 16:58
@bryophyta bryophyta force-pushed the pf/reuters-poller branch 3 times, most recently from d644737 to ef43389 Compare January 13, 2025 10:21
@bryophyta bryophyta changed the base branch from pf/ap-poller-business-logic to main January 14, 2025 14:07
@bryophyta bryophyta changed the base branch from main to pf/ap-business-logic January 14, 2025 15:45
Base automatically changed from pf/ap-business-logic to main January 16, 2025 14:46
In conversation with Reuters they said that there's currently no way of
requesting just the stories that have changed since last request
('endCursor' shouldn't be relied upon for this).

So we should search for a time period that overlaps with our last
request, and page through the results. The ingestion lambda handles
deduplication now, so no need to deduplicate here unless we spot
performance issues, for instance.
@bryophyta bryophyta marked this pull request as ready for review January 21, 2025 17:36
@bryophyta bryophyta requested a review from a team as a code owner January 21, 2025 17:36

let accessToken = ACCESS_TOKEN ?? (await auth(CLIENT_ID, CLIENT_SECRET));

async function fetchWithReauth(query: string) {
Copy link
Contributor Author

@bryophyta bryophyta Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is defined within the scope of the main poller function so that it can read and mutate accessToken.

I went back and forth on whether to handle this in a more 'pure' way, but as mentioned in the PR description this felt like a good balance in context. Open to other perspectives though!

Copy link
Member

@andrew-nowak andrew-nowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! great work!

'https://api.thomsonreuters.com/auth/reutersconnect.contentapi.read https://api.thomsonreuters.com/auth/reutersconnect.contentapi.write';
const authUrl = 'https://auth.thomsonreuters.com/oauth/token';
const grantType = 'client_credentials';
const audience = '7a14b6a2-73b8-4ab2-a610-80fb9f40f769';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double checking that this value is fine to be made public? iirc it's a value that's shared by all customers, is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that's right -- it's from the Reuters API documentation 👍

@bryophyta bryophyta changed the title Pf/reuters poller Reuters poller business logic Jan 22, 2025
@bryophyta bryophyta merged commit a4ee908 into main Jan 22, 2025
3 checks passed
@bryophyta bryophyta deleted the pf/reuters-poller branch January 22, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants