Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Replies #20

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Rework Replies #20

wants to merge 2 commits into from

Conversation

right-mouse
Copy link
Member

Reworks the "reply" portion of eelbot so that it uses the database for selecting a reply from. Replies are also more dynamic in nature and support matching using regex.

@boarnoah
Copy link
Member

boarnoah commented Jul 1, 2023

This PR proposes the following IIUC,

A reply is defined as a regex pattern + some config (the prompt), along with a list of replies (some prompts might only have one reply but others might have lots) that Eelbot will reply with once there is a match.

The way this data is organized in the database is a unconventional. Conventional thinking to me suggests organizing the data as follows

  • A prompts table, which stores regex pattern(s) & the associated config
  • A replies table with a FK to the prompt table, by which we can select a reply that is suitable for a given prompt

This design is:

  • Prompts are stored in config, the pattern & the associated config
  • Prompts point to replies.<reply-type> tables, so in general there would be N reply tables for N prompts (you could have multiple prompts point to the same reply table)

There isn't any obvious examples of people doing this in relational DBs, and while I personally do feel a more conventional design would have suited our purpose, it will be interesting to investigate this further, to see

  1. if it has been done before
  2. if there are advantages to this approach when dealing with large data sets

So a bunch of thoughts & follow up work (not necessarily needed for this PR to merge) in no particular order

Some of these questions / points are better thought of in the context of "how would something similar to this behave with a very very large dataset, the typical: 'How would you build twitter?' realm of questions"

  • this looks a lot like how document databases typically do horizontal scaling by implementing the concept of a sharding with partition key
    • not similar to a relational database's concept of partition, which is generally used to split a homogenous dataset by time etc... into separate tables under the hood.
    • I can't find too many examples of a design like this with relational databases, but would like to gather them to see any pros/cons, closest I found was: https://news.ycombinator.com/item?id=28776786, there is plenty of material about read replicas for high availability for relational DBs out there, this is more complex than that
  • minor point, wonder if this will cause DB tools like PGAdmin etc..., to keel over quite soon since the number of tables in the schema grows with the database, in fairness I suspect if this kind of design was done at a large scale you are likely using bespoke tools to interact with it anyway
  • as you mentioned this makes maintaining a gapless sequence much easier
    • in a typical one large table design, there is IIUC always a hard limit with trying to maintain a gapless sequence & scaling up for multiple writers, since with this design a typical write interacts with a much smaller table, this is not an issue
    • in fairness to Postgres, https://www.postgresql.org/files/developer/concurrency.pdf as I understand it typical works at a row level, which means in most cases you can already do concurrent writes, you would need to sacrifice having a gapless sequence however. Curious if that is a worthy sacrifice in most situations
  • with our current design this is built around all the data being organized into tables that share the same table design, this does not require it
    • similar to typical document databases, this means we could if we want organize data differently by generation or type if desired
    • Wonder if this could be an advantage with large databases compared to a more conventional design. Where your time for migrations that mutate existing data (ex: splitting data off from a table etc...) increases with the amount of data you have (presumably linearly)

Comment on lines +63 to +71
# Add custom replies by creating a table under the `reply` schema in the database first, then adding an entry here to
# reference it. Check the README for more details.
- table_name: "myreply"
regexps:
- "a{3,6}"
percent: 33
min_delay: 3
max_delay: 6
timeout: 600
Copy link
Member

@boarnoah boarnoah Jul 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to do this, IMO this needs to live in the DB not in config, it doesn't fully achieve the goal of being able to extend replies to -> easily / with eelbot-ui this way.

If you do this lets see if we can avoid having so many long type switches for all the numerical types in golang :P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did this is because the config is loaded once at startup whereas the DB tables can be changed at runtime (replies added/removed etc.). I felt like putting the config in the DB could imply that this too could be changed on the fly. I'll move the config to the DB now, and in a future MR perhaps, I think hot reloading of the config can be handled in the app by watching for changes (not sure how easy it is in Postgres).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not sure how easy it is in Postgres).

Not difficult IIUC. But I figure what we will more likely want to consider is having eelbot-ui signal to eelbot about changes rather than a DB level event.

I felt like putting the config in the DB could imply that this too could be changed on the fly

Yep, suspected this was the reason. But I do think we should build this around the hopefully realistic future case where eelbot-ui is going to be used to enter in a lot of new reply types without having to re-deploy eelbot often.

@right-mouse
Copy link
Member Author

@CodiumAI-Agent /review

@CodiumAI-Agent
Copy link

PR Review

⏱️ Estimated effort to review [1-5]

3, because the PR involves changes across multiple files with modifications that include database interactions, regex operations, and handling of different data types which require careful review to ensure correctness and efficiency.

🧪 Relevant tests

No

🔍 Possible issues

Possible Bug: The conversion of all numeric types to float64 in the toStrKeys function might lead to precision issues or unexpected behavior when dealing with large integers.

Performance Concern: The function getDuration in commands/command.go could be optimized by using a type switch instead of multiple if-else conditions.

🔒 Security concerns

No

Code feedback:
relevant filecmd/eelbot/main.go
suggestion      

Consider using a type switch in the toStrKeys function to handle different numeric types more cleanly and efficiently. This approach reduces the redundancy of multiple case statements and improves readability. [important]

relevant linecase int:

relevant filecommands/command.go
suggestion      

Optimize the getDuration function by implementing a type switch to handle different types of numeric inputs. This will make the code cleaner and potentially more performant by reducing the number of type assertions. [important]

relevant linecase int:

relevant filecmd/eelbot/main.go
suggestion      

Ensure that the conversion of numeric types to float64 does not cause precision issues, especially with large integers. Consider adding a check or a different handling mechanism for very large numbers. [medium]

relevant linecase uint64:

relevant filecommands/command.go
suggestion      

Refactor the repeated code in getDuration by using a helper function or a loop to handle the conversion of different integer types to time.Duration, reducing code duplication. [medium]

relevant linecase int:


✨ Review tool usage guide:

Overview:
The review tool scans the PR code changes, and generates a PR review which includes several types of feedbacks, such as possible PR issues, security threats and relevant test in the PR. More feedbacks can be added by configuring the tool.

The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.

  • When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:
/review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...
[pr_reviewer]
some_config1=...
some_config2=...

See the review usage page for a comprehensive guide on using this tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants