-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get reformulator working #7
base: public
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thank you very much for taking the time. This really means a lot to me!
This is my attempt to clean up the readme and code to make sense.
Has anyone gotten this to work? There are a bunch of problem, which I've tried to address, but have a few questions.
As I said I might have broken a few things here and there during the publishing rush. Sorry about that. In general the illustrator should be the most recent code and have the least technical debt I believe.
If you're willing/able to point me in the right direction, I can fix the code and update the readme.
That's really great to hear and the help I was desperately looking for! I can find the time to point you in the good direction for sure but I'm stretched very thin otherwise so rather not code most things myself. You are a godsend :)
I did a review of your PR and left a few comments.
I just noticed that unfortunately my TODO parser I'm using to sync my README with logseq was missing quite a few bulletpoints so the outline in the roadmap was not clear. I fixed that, so now what do you think about the way forward to sensify the code? I believe the main thing is that those scripts should instead inherit from a common class and share many methods. That would make it both easier to add more tools if ever needed and more urgently to understand / maintain / unclutter and in the end make usable.
Why is it that the example dataset uses clozes? Is that how you make all of yours cards? I found that the reformulator (using my cards, your dataset) will turn them into clozes, but that doesn't work unless the note type is cloze. |
except AssertionError as e: | ||
red(e) | ||
except Exception as e: | ||
red(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a personal thing, we can change it.
Basically, I make it so that if we fail an assert, it's probably an issue with the invocation, and we log it. If it's another type of exception, then we probably don't expect it, so we log it, but we also re-raise the exception to print the stack trace. I can remove it, but it's helping me with debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever helps you helps all of us here so no biggy
See my reply here |
Also note that my datasets were intially written in french and I used claude sonnet 3.5 to translate them to english hastily. I have not had the time to proofread them but the idea should be there still. |
I think a good idea would be to use platformdirs to find a good location for the dbs, ideally with an abilitty to overload it using an env var |
@@ -670,6 +678,7 @@ def apply_reformulate(self, log: Dict) -> None: | |||
nid, | |||
fields={ | |||
self.field_name: log["note_field_formattednewcontent"], | |||
# TODO: Might be nice to not require this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iirc it was necessary to be rock solid sure that we can rollback easily. But yeah maybe we could just store the previous version and use only the db to handle rollbacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: here I was refering to the fact that where have many strings lile "note_field_*" and would be nice if we didn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure of the purpose of the db. Does it reflect changes that haven't been committed, or the latest version? having the db is nice for persistence, but it also opens up to some weird state where the notes don't match the database, so generally I prefer having a single source of truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the database is only to act as a kind of very reliable and easy to parse logfile. It does not matter if there are inconsistencies because, for example if the user modified itself since the last time they ran the script. But it ensures complete reliability because it allows to roll back. To me, working with LLM's, it's very important to be able to rollback. Say in six months there is a shiny new LLM that is very cheap and possibly very good. Well, there is only one way to find out if it's good enough to handle real world notes. And then it's only after a few hundred reviews that you can actually judge if it does not make some weird edge case mistakes.
I don't know, just to give an example, at some point I realized that some of my cards related to hours were wrongly parsed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, yeah I see that it's only a log and is outside of anything with the anki reformulator field.
self.db_content = self.load_db() | ||
if not self.db_content: | ||
red("Empty database. If you have already ran anki_reformulator " | ||
"before then something went wrong!") | ||
whi("Trying to create a new database") | ||
whi("Creating a empty database") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo, should be "an"
if buffer: | ||
try: | ||
_ = rtoml.loads("".join(buffer + [line])) | ||
# TODO: What are you trying to do here? Just check that adding the line keeps valid toml? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Irrc the thing with toml was to have a human readable way to see what happened using the addon. As a lot of log is packed into it I also added code to try rolling back if --reset was used and in case the db failed to recover. I can be fine with only storing data to the db, but also with storing all metadata of all scripts into a single field.
Btw rtoml was better in some aspects than toml but I remember a bit having much trouble in some situation (lile dumping then loading resulting in different values especially when None are involved but can't remember more specifically.)
dictionary = json.loads(zlib.decompress(row[0])) | ||
dictionaries.append(dictionary) | ||
return dictionaries | ||
# TODO: Why do you compress? This just makes it more difficult to debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intially I just dumped json but the size got out of hand surprisingly quickly so I compressed it with zlib and found out I might as well use sqlite.
This was totally amateurish, if I had to do it again I would use sqlite only and enable the built in compressions of course. But still being technically an amateur I'm open to any suggestion of course
This would also make it easy to switch to storing the db into the addon package in anki. Well actually I don't know if that's the best way, I have faint memories of some addon storing their own data in their own table of the anki db file. That might be fine if text but storing media like images seems unreasonnably costly to the ankidev. |
This is my attempt to clean up the readme and code to make sense.
Has anyone gotten this to work? There are a bunch of problem, which I've tried to address, but have a few questions.
If you're willing/able to point me in the right direction, I can fix the code and update the readme.
Main issues: