Get reformulator working #7

Grazfather · 2025-01-08T02:26:50Z

This is my attempt to clean up the readme and code to make sense.

Has anyone gotten this to work? There are a bunch of problem, which I've tried to address, but have a few questions.

If you're willing/able to point me in the right direction, I can fix the code and update the readme.

Main issues:

Creating the loggers is broken (fixed)
It expects a database, but the schema isn't well defined. Did you start with a db? Maybe a method should be added to create an empty database.
It probably should not sync the database when things don't work, that's a good way to trash your database.
I think it expects custom note types with specific fields? If this is true, it should be documented.

thiswillbeyourgithub

I thank you very much for taking the time. This really means a lot to me!

This is my attempt to clean up the readme and code to make sense.

Has anyone gotten this to work? There are a bunch of problem, which I've tried to address, but have a few questions.

As I said I might have broken a few things here and there during the publishing rush. Sorry about that. In general the illustrator should be the most recent code and have the least technical debt I believe.

If you're willing/able to point me in the right direction, I can fix the code and update the readme.

That's really great to hear and the help I was desperately looking for! I can find the time to point you in the good direction for sure but I'm stretched very thin otherwise so rather not code most things myself. You are a godsend :)

I did a review of your PR and left a few comments.

I just noticed that unfortunately my TODO parser I'm using to sync my README with logseq was missing quite a few bulletpoints so the outline in the roadmap was not clear. I fixed that, so now what do you think about the way forward to sensify the code? I believe the main thing is that those scripts should instead inherit from a common class and share many methods. That would make it both easier to add more tools if ever needed and more urgently to understand / maintain / unclutter and in the end make usable.

reformulator.py

utils/llm.py

utils/logger.py

Grazfather · 2025-01-08T20:59:08Z

Why is it that the example dataset uses clozes? Is that how you make all of yours cards? I found that the reformulator (using my cards, your dataset) will turn them into clozes, but that doesn't work unless the note type is cloze.

reformulator.py

thiswillbeyourgithub · 2025-01-09T10:26:26Z

reformulator.py

+    except AssertionError as e:
+        red(e)
+    except Exception as e:
+        red(e)


Are you sure about this?

This is a personal thing, we can change it.

Basically, I make it so that if we fail an assert, it's probably an issue with the invocation, and we log it. If it's another type of exception, then we probably don't expect it, so we log it, but we also re-raise the exception to print the stack trace. I can remove it, but it's helping me with debugging.

Whatever helps you helps all of us here so no biggy

utils/llm.py

thiswillbeyourgithub · 2025-01-09T10:33:58Z

Why is it that the example dataset uses clozes? Is that how you make all of yours cards? I found that the reformulator (using my cards, your dataset) will turn them into clozes, but that doesn't work unless the note type is cloze.

See my reply here

thiswillbeyourgithub · 2025-01-09T10:35:13Z

Also note that my datasets were intially written in french and I used claude sonnet 3.5 to translate them to english hastily. I have not had the time to proofread them but the idea should be there still.

thiswillbeyourgithub · 2025-01-09T10:38:52Z

I think a good idea would be to use platformdirs to find a good location for the dbs, ideally with an abilitty to overload it using an env var

thiswillbeyourgithub · 2025-01-09T19:07:19Z

reformulator.py

@@ -670,6 +678,7 @@ def apply_reformulate(self, log: Dict) -> None:
            nid,
            fields={
                self.field_name: log["note_field_formattednewcontent"],
+                # TODO: Might be nice to not require this


Iirc it was necessary to be rock solid sure that we can rollback easily. But yeah maybe we could just store the previous version and use only the db to handle rollbacks?

To clarify: here I was refering to the fact that where have many strings lile "note_field_*" and would be nice if we didn't

I am not entirely sure of the purpose of the db. Does it reflect changes that haven't been committed, or the latest version? having the db is nice for persistence, but it also opens up to some weird state where the notes don't match the database, so generally I prefer having a single source of truth.

The purpose of the database is only to act as a kind of very reliable and easy to parse logfile. It does not matter if there are inconsistencies because, for example if the user modified itself since the last time they ran the script. But it ensures complete reliability because it allows to roll back. To me, working with LLM's, it's very important to be able to rollback. Say in six months there is a shiny new LLM that is very cheap and possibly very good. Well, there is only one way to find out if it's good enough to handle real world notes. And then it's only after a few hundred reviews that you can actually judge if it does not make some weird edge case mistakes.

I don't know, just to give an example, at some point I realized that some of my cards related to hours were wrongly parsed.

Cool, yeah I see that it's only a log and is outside of anything with the anki reformulator field.

utils/cloze_utils.py

utils/llm.py

thiswillbeyourgithub · 2025-01-15T16:54:13Z

reformulator.py

        self.db_content = self.load_db()
        if not self.db_content:
            red("Empty database. If you have already ran anki_reformulator "
                "before then something went wrong!")
-            whi("Trying to create a new database")
+            whi("Creating a empty database")


Typo, should be "an"

thiswillbeyourgithub · 2025-01-15T16:59:59Z

reformulator.py

                if buffer:
                    try:
-                        _ = rtoml.loads("".join(buffer + [line]))
+                        # TODO: What are you trying to do here? Just check that adding the line keeps valid toml?


I think so. Irrc the thing with toml was to have a human readable way to see what happened using the addon. As a lot of log is packed into it I also added code to try rolling back if --reset was used and in case the db failed to recover. I can be fine with only storing data to the db, but also with storing all metadata of all scripts into a single field.

Btw rtoml was better in some aspects than toml but I remember a bit having much trouble in some situation (lile dumping then loading resulting in different values especially when None are involved but can't remember more specifically.)

thiswillbeyourgithub · 2025-01-15T17:02:13Z

reformulator.py

-            dictionary = json.loads(zlib.decompress(row[0]))
-            dictionaries.append(dictionary)
-        return dictionaries
+        # TODO: Why do you compress? This just makes it more difficult to debug


Intially I just dumped json but the size got out of hand surprisingly quickly so I compressed it with zlib and found out I might as well use sqlite.

This was totally amateurish, if I had to do it again I would use sqlite only and enable the built in compressions of course. But still being technically an amateur I'm open to any suggestion of course

thiswillbeyourgithub · 2025-01-15T17:12:02Z

I think a good idea would be to use platformdirs to find a good location for the dbs, ideally with an abilitty to overload it using an env var

This would also make it easy to switch to storing the db into the addon package in anki. Well actually I don't know if that's the best way, I have faint memories of some addon storing their own data in their own table of the anki db file. That might be fine if text but storing media like images seems unreasonnably costly to the ankidev.

wip: Make sense of everything

f7a97ce

Grazfather force-pushed the fixup branch from e56f653 to f7a97ce Compare January 8, 2025 02:28

thiswillbeyourgithub reviewed Jan 8, 2025

View reviewed changes

Grazfather added 2 commits January 8, 2025 15:42

More work

a3fdc97

more

212b88c

Grazfather added 2 commits January 8, 2025 16:03

Remove API key loading

c364342

more

0c0aec0

Grazfather commented Jan 8, 2025

View reviewed changes

reformulator.py Outdated Show resolved Hide resolved

thiswillbeyourgithub reviewed Jan 9, 2025

View reviewed changes

reformulator.py Outdated Show resolved Hide resolved

thiswillbeyourgithub reviewed Jan 9, 2025

View reviewed changes

reformulator.py Outdated Show resolved Hide resolved

thiswillbeyourgithub reviewed Jan 9, 2025

View reviewed changes

reformulator.py Outdated Show resolved Hide resolved

thiswillbeyourgithub reviewed Jan 9, 2025

View reviewed changes

reformulator.py Show resolved Hide resolved

thiswillbeyourgithub reviewed Jan 9, 2025

View reviewed changes

reformulator.py Outdated Show resolved Hide resolved

thiswillbeyourgithub reviewed Jan 9, 2025

View reviewed changes

utils/llm.py Outdated Show resolved Hide resolved

Grazfather added 3 commits January 9, 2025 09:29

More fixes and Qs

cf2fca5

Better error for main field index

811d4e0

Add reformulate method so all work is not done in init

f34de04

Grazfather changed the title ~~wip: Make sense of everything~~ Get reformulator working Jan 9, 2025

thiswillbeyourgithub reviewed Jan 9, 2025

View reviewed changes

utils/cloze_utils.py Show resolved Hide resolved

thiswillbeyourgithub reviewed Jan 9, 2025

View reviewed changes

utils/cloze_utils.py Show resolved Hide resolved

thiswillbeyourgithub reviewed Jan 9, 2025

View reviewed changes

utils/llm.py Outdated Show resolved Hide resolved

Grazfather added 2 commits January 9, 2025 15:29

more cleanup

25813f6

fix

f907ea1

thiswillbeyourgithub reviewed Jan 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get reformulator working #7

Get reformulator working #7

Grazfather commented Jan 8, 2025

thiswillbeyourgithub left a comment

Grazfather commented Jan 8, 2025

thiswillbeyourgithub Jan 9, 2025

Grazfather Jan 9, 2025

thiswillbeyourgithub Jan 9, 2025

thiswillbeyourgithub commented Jan 9, 2025

thiswillbeyourgithub commented Jan 9, 2025

thiswillbeyourgithub commented Jan 9, 2025

thiswillbeyourgithub Jan 9, 2025

thiswillbeyourgithub Jan 9, 2025

Grazfather Jan 9, 2025

thiswillbeyourgithub Jan 9, 2025

Grazfather Jan 9, 2025

thiswillbeyourgithub Jan 15, 2025

thiswillbeyourgithub Jan 15, 2025

thiswillbeyourgithub Jan 15, 2025 •

edited

Loading

thiswillbeyourgithub commented Jan 15, 2025

Get reformulator working #7

Are you sure you want to change the base?

Get reformulator working #7

Conversation

Grazfather commented Jan 8, 2025

thiswillbeyourgithub left a comment

Choose a reason for hiding this comment

Grazfather commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thiswillbeyourgithub commented Jan 9, 2025

thiswillbeyourgithub commented Jan 9, 2025

thiswillbeyourgithub commented Jan 9, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thiswillbeyourgithub Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

thiswillbeyourgithub commented Jan 15, 2025

thiswillbeyourgithub Jan 15, 2025 •

edited

Loading