Merge pull request #4 from frli4797:flags

Fixing flags daemon and dry run
frli4797 · Nov 1, 2024 · 6dad1a7 · 6dad1a7
2 parents 849544e + be37a53
commit 6dad1a7
Show file tree

Hide file tree

Showing 3 changed files with 146 additions and 58 deletions.
diff --git a/README.md b/README.md
@@ -1,21 +1,99 @@
 # imap_sorting_hat = "ish"
 
-Magically sort email into smart folders. This is copied (but not forked) from @kenseehart/[imap_sorting_hat](https://github.com/kenseehart/imap_sorting_hat) to support some additional changes and experimentation for my own learning.
+Magically sort email into smart folders. **ish** works by downloading plain text versions of all the emails in the source email folders and move those unread to the destination folders, by using a multi class classifier. 
+
+Initially the classifier needs to be trained on what your emails look like and where you like to keep them. This is done by downloading all emails (as plain text) from your destination folders, caching those locally. That cache, essentially a write through cache is used to aquire text embeddings from OpenAI for all the emails seen. Also the embeddings will be cached on disk. The embeddings will constitute the data to train a RandomForest on, and all the destination folders will be used as the classes for the classifier.
+
+The model, after trained, will the get stored as well, and then used whenever a new email message has been discovered, assuming that **ish** is being run in non-interactive and polling mode. Once a new email (unseen/unread) message is discovered **ish** will classify that message and then move it according to its prior experience (training).
+
+**ish** can also be run in interactive mode. It will then try to move **all** messages from the source folder(s), but ask the user about every message. This can be a good option when first training the model, and also to ensure that you don't end up with email in random folders in a cold start situation. 
 
 - No rule programming. Instead, just move a few emails into a smart folder and **ish** will quickly learn what the messages have in common.
 - Any folder can be labeled a smart folder.
 - Uses the lates OpenAI language model technology to quickly sort emails into corresponding folders.
 - Compatible with all imap email clients.
 - Works for all common languages.
 
+This is copied (but not forked) from @kenseehart/[imap_sorting_hat](https://github.com/kenseehart/imap_sorting_hat) to support some additional changes and experimentation for my own learning.
+
+## Configuring
+
+To configure **ish** there needs to be a directory in which **ish** will put e-mail text and cached embeddings, used to train the model. It will also store the model as a pickle. Per default all these things will be contained within a directory, `${HOME}/.ish`
+
+```text
+.ish
+├── data
+│   ├── embd
+│   ├── model.pkl
+│   └── msgs
+├── settings.yaml
+```
+
+### Example setting.yaml
+
+```yaml
+host: imap.mail.me.com
+username: [email protected]
+password: this-is-a-mock-password
+source_folders:
+- INBOX
+destination_folders:
+- News
+- Notifications
+- School
+- Travel
+ignore_folders:
+- Archive
+- Deleted Messages
+- Drafts
+- Sent Messages
+- Junk
+openai_api_key: ll-ddjg-RI51oFV-0Du9Xo4ERraVFd0UvcwFPP0wUkTB2tC
+openai_model: text-embedding-3-small
+```
+
+## Running
+
+### Command line
+
+Run the **ish** by issuing
+`python3 ish.py` 
+The main program has a few parameters that can be used.
+
+```text
+usage: ish.py [-h] [--learn-folders] [--interactive] [--dry-run] [--daemon] [--config-path CONFIG_PATH] [--verbose]
+options:
+  -h, --help            show this help message and exit
+  --learn-folders, -l   Learn based on the contents of the destination folders
+  --interactive, -i     Prompt user before moving anything
+  --dry-run, -n         Don't actually move emails
+  --daemon, -D          Run in daemon mode (NOT IMPLEMENTED)
+  --config-path, -C CONFIG_PATH
+                        Path for config file and data. Will default to /Users/fredriklilja/.ish
+  --verbose, -v         Verbose/debug mode
+```
+
+### In Docker
+
+You can also run **ish** in a Docker container.
+`docker run -it  -v ./.ish:/opt/ish/config -e ISH_DAEMON=True -e ISH_DEBUG=True -e ISH_LEARN=True frli4797/ish`
+
+## Take care
+
+I leave no guarantees that this will work with your mailprovider, nor that it will work for well for your language. This is an application that I've been tinkering with as I got fed up with sifting through my email and wanted to learn something new. Email might be destroyed, misplaces or lost from using this little tool. Take care. Make backups.  
+
 ## Future development
 
 - [x] Make it work (kind of).
 - [x] Create command line parameters for the usual tasks and tweaks, such as traing, inference, dry-run
 - [x] Optimize embedding calls to OpenAI by batching
-- [ ] Dockerize
+- [x] Dockerize
 - [ ] Use dev container
-- [ ] Daemonize to be able to run this as a service
+- [x] Service mode/daemon mode to be able to run this as a service, polling the imap server every so often
 - [ ] Add Ollama as a potential source of embeddings
 
+## Other
+
+I made some experiments with other embedding models using Ollama. Unfortunately the precision really suffered in these experimiments, especially with a mailbox with messages on multiple languages. 
+
 Thanks to: [@kenseehart](https://github.com/kenseehart)
diff --git a/imap.py b/imap.py
@@ -73,10 +73,11 @@ def mesg_to_text(mesg: email.message.Message) -> str:
 
 
 class ImapHandler:
-    def __init__(self, settings: Settings) -> None:
+    def __init__(self, settings: Settings, readonly=False) -> None:
         self.__settings = settings
         self.__imap_conn = None
         self.logger = logging.getLogger(self.__class__.__name__)
+        self.__readonly = readonly
 
     def get_connection(self):
         return self.__imap_conn
@@ -199,7 +200,7 @@ def search(self, folder: str, search_args=None) -> list[int]:
     def __search(self, folder: str, search_args=None) -> list[int]:
         if search_args is None:
             search_args = ["ALL"]
-        self.__imap_conn.select_folder(folder)
+        self.__imap_conn.select_folder(folder, self.__readonly)
         results = self.__imap_conn.search(search_args)
         return results
 
@@ -234,7 +235,7 @@ def move(
                 dest_folder,
             )
             raise ValueError("Expected uids to be a list")
-        self.__imap_conn.select_folder(folder)
+        self.__imap_conn.select_folder(folder, self.__readonly)
         if flag_messages:
             self.__imap_conn.add_flags(uids, [imapclient.FLAGGED])
         if flag_unseen:

diff --git a/ish.py b/ish.py
@@ -13,6 +13,7 @@
 Status: Early development
 """
 
+from enum import Enum
 import logging
 import os
 import shelve
@@ -53,6 +54,9 @@ def env_to_bool(key: str):
     return os.environ.get(key) is not None
 
 
+Action = Enum("Action", ["YES", "NO", "QUIT"])
+
+
 class ISH:
     debug = False
     _exit_event = Event()
@@ -62,7 +66,11 @@ class ISH:
     _daemon = False
 
     def __init__(
-        self, interactive: bool = False, train: bool = False, daemon: bool = False
+        self,
+        interactive: bool = False,
+        train: bool = False,
+        daemon: bool = False,
+        dry_run=False,
     ) -> None:
         self.logger = logging.getLogger(self.__class__.__name__)
 
@@ -75,6 +83,7 @@ def __init__(
         self._interactive = interactive
         self._train = train
         self._daemon = daemon
+        self._dry_run = dry_run
 
         self.classifier: RandomForestClassifier = None
         self.moved = 0
@@ -379,7 +388,6 @@ def classify_messages(self, source_folders: List[str]) -> None:
             source_folders (List[str]): list of source folders
         """
         imap_conn: ImapHandler = self.__imap_conn
-
         self.skipped = 0
         self.moved = 0
         classifier = self.classifier
@@ -411,48 +419,44 @@ def classify_messages(self, source_folders: List[str]) -> None:
                     "body": mesgs[uid]["body"][0:100],
                 }
                 if top_probability > 0.25:
-                    self.logger.info(
-                        "\n%3i From %s: %s",
-                        uid,
-                        mess_to_move["from"],
-                        mess_to_move["body"],
-                    )
+                    self._log_move(uid, "Going to move", ranks, mess_to_move)
 
-                    for p, c in ranks[:3]:
-                        self.logger.info("%.2f: %s", p, c)
-
-                    if self.interactive and not self.__select_move(dest_folder):
-                        self.logger.debug(
-                            """Skipping due to probability %.2f
-                                %i From %s: %s""",
-                            top_probability,
-                            uid,
-                            mess_to_move["from"],
-                            mess_to_move["body"],
-                        )
-                        self.skipped += 1
-                        continue
+                    if self.interactive:
+                        answer = self.__select_move(dest_folder)
+                        if answer == Action.NO:
+                            self.skipped += 1
+                            continue
+                        elif answer == Action.QUIT:
+                            break
 
                     if dest_folder not in to_move:
                         to_move[dest_folder] = [mess_to_move]
                     else:
                         to_move[dest_folder].append(mess_to_move)
 
                 else:
-                    self.logger.debug(
-                        """Skipping due to probability %.2f
-                                %i From %s: %s""",
-                        top_probability,
-                        uid,
-                        mess_to_move["from"],
-                        mess_to_move["body"],
+                    self._log_move(
+                        uid, "Skipping due to probability", ranks, mess_to_move
                     )
                     self.skipped += 1
+
             self.logger.info("Finished predicting %s", folder)
             self.moved += self.move_messages(folder, to_move)
         self.logger.info("Finished moved %i and skipped %i", self.moved, self.skipped)
 
-    def __select_move(self, dest_folder: str) -> bool:
+    def _log_move(self, uid, text, ranks, mess_to_move):
+        self.logger.info(
+            "%s\n%3i From %s: %s",
+            text,
+            uid,
+            mess_to_move["from"],
+            mess_to_move["body"],
+        )
+
+        for p, c in ranks[:3]:
+            self.logger.info("%.2f: %s", p, c)
+
+    def __select_move(self, dest_folder: str) -> Action:
         """Interactively ask user if to move.
 
         Args:
@@ -465,12 +469,11 @@ def __select_move(self, dest_folder: str) -> bool:
         while opt not in ["y", "n", "q"]:
             opt = input(f"Move message to {dest_folder}? [y]yes, [n]no, [q]quit:")
             if opt == "y":
-                return True
+                return Action.YES
             if opt == "q":
                 self.logger.info("Quitting.")
-                sys.exit(0)
-            else:
-                return False
+                return Action.QUIT
+        return Action.NO
 
     def move_messages(self, folder: str, messages: dict[str, list]) -> int:
         """Move the messages market for moving, by target folder.
@@ -487,16 +490,23 @@ def move_messages(self, folder: str, messages: dict[str, list]) -> int:
         for dest_folder in messages:
             messages_list = messages[dest_folder]
             uids: list = [mess["uid"] for mess in messages_list]
-            if len(uids) > 0:
-                imap_conn.move(
-                    folder,
+            if not self._dry_run:
+                if len(uids) > 0:
+                    imap_conn.move(
+                        folder,
+                        uids,
+                        dest_folder,
+                        flag_messages=True,
+                        flag_unseen=not self.interactive,
+                    )
+                moved += len(uids)
+            else:
+                self.logger.info(
+                    "Dry run. WOULD have moved UID %s from %s to %s",
                     uids,
+                    folder,
                     dest_folder,
-                    flag_messages=True,
-                    flag_unseen=not self.interactive,
                 )
-            moved += len(uids)
-
         return moved
 
     def run(self) -> int:
@@ -527,12 +537,6 @@ def run(self) -> int:
             new_var = 10
             self._exit_event.wait(POLL_TIME_SEC)
 
-        #       except Exception as e:
-        #           base_logger.error("Something went wrong. Unknown error.")
-        #           base_logger.info(e, stack_info=True)
-        #           return -1
-        #       finally:
-        #           self.close()
         return 0
 
     def close(self):
@@ -559,14 +563,14 @@ def __do_exit(self, signum, frame):
 def main(args: Dict[str, str]):
     ISH.debug = bool(args.pop("verbose"))
     dry_run = bool(args.pop("dry_run"))  # noqa: F841
-    daemonize = bool(args.pop("daemon"))  # noqa: F841
+    daemonize = bool(args.pop("daemon"))
     interactive = bool(args.pop("interactive"))
     train = bool(args.pop("learn_folders"))
     config_path = args.pop("config_path")
     if config_path is not None and not config_path == "":
         os.environ["ISH_CONFIG_PATH"] = config_path
 
-    ish = ISH(interactive=interactive, train=train, daemon=daemonize)
+    ish = ISH(interactive=interactive, train=train, daemon=daemonize, dry_run=dry_run)
     r = ish.run()
     sys.exit(r)
 
@@ -576,7 +580,12 @@ def main(args: Dict[str, str]):
     import argparse
 
     userhomedir = Settings.get_user_directory()
-    parser = argparse.ArgumentParser(description="Lorem ipsum")
+    parser = argparse.ArgumentParser(
+        description="""Magically sort email into smart folders.
+                            **ish** works by downloading plain text versions of all the \
+                            emails in the source email folders and move those unread to \
+                            the destination folders, by using a multi class classifier."""
+    )
     # Environment variables always takes precedence.
     parser.add_argument(
         "--learn-folders",
@@ -605,9 +614,9 @@ def main(args: Dict[str, str]):
     parser.add_argument(
         "--daemon",
         "-D",
-        help="Run in daemon mode (NOT IMPLEMENTED)",
+        help="Run in daemon/polling mode",
         action="store_true",
-        default=bool(os.environ.get("ISH_DAEMON")),
+        default=env_to_bool("ISH_DAEMON"),
     )
 
     parser.add_argument(