Skip to content

Commit

Permalink
Merge pull request #4 from frli4797:flags
Browse files Browse the repository at this point in the history
Fixing flags daemon and dry run
  • Loading branch information
frli4797 authored Nov 1, 2024
2 parents 849544e + be37a53 commit 6dad1a7
Show file tree
Hide file tree
Showing 3 changed files with 146 additions and 58 deletions.
84 changes: 81 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,99 @@
# imap_sorting_hat = "ish"

Magically sort email into smart folders. This is copied (but not forked) from @kenseehart/[imap_sorting_hat](https://github.com/kenseehart/imap_sorting_hat) to support some additional changes and experimentation for my own learning.
Magically sort email into smart folders. **ish** works by downloading plain text versions of all the emails in the source email folders and move those unread to the destination folders, by using a multi class classifier.

Initially the classifier needs to be trained on what your emails look like and where you like to keep them. This is done by downloading all emails (as plain text) from your destination folders, caching those locally. That cache, essentially a write through cache is used to aquire text embeddings from OpenAI for all the emails seen. Also the embeddings will be cached on disk. The embeddings will constitute the data to train a RandomForest on, and all the destination folders will be used as the classes for the classifier.

The model, after trained, will the get stored as well, and then used whenever a new email message has been discovered, assuming that **ish** is being run in non-interactive and polling mode. Once a new email (unseen/unread) message is discovered **ish** will classify that message and then move it according to its prior experience (training).

**ish** can also be run in interactive mode. It will then try to move **all** messages from the source folder(s), but ask the user about every message. This can be a good option when first training the model, and also to ensure that you don't end up with email in random folders in a cold start situation.

- No rule programming. Instead, just move a few emails into a smart folder and **ish** will quickly learn what the messages have in common.
- Any folder can be labeled a smart folder.
- Uses the lates OpenAI language model technology to quickly sort emails into corresponding folders.
- Compatible with all imap email clients.
- Works for all common languages.

This is copied (but not forked) from @kenseehart/[imap_sorting_hat](https://github.com/kenseehart/imap_sorting_hat) to support some additional changes and experimentation for my own learning.

## Configuring

To configure **ish** there needs to be a directory in which **ish** will put e-mail text and cached embeddings, used to train the model. It will also store the model as a pickle. Per default all these things will be contained within a directory, `${HOME}/.ish`

```text
.ish
├── data
│   ├── embd
│   ├── model.pkl
│   └── msgs
├── settings.yaml
```

### Example setting.yaml

```yaml
host: imap.mail.me.com
username: [email protected]
password: this-is-a-mock-password
source_folders:
- INBOX
destination_folders:
- News
- Notifications
- School
- Travel
ignore_folders:
- Archive
- Deleted Messages
- Drafts
- Sent Messages
- Junk
openai_api_key: ll-ddjg-RI51oFV-0Du9Xo4ERraVFd0UvcwFPP0wUkTB2tC
openai_model: text-embedding-3-small
```
## Running
### Command line
Run the **ish** by issuing
`python3 ish.py`
The main program has a few parameters that can be used.

```text
usage: ish.py [-h] [--learn-folders] [--interactive] [--dry-run] [--daemon] [--config-path CONFIG_PATH] [--verbose]
options:
-h, --help show this help message and exit
--learn-folders, -l Learn based on the contents of the destination folders
--interactive, -i Prompt user before moving anything
--dry-run, -n Don't actually move emails
--daemon, -D Run in daemon mode (NOT IMPLEMENTED)
--config-path, -C CONFIG_PATH
Path for config file and data. Will default to /Users/fredriklilja/.ish
--verbose, -v Verbose/debug mode
```

### In Docker

You can also run **ish** in a Docker container.
`docker run -it -v ./.ish:/opt/ish/config -e ISH_DAEMON=True -e ISH_DEBUG=True -e ISH_LEARN=True frli4797/ish`

## Take care

I leave no guarantees that this will work with your mailprovider, nor that it will work for well for your language. This is an application that I've been tinkering with as I got fed up with sifting through my email and wanted to learn something new. Email might be destroyed, misplaces or lost from using this little tool. Take care. Make backups.

## Future development

- [x] Make it work (kind of).
- [x] Create command line parameters for the usual tasks and tweaks, such as traing, inference, dry-run
- [x] Optimize embedding calls to OpenAI by batching
- [ ] Dockerize
- [x] Dockerize
- [ ] Use dev container
- [ ] Daemonize to be able to run this as a service
- [x] Service mode/daemon mode to be able to run this as a service, polling the imap server every so often
- [ ] Add Ollama as a potential source of embeddings

## Other

I made some experiments with other embedding models using Ollama. Unfortunately the precision really suffered in these experimiments, especially with a mailbox with messages on multiple languages.

Thanks to: [@kenseehart](https://github.com/kenseehart)
7 changes: 4 additions & 3 deletions imap.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,11 @@ def mesg_to_text(mesg: email.message.Message) -> str:


class ImapHandler:
def __init__(self, settings: Settings) -> None:
def __init__(self, settings: Settings, readonly=False) -> None:
self.__settings = settings
self.__imap_conn = None
self.logger = logging.getLogger(self.__class__.__name__)
self.__readonly = readonly

def get_connection(self):
return self.__imap_conn
Expand Down Expand Up @@ -199,7 +200,7 @@ def search(self, folder: str, search_args=None) -> list[int]:
def __search(self, folder: str, search_args=None) -> list[int]:
if search_args is None:
search_args = ["ALL"]
self.__imap_conn.select_folder(folder)
self.__imap_conn.select_folder(folder, self.__readonly)
results = self.__imap_conn.search(search_args)
return results

Expand Down Expand Up @@ -234,7 +235,7 @@ def move(
dest_folder,
)
raise ValueError("Expected uids to be a list")
self.__imap_conn.select_folder(folder)
self.__imap_conn.select_folder(folder, self.__readonly)
if flag_messages:
self.__imap_conn.add_flags(uids, [imapclient.FLAGGED])
if flag_unseen:
Expand Down
113 changes: 61 additions & 52 deletions ish.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
Status: Early development
"""

from enum import Enum
import logging
import os
import shelve
Expand Down Expand Up @@ -53,6 +54,9 @@ def env_to_bool(key: str):
return os.environ.get(key) is not None


Action = Enum("Action", ["YES", "NO", "QUIT"])


class ISH:
debug = False
_exit_event = Event()
Expand All @@ -62,7 +66,11 @@ class ISH:
_daemon = False

def __init__(
self, interactive: bool = False, train: bool = False, daemon: bool = False
self,
interactive: bool = False,
train: bool = False,
daemon: bool = False,
dry_run=False,
) -> None:
self.logger = logging.getLogger(self.__class__.__name__)

Expand All @@ -75,6 +83,7 @@ def __init__(
self._interactive = interactive
self._train = train
self._daemon = daemon
self._dry_run = dry_run

self.classifier: RandomForestClassifier = None
self.moved = 0
Expand Down Expand Up @@ -379,7 +388,6 @@ def classify_messages(self, source_folders: List[str]) -> None:
source_folders (List[str]): list of source folders
"""
imap_conn: ImapHandler = self.__imap_conn

self.skipped = 0
self.moved = 0
classifier = self.classifier
Expand Down Expand Up @@ -411,48 +419,44 @@ def classify_messages(self, source_folders: List[str]) -> None:
"body": mesgs[uid]["body"][0:100],
}
if top_probability > 0.25:
self.logger.info(
"\n%3i From %s: %s",
uid,
mess_to_move["from"],
mess_to_move["body"],
)
self._log_move(uid, "Going to move", ranks, mess_to_move)

for p, c in ranks[:3]:
self.logger.info("%.2f: %s", p, c)

if self.interactive and not self.__select_move(dest_folder):
self.logger.debug(
"""Skipping due to probability %.2f
%i From %s: %s""",
top_probability,
uid,
mess_to_move["from"],
mess_to_move["body"],
)
self.skipped += 1
continue
if self.interactive:
answer = self.__select_move(dest_folder)
if answer == Action.NO:
self.skipped += 1
continue
elif answer == Action.QUIT:
break

if dest_folder not in to_move:
to_move[dest_folder] = [mess_to_move]
else:
to_move[dest_folder].append(mess_to_move)

else:
self.logger.debug(
"""Skipping due to probability %.2f
%i From %s: %s""",
top_probability,
uid,
mess_to_move["from"],
mess_to_move["body"],
self._log_move(
uid, "Skipping due to probability", ranks, mess_to_move
)
self.skipped += 1

self.logger.info("Finished predicting %s", folder)
self.moved += self.move_messages(folder, to_move)
self.logger.info("Finished moved %i and skipped %i", self.moved, self.skipped)

def __select_move(self, dest_folder: str) -> bool:
def _log_move(self, uid, text, ranks, mess_to_move):
self.logger.info(
"%s\n%3i From %s: %s",
text,
uid,
mess_to_move["from"],
mess_to_move["body"],
)

for p, c in ranks[:3]:
self.logger.info("%.2f: %s", p, c)

def __select_move(self, dest_folder: str) -> Action:
"""Interactively ask user if to move.
Args:
Expand All @@ -465,12 +469,11 @@ def __select_move(self, dest_folder: str) -> bool:
while opt not in ["y", "n", "q"]:
opt = input(f"Move message to {dest_folder}? [y]yes, [n]no, [q]quit:")
if opt == "y":
return True
return Action.YES
if opt == "q":
self.logger.info("Quitting.")
sys.exit(0)
else:
return False
return Action.QUIT
return Action.NO

def move_messages(self, folder: str, messages: dict[str, list]) -> int:
"""Move the messages market for moving, by target folder.
Expand All @@ -487,16 +490,23 @@ def move_messages(self, folder: str, messages: dict[str, list]) -> int:
for dest_folder in messages:
messages_list = messages[dest_folder]
uids: list = [mess["uid"] for mess in messages_list]
if len(uids) > 0:
imap_conn.move(
folder,
if not self._dry_run:
if len(uids) > 0:
imap_conn.move(
folder,
uids,
dest_folder,
flag_messages=True,
flag_unseen=not self.interactive,
)
moved += len(uids)
else:
self.logger.info(
"Dry run. WOULD have moved UID %s from %s to %s",
uids,
folder,
dest_folder,
flag_messages=True,
flag_unseen=not self.interactive,
)
moved += len(uids)

return moved

def run(self) -> int:
Expand Down Expand Up @@ -527,12 +537,6 @@ def run(self) -> int:
new_var = 10
self._exit_event.wait(POLL_TIME_SEC)

# except Exception as e:
# base_logger.error("Something went wrong. Unknown error.")
# base_logger.info(e, stack_info=True)
# return -1
# finally:
# self.close()
return 0

def close(self):
Expand All @@ -559,14 +563,14 @@ def __do_exit(self, signum, frame):
def main(args: Dict[str, str]):
ISH.debug = bool(args.pop("verbose"))
dry_run = bool(args.pop("dry_run")) # noqa: F841
daemonize = bool(args.pop("daemon")) # noqa: F841
daemonize = bool(args.pop("daemon"))
interactive = bool(args.pop("interactive"))
train = bool(args.pop("learn_folders"))
config_path = args.pop("config_path")
if config_path is not None and not config_path == "":
os.environ["ISH_CONFIG_PATH"] = config_path

ish = ISH(interactive=interactive, train=train, daemon=daemonize)
ish = ISH(interactive=interactive, train=train, daemon=daemonize, dry_run=dry_run)
r = ish.run()
sys.exit(r)

Expand All @@ -576,7 +580,12 @@ def main(args: Dict[str, str]):
import argparse

userhomedir = Settings.get_user_directory()
parser = argparse.ArgumentParser(description="Lorem ipsum")
parser = argparse.ArgumentParser(
description="""Magically sort email into smart folders.
**ish** works by downloading plain text versions of all the \
emails in the source email folders and move those unread to \
the destination folders, by using a multi class classifier."""
)
# Environment variables always takes precedence.
parser.add_argument(
"--learn-folders",
Expand Down Expand Up @@ -605,9 +614,9 @@ def main(args: Dict[str, str]):
parser.add_argument(
"--daemon",
"-D",
help="Run in daemon mode (NOT IMPLEMENTED)",
help="Run in daemon/polling mode",
action="store_true",
default=bool(os.environ.get("ISH_DAEMON")),
default=env_to_bool("ISH_DAEMON"),
)

parser.add_argument(
Expand Down

0 comments on commit 6dad1a7

Please sign in to comment.