Developer¶
To simplify the continuous development of the gmailsorter service, the whole development is based on the gmailsorter
python module. While this module is used in the webservice, the docker container and the stand-alone python application
the later two also allow loading the current state of the model directly.
Python Interface¶
Just install the gmailsorter
python package and then import the Gmail
class and the function load_client_secrets_file
from the gmailsorter
module:
from gmailsorter import Gmail, load_client_secrets_file
Initialize gmailsorter¶
Create a gmail
object from the Gmail()
class:
gmail = Gmail(
client_config=load_client_secrets_file(
client_secrets_file="/absolute/path/to/credentials.json"
),
connection_str="sqlite:////absolute/path/to/email.db",
)
Based on the configuration from the previous section, the function load_client_secrets_file
is used to load the
credentials.json
file and provide its content as python dictionary to the client_config
parameter of the Gmail()
class. In addition to the client_config
parameter the Gmail()
class also requires a connection to an SQL database
which is provided as connection_str
. In addition the email_download_format
can be specified as either metadata
or
full
, where the primary difference is whether the content of the email is stored or not.
Sync local database with email account¶
To reduce the communication overhead, the emails are stored locally in an SQLite database.
gmail.update_database(quick=False)
By setting the optional flag quick
to True
only new emails are downloaded while changes to existing emails are
ignored.
Generate pandas dataframe for emails¶
Load all emails from the local SQLite database and combine them in a pandas DataFrame for further postprocessing:
df = gmail.get_all_emails_in_database()
Download specific label from email server¶
Download emails with the label "MyLabel"
from the email server:
df = gmail.download_emails_for_label(label="MyLabel")
In this case the emails are not stored in the local SQLite database.
Filter emails using machine learning¶
Assign new email labels to the emails with the label "MyLabel"
:
gmail.filter_messages_from_server
label="MyLabel",
recommendation_ratio=0.9,
)
This functionality is based on the download_emails_for_label()
function above. It checks the server for new emails for
a selected label "MyLabel"
. Then reloads the machine learning model from the local SQLite database and trys to predict
the correct labels for these emails. The recommendation_ratio
defines the level of certainty required to actually move
the email, with 0.9
equalling a certainty of 90%.
Future directions¶
The current machine learning model is limited in the precision and memory usage. So there is a great interest to replace it with a computationally more efficient model. All suggestions and feedback are welcome. Beyond the optimization of the machine learning model and general improvements to the stability of the code base, the extension to other email services would be great but is currently on hold based on limited resources.