An example of a real-world DS&A problem, commonly referred to as N+1 Queries.
@bryant-finney has encountered this problem several times in his career, and this example demonstrates how to recognize it in production.
A team of developers has built a web application that cat owners use to...
- browse veterinarians at participating clinics
- schedule and track their cats' veterinary appointments with their preferred provider
The application is built using a Python web framework, and the team is using an ORM to interact with the database. This repo emulates a portion of the application, defining the database models and a connection protocol for reading from the database.
The team has recently launched a new Success Stories page, which advertizes veterinary success stories from different clinics and pet owners. We tested the page extensively in a QA environment, and we used an A/B testing strategy to verify things worked well in production (for a small subset of users).
However, we were shocked to discover one morning that the production environment was down. All servers were had plateaued at 25% CPU utilization, and the application was completely unresponsive. Then we connected to the database, and we saw it was executing thousands of SQL queries every minute.
From here, we connected to a server hosting the application and concluded the issue was caused by the N+1 Queries problem.
See if you can spot it here in the code! Then, discuss ways we could fix it.
The data structures used in this problem are located in src/models.py. These structures model the data stored in the database (called an ORM, or object-relational mapping). The database schema is shown below:
src/db.py contains the code for connecting to the database and selecting data from it. In order to populate content in the new Success Stories page, we needed write an algorithm to select all cats that had been treated at a given clinic (so that the front end to display the cats' success stories). We implemented this algorithm in the db.Connection.get_cats_seen_at():
def get_cats_seen_at(self, clinic_name: str) -> list[models.Cat]:
"""Return all cats seen at the given clinic."""
clinic = self.get_clinic(clinic_name)
veterinarians = self.get_veterinarians(clinic)
all_appointments = [
appointment
for veterinarian in veterinarians
for appointment in self.get_appointments(veterinarian)
]
all_cats: list[models.Cat] = []
for appointment in all_appointments:
cat = self.get_cat(appointment.cat_id)
if cat not in all_cats:
all_cats.append(cat)
return all_cats
Furthermore, we tested to make sure everything works in the notebooks/testing.ipynb notebook.
This project requires the following system dependencies. Follow each link to open the installation pages:
- git-lfs: used for storing the large datasets in data/
- After installing
git-lfs
, rungit lfs install
to initialize it
- After installing
- Python 3.8+
- Poetry
- There are multiple ways to install it; using pipx works well
- To install
pipx
:# the following assumes python3 is installed on your system python3 -m pip install --user pipx && python3 -m pipx ensurepath
- And then to install
poetry
:pipx install poetry
With poetry
installed, you can now clone the project and install its dependencies:
git clone https://github.com/bryant-finney/python-dsa.git &&
cd python-dsa
# install dependencies to a virtual environment
poetry install
# to activate the virtual environment, run...
poetry shell
Using with VSCode
This project is set up to work with VSCode. Just run the following command to launch the project:
code python-dsa.code-workspace
When the editor opens, press ⌘ + ⇧ + x
(mac) / ctrl + shift + x
(windows) to open the extensions view and search @recommended
to list extensions recommended for this project.