Skip to content

Commit

Permalink
Merge pull request #12 from EBI-Metagenomics/feature/cazy-annotations
Browse files Browse the repository at this point in the history
CAZyme annotations on MAGs
  • Loading branch information
SandyRogers authored Jul 8, 2024
2 parents a98c3ce + 686a513 commit 21fd0a0
Show file tree
Hide file tree
Showing 22 changed files with 323 additions and 11 deletions.
7 changes: 4 additions & 3 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
[run]
omit =
*tests*
*migrations*
*/tests/*
*/migrations/*
*settings*
*asgi.py
*wsgi.py
*wsgi.py
*generate_dev_data.py
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ __pycache__
.coverage
.pytest_cache
test-results
screenshot_website*

# Elastic Beanstalk Files
.elasticbeanstalk/*
Expand Down
Binary file added docs/img/website/mag-annotations.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/img/website/mag-catalogue.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/website/mag-containment.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions docs/website.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,37 @@ MAGs can be found by searching on accession or taxonomy, or for the accession of

The MAGs in a catalogue can be downloaded as a TSV file, using the "Download all as TSV" button.

#### MAG Annotations
MGnify Genomes catalogues use a standardised pipeline ([Gurbich et al. 2023](https://europepmc.org/article/MED/36806692)) to annotate
the assembled genomes with various tools.
These annotations are performed on the species-level cluster representative genomes.
These annotations can all be accessed via the data portal’s links to MGnify.

Given the HoloFood project's aims, [CAZy](http://www.cazy.org) (Carbohydrate-Active enZymes) annotations are particularly relevant to
the HoloFood MAG catalogues.

A summary of the CAZy annotations, in the form of counts per CAZy category, is therefore shown on the detail view of each MAG.
(Note that, like all MAG annotations, these CAZy annotations refer to the MAG's cluster representative genome – not necesarily the HoloFood-data-derived MAG itself.)

![Screenshot of a MAG’s detail page, including CAZy annotations](/img/website/mag-annotations.png)

These annotations are also available via each genome's [API](api.ipynb) endpoint.

#### MAG containment within samples
To facilitate the linking of MAGs to other samples within the HoloFood dataset, the data portal also includes a list of "containments" for each MAG within all of the project’s metagenomic samples.

For each MAG, a sample list was found using Mastiff (a tool based on [sourmash](https://sourmash.bio), [Irber et al. 2022](https://www.biorxiv.org/content/10.1101/2022.11.02.514947v1)).
Each sample in this list contains the MAG at some level, equivalent to the fraction of the MAG’s kmers that are present in the sample’s sequencing reads.
The list can be filtered to find only the samples that contain the MAG above some minimum containment threshold.

![Screenshot of a MAG’s sample containment list](/img/website/mag-containment.png)

These sample containment lists are also available via each genome's [API](api.ipynb) endpoint, as well as via the TSV export option above the table.

Together with the [MAG’s CAZy annotations](website.qmd#mag-annotations), this feature means the prevalence of
carbohydrate-active enzymes can be compared at the genome level between samples originating from animals
under different experimental conditions.

### Viral Catalogues
![Screenshot of a viral catalogue](/img/website/viral-catalogue.png)
Viral catalogues are lists of the unique (at species-level) viruses found in HoloFood samples.
Expand Down
8 changes: 7 additions & 1 deletion holofood/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,13 @@ def resolve_representative_url(obj: Genome):

class Config:
model = Genome
model_fields = ["accession", "cluster_representative", "taxonomy", "metadata"]
model_fields = [
"accession",
"cluster_representative",
"taxonomy",
"metadata",
"annotations",
]


class GenomeSampleContainmentSchema(ModelSchema):
Expand Down
15 changes: 15 additions & 0 deletions holofood/filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from django.forms import NumberInput
from django.utils.safestring import mark_safe

from holofood.forms import CazyAnnotationsFilterForm
from holofood.models import (
Sample,
Genome,
Expand Down Expand Up @@ -123,13 +124,27 @@ class Meta:
class GenomeFilter(django_filters.FilterSet):
class Meta:
model = Genome
form = CazyAnnotationsFilterForm

fields = {
"accession": ["icontains"],
"cluster_representative": ["icontains"],
"taxonomy": ["icontains"],
}

def filter_queryset(self, queryset):
qs = queryset
for name, value in self.form.cleaned_data.items():
if name in self.filters:
qs = self.filters[name].filter(qs, value)

filters = Q()
if self.data:
cazy_annotations = self.data.getlist("cazy_annotations")
for key in cazy_annotations:
filters &= Q(**{f"annotations__cazy__{key}__gt": 0})
return qs.filter(filters)


class GenomeSampleContainmentFilter(django_filters.FilterSet):
minimum_containment = django_filters.NumberFilter(
Expand Down
75 changes: 75 additions & 0 deletions holofood/forms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
from django import forms
from django.forms.widgets import SelectMultiple
from django.utils.html import format_html
from django.utils.encoding import force_str


class CazyCheckboxesWidget(SelectMultiple):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.choices = (
[
("GH", "Glycoside Hydrolase"),
("CB", "Carbohydrate-Binding"),
("PL", "Polysaccharide Lyases"),
("CE", "Carbohydrate Esterases"),
("AA", "Auxiliary Activities"),
("GT", "GlycosylTransferases"),
],
)

def render(self, name, value, attrs=None, renderer=None):
output = []
if not isinstance(value, list):
value = [value]
for option_value, option_label in self.choices:
final_attrs = self.build_attrs(self.attrs, attrs)
final_attrs["type"] = "checkbox"
final_attrs["name"] = name
final_attrs["value"] = option_value
final_attrs["id"] = "id_%s_%s" % (name, option_value)

if value and force_str(option_value) in value:
final_attrs["checked"] = "checked"
else:
final_attrs.pop("checked", None)

output.append(
format_html(
'<div class="hf-checkbox">'
'<input{} /> <label for="{}">{}</label>'
"</div>",
format_html(
"".join(
f' {key}="{value}"' for key, value in final_attrs.items()
)
),
final_attrs["id"],
option_label,
)
)

return format_html("".join(output))


class CazyAnnotationsFilterForm(forms.Form):
field_order = [
"accession__icontains",
"cluster_representative__icontains",
"taxonomy__icontains",
"cazy_annotations",
]
cazy_annotations = forms.MultipleChoiceField(
choices=[
("GH", "Glycoside Hydrolase"),
("CB", "Carbohydrate-Binding"),
("PL", "Polysaccharide Lyases"),
("CE", "Carbohydrate Esterases"),
("AA", "Auxiliary Activities"),
("GT", "GlycosylTransferases"),
],
widget=CazyCheckboxesWidget,
required=False,
label="CAZy Annotations present",
help_text="Annotated on species rep.",
)
20 changes: 20 additions & 0 deletions holofood/management/commands/import_mag_catalogue.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ def add_arguments(self, parser):
type=str,
help="System (chicken/salmon) of the catalogue (or None to copy from related MAG catalogue)",
)
parser.add_argument(
"--representatives_cazy_annotations_file",
type=argparse.FileType("r"),
help="Optional path to a TSV file listing cazy annotations for the cluster representative MAGs.",
)

@staticmethod
def _parse_taxonomic_lineage(lineage_string: str) -> str:
Expand Down Expand Up @@ -79,6 +84,17 @@ def handle(self, *args, **options):
system=options["system"],
)
logging.info(f"Created MAG {catalogue=}")

cazy_file = options["representatives_cazy_annotations_file"]
cazy_annotations = {}
if cazy_file:
cazy_reader = DictReader(cazy_file, delimiter="\t")
for row in cazy_reader:
cazy_annotations.setdefault(row["Genome"], {})[
row["CAZy_category"]
] = int(row["Counts"])
cazy_file.close()

for mag in reader:
mag_data = {
field_name: mag[col_name]
Expand All @@ -90,6 +106,10 @@ def handle(self, *args, **options):
mag_data["taxonomy"]
)

mag_data["annotations"] = {
"cazy": cazy_annotations.get(mag_data["cluster_representative"], {})
}

metadata = {
col_name: col_val
for col_name, col_val in mag.items()
Expand Down
17 changes: 17 additions & 0 deletions holofood/migrations/0038_genome_annotations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Generated by Django 4.2 on 2024-07-02 16:01

from django.db import migrations, models


class Migration(migrations.Migration):
dependencies = [
("holofood", "0037_genomesamplecontainment"),
]

operations = [
migrations.AddField(
model_name="genome",
name="annotations",
field=models.JSONField(blank=True, default=dict),
),
]
1 change: 1 addition & 0 deletions holofood/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -456,6 +456,7 @@ class Genome(models.Model):
)
taxonomy = models.CharField(max_length=200)
metadata = models.JSONField(default=dict, blank=True)
annotations = models.JSONField(default=dict, blank=True)

class Meta:
ordering = ("accession",)
Expand Down
3 changes: 3 additions & 0 deletions holofood/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -966,6 +966,9 @@ def create_genome_objects(sample: Sample) -> GenomeCatalogue:
"catalogue": catalogue,
"taxonomy": "Root > Foods > Donuts > Sugar Monster",
"metadata": {},
"annotations": {
"cazy": {"GH": 6, "PL": 5, "CE": 4, "AA": 3, "CB": 2, "GT": 1, "CL": 0}
},
},
)

Expand Down
8 changes: 8 additions & 0 deletions holofood/tests/static_fixtures/mag-catalogue-cazy.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Genome CAZy_category Counts
0 MGYG000290000 GH 6
1 MGYG000290000 PL 5
2 MGYG000290000 CE 4
3 MGYG000290000 AA 3
4 MGYG000290000 CB 2
5 MGYG000290000 GT 1
6 MGYG000290000 CL 0
1 change: 1 addition & 0 deletions holofood/tests/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,7 @@ def test_mag_catalogues(client, chicken_mag_catalogue):
"sample": "SAMEA00000006",
"containment": 0.7,
}
assert data.get("annotations", {}).get("cazy", {}).get("GH") == 6


@pytest.mark.django_db
Expand Down
1 change: 1 addition & 0 deletions holofood/tests/test_import_comands.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ def test_import_mag_catalogue():
"public-donut-v1-0",
"Donut Surface",
"chicken",
f"--representatives_cazy_annotations_file={tests_path}/static_fixtures/mag-catalogue-cazy.tsv",
)
logging.info(out)

Expand Down
25 changes: 22 additions & 3 deletions holofood/tests/test_website.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def test_web(self, m):

wait = WebDriverWait(self.selenium, 10)

# # ---- Home page ---- #
# ---- Home page ---- #
self.selenium.get(self.live_server_url)
self.selenium.add_cookie(
{
Expand Down Expand Up @@ -381,8 +381,27 @@ def test_web(self, m):
)
self.assertIn("export", export_link.get_attribute("href"))

# chart should be showing cazys. test accessible/aria version of chart rather than svg.
cazy_accessible_table = self.selenium.find_element(
by=By.XPATH, value="//*[@id='cazy_chart']//table/tbody"
)
self.assertEqual(
len(cazy_accessible_table.find_elements(by=By.TAG_NAME, value="tr")), 6
)

# element is hidden so use selenium script to get text
first_cazy_label = self.selenium.execute_script(
"return document.evaluate(\"//*[@id='cazy_chart']//table/tbody/tr[1]/td[1]\", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.textContent;"
)
self.assertEqual(first_cazy_label.strip(), "Glycoside Hydrolase")

first_cazy_count = self.selenium.execute_script(
"return document.evaluate(\"//*[@id='cazy_chart']//table/tbody/tr[1]/td[2]\", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.textContent;"
)
self.assertEqual(first_cazy_count.strip(), "6")

# should be one sample containing this MAG
table = self.selenium.find_element(by=By.TAG_NAME, value="tbody")
table = self.selenium.find_element(by=By.CLASS_NAME, value="vf-table__body")
self.assertEqual(len(table.find_elements(by=By.TAG_NAME, value="tr")), 1)

# change containment to very high, so MAG is contained sufficiently in NO samples
Expand All @@ -397,7 +416,7 @@ def test_web(self, m):
"minimum_containment=0.9",
self.selenium.current_url,
)
table = self.selenium.find_element(by=By.TAG_NAME, value="tbody")
table = self.selenium.find_element(by=By.CLASS_NAME, value="vf-table__body")
self.assertEqual(table.size["height"], 0)

# ---- Viral catalogues ---- #
Expand Down
17 changes: 17 additions & 0 deletions holofood/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,23 @@ def get_context_data(self, **kwargs):
context["catalogue"] = get_object_or_404(
GenomeCatalogue, id=self.kwargs.get("catalogue_pk")
)
cazy = self.object.annotations.get("cazy")
if cazy:
cazy_categories = {
"GH": "Glycoside Hydrolase",
"CB": "Carbohydrate-Binding",
"PL": "Polysaccharide Lyases",
"CE": "Carbohydrate Esterases",
"AA": "Auxiliary Activities",
"GT": "GlycosylTransferases",
}
context["cazy_annotations"] = {
cazy_categories[cat]: count
for cat, count in cazy.items()
if cat in cazy_categories
}
else:
context["cazy_annotations"] = {}
return context


Expand Down
6 changes: 3 additions & 3 deletions requirements-docs.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
jupyterlab==3.4.8
pandas==1.5.1
matplotlib==3.6.1
jupyterlab==4.2.3
pandas==2.2.2
matplotlib==3.9.1
Loading

0 comments on commit 21fd0a0

Please sign in to comment.