- Check chromosomes in Variants locations: mind to scaffold, null, and non-autosomal chromosomes for Goat and Sheep
- Rename objects (use names in a consistent way, ex TOP, BOT)
- Release a smarter coordinate version with information on every variant defined in database (which will be used as reference)
- Map affymetrix snps in OARV3 coordinates
- Check if
rs_id
is still valid or not (with EVA)
- Fix issues with sample countries using reverse geocoding (#112) * fix country for Merino (Sheep) * fix country for Sumavska (Sheep) * fix GPS location for Latxa (Sheep) * fix country for Karakul (Sheep) * fix country for Romanov (Sheep) * fix country for Suffolk (Sheep) * fix country for Texel (Sheep)
- Manage python packages with poetry (#128)
- Add data for Guisandesa goats (#117 <#117>)
- Rename
manifacturer
intomanufacturer
- Convert genotypes from top to forward (#111)
- Update dependencies
- Load phenotypes for Fosses, Provencale goat breeds
- Add sex for Fosses, Provencale goat breeds
- Add sex while importing metadata
- Load multiple phenotypes for Boutsko foreground sheeps
- Add multiple phenotypes as a list (#103)
- Update datasets metadata
- Update dependencies
- Capitalize
species_class
parameter insrc.data.import_breeds.py
- Generate output files for OARV4 and CHIR1 (#87)
- Import data from dbSNP152 (#15)
- Import data from IGGC (#18)
- Split
import_consortium.py
inimport_isgc.py
andimport_iggc.py
to import data from Sheep and Goat genome consortia respectively - Force data update when importing from consortium
- Track date when importing from consortium
- Determine
illumina_top
data directly from variant for Sheep when importing from consortium data - Uniform note metadata field (add a note parameters in import metadata)
- Import data from Cortellari et al 2021 (https://doi.org/10.1038/s41598-021-89900-2)
- Import data from Burren et al 2016 (https://doi.org/10.1111/age.12476)
- Revise illumina A/B genotype tracking
- Import from Illumina report with only 3 columns in SNP list file
- Update dependencies
- Import background data from Gaouar et al 2017 (https://doi.org/10.1038/hdy.2016.86)
- Import from plink with illumina coding (as specified in manifest: not top nor forward)
- Import background data from Belabdi et al 2019 (https://doi.org/10.1038/s41598-019-44137-y)
- Import background data from Ciani et al 2020 (https://doi.org/10.1186/s12711-020-00545-7)
- Import background data from Barbato et al 2017 (https://doi.org/10.1038/s41598-017-07382-7)
- Update species for european mouflon
- Support species update with
import_metadata.py
- Import 18 welsh breed as background genotypes
- Rename two welsh breeds
- Model doi in datasets
- Upgrade CI workflows to
actions/cache@v3
- Add
SNPconvert.py
script - Import genotypes of other WPs coming from Uruguay
- Deal with affymetrix report with less SNPs than declared
- Add an option to skip coordinate check when importing affymetrix report
- Import from affymetrix a limited number of samples
- Skip sample creation when there's no alias
- Support for missing columns in affymetrix report files
- Support invalid python names in
src.features.affymetrix.read_affymetrixRow
- Update requirements
- Deal with missing files in
import_datasets.py
- Update Uruguay metadata locations
- Move Galway sheep to Ireland country (Ovine HapMap)
- Update requirements
- Read from affymetrix A/B reportfile
- Import latest Uruguayan data (#65)
- Configure database connection (#66)
- Update sex in ped file if there are information in database
- Enable continuous integration for documentation (ReadTheDocs)
- Update documentation
- Track full species information in Sample (support for multi-species sheep and goats)
- Updated isheep exploration notebooks
- Deal with unknown countries and species
- Fix issues related on alias when creating samples or adding metadata
- Fetch variants using positions
- Import from plink using genomic coordinates
- Import 50K, 600K and WGS isheep datasets (#47)
- Fix issue in
src.features.plinkio.plink_binary_exists
- Code refactoring in
src.features.plinkio
- Import data from Sheep HapMap V2
- Update requirements
- Import data from Hungary (#53)
- Create a new sample when having the same
original_id
in dataset but for a different breed illumina_top
is an attribute of variant, and is set when the first location is loaded.- Check variants data before update (#56)
- Simplified
import_affymetrix
script - Import custom affymetrix chips (Oar_v3.1)
- Support source and destination assemblies when importing from plink or affymetrix source files
- Deal with spaces in filenames while importing from plink
- Add
affy_snp_id
primary key - Update
import_affymetrix.py
script - Import data from Spain (#52)
- Fix 20220503 dataset breed and churra chip name
- Track manifest probe
sequence``s by ``chip_name
- Track
probeset_id
bychip_name
- Search for affymetrix
probeset_id
in the properchip_name
while importing samples - Track multiple
rs_id
- Fetch churra coordinates by
rs_id
andprobeset_id
and filter out unmanaged SNPs - If
src_dataset
anddst_dataset
are equals, provide onlysrc_dataset
- Model location with
MultiPointField
- Describe smarter metadata
- Import sweden goat metadata
- Import latest 290 samples greek dataset
- Fix issue with greek samples name (
B273
converted intoB273A
) - Add latest 19 sheep greek samples
- Add a country collection
- Update dependencies
- Add 270 Frizarta background samples
- Import from ab plink and support multiple missing letters
- Track database status and constants
- Add foreground/background type attribute in
SampleSpecies
- Update dependencies
- Add make rule to pack results and make checksum
- Move greek foreground metadata to a custom phenotypes dataset
- Update greek foreground metadata
- Import phenotypes from Uruguay
- Import phenotypes using alias
- Allow phenotypes for ambiguous sex animals
- Import french goat foreground dataset
- Pin
plinkio
to support extra-chroms in plink binary files - Import 5 Sweden Sheep background genotypes
- Force half-missing SNPs to be MISSING
- Add the README.txt.ftp
- Bug fixed in importing multibreed reportfile (setting FID properly in output)
- Set nullable
ListField
for sample locations and variant consequences - Capitalize phenotype values (ie milk -> Milk)
- Import greek chios-mytilini-boutsko sheep dataset
- Track multiple location for sample (deal with transhumant breeds )
- Import greek skopelios-eghoria goat dataset
- Use sample data to deal with multi breeds illumina row files
- Determine fid from database with IlluminaReportIO
- Import greek frizarta-chios-pelagonia sheep dataset
- Import greek frizarta-chios sheep dataset
- Import sweden foreground goat dataset
- Update ADAPTmap breed names and phenotypes import
- Check that breed exists while inserting phenotype data
- Import french foreground sheep dataset
- Use
elemMatch
in projection inplinkio.SmarterMixin.fetch_coordinates
(ex:VariantSheep.objects.fields(elemMatch__locations={"imported_from": "SNPchiMp v.3", "version": "Oar_v4.0"})
) - Use
elemMatch
to search a SNP within the desired coordinate systems inplinkio.SmarterMixin.fetch_coordinates
- Skip SNPchimp indels when importing from SNPchimp
- Skip illumina indels when reading from manifest
- Add
chip_name
in Dataset (database value, not user value) - Skip
null
fields when importing datasets - Import uruguay sheep affymetrix data
- Import from affymetrix dataset
- Rely on original affymetrix coordinate system to determine illumina top alleles
- Search samples aliases while importing genotypes
- Clearly state when creating samples (ignore samples if not defined in database)
- Track sample aliases for
original_id
- Import samples from file by providing country and breeds values as parameters
- Import sheep coordinates from genome project
- Security updates
- Fix github Workflow
dbSNP
feature library refactor- fix linter issues
- Transform affymetrix unmapped chrom to
0
- Transform SNPchiMp unmapped chroms to
0
- ignore affymetrix insertions and deletions
- join affymetrix data with illumina relying on
cust_id
- define
illumina_top
from affymetrix flanking sequences - load data from affymetrix manifest
- calculate illumina_top from affymetrix sequence
- Test import data from snpchimp
- Import
OARV4
coordinates data/common
module refactoring- Fix bug in importing dataset order
- Model affymetrix fields
- Read from affymetrix manifest file
- Track illumina manufactured date
- Upgrade dependencies
- Enable continuous integration
- Github Workflow
- Coverage
- Deal with multi-sheets
.xlsx
documents - Import phenotypes (from a source dataset to a destination dataset)
- Define phenotype attribute as a
mongoengine.DynamicDocument
field - Import metadata or phenotype by breeds or by samples
- Import metadata (from a source dataset to a destination dataset)
- Forcing
plink
chrom options when converting in binary formats - import data from ADAPTmap project
- Import goat breeds (from a source dataset to a destination dataset)
- Import goat data from plink files
- Import goat metadata
- Import goat data from manifest and snpchimp
- configure
mongodb-express
credentials - Add Goat Related tables
- add
variantGoat
collection - add
sampleGoat
collection
- add
- Unset ped columns if relationship can't be derived from data (ex. brazilian BSI)
- Deal with geographical coordinates
- Add features to samples (relying on metadata file)
- Breed name should be a unique key within species
- make rule to clean-up
interim
data - skip already processed file from import
- Deal with
mother_id
andfather_id
(search forsmarter_id
in database) - Deal with multi-countries dataset
- track country in aliases while importing breeds from dataset
- Track
chip_name
with samples - Deal with binary plink files
- Search breed by aliases used in
dataset
:- match fid with breed aliases in
dataset
- store aliases by
dataset
- match fid with breed aliases in
- Add breeds from
.xlsx
files
- Merge multiple files per dataset
- Import from an illumina report file
- Deal with AB allele coding
- Deal with plink text files using modules
- Fix SNPchiMp data import
- Determine
illumina_top
coding as a property relying on database data - Support multi-manifest upload (extend database with HD chip)
- Deal with compressed manifest
- Add breeds with CLI
- Check coordinates format relying on DRM
- Test stuff with
mongomock
- Start with project documentation
- Explore background datasets
- Merge plink binary files
- Convert from
forward
toillumina_top
coordinates - Convert to plink binary format
- Manage database credentials
- Import samples into
smarter
database while fixing coordinates and genotypes - Configure tox and sphinx environments
- Model breeds in
smarter
database - Import datasets into database
- Read from dbSNP xml dump file
- Import SNPchiMp data into
smarter
database - Import Illumina manifest data into database
- Model objects with
mongoengine
- Model smarter ids
- Configure environments, requirements and dependencies