Ensembl Genomio Pipelines:

Genomio prepare pipeline

Module [Bio::EnsEMBL::Pipeline::PipeConfig::BRC4_genome_prepare_conf]

Genome prepare pipeline for BRC/Metazoa

Description

Retrieve data for a genome from INSDC and prepare the following files in a separate folder for each genome:

FASTA for DNA sequences
FASTA for protein sequences
GFF gene models
JSON functional annotation
JSON seq_region
JSON genome
JSON manifest

The JSON files follow the schemas defined in the src/python/ensembl/io/genomio/data/schemas folder.

These files can then be fed to the Genome loader pipeline.

How to run

init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::BRC4_genome_prepare_conf \
    --host $HOST --port $PORT --user $USER --pass $PASS \
    --hive_force_init 1 \
    --pipeline_dir temp/prepare \
    --data_dir $INPUT \
    --output_dir $OUTPUT \
    ${OTHER_OPTIONS}

Parameters

option	default value	meaning
`--pipeline_name`	brc4_genome_prepare	name of the hive pipeline
`--pipeline_dir`		temp directory for this pipeline run
`--data_dir`		directory with json files for each genome to prepare, following the format set by `src/python/ensembl/io/genomio/data/schemas/genome.json`
`--output_dir`		directory where the prepared files are to be stored
`--merge_split_genes`	0	Sometimes the gene features are split in a gff file. Ensembl expects genes to be contiguous, so this option merge the parts into 1.
`--exclude_seq_regions`		Do not include those seq_regions (apply to all genomes, this should be seldom used)
`--validate_gene_id`	0	Enforce a strong gene ID pattern (replace by GeneID if available)
`--ensembl_mode`	0	By default, set additional metadata for BRC genomes. With this parameter, use vanilla Ensembl metadata.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipelines.md

pipelines.md

Ensembl Genomio Pipelines:

Genomio prepare pipeline

Description

How to run

Parameters

Files

pipelines.md

Latest commit

History

pipelines.md

File metadata and controls

Ensembl Genomio Pipelines:

Genomio prepare pipeline

Description

How to run

Parameters