Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle when two contigs from the same assembly sligtly overlap #329

Open
Isoris opened this issue Sep 15, 2023 · 7 comments
Open

Comments

@Isoris
Copy link

Isoris commented Sep 15, 2023

Hello,

Thank you for providing this tool.

I'm currently building a species pangenome using Illumina assemblies. However, I've noticed that some contigs appear to overlap or map to the same region without collapsing when I use odgi extract followed by viz.

Is there a method to collapse overlapping alignments into a single alignment for a sample?

Thank you for your assistance.
SIKU01_000692_SIKU01_000693_706654_708999_Operon_34

SIKU01_000323_SIKU01_000329_302761_310606_Operon_321

Thank you for your answer.
Quentin

@subwaystation
Copy link
Member

Hi @Isoris,

odgi viz uses binning to visualize the graph. So a rough summary of the base pair picture is shown here.
Did you take a look at such regions with https://github.com/chfi/waragraph? In the 1D viz, you can zoom in and verify, that indeed, the assemblies overlap and do not have SNPs. In the 2D viz, you can take a closer look at the nodes and the path positions. This might help you to get an idea of how to manually adjust your input sequences!

In PGGB, there is no such method to merge overlapping contigs. In odgi viz you can merge paths by prefixes with -M, but that is for visualization purposes only. As it may not be 100% accurate.

@Isoris
Copy link
Author

Isoris commented Sep 19, 2023

Hi @subwaystation Thank you for your assistance,

In my bacterial dataset, comprising a minimal example of three bacteria, I observed the following results:

Without merging by prefixes, the visualization is as depicted here:
viz_1

However, after merging the paths by prefixes, the graph alters to the representation shown below:

viz_2

My inquiry is: Is it feasible to extract the complete paths post-merge? Specifically, I am interested in obtaining the nodes and edges present on the left-hand side of the visual. My aim is to extract these "scaffolds" or paths, enabling me to subsequently remap my short reads onto them.

Ultimately, my goal is to produce a unified genome graph, as opposed to a fragmented genome graph. As evident from the left side of the visualization, the merged paths of blue and violet do share an overlap. This suggests that we possess the requisite positional or genomic context information. Using this, I hope to reconstruct a cohesive graph, wherein all nodes within this particular interval are interconnected, deriving from all the individual contigs.

I believe it is possible because I have a set of 80 samples of the same species from short read data and de-novo contigs and also 2 reference genomes.

I would be grateful for any suggestions.
Quentin.

@Isoris
Copy link
Author

Isoris commented Sep 19, 2023

For instance here for another subset of the bacteria of the same species. We obtain this:

combined_renamed Run_2 fasta gz f7ea872 417fcdf 483d7ba smooth final og lay draw_multiqc

Without merging by prefixes, the visualization is as depicted here:

viz_paths

However, after merging the paths by prefixes, the graph alters to the representation shown below:

viz_merged_paths

We can clearly see that it is theoretically possible to integrate the floating "subgraphs" to the main graph at least to merge some of them in a larger graph.

@alarawms
Copy link

Dear @Isoris , thanks,
I have an inquiry, I ran the same issue, how could you merge paths by prefixes,

@Isoris
Copy link
Author

Isoris commented Oct 10, 2023

Dear @Isoris , thanks,
I have an inquiry, I ran the same issue, how could you merge paths by prefixes,

odgi viz -M

To merge prefixes.

If I remember correctly.. i will send you my code later in the afternoon.

Basically the prefixes have to be the same before the first separator.

@alarawms
Copy link

I did do it. it worked.
it is just to have the input file or the samples identifier without the rest of the file, no # for the haplotype or anything. pass it as a text file contains list of names, each sample per line. and then the merging occur.

22-prefix is a text file with sample names, one sample per line.

odgi viz -M ../../22-prefix -i out.og -o out.og-m.png

@Isoris
Copy link
Author

Isoris commented Oct 10, 2023

I did do it. it worked.
it is just to have the input file or the samples identifier without the rest of the file, no # for the haplotype or anything. pass it as a text file contains list of names, each sample per line. and then the merging occur.

22-prefix is a text file with sample names, one sample per line.

odgi viz -M ../../22-prefix -i out.og -o out.og-m.png

Wow I never knew about that, thank you so much for the tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants