I am confused, and need your help #164
Replies: 27 comments
-
Beta Was this translation helpful? Give feedback.
-
Thank you Kevin!
One thing that still I can not find justification for it, is the network
generated from dandelion['edges'], all the clusters are connected
and intra-connections are generated. That means multiple clones are
connected in dandelion.
1-My confusion is how biologically we can justify the intra-connections of
clones? Do you have any comments about the justifications of
intra-clsterers edges from a biological perspective?
2- Most of the clones in my dandelion file have unassigned clone-id. Why
can this happen?
Thank you,
Sara
…On Mon, Jul 11, 2022 at 1:40 PM Zewen Kelvin Tuong ***@***.***> wrote:
Hi Sara,
1- is each node in the dandelion network a clone?
each node is a single cell:
[image: image]
<https://user-images.githubusercontent.com/26215587/178317378-fe93bfca-b00d-44f4-8981-8d3c93ceb32c.png>
and each connected component (network) would most often be 1 clone. there
are situations where a network can be comprised of multiple clones, because
some cells have multiple BCRs/TCRs and dandelion merges them into a single
network just for the visualisation.
2- how the clone network is generated?
in a simple example, for all cells that were assigned a clone id of
1_1_1_1, including cells that have clone ids of 1_2_3_4|1_1_1_1 (exampled
of a single cell expressing two pairs of BCRs) will be selected and
pairwise levenshtein distances will be calculated for every pair of cells
within this subset. The calculation is performed on each IGH/IGK/IGL layer
separately. The layers are then just summed (simple matrix addition),
forming a distance matrix like this:
[image: image]
<https://user-images.githubusercontent.com/26215587/178321696-49800642-edd5-43f5-a88a-640752814772.png>
I've coloured the upper triange grey because it's just going to mirror the
lower triangle.
a minimum spanning tree is then calculated, which will form something like
this:
[image: image]
<https://user-images.githubusercontent.com/26215587/178322113-2623d4e2-651a-4489-85f2-b806ec3fdd64.png>
I've coloured the edge weights (levenshtein distance) blue
In the constructed minimum spanning tree, a special circumstance here is
that Cell 1, being connected to Cell 4, is totally random - Cell 3 and Cell
2 have equal chances of being selected for Cell 1's position because they
are the same distance apart. So, i added a step to 'rescue' those
connections/edges, making it look like:
[image: image]
<https://user-images.githubusercontent.com/26215587/178322811-bb4aec4f-1299-4543-921e-94e074f0d797.png>
I've coloured the rescued edges as orange.
That's it.
3- why after generating the .tsv file, some of the cells have different
cluster_id?
i'm not sure what you mean by this. Unless you are asking why the numbers
change each time you run it - it's got to do with a random argsort whenever
lists of V/D/J and lengths are sorted. The numbers don't have any
particularly meaning other than to say whether or not two different clones
share a similar criteria, so i've never enforced for the numbers to stay
identical all the time.
4- We expected to see the same germline in all the cells in the network.
But the germlines of cells in the network are different. Why?
I'm unsure how this can happen, other than the possiblity as i described
above where a cell can have multiple BCRs, and also when cells have
multiple light chains. Are you sure that the different germlines you are
seeing is not because it's just IGH/IGK/IGL? Otherwise, I'll need an
example where you've observed this.
—
Reply to this email directly, view it on GitHub
<#162 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONV2LYYVQC2DQ7Q3G7LVTRMA7ANCNFSM53H5VKCQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Thank you Kelvin!
tw questions I have:
1- From the dandelion network, how can I extract the single cell ID's in
the biggest clone?
2- Are you saying there is no biological justification to use the "edges"
that was in the previous version?
Thanks,
Sara
…On Tue, Jul 12, 2022 at 6:41 AM Zewen Kelvin Tuong ***@***.***> wrote:
Hi Sara,
1-My confusion is how biologically we can justify the intra-connections of
clones? Do you have any comments about the justifications of
intra-clsterers edges from a biological perspective?
The network structure should look like this:
[image: image]
<https://user-images.githubusercontent.com/26215587/178472236-e01d6ce4-430c-4cdc-988c-50353c8303ff.png>
Just a side note: in the latest update (v0.2.4), .edges have been removed
because its behaviour was a bit random in which nodes were selected for
source/target and this can lead to edge table being unstable - the eventual
network is still the same. I've elected to just operate from the networkx
graphs as the behaviour is more consistent.
2- Most of the clones in my dandelion file have unassigned clone-id. Why
can this happen?
can you try and update your dandelion version and see if this persist?
—
Reply to this email directly, view it on GitHub
<#162 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONTSVPJIHPRVNVF4YGTVTVDXDANCNFSM53H5VKCQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi Kelvin again:
My questions are:
1- From the dandelion network, how can I extract the single cell ID's in
the biggest clone?
2- Are you saying there is no biological justification to use the "edges"
that was in the previous version?
3- why in the same clone, I see different VDJs?
Thanks,
Sara
…On Thu, Jul 14, 2022 at 3:59 PM Sara Moien ***@***.***> wrote:
Thank you Kelvin!
tw questions I have:
1- From the dandelion network, how can I extract the single cell ID's in
the biggest clone?
2- Are you saying there is no biological justification to use the "edges"
that was in the previous version?
Thanks,
Sara
On Tue, Jul 12, 2022 at 6:41 AM Zewen Kelvin Tuong <
***@***.***> wrote:
> Hi Sara,
>
> 1-My confusion is how biologically we can justify the intra-connections of
> clones? Do you have any comments about the justifications of
> intra-clsterers edges from a biological perspective?
>
> The network structure should look like this:
> [image: image]
> <https://user-images.githubusercontent.com/26215587/178472236-e01d6ce4-430c-4cdc-988c-50353c8303ff.png>
>
> Just a side note: in the latest update (v0.2.4), .edges have been
> removed because its behaviour was a bit random in which nodes were selected
> for source/target and this can lead to edge table being unstable - the
> eventual network is still the same. I've elected to just operate from the
> networkx graphs as the behaviour is more consistent.
>
> 2- Most of the clones in my dandelion file have unassigned clone-id. Why
> can this happen?
>
> can you try and update your dandelion version and see if this persist?
>
> —
> Reply to this email directly, view it on GitHub
> <#162 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AVVJONTSVPJIHPRVNVF4YGTVTVDXDANCNFSM53H5VKCQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
-
Sorry Kelvin,
Some other questions are added here:
Hi Kelvin again:
My questions are:
1- From the dandelion network, how can I extract the single cell ID's in
the biggest clone?
2- How can get the size of clones?
2- Are you saying there is no biological justification to use the "edges"
that was in the previous version?
3- why in the same clone, I see different VDJs?
Thanks,
Sara
…On Thu, Jul 14, 2022 at 4:54 PM Sara Moien ***@***.***> wrote:
Hi Kelvin again:
My questions are:
1- From the dandelion network, how can I extract the single cell ID's in
the biggest clone?
2- Are you saying there is no biological justification to use the "edges"
that was in the previous version?
3- why in the same clone, I see different VDJs?
Thanks,
Sara
On Thu, Jul 14, 2022 at 3:59 PM Sara Moien ***@***.***> wrote:
> Thank you Kelvin!
>
> tw questions I have:
> 1- From the dandelion network, how can I extract the single cell ID's in
> the biggest clone?
> 2- Are you saying there is no biological justification to use the "edges"
> that was in the previous version?
>
> Thanks,
> Sara
>
>
> On Tue, Jul 12, 2022 at 6:41 AM Zewen Kelvin Tuong <
> ***@***.***> wrote:
>
>> Hi Sara,
>>
>> 1-My confusion is how biologically we can justify the intra-connections
>> of
>> clones? Do you have any comments about the justifications of
>> intra-clsterers edges from a biological perspective?
>>
>> The network structure should look like this:
>> [image: image]
>> <https://user-images.githubusercontent.com/26215587/178472236-e01d6ce4-430c-4cdc-988c-50353c8303ff.png>
>>
>> Just a side note: in the latest update (v0.2.4), .edges have been
>> removed because its behaviour was a bit random in which nodes were selected
>> for source/target and this can lead to edge table being unstable - the
>> eventual network is still the same. I've elected to just operate from the
>> networkx graphs as the behaviour is more consistent.
>>
>> 2- Most of the clones in my dandelion file have unassigned clone-id. Why
>> can this happen?
>>
>> can you try and update your dandelion version and see if this persist?
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#162 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AVVJONTSVPJIHPRVNVF4YGTVTVDXDANCNFSM53H5VKCQ>
>> .
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>
|
Beta Was this translation helpful? Give feedback.
-
The biggest clone should have a
run
that's correct. no justification
I'm not sure how this can happen. can you show me an example? |
Beta Was this translation helpful? Give feedback.
-
Thanks Kelvin for all your answers. Really appreciate your time!
I could see that after adding vdj, adata2 = ddl.pp.filter_contigs(new_vdj,
adata, filter_rna = True) , I could get clone_id for all cells.
My question is how I can extract all the cells on the BCR network (the
visualized network)? I want to extract the clone_id and cell_barcode from
the visualized BCR_network.
Thank you again!
Sara
…On Mon, Jul 18, 2022 at 4:10 PM Zewen Kelvin Tuong ***@***.***> wrote:
1- From the dandelion network, how can I extract the single cell ID's in
the biggest clone?
The biggest clone should have a clone_id_by_size of 1. So you can just
use the size ids from the metadata.
2- How can get the size of clones?
run ddl.tl.clone_size
2- Are you saying there is no biological justification to use the "edges"
that was in the previous version?
that's correct. no justification
3- why in the same clone, I see different VDJs?
I'm not sure how this can happen. can you show me an example?
—
Reply to this email directly, view it on GitHub
<#162 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONQBD3RGXJSETVRAH3LVUW22BANCNFSM53H5VKCQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
i see. for that you need to extract from the graph itself: you would want to follow the instructions here: which basically should look like: G = vdj.graph[1]
# find the largest connected network
largest_cc = max(nx.connected_components(G), key=len)
# subset to largest_cc
S = [G.subgraph(c).copy() for c in nx.connected_components(G)]
# this should give you the list of nodes that are this network
S.nodes Then you should be able to just match it them from the metadata? newvdj = vdj[vdj.metadata_names.isin(list(S.nodes))].copy()
newvdj.metadata |
Beta Was this translation helpful? Give feedback.
-
Thank you Kelvin. I could see you made a lot of updates on your tutorial.
That helped me to find the problem.
Best,
Sara
…On Mon, Jul 18, 2022 at 4:46 PM Zewen Kelvin Tuong ***@***.***> wrote:
i see.
for that you need to extract from the graph itself:
vdj.graph[0] or vdj.graph[1] - either will work.
you would want to follow the instructions here:
https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.components.connected_components.html
which basically should look like:
G = vdj.graph[1]# find the largest connected networklargest_cc = max(nx.connected_components(G), key=len)# subset to largest_ccS = [G.subgraph(c).copy() for c in nx.connected_components(G)]
# this should give you the list of nodes that are this networkS.nodes# orlist(S.nodes)
Then you should be able to just match it them from the metadata?
newvdj = vdj[vdj.metadata_names.isin(S.nodes)].copy()newvdj.metadata
—
Reply to this email directly, view it on GitHub
<#162 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONU4QODNCXE5Z6MEKQDVUW7B3ANCNFSM53H5VKCQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
So Kelvin I have another question:
I got a much better network before filtering the configs. And after the
configs, my network significantly shrinked.
Is there any justification for using the network without filtering of
configs (, which from that most of the clones have no ids)?
…On Mon, Jul 18, 2022 at 4:54 PM Sara Moien ***@***.***> wrote:
Thank you Kelvin. I could see you made a lot of updates on your tutorial.
That helped me to find the problem.
Best,
Sara
On Mon, Jul 18, 2022 at 4:46 PM Zewen Kelvin Tuong <
***@***.***> wrote:
> i see.
>
> for that you need to extract from the graph itself:
> vdj.graph[0] or vdj.graph[1] - either will work.
>
> you would want to follow the instructions here:
>
> https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.components.connected_components.html
>
> which basically should look like:
>
> G = vdj.graph[1]# find the largest connected networklargest_cc = max(nx.connected_components(G), key=len)# subset to largest_ccS = [G.subgraph(c).copy() for c in nx.connected_components(G)]
> # this should give you the list of nodes that are this networkS.nodes# orlist(S.nodes)
>
> Then you should be able to just match it them from the metadata?
>
> newvdj = vdj[vdj.metadata_names.isin(S.nodes)].copy()newvdj.metadata
>
> —
> Reply to this email directly, view it on GitHub
> <#162 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AVVJONU4QODNCXE5Z6MEKQDVUW7B3ANCNFSM53H5VKCQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
-
Hmm, i think the data may be artificial
In that those cells are connected because they do not have a good set of BCRs, so my current judgement is no it’s not recommended to use the network where it’s formed by unassigned ids.
However, you could disregard dandelion mode of assigning clone ids, and just replace them with your clone ids set by your prefer criterion (or don’t specify any and leave all clone ids as blanks - in which case you will end up with a fully connected network, can you can use other methods to break this up e.g. louvain clustering as per implemented in various graphing tools). I guess you have to ask what would be the purpose of this approach yourself, to validate why you chose that route.
Kelvin
|
Beta Was this translation helpful? Give feedback.
-
So Kelvin, is this filter_contigs command necessary?
On Mon, Jul 18, 2022 at 5:49 PM Zewen Kelvin Tuong ***@***.***>
wrote:
… Hmm, i think the data may be artificial
In that those cells are connected because they do not have a good set of
BCRs, so my current judgement is no it’s not recommended to use the network
where it’s formed by unassigned ids.
However, you could disregard dandelion mode of assigning clone ids, and
just replace them with your clone ids set by your prefer criterion (or
don’t specify any and leave all clone ids as blanks - in which case you
will end up with a fully connected network, can you can use other methods
to break this up e.g. louvain clustering as per implemented in various
graphing tools). I guess you have to ask what would be the purpose of this
approach yourself, to validate why you chose that route.
Kelvin
On 18 Jul 2022, at 10:17 PM, saramoein372 ***@***.***> wrote:
So Kelvin I have another question:
I got a much better network before filtering the configs. And after the
configs, my network significantly shrinked.
Is there any justification for using the network without filtering of
configs (, which from that most of the clones have no ids)?
On Mon, Jul 18, 2022 at 4:54 PM Sara Moien ***@***.***> wrote:
> Thank you Kelvin. I could see you made a lot of updates on your
tutorial.
> That helped me to find the problem.
>
> Best,
> Sara
>
> On Mon, Jul 18, 2022 at 4:46 PM Zewen Kelvin Tuong <
> ***@***.***> wrote:
>
>> i see.
>>
>> for that you need to extract from the graph itself:
>> vdj.graph[0] or vdj.graph[1] - either will work.
>>
>> you would want to follow the instructions here:
>>
>>
https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.components.connected_components.html
>>
>> which basically should look like:
>>
>> G = vdj.graph[1]# find the largest connected networklargest_cc =
max(nx.connected_components(G), key=len)# subset to largest_ccS =
[G.subgraph(c).copy() for c in nx.connected_components(G)]
>> # this should give you the list of nodes that are this networkS.nodes#
orlist(S.nodes)
>>
>> Then you should be able to just match it them from the metadata?
>>
>> newvdj = vdj[vdj.metadata_names.isin(S.nodes)].copy()newvdj.metadata
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <
#162 (comment)>,
>> or unsubscribe
>> <
https://github.com/notifications/unsubscribe-auth/AVVJONU4QODNCXE5Z6MEKQDVUW7B3ANCNFSM53H5VKCQ>
>> .
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>
—
Reply to this email directly, view it on GitHub [github.com]<
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_zktuong_dandelion_issues_162-23issuecomment-2D1188320507&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=MW1cV0xsLGYlETWuyJDmyfEqhzCPa4l5shM9avRhUJ-n8j0z9frikpa8VwqY3ojk&s=WJ3ytqpIVlxZvMJc6-sD1gAPzsuH9dKXkHJl9jBix2k&e=>,
or unsubscribe [github.com]<
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGIAJI7GBYU7VZXQKMDOIBTVUXCXPANCNFSM53H5VKCQ&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=MW1cV0xsLGYlETWuyJDmyfEqhzCPa4l5shM9avRhUJ-n8j0z9frikpa8VwqY3ojk&s=CI1HYdokLGUke8YZvmm9xKrYQsKQTMvFMOef9u734dk&e=>.
You are receiving this because you commented.Message ID: ***@***.***>
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
—
Reply to this email directly, view it on GitHub
<#162 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONXTHTATLLAQU2FWIHLVUXGPZANCNFSM53H5VKCQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Yes, because the whole point is to remove all ambiguous BCR chains. You can also use scirpy's method to define clones and see if that makes a difference |
Beta Was this translation helpful? Give feedback.
-
Thanks Kelvin.
One more question: how this can happen that my cell ranger results has the
v-call and j-call information for each cell. But dandelion has put empty
for v and j genotypes columns, and also empty column for clone-id? Then I
have unassigned clone and my bcr network is showing all these cells in a
big clone.
How this is possible?
…On Tue, Jul 19, 2022, 2:05 AM Zewen Kelvin Tuong ***@***.***> wrote:
Yes, because the whole point is to remove all ambiguous BCR chains.
You can also use scirpy's method to define clones and see if that makes a
difference
—
Reply to this email directly, view it on GitHub
<#162 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONR7SH7DJO7VJ5Y2MSLVUZASVANCNFSM53H5VKCQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
And one more question is: can I ask the correct singularity command in
preprocessing step, that has all the necessary parameters for correct
filtering, including contig filtering and everything?
…On Tue, Jul 19, 2022, 5:16 AM Sara Moien ***@***.***> wrote:
Thanks Kelvin.
One more question: how this can happen that my cell ranger results has the
v-call and j-call information for each cell. But dandelion has put empty
for v and j genotypes columns, and also empty column for clone-id? Then I
have unassigned clone and my bcr network is showing all these cells in a
big clone.
How this is possible?
On Tue, Jul 19, 2022, 2:05 AM Zewen Kelvin Tuong ***@***.***>
wrote:
> Yes, because the whole point is to remove all ambiguous BCR chains.
>
> You can also use scirpy's method to define clones and see if that makes a
> difference
>
> —
> Reply to this email directly, view it on GitHub
> <#162 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AVVJONR7SH7DJO7VJ5Y2MSLVUZASVANCNFSM53H5VKCQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
-
the pre-processing will reannotate the V and J calls, using igblastn and blastn. Where it was deemed that the call was too low confidence, dandelion will remove the V/J call annotation, but would largely be consistent with how igblastn is performed (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692102/). during post-processing i.e. e.g. IGHV must pair with a IGHJ in the same contig - if it's missing either, then it's not a good productive contig. Where
So unless you are still using an older version of dandelion i'm not sure if it's possible for form a network of unassigned clones - regardless, this is still a bug and should be removed/ignore. I'll need a more concrete example to able to diagnose this bug.
The current singularity script just do the pre-processing. All the filtering steps are considered post-processing and you'll have to follow the tutorial. |
Beta Was this translation helpful? Give feedback.
-
Thanks Kelvin.
During filter_contig "vdj, adata2 = ddl.pp.filter_contigs(new_vdj, adata,
library_type ='tr-ab', filter_rna = True)"
I get this error:
TypeError: update_metadata() got an unexpected keyword argument 'library_type'
How I can get rid of this error? Since it is recommended to define the type
of library.
…On Tue, Jul 19, 2022 at 7:54 AM Zewen Kelvin Tuong ***@***.***> wrote:
One more question: how this can happen that my cell ranger results has the
v-call and j-call information for each cell. But dandelion has put empty
for v and j genotypes columns, and also empty column for clone-id? Then I
have unassigned clone and my bcr network is showing all these cells in a
big clone.
the pre-processing will reannotate the V and J calls, using igblastn and
blastn. Where it was deemed that the call was too low confidence, dandelion
will remove the V/J call annotation, but would largely be consistent with
how igblastn is performed (
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692102/).
during post-processing i.e. filter_contigs or check_contigs, a *contig
level* QC assessment is performed where i ask whether the assignments
make sense:
e.g. IGHV must pair with a IGHJ in the same contig - if it's missing
either, then it's not a good productive contig.
there's several other logical checks like that along the way, to ensure
that what we end up with are good sets of contigs.
Where filter_contigs and check_contigs differ, is that filter_contigs is
stricter, and also performs a hard *cell level* QC where it checks if a
cell has 1 or many sets of heavy+light chains. If many, filter_contigs
will remove. For check_contigs, the cell level QC is a soft check, and
just populates in the .metadata's chain_status column - to indicate if
particular cells display ambiguous contigs.
clone_id thus relies on all these checks to succeed.
1. It MUST have a V gene, a J gene, CDR3 sequence
2. It MUST have at least 1 heavy chain
If a cell only has light chains, then clone id will not be defined.
The rationale is that biologically, IGH rearrangement occurs prior to
IGK/IGL rearrangement i.e. you must have a productive heavy chain before
light chain will be rearranged.
So unless you are still using an older version of dandelion i'm not sure
if it's possible for form a network of unassigned clones - regardless, this
is still a bug and should be removed/ignore. I'll need a more concrete
example to able to diagnose this bug.
And one more question is: can I ask the correct singularity command in
preprocessing step, that has all the necessary parameters for correct
filtering, including contig filtering and everything?
The current singularity script just do the pre-processing. All the
filtering steps are considered post-processing and you'll have to follow
the tutorial.
—
Reply to this email directly, view it on GitHub
<#162 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONRLNORBW3BXAMT6EQTVU2JRFANCNFSM53H5VKCQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
And sorry Kelvin,
I am going to generate a network of all the edges from nx package (like the
graph that you sent me a few days ago) and you mentioned that the 'edges'
from the dandelion package is not reliable. I need a way that gives me
edges.
But it is not clear for me how to do that.
Any comments?
…On Tue, Jul 19, 2022 at 9:39 AM Sara Moien ***@***.***> wrote:
Thanks Kelvin.
During filter_contig "vdj, adata2 = ddl.pp.filter_contigs(new_vdj, adata,
library_type ='tr-ab', filter_rna = True)"
I get this error:
TypeError: update_metadata() got an unexpected keyword argument 'library_type'
How I can get rid of this error? Since it is recommended to define the
type of library.
On Tue, Jul 19, 2022 at 7:54 AM Zewen Kelvin Tuong <
***@***.***> wrote:
> One more question: how this can happen that my cell ranger results has the
> v-call and j-call information for each cell. But dandelion has put empty
> for v and j genotypes columns, and also empty column for clone-id? Then I
> have unassigned clone and my bcr network is showing all these cells in a
> big clone.
>
> the pre-processing will reannotate the V and J calls, using igblastn and
> blastn. Where it was deemed that the call was too low confidence, dandelion
> will remove the V/J call annotation, but would largely be consistent with
> how igblastn is performed (
> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692102/).
>
> during post-processing i.e. filter_contigs or check_contigs, a *contig
> level* QC assessment is performed where i ask whether the assignments
> make sense:
>
> e.g. IGHV must pair with a IGHJ in the same contig - if it's missing
> either, then it's not a good productive contig.
> there's several other logical checks like that along the way, to ensure
> that what we end up with are good sets of contigs.
>
> Where filter_contigs and check_contigs differ, is that filter_contigs is
> stricter, and also performs a hard *cell level* QC where it checks if a
> cell has 1 or many sets of heavy+light chains. If many, filter_contigs
> will remove. For check_contigs, the cell level QC is a soft check, and
> just populates in the .metadata's chain_status column - to indicate if
> particular cells display ambiguous contigs.
>
> clone_id thus relies on all these checks to succeed.
>
> 1. It MUST have a V gene, a J gene, CDR3 sequence
> 2. It MUST have at least 1 heavy chain
> If a cell only has light chains, then clone id will not be defined.
> The rationale is that biologically, IGH rearrangement occurs prior to
> IGK/IGL rearrangement i.e. you must have a productive heavy chain before
> light chain will be rearranged.
>
> So unless you are still using an older version of dandelion i'm not sure
> if it's possible for form a network of unassigned clones - regardless, this
> is still a bug and should be removed/ignore. I'll need a more concrete
> example to able to diagnose this bug.
>
> And one more question is: can I ask the correct singularity command in
> preprocessing step, that has all the necessary parameters for correct
> filtering, including contig filtering and everything?
>
> The current singularity script just do the pre-processing. All the
> filtering steps are considered post-processing and you'll have to follow
> the tutorial.
>
> —
> Reply to this email directly, view it on GitHub
> <#162 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AVVJONRLNORBW3BXAMT6EQTVU2JRFANCNFSM53H5VKCQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
-
And Kelvin,
Would you please provide a short explanation about graph[0] and graph[1]?
It looks after plotting all clones are connected together. I am confused
about how they are connected?
Thanks,
Sara
…On Tue, Jul 19, 2022 at 11:29 AM Sara Moien ***@***.***> wrote:
And sorry Kelvin,
I am going to generate a network of all the edges from nx package (like
the graph that you sent me a few days ago) and you mentioned that
the 'edges' from the dandelion package is not reliable. I need a way
that gives me edges.
But it is not clear for me how to do that.
Any comments?
On Tue, Jul 19, 2022 at 9:39 AM Sara Moien ***@***.***> wrote:
> Thanks Kelvin.
> During filter_contig "vdj, adata2 = ddl.pp.filter_contigs(new_vdj, adata,
> library_type ='tr-ab', filter_rna = True)"
>
> I get this error:
>
> TypeError: update_metadata() got an unexpected keyword argument 'library_type'
>
>
> How I can get rid of this error? Since it is recommended to define the
> type of library.
>
> On Tue, Jul 19, 2022 at 7:54 AM Zewen Kelvin Tuong <
> ***@***.***> wrote:
>
>> One more question: how this can happen that my cell ranger results has
>> the
>> v-call and j-call information for each cell. But dandelion has put empty
>> for v and j genotypes columns, and also empty column for clone-id? Then I
>> have unassigned clone and my bcr network is showing all these cells in a
>> big clone.
>>
>> the pre-processing will reannotate the V and J calls, using igblastn and
>> blastn. Where it was deemed that the call was too low confidence, dandelion
>> will remove the V/J call annotation, but would largely be consistent with
>> how igblastn is performed (
>> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692102/).
>>
>> during post-processing i.e. filter_contigs or check_contigs, a *contig
>> level* QC assessment is performed where i ask whether the assignments
>> make sense:
>>
>> e.g. IGHV must pair with a IGHJ in the same contig - if it's missing
>> either, then it's not a good productive contig.
>> there's several other logical checks like that along the way, to ensure
>> that what we end up with are good sets of contigs.
>>
>> Where filter_contigs and check_contigs differ, is that filter_contigs
>> is stricter, and also performs a hard *cell level* QC where it checks
>> if a cell has 1 or many sets of heavy+light chains. If many,
>> filter_contigs will remove. For check_contigs, the cell level QC is a
>> soft check, and just populates in the .metadata's chain_status column -
>> to indicate if particular cells display ambiguous contigs.
>>
>> clone_id thus relies on all these checks to succeed.
>>
>> 1. It MUST have a V gene, a J gene, CDR3 sequence
>> 2. It MUST have at least 1 heavy chain
>> If a cell only has light chains, then clone id will not be defined.
>> The rationale is that biologically, IGH rearrangement occurs prior to
>> IGK/IGL rearrangement i.e. you must have a productive heavy chain before
>> light chain will be rearranged.
>>
>> So unless you are still using an older version of dandelion i'm not sure
>> if it's possible for form a network of unassigned clones - regardless, this
>> is still a bug and should be removed/ignore. I'll need a more concrete
>> example to able to diagnose this bug.
>>
>> And one more question is: can I ask the correct singularity command in
>> preprocessing step, that has all the necessary parameters for correct
>> filtering, including contig filtering and everything?
>>
>> The current singularity script just do the pre-processing. All the
>> filtering steps are considered post-processing and you'll have to follow
>> the tutorial.
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#162 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AVVJONRLNORBW3BXAMT6EQTVU2JRFANCNFSM53H5VKCQ>
>> .
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>
|
Beta Was this translation helpful? Give feedback.
-
you are not using the correct version of dandelion. please uninstall and reinstall again.
I would suggest for you have to learn how to use the networkx package because this isn't the place to ask questions related to it.
Sorry the code i used above is wrong. should be:
|
Beta Was this translation helpful? Give feedback.
-
Hi Kelvin,
One question I have: how filtering_contigs function is working?
Does dandelion remove the light chain?
We want to see which criterias filter_contigs is looking at to filter
contigs.
Because we see many of our cells are excluded in the filtering step, which
is strange.
Thanks,
Sara
…On Tue, Jul 19, 2022 at 11:14 PM Sara Moien ***@***.***> wrote:
Thanks Kelvin,
I made some mess for updating my dandelion. I uninstalled that, but to
re-install it I am using the instruction from your tutorial.
https://sc-dandelion.readthedocs.io/en/latest/README.html#installation
For installation conda install -c conda-forge python-igraph leidenalg
I get an error:
ERROR: Failed building wheel for leidenalg
Do you have any idea how I install this leidenalg? or is there any other
way for dandelion installation?
Thanks,
Sara
On Tue, Jul 19, 2022 at 1:58 PM Zewen Kelvin Tuong <
***@***.***> wrote:
> During filter_contig "vdj, adata2 = ddl.pp.filter_contigs(new_vdj, adata,
> library_type ='tr-ab', filter_rna = True)" I get this error: TypeError:
> update_metadata() got an unexpected keyword argument 'library_type'
>
> you are not using the correct version of dandelion. please uninstall and
> reinstall again. dandelion.__version__ has to be 0.2.4
>
> I am going to generate a network of all the edges from nx package (like
> the
> graph that you sent me a few days ago) and you mentioned that the 'edges'
> from the dandelion package is not reliable. I need a way that gives me
> edges.
> But it is not clear for me how to do that.
>
> I would suggest for you have to learn how to use the networkx package
> <https://networkx.org/documentation/stable/> because this isn't the
> place to ask questions related to it.
>
> https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.to_pandas_edgelist.html
>
> Would you please provide a short explanation about graph[0] and graph[1]?
> It looks after plotting all clones are connected together. I am confused
> about how they are connected?
>
> graph[0] contains all nodes (includes singleton) and graph[1] contains
> only connected nodes.
>
> Sorry the code i used above is wrong. should be:
>
> S = G.subgraph(largest_cc)
>
> —
> Reply to this email directly, view it on GitHub
> <#162 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AVVJONVCI6G7LXJCXT4ITUTVU3UDVANCNFSM53H5VKCQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
-
Hi Kelvin,
How we can say to dandelion to consider both heavy and LIGHT chains?
because currently, it is only generating clone_id based on heavy chain. But
we need to look at both chanis.
Thanks,
Sara
…On Wed, Jul 20, 2022 at 12:52 PM Sara Moien ***@***.***> wrote:
Hi Kelvin,
One question I have: how filtering_contigs function is working?
Does dandelion remove the light chain?
We want to see which criterias filter_contigs is looking at to filter
contigs.
Because we see many of our cells are excluded in the filtering step, which
is strange.
Thanks,
Sara
On Tue, Jul 19, 2022 at 11:14 PM Sara Moien ***@***.***> wrote:
> Thanks Kelvin,
>
> I made some mess for updating my dandelion. I uninstalled that, but to
> re-install it I am using the instruction from your tutorial.
> https://sc-dandelion.readthedocs.io/en/latest/README.html#installation
> For installation conda install -c conda-forge python-igraph leidenalg
> I get an error:
>
> ERROR: Failed building wheel for leidenalg
>
>
> Do you have any idea how I install this leidenalg? or is there any other
> way for dandelion installation?
>
>
> Thanks,
>
> Sara
>
>
>
> On Tue, Jul 19, 2022 at 1:58 PM Zewen Kelvin Tuong <
> ***@***.***> wrote:
>
>> During filter_contig "vdj, adata2 = ddl.pp.filter_contigs(new_vdj,
>> adata, library_type ='tr-ab', filter_rna = True)" I get this error:
>> TypeError: update_metadata() got an unexpected keyword argument
>> 'library_type'
>>
>> you are not using the correct version of dandelion. please uninstall and
>> reinstall again. dandelion.__version__ has to be 0.2.4
>>
>> I am going to generate a network of all the edges from nx package (like
>> the
>> graph that you sent me a few days ago) and you mentioned that the 'edges'
>> from the dandelion package is not reliable. I need a way that gives me
>> edges.
>> But it is not clear for me how to do that.
>>
>> I would suggest for you have to learn how to use the networkx package
>> <https://networkx.org/documentation/stable/> because this isn't the
>> place to ask questions related to it.
>>
>> https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.to_pandas_edgelist.html
>>
>> Would you please provide a short explanation about graph[0] and graph[1]?
>> It looks after plotting all clones are connected together. I am confused
>> about how they are connected?
>>
>> graph[0] contains all nodes (includes singleton) and graph[1] contains
>> only connected nodes.
>>
>> Sorry the code i used above is wrong. should be:
>>
>> S = G.subgraph(largest_cc)
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#162 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AVVJONVCI6G7LXJCXT4ITUTVU3UDVANCNFSM53H5VKCQ>
>> .
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>
|
Beta Was this translation helpful? Give feedback.
-
Sorry Kelvin,
How can I have the original version of dandelion?
…On Wed, Jul 20, 2022 at 2:24 PM Sara Moien ***@***.***> wrote:
Hi Kelvin,
How we can say to dandelion to consider both heavy and LIGHT chains?
because currently, it is only generating clone_id based on heavy chain. But
we need to look at both chanis.
Thanks,
Sara
On Wed, Jul 20, 2022 at 12:52 PM Sara Moien ***@***.***> wrote:
> Hi Kelvin,
>
> One question I have: how filtering_contigs function is working?
> Does dandelion remove the light chain?
>
> We want to see which criterias filter_contigs is looking at to filter
> contigs.
>
> Because we see many of our cells are excluded in the filtering step,
> which is strange.
>
> Thanks,
> Sara
>
> On Tue, Jul 19, 2022 at 11:14 PM Sara Moien ***@***.***> wrote:
>
>> Thanks Kelvin,
>>
>> I made some mess for updating my dandelion. I uninstalled that, but to
>> re-install it I am using the instruction from your tutorial.
>> https://sc-dandelion.readthedocs.io/en/latest/README.html#installation
>> For installation conda install -c conda-forge python-igraph leidenalg
>> I get an error:
>>
>> ERROR: Failed building wheel for leidenalg
>>
>>
>> Do you have any idea how I install this leidenalg? or is there any
>> other way for dandelion installation?
>>
>>
>> Thanks,
>>
>> Sara
>>
>>
>>
>> On Tue, Jul 19, 2022 at 1:58 PM Zewen Kelvin Tuong <
>> ***@***.***> wrote:
>>
>>> During filter_contig "vdj, adata2 = ddl.pp.filter_contigs(new_vdj,
>>> adata, library_type ='tr-ab', filter_rna = True)" I get this error:
>>> TypeError: update_metadata() got an unexpected keyword argument
>>> 'library_type'
>>>
>>> you are not using the correct version of dandelion. please uninstall
>>> and reinstall again. dandelion.__version__ has to be 0.2.4
>>>
>>> I am going to generate a network of all the edges from nx package (like
>>> the
>>> graph that you sent me a few days ago) and you mentioned that the
>>> 'edges'
>>> from the dandelion package is not reliable. I need a way that gives me
>>> edges.
>>> But it is not clear for me how to do that.
>>>
>>> I would suggest for you have to learn how to use the networkx package
>>> <https://networkx.org/documentation/stable/> because this isn't the
>>> place to ask questions related to it.
>>>
>>> https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.to_pandas_edgelist.html
>>>
>>> Would you please provide a short explanation about graph[0] and
>>> graph[1]?
>>> It looks after plotting all clones are connected together. I am confused
>>> about how they are connected?
>>>
>>> graph[0] contains all nodes (includes singleton) and graph[1] contains
>>> only connected nodes.
>>>
>>> Sorry the code i used above is wrong. should be:
>>>
>>> S = G.subgraph(largest_cc)
>>>
>>> —
>>> Reply to this email directly, view it on GitHub
>>> <#162 (comment)>,
>>> or unsubscribe
>>> <https://github.com/notifications/unsubscribe-auth/AVVJONVCI6G7LXJCXT4ITUTVU3UDVANCNFSM53H5VKCQ>
>>> .
>>> You are receiving this because you authored the thread.Message ID:
>>> ***@***.***>
>>>
>>
|
Beta Was this translation helpful? Give feedback.
-
This assertion is not true. Dandelion will consider both heavy and light chains if they are there.
It does not remove normally. Do you see a lot of situatuons where a single cell barcode have more than two contigs assigned to one barcode? If so, then your original data needs to be assessed if it's correct and of high quality.
This is in the documentation. Please read it.
You can pip install an earlier version as they are all on pypi However, earlier versions should not change this behavior of missing clone_ids as i highly suspect that your issue is with your data, rather than the tool itself. Please provide a screenshot of your data/error, or send the data to my email so i can diagnose if it's a genuine problem. I wouldn't need the full data - just a couple of your rows which you are experiencing issues will suffice. If that is not possible, then i will suggest that you start from the original cellranger outputs and just read in with ddl.read_10x_vdj or ddl.read_10x_airr. |
Beta Was this translation helpful? Give feedback.
-
Thanks Kelvin.
I think for now, we are trying to make sure we are doing the correct steps.
I have some other questions:
1- what is the "criteria of connecting" of one cluster to another cluster?
Do clusters connect each other from cells that "have one base nucleotide"
difference?
2- I was running one data with dandelion 1.12 and the output vdj had around
6000 rows (in vdj.metadata). But with dandelion 2.4, running on the same
data generates the vdj.metadata with around 300 rows. How these two
dandelions different?
Thanks,
Sara
…On Wed, Jul 20, 2022 at 5:55 PM Zewen Kelvin Tuong ***@***.***> wrote:
How we can say to dandelion to consider both heavy and LIGHT chains?
because currently, it is only generating clone_id based on heavy chain. But
we need to look at both chanis.
This assertion is not true. Dandelion will consider both heavy and light
chains * if they are there*.
Thus, your description is only possible if your light chain rows are not
there (because they were filtered away because of quality issues), or are
not formed properly (and thus filtered away because of quality issues).
One question I have: how filtering_contigs function is working?
Does dandelion remove the light chain?
It does not remove normally.
*Do you see a lot of situatuons where a single cell barcode have more than
two contigs assigned to one barcode?*
If so, then your original data needs to be assessed if it's correct and of
high quality.
We want to see which criterias filter_contigs is looking at to filter
contigs.
This is in the documentation
<https://sc-dandelion.readthedocs.io/en/latest/modules/dandelion.preprocessing.filter_contigs.html>.
Please read it.
How can I have the original version of dandelion?
You can pip install an earlier version as they are all on pypi
<https://pypi.org/project/sc-dandelion/>
However, earlier versions should not change this behavior of missing
clone_ids as i highly suspect that your issue is with your data, rather
than the tool itself.
Please provide a screenshot of your data/error, or send the data to my
email so i can diagnose if it's a genuine problem. I wouldn't need the full
data - just a couple of your rows which you are experiencing issues will
suffice. If that is not possible, then i will suggest that you start from
the original cellranger outputs and just read in with ddl.read_10x_vdj
<https://sc-dandelion.readthedocs.io/en/latest/modules/dandelion.utilities.read_10x_vdj.html>
or ddl.read_10x_airr
<https://sc-dandelion.readthedocs.io/en/latest/modules/dandelion.utilities.read_10x_airr.html>
.
—
Reply to this email directly, view it on GitHub
<#162 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONTTZZORXYEN7PDMOHDVVBYVFANCNFSM53H5VKCQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
As i've explained above - this is determined if the
you can see the various code changes here: https://github.com/zktuong/dandelion/releases The largest difference between v0.1.12 and 0.2.x is the preprocessing step has a 'strict' mode by default, which could be why your dataset now is reduced. The rest of the changes are to do with speed upgrades. So, instead of using |
Beta Was this translation helpful? Give feedback.
-
Description of the question
Hi Kelvin,
I have some basic questions about how dandelion is working and trying to find the biological meaning of each step in dandelion. To do this I am asking my questions to complete the puzzle.
Would you please help me to answer these questions:
1- is each node in the dandelion network a clone?
2- how the clone network is generated? I already read all the tutorials and papers. But I think there are some inconsistencies in the paper and tutorial. It would be great to briefly provide me the steps. I am very confused.
3- why after generating the .tsv file, some of the cells have different cluster_id?
4- We expected to see the same germline in all the cells in the network. But the germlines of cells in the network are different. Why?
Thank you,
Sara
Minimal example
NA
Any error message produced by the code above
OS information
NA
Version information
NA
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions