Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

short_long does not support gtf without gene line. #42

Open
dputhier opened this issue Sep 22, 2017 · 14 comments
Open

short_long does not support gtf without gene line. #42

dputhier opened this issue Sep 22, 2017 · 14 comments
Labels

Comments

@dputhier
Copy link
Owner

dputhier commented Sep 22, 2017

It seems that short_long does not support files without gene lines. I think the gene_ids should could taken from the transcript lines or even the exon line since it is very common to have no gene line. What do you think about this ?

 # Here toto.gtf has no gene line
gtftk convert_ensembl  -i  toto.gtf | gtftk  select_by_key -k feature -v gene -n | gtftk short_long
# segfault
gtftk convert_ensembl  -i  toto.gtf |  gtftk short_long
# ok
@dputhier dputhier added the bug label Sep 29, 2017
@fafa13
Copy link
Collaborator

fafa13 commented Nov 8, 2017

It is done exactly like you want. gene_ids are taken from transcript line if such a line exists. If not, they are taken from exon lines.

@fafa13
Copy link
Collaborator

fafa13 commented Nov 8, 2017

My previous comment is for convert_to_ensembl function. You're right, get_shortest/longest_transcript functions don't work if there is no gene lines. I will try to fix that ...

@fafa13
Copy link
Collaborator

fafa13 commented Nov 8, 2017

I just pushed a fixed version for this bug. Can you test it and tell me if it works ?

@dputhier
Copy link
Owner Author

This does not seem to fix the issue regarding #42.
The command ends up with a segfault. Sure you have pushed ?

  gtftk get_example -d mini_real |  gtftk select_by_key -V 2 -C -k feature -v exon  | gtftk short_long
  Segmentation fault: 11

@dputhier dputhier reopened this Nov 29, 2017
@dputhier
Copy link
Owner Author

This one is not fixed

    $ gtftk get_example -d mini_real |  gtftk select_by_key -V 2 -C -k feature -v exon  | gtftk short_long
        |--- 11:13-DEBUG-select_by_key : Creating a GTF instance.
        |--- 11:13-DEBUG_MEM-select_by_key : GTF created (#lines=137670, file=-, ptr_addr=0x7fea45b21b20, id=4525751952, nb=1).
        |--- 11:13-DEBUG-select_by_key : Calling extract_data.
        |--- 11:13-DEBUG-select_by_key : Calling select_by_key (key=feature, value=exon)
        |--- 11:13-DEBUG-select_by_key : Creating a GTF instance.
        |--- 11:13-DEBUG_MEM-select_by_key : GTF created (#lines=64251, file=-, ptr_addr=0x7fea48198370, id=4525482768, nb=2).
        |--- 11:13-DEBUG_MEM-select_by_key : GTF deleted (#lines=137670, file=-, ptr_addr=0x7fea45b21b20, id=4525751952, nb=1).
        |--- 11:13-DEBUG-select_by_key : Writing a GTF (#lines=64251, file=-, ptr_addr=0x7fea48198370, id=4525482768, nb=2).
        |--- 11:13-INFO-select_by_key : GTF written.
        |--- 11:13-DEBUG_MEM-select_by_key : GTF deleted (#lines=64251, file=-, ptr_addr=0x7fea48198370, id=4525482768, nb=2).
    Segmentation fault: 11

@fafa13
Copy link
Collaborator

fafa13 commented Apr 27, 2018

If you add a "gtftk convert_ensembl" after select_by_key, there is no more error. I will try to do what you suggested before, that is to get transcript_ids from exon lines.

@fafa13
Copy link
Collaborator

fafa13 commented Apr 27, 2018

I have made a push to try to fix this one. The changes are big so I made a copy of the function select_transcript that I called select_transcript2. Can you test your example in python after replacing the calls to select_transcript by the the new function ?

@dputhier
Copy link
Owner Author

I will try this asap.

@dputhier
Copy link
Owner Author

I end up with 61 failures out of 414 tests. The select_transcript2 function also seem to also have problems:

  gtftk short_long -i gtftk/data/simple_03/simple_short_long.gtf
  >segfault

I have created a branch in gtftk containing your last libgtftk version (feature_libgtftk_bc072e5). Maybe it will help to debug.

@fafa13
Copy link
Collaborator

fafa13 commented May 2, 2018

I can't see the select_transcript2 function in the new branch. Is it normal ?

@dputhier
Copy link
Owner Author

dputhier commented May 2, 2018

did you try a:

git pull
git checkout feature_libgtftk_bc072e5

@fafa13
Copy link
Collaborator

fafa13 commented May 2, 2018

I did even more :
git clone [email protected]:dputhier/gtftk.git
git checkout feature_libgtftk_bc072e5

And still no select_transcript2 function.

@fafa13
Copy link
Collaborator

fafa13 commented May 18, 2018

This one seems to be fixed by the last push.

@dputhier
Copy link
Owner Author

To be tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants