Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello! Could you post a quick tutorial on how to format a linseed object? #7

Open
methornton opened this issue Nov 1, 2019 · 2 comments

Comments

@methornton
Copy link

Hello!

I am working through the tutorial and I have my own RNA-seq data that I would like to process with linseed. Does the LinseedObject function require data be formatted exactly as "GSE19830_series_matrix.txt"? I have an RNA-seq data set that has annotation for genes , raw counts, and RPKM. I don't know how many cell types are present, but I expect at least 10 -12.

Can you tell me which of these fields must be supplied?

Fields:

     ‘exp’ List of two elements raw and normalized gene expression
          dataset

     ‘name’ Character, optional, dataset name

     ‘cellTypeNumber’ Identified cell type number, required for
          projection, corner detection and deconvolution

     ‘projection’ Projection of genes into space lower-dimensionality
          (presumably simplex)

     ‘endpoints’ Simplex corners (in normalized, non-reduced space)

     ‘endpointsProjection’ Simplex corners (in reduced space)

     ‘distances’ Stores distances for every gene to each corner in
          reduced space

     ‘markers’ List that stores signatures genes for deconvolution, can
          be set manually or can be obtained by ‘selectGenes(k)’

     ‘signatures’ Deconvolution signature matrix

     ‘proportions’ Deconvolution proportion matrix

     ‘pairwise’ Calculated pairwise collinearity measure

The header of my RNA-seq data looks like this:

EnsemblID	EntrezID	RGD_ID	Geneme	GeneType	logFC	logCPM	LR	PValue	FDR	SA33599_rev	SA33601_rev	SA33604_rev	SA33598_rev	SA33600_rev	SA33602_rev	SA33603_rev	SA33605_rev	SA33606_rev	SA33598_rev_RPKM	SA33599_rev_RPKM	SA33600_rev_RPKM	SA33601_rev_RPKM	SA33602_rev_RPKM	SA33603_rev_RPKM	SA33604_rev_RPKM	SA33605_rev_RPKM	SA33606_rev_RPKM	Chr	Strand	length	NoExons	RNACentralID	miRBaseID	miRBaseACC	TM_Helix	HAMAP_ID	Description
ENSRNOG00000005609	29458	3165	Neurod1	protein_coding	-4.41557073893638	5.09105209110567	111.392747290707	4.85365557971023E-26	7.76293673418854E-22	174	218	11	16	41	27	42	388	5	0.720808668436819	13.0576466284548	1.93454971657025	10.9567107210054	1.75455632681648	1.57289902939802	0.773458305076102	14.4906203372679	0.33082393003395	3	-1	5248	3						neuronal differentiation 1 [Source:RGD Symbol;Acc:3165]
ENSRNOG00000003680	25451	2650	Gabrb2	protein_coding	-4.82293017899498	4.31972433520164	107.686834920917	3.14786664687739E-25	2.51734895750785E-21	98	134	5	6	144	14	25	225	3	0.672937124992248	18.3090140777356	16.9153796849254	16.7668593078227	2.2649301153254	2.33085245162443	0.875260735086981	20.919966761681	0.494164322054507	10	1	2108	10				TMhelix		gamma-aminobutyric acid type A receptor beta 2 subunit [Source:RGD Symbol;Acc:2650]

I can get the 'normCounts' out from the R package 'edgeR', if this is necessary, how to format it? Any advice or assistance is greatly appreciated!! Thank you!

@pushtiks
Copy link

pushtiks commented Dec 4, 2019

Hi! I'm also trying/testing linseed and used CPMs (from edgeR), TPMs (from RSEM) and also FPKM (cufflinks) matrices.

Matrices looked like:
transcript_id sample1 sample2 sample3 <--------header
ENST000000000 5.456 7.876 4.194 <-------- transcript/gene id and it's expression values per sample in CPMs/TPMs/FPKMs

The expected cell type number entered by hand into R script. Idk, if linseed allows to add more than one number simultaneously. I just tried different expected numbers per each script run.

By now my results are not as beautiful as they could be.

Some more detailed tutorial is appreciated! :)

@konsolerr
Copy link
Collaborator

@methornton

You can just provide the expression matrix to a constructor of the Linseed Class (basically matrix objects)
I would suggest using something like TPMs, any normalization that already took library size into an account.

Cheers and sorry for the slow replies,
Konstantin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants