I've noted before how you might detect RNASeq sample swaps in engineered cell line samples. But what if we don't have a priori genotype knowledge?
$ perl -e 'print "target_id\t",join("\t",map {/(.*)\//;$1} @ARGV),"\n";' *.kallisto/abundance.tsv > all_abundance.tsv
$ paste *.kallisto/abundance.tsv | perl -ane 'print $F[0];for (1..$#F){print "\t$F[$_]" if /[49]$/}print "\n"' | tail -n +2 >> all_abundance.tsv
$ grep NM_003140 all_abundance.tsv | perl -ane 'print join("\n",@F),"\n"' | paste meta.tab -
sample path sex age NM_003140
A A.kallisto M 50 0.603562
B B.kallisto M 75 0.540668
C C.kallisto M 27 0.519294
D D.kallisto F 35 0
E E.kallisto M 46 0
F F.kallisto M 74 0.970973
G G.kallisto M 41 0.57206
H H.kallisto F 30 0
I I.kallisto M 19 0.246618
J J.kallisto F 39 0.381072
K K.kallisto F 61 0
L L.kallisto M 37 0.304948
M M.kallisto M 65 0
N N.kallisto F 78 0
O O.kallisto F 57 0
P P.kallisto F 53 0
Q Q.kallisto F 52 0
R R.kallisto F 73 0
R R.kallisto F 73 0
Update: Even more reliable is the presence of the XIST long non-coding (but polyadenylated so usually captured in mammalian RNASeq protocols) in female samples but not male. It's RefSeq ID is NM_001564.This wouldn't be reliable in Turner Syndrome subjects, but hopefully your study does not include these, or if they do you use the combination of XIST absence and SRY presence.
_________
^Sure there are rare exceptions to nominal male=Y, I'm ignoring them here.
No comments:
Post a Comment