I'm not so silly as to get involved in the Qiime vs. Mothur methodology debate, but let's suppose you've settled on Mothur. Mothur's kind of nice insofar as it's a single binary (Qiime is typically installed as a virtual machine), and the recommended Silva reference files used can be downloaded in a bundle from the Mothur Web site. There are two issues with using the Mothur-provided reference files for eukaryotic analysis: 1) it truncates the taxonomy for eukaryotes the same way as prokaryotes (6 levels) even the though the tree is deeper, so you end up with only order, class or family level designation quite often, and 2) the prebuilt DB is Silva 123 from July 2015. Silva is now at version 128 as of this writing, with 577,832 quality entries vs. the old with 526,361, with 18,213 Eukarya represented vs. 16,209 in version 123. Here's how to address both the eukaryotic and latest version issues:
Download and unpack (as newer version come out adjust the URL accordingly):
wget -O SSURef_NR99_latest_opt.arb.gz https://www.arb-silva.de/fileadmin/silva_databases/current/ARB_files/SSURef_128_SILVA_23_09_16_opt.arb.gz
gzip -d SSURef_NR99_latest_opt.arb.gz
wget -O tax_slv_ssu_latest.txt https://www.arb-silva.de/fileadmin/silva_databases/current/Exports/taxonomy/tax_slv_ssu_128.txt
Export the FastA files:
Launch arb:
arb SSURef_NR99_latest_opt.arb
Export as per http://blog.mothur.org/2015/12/03/SILVA-v123-reference-files/ with a final file name of silva.full_latest.fasta
Save the file as silva.full_latest.fasta, then quit arb.
Note that I compiled ARB from source since I'm on CentOS 7 and they only have precompiled binaries for earlier operating systems. If you're in the same boat, you'll need to root/sudo the following, which are not all documented in the installation instructions.
yum install libxml2 transfig libXp libtiff gnuplot-common xorg-x11-xbitmaps Xaw3d xorg-x11-fonts-misc xfig xfig-common motif gnuplot libtiff-devel libxml2-devel libxml2-python lynx glib2-devel imake libXmu-devel libXp-devel motif-devel
Format the sequence:
Here we deviate a bit from the Mothur README for simplicity, and to get the right taxonomic labels for eukaryotes.
mothur "#screen.seqs(fasta=silva.full_latest.fasta, start=1044, end=43116, maxambig=5, processors=8); pcr.seqs(start=1044, end=43116, keepdots=T); degap.seqs(); unique.seqs();"
grep ">" silva.latest.good.pcr.ng.unique.fasta | cut -f 1 | cut -c 2- > silva.latest.good.pcr.ng.unique.accnos
mothur "#get.seqs(fasta=silva.latest.good.pcr.fasta, accnos=silva.latest.good.pcr.ng.unique.accnos)"
mv silva.full_latest.good.pcr.pick.fasta silva.full_latest.align
Run my eukaryotic labelling adjustment script (modified from https://raw.githubusercontent.com/rec3141/diversity-scripts/master/convert_silva_taxonomy.r, since that one didn't work for me, and wasn't parameterized)
Rscript convert_silva_taxonomy.r tax_slv_ssu_latest.txt silva.full_latest.align silva.full_latest.tax
Now you're good to go for any type of SSU analysis (16S or 18S), and follow something like the ever popular MiSeq SOP.
If the above instructions failed for you, download my SILVA 128 tax file here, and the fasta and align.
**Update: there seems to be a problem with 4 Ralstonia sequence taxonomic classifications in the current SILVA release. You'll need to manually fix those in the output taxonomy file to get it to work properly. They have only two levels of classification.
If the above instructions failed for you, download my SILVA 128 tax file here, and the fasta and align.
**Update: there seems to be a problem with 4 Ralstonia sequence taxonomic classifications in the current SILVA release. You'll need to manually fix those in the output taxonomy file to get it to work properly. They have only two levels of classification.
No comments:
Post a Comment