Many files in bioinformatics are gzip compressed to save space. Sure you can send decompression streams to programs that need raw FASTQ or whatever using gzip -cd file.gz | , but what if you need to merge a bunch of FASTQ files (or any other files without headers)? In a stroke of genius the gzip designers made each compression block independent, so you can concatenate without decompression. For example to merge the lane files of a paired-end Illumina NextSeq 500 run:
gzip *_R1*.fastq.gz > outdir/R1.fastq.gz
gzip *_R2*.fastq.gz > outdir/R2.fastq.gz
It's as simple as that!
No comments:
Post a Comment