

I would be much more worried about other aspects of the long-term storage problem, like adequate off-site backups and catastrophic fail-safe plans at the physical level.
#Tar compress multithread code#
I think that as long as you pick a reasonable compression tool where you have the source code and enough instructions to build it on a clean system, you'll be fine. I guess it may depend on the details of your use case, but to me this seems like a lot of worry over a very very tiny part of the problem. Of course, these figures will change with different fastq files and it assumes you do not care much about the quality values (for Ion Proton, which we sequence in high volume, we actually don't really care that much) but the potential gain is considerable. This is why I am really considering this algorithm. However, fqzcomp files can be as much as 2.7 times smaller than gzip ones and 2.2 times than bz2 files. Here are the compression ratios (compared to the non compressed files): Algo ratioįqzcomp 0.101 - 0.181 (depending on the Q param value)įrom these figures, we can see that bz2 reduces files only by an additional ~20% when compared to gzip. I used the default parameter values for gzip and bz2 and -Q -s5+ -e -q 3 for fqzcomp. I did a quick compression ration comparison for gzip, bz2, and fqzcomp. What fastq compression tool would you recommend from the point of view of data safety? What have you used with success? The article cited by Charles Plessy doesn't help much in this regard.įor those of you that work in big groups, institutes, organisations. I saw this other question ( fastq compression tools of choice), but I really want to know if one of these tools is a) good enough to improve compression b) dependable. Now these are important data, so I need to be sure that in 2 or 5 or 10 years I will be able to get them back. It boasts very good compression rates (about twice smaller compressed files when compared to gzip), is fast enough, installs easily. I have found a few contenders, but the most solid of them is fqzcomp-4.6 ( ). (Funny, for some people this is probably big and for others ridiculously small :) The bottom line here is that disk space is money and up-scaling means the costs won't be linear but probably more expensive.
#Tar compress multithread free#
Decompressing and re compressing could free about half the disk space I am using, which would save me approximately 20-30 To of space now and more in the future. I would like to use another compression algorithm on fastq files than gzip for long term storage.
