FASTQ Format Specification


FASTQ format stores sequences and Phred qualities in a single file. It is concise and compact. FASTQ is first widely used in the Sanger Institute and therefore we usually take the Sanger specification and the standard FASTQ format, or simply FASTQ format. Although Solexa/Illumina read file looks pretty much like FASTQ, they are different in that the qualities are scaled differently. In the quality string, if you can see a character with its ASCII code higher than 90, probably your file is in the Solexa/Illumina format.



FASTQ Format Specification


  • <fastq>, <blocks> and so on represents non-terminal symbols.
  • Characters in red are regex-like operators.
  • '\n' stands for the Return key.




  • The <seqname> following '+' is optional, but if it appears right after '+', it should be identical to the <seqname> following '@'.
  • The length of <seq> is identical the length of <qual>. Each character in <qual> represents the phred quality of the corresponding nucleotide in <seq>.
  • If the Phred quality is $Q, which is a non-negative integer, the corresponding quality character can be calculated with the following Perl code:
      $q = chr(($Q<=93? $Q : 93) + 33);
    where chr() is the Perl function to convert an integer to a character based on the ASCII table.
  • Conversely, given a character $q, the corresponding Phred quality can be calculated with:
      $Q = ord($q) - 33;
    where ord() gives the ASCII code of a character.

Solexa/Illumina Read Format

The syntax of Solexa/Illumina read format is almost identical to the FASTQ format, but the qualities are scaled differently. Given a character $sq, the following Perl code gives the Phred quality $Q:

    $Q = 10 * log(1 + 10 ** (ord($sq) - 64) / 10.0)) / log(10);