WU-BLAST was developed and is maintained entirely by Warren Gish. He was one of the original authors of BLAST while at the NCBI but is now at Washington University in St. Louis (where the WU comes from). Development began in 1994 at Version 1.4, before BLAST had gapped alignments. Quite a lot has changed since then. Paradoxically, WU-BLAST is more similar to the original BLAST than the current NCBI version.
WU-BLAST is useful because it has more command-line parameters that allow advanced users to control the program with more precision. It is also faster. Table 14-1 displays features unique to WU-BLAST or significantly different from NCBI-BLAST.
|
Table 14-1. WU- and NCBI-BLAST feature differences |
||
|
Feature |
WU-BLAST |
NCBI-BLAST |
|
Word size |
Any word size for any program mode. Neighborhood words are turned off for word sizes of 5 or greater, but may be activated by setting an explicit value for T. |
blastn has a minimum word size of 7. blastp, blastx, tblastn, and tblastx have word sizes of 2 or 3. Neighborhood words are never used for blastn. |
|
Nucleotide scoring |
Choice of match/mismatch or scoring matrix. |
Only match/mismatch scoring. |
|
Nucleotide statistics |
Karlin-Altschul parameters are available for several match/mismatch values and gap costs. |
Karlin-Altschul parameters are always computed without respect to gap costs. Reported E-values may greatly overestimate significance. |
|
altscore |
Allows score modification for any matrix (e.g., to set stop scores lower). |
Nothing similar. |
|
H, K, L, gapH, gapK,gapL |
Especially useful when using unsupported scoring schemes; allow the provision of values for Karlin-Altschul parameters. |
Nothing similar. Unsupported scoring schemes are fatal errors. |
|
Alias databases |
No, but virtual databases offer similar functionality. |
Yes, both alias and virtual databases are supported. |
|
Gapped alignment |
All programs. |
All programs except tblastx. |
|
/etc/sysblast |
Allows systems administrators to set system-wide resource restrictions. |
Nothing similar. |
|
Database subset selection |
Yes, via dbrecmin and dbrecmax. |
No, but alias databases can be used for static splitting. |
|
Restricted region of query |
The nwstart and nwlen parameters restrict seeding but not alignment. |
-L restricts both seeding and alignment. |
|
links |
Displays the order of alignments in a group. |
Nothing similar. |
|
topcomboN |
Allows restriction of number alignment groups. Groups are clearly labeled. |
Nothing similar. |
|
kap |
Computes significance without sum statistics. |
Nothing similar. |
|
olf, golf, olmax, golmax |
Allows setting of overlap rules for HSP consistency. |
Fixed internally. |
|
notes, warnings,errors |
Descriptive messages at various levels ofcaution. |
Most error messages are terse and not user friendly. |
|
Output formats |
Only the standard format. |
Multiple output report formats including HTML, ASN.1, XML, tabular, and anchored multiple alignments. See Appendix A. |
To use the most recent version of WU-BLAST, you must have a site license from Washington University in St. Louis. The product is free for academic use, but commercial users must pay a fee. Unlike NCBI-BLAST, the source code isn't freely available. For the latest information on WU-BLAST, visit the official site at http://blast.wustl.edu. If you want to try WU-BLAST, an early version is available without license.
![]()
14.1 Usage Statements
All WU-BLAST programs provide usage statements if they are executed without any arguments. They are sometimes lengthy, so it's best to pipe them through a pager such as less or more.
blastn | more
xdformat | less
xdget | less
14.2 Command-Line Syntax
WU-BLAST command-line syntax isn't uniform between all programs. The BLAST programs blastn, blastp, blastx, tblastn, and tblastx use a slightly different syntax than do xdformat, and xdget.
The BLAST program options come after the mandatory arguments of database and query sequence. The command-line structure is as follows:
[program name] [blast database] [query sequence] [parameters]
The parameter names in the BLAST programs and their arguments have some flexibility. The following command lines are all identical:
blastn db query E=10
blastn db query -E 10
blastn db query E 10
blastn db query -E=10
This book uses the first form to avoid confusion with NCBI-BLAST.
xdformat and xdget use the traditional Unix syntax where the parameters precede the mandatory arguments:
[program name] [parameters] [mandatory arguments]
The xdformat and xdget options are all single letters preceded by a single dash. For parameters that require a value, a space between the parameter and its value is optional. As is typical for Unix programs, a double dash indicates the end of command-line options and a single dash signifies stdin.
xdformat -p protein_db
xdformat -n -I nucleotide_db
zcat fasta.*.gz | xdformat -n -o my_db -- -
14.3 WU-BLAST Parameters
WU-BLAST has many control parameters, some of which are esoteric and rarely useful. The most important parameters are listed here.
|
altscore=[string] |
|
Default: Off |
Defines an alternate scoring system for any pair of letters. For example, altscore="M M -3" changes the score of M-M pairs to -3, and altscore="A C 4" gives a score of 4 if the query is A and the subject is C. Letters may be designated as any to change an entire row or column. The score can be given as min or max for the minimum and maximum scores in the matrix or na to make the score infinitely low. To set the score of all rows and columns containing stop codons to negative infinity, set altscore="* any na" and altscore="any * na". If you change the scoring parameters, you may also want to adjust gapL, gapH, and gapK.
See also
nogap, gapL, gapH, gapK
|
B=[integer] |
|
Default: 250 |
Sets the number of database hits to report. A warning is issued if this number is exceeded. It is typical to set this parameter to a very high value, such as B=100000, to ensure that no alignments are missed.
|
bottom |
|
Default: Off |
Programs: blastn, tblastx, blastx |
Search only the bottom strand of the query.
See also
top
|
cpus=[integer] |
|
Default: 4 for blastn; all for blastp, blastx, tblastn, and tblastx |
Sets the number of processors to use. If not set, all processors on the system may be used except blastn, which will limit itself to 4. See Chapter 10 for information on the/etc/sysblast file used for setting systemwide resource limitations.
|
dbrecmax=[integer] |
|
Default: Last database record |
Last database record number to search.
See also
dbrecmin, qrecmin, qrecmax
|
dbrecmin=[integer] |
|
Default: 1 |
First database record number to search. For example, by setting dbrecmin=1 dbrecmax=10, only the first 10 database sequences are searched.
See also
dbrecmax, qrecmin, qrecmax
|
E=[number] |
|
Default: 10 |
This is the E from the Karlin-Altschul equation. Database hits whose E-value is greater than this threshold will not be reported. If both E and S are set, the more restrictive parameter is used.
See also
S
|
E2=[number] |
|
Default: Variable; calculated from scoring parameters |
Sets the alignment threshold for ungapped alignments. When E2 and S2 are set, the more restrictive parameter is used.
See also
S2, gapE2, gapS2
|
echofilter |
|
Default: Off |
Prints out the query sequence after all filtering is performed. This is useful for troubleshooting when there are no database hits, and you suspect the filtering is too aggressive.
See also
filter, wordmask, maskextra
|
errors |
|
Default: Off |
Suppress nonfatal error messages. It is generally a good idea to pay attention to the error messages, but at times it is useful to block them.
See also
nonnegok, novalidctxok
|
filter=[string] |
|
Default: Off |
Processes the query sequence with the specified filtering method. Letters are replaced with X and N for proteins and nucleotides, respectively.
seg
Identifies low-complexity regions in both nucleotide and amino acid sequences.
dust
The standard low-complexity filter for nucleotide sequences. Generally less sensitive than seg.
xnu
Finds short repeats in protein sequences.
seg+xnu
Combines both seg and xnu.
ccp
Coiled-coil filter for proteins.
Multiple filtering methods may be specified on the same command line; for example:
blastp nr query filter=seg filter=ccp filter=xnu
See also
echofilter, maskextra, wordmask
|
gapE2=[number] |
|
Default: Variable; calculated from scoring parameters |
Expectation threshold for saving individual gapped alignments. When gapE2 and gapS2 are set, the more restrictive parameter is used.
See also
gapS2, E2, S2
|
gapH=[number] |
|
Default: Variable; depends on scoring parameters |
Sets the value of H (information per aligned letter) for gapped alignments. If a particular combination of scoring matrix (or match/mismatch scores) and gap values doesn't already have precomputed values for gapH, gapK, and gapL, WU-BLAST uses ungapped statistics. In this case, the resulting E-values may be much too low. A warning is issued when this is the case. Computing proper values for gapped Karlin-Altschul parameters requires simulations with random sequences that determine what ungapped scoring scheme is most similar to the gapped scoring scheme.
See also
H, K, gapK, L, gapL, warnings
|
gapK=[number] |
|
Default: Variable; depends on scoring parameters |
Sets the value of the Karlin-Altschul K parameter for gapped alignments. See the description for gapH.
See also
H, gapH, K, L, gapL
|
gapL=[number] |
|
Default: Variable; depends on scoring parameters |
Sets the value of the Karlin-Altschul parameter lambda (information per unit score) used for gapped alignments. See the description for gapH.
See also
H, gapH, K, gapK, L
|
gapS2=[integer] |
|
Default: Variable; calculated from scoring parameters |
Score threshold for saving individual gapped alignments. Alignments below the threshold aren't reported.
See also
gapE2
|
gapsepqmax=[int] |
|
Default: Unlimited |
Maximum separation allowed between gapped alignments along the query.
See also
gapsepsmax, hspsepqmax, hspsepsmax
|
gapsepsmax=[int] |
|
Default: Unlimited |
Maximum separation allowed between gapped alignments along the subject.
See also
gapsepqmax, hspsepqmax, hspsepsmax
|
gapX |
|
Default: Variable; depends on scoring parameters |
Sets the alignment extension cutoff for gapped alignment.
See also
X
|
gi |
|
Default: Off |
Displays the GenInfo identifiers of database hits, if present.
|
golf=[number] |
|
Default: 0.1 |
Maximum fractional length overlap for gapped alignment consistency. See the description for olf.
|
golmax=[integer] |
|
Default: Unlimited |
Maximum absolute length of overlap for gapped alignment consistency. See the description for olf.
|
gspmax=[integer] |
|
Default: 1,000 |
Sets the maximum number of gapped alignments per subject sequence. gspmax is bounded by hspmax. A value of 0 implies no limit.
See also
hspmax
|
H=[number] |
|
Default: Variable; depends on scoring parameters |
Sets the value of the Karlin-Altschul parameter H.
See also
gapH, K, gapK, L, gapL
|
hspmax=[integer] |
|
Default: 1000 |
Sets the maximum number of ungapped alignments per subject sequence. A warning is issued if this limit is exceeded. A value of 0 implies no limit.
See also
gspmax
|
hitdist=[integer] |
|
Default: 0, off |
Maximum distance between word hits for the two-hit seeding algorithm. WU-BLAST uses one-hit seeding by default.
|
hspsepqmax=[int] |
|
Default: Unlimited |
Maximum separation allowed between alignments along the query.
|
hspsepsmax=[int] |
|
Default: Unlimited |
Maximum separation allowed between alignments along the subject.
|
K=[number] |
|
Default: Variable; depends on scoring parameters |
Sets the value for K from the Karlin-Altschul equation.
See also
gapK, H, gapH, L, gapL
|
kap |
|
Default: Off |
Assesses individual alignment scores with Karlin-Altschul statistics rather than using sum statistics on groups of alignments.
|
L=[number] |
|
Default: Variable; depends on scoring parameters |
Sets lambda (nats per unit score) from the Karlin-Altschul equation.
See also
gapL, H, gapH, K, gapK
|
lcfilter |
|
Default: Off |
Filters lowercase letters in the query sequence. The lowercase letters are treated as if they had been filtered out by one of the filtering programs.
See also
echofilter, filter, wordmask, lcmask
|
lcmask |
|
Default: Off |
Masks lowercase letters in the query sequence for seeding only. Lowercase letters in the query sequence aren't used in the initial word search but are available for alignment during the extension stage; known as soft masking.
See also
echofilter, filter, wordmask, lcfilter
|
links |
|
Default: Off |
Display group information. Parentheses indicate the placement of the alignment in the group. The following example shows three alignments in the group. The score of the second reported alignment is 159, the last alignment in the chain.
Score = 159 (61.0 bits), Sum P(3) = 2.7e-38
Identities = 26/39 (66%), Positives = 32/39 (82%)
Links = 1-3-(2)
See also
topcomboN
|
M=[integer] |
|
Default: +5 blastn |
Sets the match score. This parameter is usually used for blastn only but may be used for other programs.
See also
N
|
maskextra=[integer] |
|
Default: Off |
Extends masking an extra distance of [integer] letters.
See also
echofilter, filter, wordmask, lcfilter, lcmask
|
matrix=[file] |
|
Default: BLOSUM62 |
Programs: blastp, blastx, tblastn, tblastx |
Specifies a scoring matrix file. The default is BLOSUM62. A large number of scoring matrices are distributed with WU-BLAST in the matrix/aa directory. Nucleotide matrices for use with blastn are in matrix/nt.
|
N=[integer] |
|
Default: -4 blastn |
Sets the mismatch score. This parameter is usually used for blastn only but may be used for other programs.
See also
M
|
nogap |
|
Default: Off |
Turns off gapped alignment. This parameter is useful in conjunction with altscore to prevent stop codons.
See also
altscore
|
nonnegok |
|
Default: Off |
Under Karlin-Altschul statistics, the expected score, must be negative. WU-BLAST normally exits with a fatal error if this isn't the case. Sometimes scoring schemes with positive expected scores are useful, and setting nonnegok silences the error condition.
See also
novalidctxok, errors
|
nosegs |
|
Default: Off |
WU-BLAST doesn't allow alignments to cross hyphen characters that act as query segment boundaries (e.g., for draft sequence). nosegs effectively converts hyphens to Ns.
|
notes |
|
Default: Off |
Suppresses informational messages. For example, if you are intentionally searching for a low-complexity sequence, you may wish to disable the message that suggests that a low-complexity filter would help remove meaningless alignments.
See also
errors, warnings
|
novalidctxok |
|
Default: Off |
If a sequence can't generate any significant HSPs, WU-BLAST normally exits with an error that says there are no valid contexts. You may see encounter such an error when searching a collection of sequencing reads, some of which are mostly (or completely) Ns. Setting novalidctxok allows you to continue without error.
See also
nonnegok, errors
|
nwlen=[integer] |
|
Default: End of sequence |
Sets the length of region for seeding.
See also
nwstart
|
nwstart=[integer] |
|
Default: 1 |
Sets the starting position for seeding alignments. nwstart and nwlen indicate that a specific region of the query should be seeded. Alignments may extend outside of this region. For example, nwstart=500 nwlen=200 seeds positions 500 to 700 of the query sequence.
See also
nwlen
|
o=[file] |
|
Default: stdout |
Write results to this file instead of to stdout (the screen).
|
olf=[number] |
|
Default: 0.125 |
Maximum fractional length of overlap for alignment consistency.
Consistent alignments must be ordered and have minimal overlap (see Chapter 5). The amount of permitted overlap is expressed as both a relative fraction and an absolute number. The default setting, 0.1, prevents alignments whose overlap length is more than 10 percent of the length of either alignment from being in the same group. The golf parameter plays the same role for gapped alignments. The olmax and golmax parameters control the absolute length of the overlap.
|
olmax=[integer] |
|
Default: Unlimited |
Maximum absolute length of overlap for alignment consistency. See the description for olf.
|
postsw |
|
Default: Off |
Programs: blastp |
Performs Smith-Waterman alignment after initial BLAST alignment to return the single maximum-scoring pair rather than several high-scoring pairs.
|
Q=[integer] |
|
Default: 10 blastn, 9 blastp, blastx, tblastn, tblastx |
Sets the cost for the first gap character.
See also
R
|
qoffset=[integer] |
|
Default: 0 |
Adjusts the query numbering by this amount—for example, if you search with a sequence that was known to have a vector sequence in the first 25 bases. By setting this parameter to 25, your numbering will be based on the insert sequence.
|
qrecmax=[integer] |
|
Default: 1 |
Last query sequence to search. See the description for qrecmin.
|
Qrecmin=[integer] |
|
Default: 1 |
By default, WU-BLAST produces one BLAST report for each query sequence in a FASTA files with multiple sequences. Setting qrecmin and qrecmax allows you to select a subset of query sequences in much the same way as dbrecmin and dbrecmax.
See also
qrecmax, dbrecmin, dbrecmax
|
R=[integer] |
|
Default: 10 blastn, 2 blastp, blastx, tblastn, tblastx |
Sets the cost for the second and remaining gap characters.
See also
Q
|
restest |
|
Default: Off |
blastp and blastx statistical tests are based on the number of residues (letters) in the database. If Z is set in conjunction with restest, blastn, tblastn, and tblastx will also be based on the number of letters.
See also
seqtest, Z
|
S=[integer] |
|
Default: Variable; calculated from E |
Sets the final score threshold. Since S and E are interconvertible through the Karlin-Altschul equation, setting S effectively sets E, and vice versa. When both are set, the more restrictive one is used.
See also
E
|
mS2=[integer] |
|
Default: Variable; depends on scoring parameters |
Score threshold for individual ungapped alignments. If both S2 and E2 are set, the more restrictive one is used.
See also
E2, gapS2, gapE2
|
seqtest |
blastn, tblastn, and tblastx statistical tests are based on the number of sequences in the database. If Z is set in conjunction with seqtest, blastp and blastx will also be based on the number of sequences.
See also
restest, Z
|
span, span1, span2 |
|
Default: span2 |
WU-BLAST normally discards HSPs that are contained completely within a larger, higher-scoring HSP. This behavior is called span2. If span1 is set, alignments are thrown out if they are subsets of the query or subject (unlike span2, both conditions aren't required). This is useful if the sequences contain many repeats. To prevent discarded alignments, set span. The output may become very large.
|
T=[integer] |
|
Default: 11 blastp, 12 blastx, 13 tblastn, 13 tblastx |
Sets the neighborhood word threshold score. Setting this value extremely high removes neighborhood words and makes seeding require matching words. T, W, and hitdist are the most effective parameters for controlling the sensitivity and speed of BLAST searches.
See also
W, hitdist
|
top |
|
Default: Off |
Programs: blastn, tblastx, blastx |
Searches only the top strand of the query.
See also
bottom
|
topcomboN=[integer] |
|
Default: Off |
Reports the number of consistent, or collinear, HSP combinations.
|
V=[integer] |
|
Default: 500 |
Controls the number of one-line summaries.
See also
B
|
warnings |
|
Default: Off |
WU-BLAST reports various warning conditions. This parameter turns them off.
See also
notes, errors
|
wink=[integer] |
|
Default: 1 |
Words are created by sliding a window of width W by wink letters at a time. If W equals wink, words don't overlap.
See also
W, T, hitdist
|
wordmask=[method] |
|
Default: Off |
Filters the query sequence for seeding only. Low-complexity region in the query sequence isn't used in the initial word search but is available for alignment during the extension stage; called soft masking.
See also
filter, lcfilter, lcmask, echofilter, maskextra
|
W=[integer] |
|
Default: 11 |
Sets the word size for seeding alignments.
See also
T, hitdist, wink
|
X=[integer] |
|
Default: Variable; depends on scoring parameters |
Controls the alignment extension cutoff for ungapped alignments.
See also
gapX
|
Y=[number] |
|
Default: Variable; depends on scoring parameters |
Sets the size of the query sequence.
See also
Z
|
Z=[number] |
|
Default: Variable; depends on scoring parameters |
Sets the size of the database in letters (restest is assumed), but Z may also be used to mean the number of sequences if seqtest is set.
See also
Y, seqtest, restest
14.4 xdformat Parameters
xdformat formats BLAST databases from FASTA files. It also reports descriptive information about the database and dumps the entire content to FASTA format.
Here are some examples:
xdformat -n files
xdformat -p files
zcat fasta.*.gz | xdformat -o my_db -n -- -
xdformat -n -i database
xdformat -n -r datatbase > fasta_file
|
-A [0..2] |
|
Default: 2 |
When indexing accession.version identifiers, you have three indexing options:
0
Accession only; version isn't stored
1
Stored as accession.version
2
Stored as both accession only and accession.version
|
-a [database] |
Appends sequences to the named database. If the database is indexed, the appended sequences will also be indexed.
|
-c [character] |
|
Default: Off |
If an invalid letter is encountered, xdformat terminates and reports an error message. If this occurs, check the sequence file for errors. After checking, you may either skip illegal characters with -k or change them to a legal character with -c. The typical operation for nucleotides is to set -c N, and for proteins -c X.
See also
-k
|
-D [integer] |
|
Default: Unlimited |
Sets the maximum length for definition lines.
|
-d [string] |
|
Default: None |
Sets a user-defined release date for the database. The date may have 63 characters at most.
See also
-v
|
-e [file] |
|
Default: stderr |
Appends information and errors to the named file.
|
-G |
|
Default: Off |
Prefaces each sequence with the database record number in the format of gnl|xdf|#.
|
-i |
|
Default: Off |
Reports descriptive information about a BLAST database. This is useful for determining when a database was created, how many sequences it contains, and if it is indexed.
|
-K [integer] |
|
Default: Unlimited |
Sets the maximum number of identifiers with Control-A separators. This is useful for trimming highly redundant sequences created with nrdb or another redundancy purifier that uses Control-A separators.
|
-k |
|
Default: Off |
If an invalid letter is encountered, xdformat terminates. If this occurs, you can either skip illegal characters with -k or change them to a legal letter with -c. Check the errors to ensure the input file is formatted properly.
See also
-c
|
-L [number] |
|
Default: 100000000 (100 million letters) |
Sets the maximum sequence length. For optimal performance, break up large sequences into smaller fragments no larger than 1 million letters.
|
-l [number] |
|
Default: 0 |
Sets the minimum sequence length.
|
-M [number] |
|
Default: 96m |
Sets the cache size for indexing. For faster indexing, the size may be increased (for example, -M 512m).
|
-O [4..8] |
|
Default: 4 |
Sets the number of bytes of precision. The default value allows databases of up to 4 billion amino acids or 16 billion nucleotides. If you expect a database to contain more than this limit, increasing precision by one level multiplies the limit by 256. Setting -O is necessary only if you append to the database because the precision automatically increases appropriately when databases are created.
|
-P [integer] |
|
Default: 60 |
This option applies only when dumping the entire content of a database with -r. -P controls the length of the sequence lines; -P 0 puts the whole sequence on one line.
See also
-r
|
-q [0..3] |
|
Default: 0 |
Certain files may contain numerous nonfatal errors in their identifier format. -q quiets these errors.
0
No silencing
1
Silences field1 errors
2
Silences field 2 errors
3
Silences all fields
|
-r |
|
Default: Off |
Reports (dumps) the entire database content to stdout in FASTA format.
|
-T [string] |
|
Default: Off |
This option lets you restrict indexing of identifiers to a particular database name or tag. The [string] has two parts: part 1 is the name of the database (e.g., gb for GenBank or emb for EMBL—see Chapter 10), and part 2 is either blank or a number.
blank
Index all identifiers.
0
Don't index.
1
Index only field 1.
2
Index only field 2.
Here are some examples:
-T emb0 doesn't index EMBL records.
-T gb1 indexes GenBank accession but not locus.
-T gb2 indexes GenBank locus but not accession.
-T gb index both accession and locus of GenBank records.
|
-v |
|
Default: Off |
Sets a user-defined version string for the database (a maximum of 63 characters).
See also
-d
|
-X |
|
Default: Off |
Databases that are formatted but not indexed may be indexed or re-indexed (e.g., with a different indexing scheme) with -X. In the following examples, the two commands on Line 1 are equivalent to the one on Line 2.
xdformat -n nt_db ; xdformat -n -X nt_db
xdformat -n -I nt_db
![]()
14.5 xdget Parameters
xdget retrieves files in FASTA format from databases formatted with xdformat (not formatdb, pressdb, or setdb). The database must have been indexed prior to using xdget (see -Iand -X in the previous section Section 14.4").
Here are a few example command lines. If identifiers contain vertical bars, as in the second example, you have to enclose the string in quotes to prevent the shell form interpreting them as pipes. This isn't required for identifier files.
xdget -n db 12345
xdget -p nr 'gi|11611819|gb|AAG39070.1|'
xdget -n -f db files_of_ids
|
-A [n, 0] |
|
Default: n |
Given an accession number without a version, xdget retrieves the latest version number. This parameter is set explicitly with -A n. If -A 0 is set, the earliest version number is retrieved.
See also
-d, -N
|
-a [integer] |
|
Default: 1 |
The -a and -b parameters retrieve a subsequence. For example, if you want to retrieve just nucleotides 1 to 100, include -a 1 -b 100. For nucleotide sequences, if -b is greater than -a, the sequence is returned as its reverse-complement.
See also
-b, -r, -t
|
-b [integer] |
|
Default: 0, end of sequence |
See -a above.
|
-d |
|
Default: Off |
Ordinarily, when duplicate identifiers are present, only one is retrieved. With -d, all duplicates are reported. Having duplicate identifiers is generally not a good idea.
See also
-A, -N
|
-D [integer] |
|
Default: Unlimited |
Sets the maximum definition line length. Using definition lines to store arbitrary sequence data is common. This option is useful when you don't need the whole definition line.
|
-e [file] |
|
Default: stderr |
Appends messages and errors to log file.
|
-F |
|
Default: Off |
Flushes the output stream after each request. This is useful for preventing I/O deadlocks between communicating processes.
|
-f |
|
Default: Off |
Indicates that files of identifiers are given on the command line. The file format is one identifier per line.
|
-G |
|
Default: Off |
Prefaces each definition line with its record number using the gnl namespace. The format is gnl|xdf|#.
|
-o [file] |
|
Default: stdout |
Reports FASTA files to the named file rather than stdout.
|
-N [0, n] |
|
Default: 0 |
For sequences with duplicate identifiers, the first one is retrieved by default. It is set explicitly with -N 0. Setting -N n retrieves the last one. Accession numbers with version numbers have different rules.
See also
-A, -d
|
-P [integer] |
|
Default: 60 |
Sets the maximum line length for sequence data. Setting -P 0 puts the entire sequence on one line.
|
-r |
|
Default: Off |
Returns the reverse complement for nucleotide sequences.
|
-T [string] |
|
Default: Off |
This option lets you restrict the lookup of identifiers to a particular database name or tag. For example, to look only in GenBank sequences, use -T gb. For only local, use -T lcl. For tags with multiple identifiers, a numeric suffix identifies which one to select. For example, -T gb1 selects accessions and -T gb2 selects loci. To prevent lookups in a database name, use zero. For example, -T gb0 omits GenBank records.
|
-t |
|
Default: Off |
Translates nt seq.