Package jebl.evolution.sequences
Class Utils
java.lang.Object
jebl.evolution.sequences.Utils
- Version:
- $Id: Utils.java 918 2008-06-04 01:28:08Z twobeers $
- Author:
- Andrew Rambaut, Alexei Drummond
-
Method Summary
Modifier and TypeMethodDescriptionstatic State[]
cleanSequence
(CharSequence seq, SequenceType type) Produce a clean sequence filtered of spaces and digits.static NucleotideState[]
complement
(NucleotideState[] sequence) static int
getGaplessLocation
(Sequence sequence, int gappedLocation) Gets the site location index for this sequence excluding any gaps.static int
getGappedLocation
(Sequence sequence, int gaplessLocation) Gets the site location index for this sequence that corresponds to a location given excluding all gaps.static byte[]
getStateIndices
(State[] sequence) static int
getStopCodonCount
(Sequence sequence) Counts the number of stop codons in an amino acid sequencestatic SequenceType
Guess type of sequence from contents.static boolean
isPredominantlyRNA
(CharSequence sequenceString, int maximumNonGapsToLookAt) Is the given NucleotideSequence predominantly RNA? (i.e the more occurrences of "U" than "T")static State[]
replaceStates
(State[] sequence, List<State> searchStates, State replaceState) Searchers and replaces a sequence of any states givenstatic State[]
static String
reverseComplement
(String nucleotideSequence) static NucleotideState[]
reverseComplement
(NucleotideState[] sequence) static String
reverseComplementWithGaps
(String nucleotideSequence) static State[]
Strips a sequence of gapsstatic State[]
stripStates
(State[] sequence, List<State> stripStates) Strips a sequence of any states givenstatic String
static String
translate
(String nucleotideSequence, GeneticCode geneticCode) A wrapper fortranslateCharSequence(CharSequence,GeneticCode)
that takes a nucleotide sequence as a String only rather than a CharSequence.static Sequence
translate
(Sequence sequence, GeneticCode geneticCode) static Sequence
translate
(Sequence sequence, GeneticCode geneticCode, int readingFrame) static AminoAcidState[]
translate
(State[] states, GeneticCode geneticCode) Translates each of a given sequence ofNucleotideState
s orCodonState
s to theAminoAcidState
corresponding to it under the given genetic code.static AminoAcidState[]
translate
(State[] states, GeneticCode geneticCode, int readingFrame) Translates each of a given sequence ofNucleotideState
s orCodonState
s to theAminoAcidState
corresponding to it under the given genetic code.static String
translateCharSequence
(CharSequence nucleotideSequence, GeneticCode geneticCode) Translates the given nucleotideSequence into an amino acid sequence string, using the given geneticCode.
-
Method Details
-
translate
Translates a givenSequence
to a correspondingSequence
under the given genetic code. Simply a utility function that calls AminoAcidState[] translate(final State[] states, GeneticCode geneticCode)- Parameters:
sequence
- the Sequence.geneticCode
-- Returns:
-
translate
Translates a givenSequence
to a correspondingSequence
under the given genetic code. Simply a utility function that calls AminoAcidState[] translate(final State[] states, GeneticCode geneticCode)- Parameters:
sequence
- the Sequence.geneticCode
-readingFrame
-- Returns:
-
translate
Translates each of a given sequence ofNucleotideState
s orCodonState
s to theAminoAcidState
corresponding to it under the given genetic code. Translation doesn't stop at stop codons; these are translated toAminoAcids.STOP_STATE
. If translating fromNucleotideState
and the number of states is not a multiple of 3, then the excess states at the end are silently dropped.- Parameters:
states
- States to translate; must all be of the same type, either NucleotideState or CodonState.geneticCode
-- Returns:
-
translate
Translates each of a given sequence ofNucleotideState
s orCodonState
s to theAminoAcidState
corresponding to it under the given genetic code. Translation doesn't stop at stop codons; these are translated toAminoAcids.STOP_STATE
. If translating fromNucleotideState
and the number of states is not a multiple of 3, then the excess states at the end are silently dropped.- Parameters:
states
- States to translate; must all be of the same type, either NucleotideState or CodonState.geneticCode
-readingFrame
-- Returns:
-
isPredominantlyRNA
Is the given NucleotideSequence predominantly RNA? (i.e the more occurrences of "U" than "T")- Parameters:
sequenceString
- the sequence string to inspect to determine if it's RNAmaximumNonGapsToLookAt
- for performance reasons, only look at a maximum of this many non-gap residues in deciding if the sequence is predominantly RNA. Can be -1 or Integer.MAX_VALUE to look at the entire sequence.- Returns:
- true if the given NucleotideSequence predominantly RNA
-
reverseComplement
-
reverseComplementWithGaps
-
translateCharSequence
public static String translateCharSequence(CharSequence nucleotideSequence, GeneticCode geneticCode) Translates the given nucleotideSequence into an amino acid sequence string, using the given geneticCode. The translation is done triplet by triplet, starting with the triplet that is at index 0..2 in nucleotideSequence, then the one at index 3..5 etc. until there are less than 3 nucleotides left. This method usestranslate(State[],GeneticCode)
to do the translation, hence it shares some properties with that method: 1.) Any excess nucleotides at the end will be silently discarded, 2.) Translation doesn't stop at stop codons; instead, they are translated to "*", which isAminoAcids.STOP_STATE
's code.- Parameters:
nucleotideSequence
- nucleotide sequence to translategeneticCode
- genetic code to use for the translation- Returns:
- A string with length nucleotideSequence.length() / 3 (rounded
down), the translation of
nucleotideSequence
with the given genetic code
-
translate
A wrapper fortranslateCharSequence(CharSequence,GeneticCode)
that takes a nucleotide sequence as a String only rather than a CharSequence. This is to preserve backwards compatibility with existing compiled code.- Parameters:
nucleotideSequence
- nucleotide sequence string to translategeneticCode
- genetic code to use for the translation- Returns:
- A string with length nucleotideSequence.length() / 3 (rounded
down), the translation of
nucleotideSequence
with the given genetic code
-
stripGaps
Strips a sequence of gaps- Parameters:
sequence
- the sequence- Returns:
- the stripped sequence
-
stripStates
Strips a sequence of any states given- Parameters:
sequence
- the sequencestripStates
- the states to strip- Returns:
- an array of states
-
replaceStates
Searchers and replaces a sequence of any states given- Parameters:
sequence
- the sequencesearchStates
- the states to search for- Returns:
- an array of states
-
reverse
-
complement
-
reverseComplement
-
getStateIndices
-
getGaplessLocation
Gets the site location index for this sequence excluding any gaps. The location is indexed from 0.- Parameters:
sequence
- the sequencegappedLocation
- the location including gaps- Returns:
- the location without gaps.
-
getGappedLocation
Gets the site location index for this sequence that corresponds to a location given excluding all gaps. The first non-gapped site in the sequence has a gaplessLocation of 0.- Parameters:
sequence
- the sequencegaplessLocation
-- Returns:
- the site location including gaps
-
guessSequenceType
Guess type of sequence from contents.- Parameters:
seq
- the sequence- Returns:
- SequenceType.NUCLEOTIDE or SequenceType.AMINO_ACID, if sequence is believed to be of that type. If the sequence contains characters that are valid for neither of these two sequence types, then this method returns null.
-
getStopCodonCount
Counts the number of stop codons in an amino acid sequence- Parameters:
sequence
- the sequence string to count stop codons- Returns:
- the number of stop codons
-
cleanSequence
Produce a clean sequence filtered of spaces and digits.- Parameters:
seq
- the sequencetype
- the sequence type- Returns:
- An array of valid states of SequenceType (may be shorter than the original sequence)
-
toString
-