Repeated DNA Sequences

07/16/2016 Hash Table Bit Manipulation

Question

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that > occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return: ["AAAAACCCCC", "CCCCCAAAAA"].


Solution

Result: Accepted Time: 136 ms

Here should be some explanations.

class Solution {
public:
    int chrs[256]={0};
    int get_value(const string s)
    {
        int ret = 0;
        for(const auto x:s)
            ret = (ret<<2)|chrs[x];
        return ret;
    }
    vector<string> findRepeatedDnaSequences(string s) {
        chrs['A'] = 0;
        chrs['C'] = 1;
        chrs['G'] = 2;
        chrs['T'] = 3;
        unordered_map<int,int> mp;
        vector<string> ret;
        const int limit = int(s.length()) - 9;
        for(int i = 0; i < limit; i++)
        {
            string tmp = s.substr(i,10);
            int value = get_value(tmp);
            mp[value]++;
            if(mp[value] == 2)
                ret.push_back(tmp);
        }
        return ret;
    }
};

Complexity Analytics

  • Time Complexity:
  • Space Complexity: