Distinct Subsequences

Difficulty Level Hard
Frequently asked in Amazon Bloomberg Mathworks
Dynamic Programming StringViews 2242

Given two strings S and P1, we have to count all the number of distinct subsequences of S which equals P1.

Note: A subsequence of a given string is a string that we archive by deleting some characters or possible zero characters also from the original string. We can’t change the order of the elements present in the original string.

For example:

“EGY” is a subsequence of “AEDGYS” while “DAY” is not.

Note: Every substring is a subsequence but every subsequence is not a substring.

Brute force approach

We will check all the subsequences of string S with length the same as string P1.

Algorithm:

  1. Take an empty string “ans” which will store your string for the current state.
  2. Run a recursion in which arguments pointers pointing to the index of string S and string P on which we will operate.
  3. Put base cases.
    • If the pointer of string T becomes equal to the length of string P1, then return 1.
    • If the pointer of string S becomes equal to the length of string S, then return 0.

Optimal Sub-structure:

To consider all subsequence of string S, there can be two cases for every index.

  • Case 1: The item is included in the optimal subset.
  • Case 2: The item is not included in the optimal set.

If the values of pointers of both strings are not equal then only case 2 is considered.

Implementation

C++ Program For Distinct Subsequences

#include <bits/stdc++.h>
using namespace std;
int ds(string &s, string &t, string &ans, int pointer_S, int pointer_T)
{
    if (pointer_T == t.length())
    {
        return 1;
    }
    if (pointer_S >= s.length())
    {
        return 0;
    }
    int count = 0;
    count += ds(s, t, ans, pointer_S + 1, pointer_T);
    if (s[pointer_S] == t[pointer_T])
    {
        ans += s[pointer_S];
        count += ds(s, t, ans, pointer_S + 1, pointer_T + 1);
        ans.pop_back();
    }
    return count;
}
int main()
{
    string S, P;
    cin >> S >> P;
    string ans;
    cout << ds(S, P, ans, 0, 0) << endl;
    return 0;
}

 

If we observe the above function carefully, we can see that it computes the same sub-problems again and again. See the following recursion tree. The time complexity of this naive recursive solution is exponential.

Distinct Subsequences

 

Since sub problems are evaluated again, this problem has Overlapping Sub-problems property.

So this problem has both properties for a dynamic programming problem:

  1. Overlapping sub problems
  2. Optimal substructure

Like in other dynamic programming problems, in this also we will store the answers of every state so that we will not compute the same sub problem again and again.

Dynamic programming approach

Algorithm

  1. Initialize an array dp[][] of size (length of string S+1)*(length of string P1+1) with zeros, where dp[i][j] represents the number of distinct subsequences of string S[0….i-1] which are equal to P1[0….j-1].
  2. Base case:

for I in range 0 to n

dp[i][0]=1, because to 1 subsequence (empty subsequence) which is equal to empty string.

  1. Now there are two cases

CASE 1: if S[i-1]==P1[j-1]

In this case we can choose either ith element or i-1th element for jth element, so

dp[i][j]=dp[i-1][j-1]+dp[i-1][j].

CASE 2: if S[i-1]!=P1[j-1]

In this case we can only choose  i-1th element for jth element, so

dp[i][j]= dp[i-1][j].

  1. Return dp[n][m].

Implementation

C++ Program For Distinct Subsequences

#include <bits/stdc++.h>
using namespace std;
int distinctSubsequence(string s, string t)
{
    long long n, m;
    n = s.length();
    m = t.length();
    vector<vector<long long>> dp(n + 1, vector<long long>(m + 1, 0));
    for (long long i = 0; i <= n; i++)
    {
        dp[i][0] = 1;
    }
    for (long long i = 1; i <= n; i++)
    {
        for (long long j = 1; j <= m; j++)
        {
            if (s[i - 1] == t[j - 1])
            {
                dp[i][j] = dp[i - 1][j] + dp[i - 1][j - 1];
            }
            else
            {
                dp[i][j] = dp[i - 1][j];
            }
        }
    }
    return dp[n][m];
}
int main()
{
    string s="abbbcdd", p="abc";
    cout << distinctSubsequence(s, p) << endl;
    return 0;
}

Output

3

JAVA Program For Distinct Subsequences

public class Main
{
    public static int distinctSubsequence(String s, String t)
    {
        int n = s.length(), m = t.length();
        int[][] dp = new int[n + 1][m + 1];
        for (int i = 0; i <= n; i++) {
            dp[i][0] = 1;
        }
        
        for (int i = 0; i < n; i++) {
            for (int j = 0; j < m; j++) {
                if (s.charAt(i) == t.charAt(j)) dp[i + 1][j + 1] = dp[i][j] + dp[i][j + 1];
                else dp[i + 1][j + 1] = dp[i][j + 1];
            }
        }
        return dp[n][m];
    }
  public static void main(String[] args) {
    String s="abccccdee", p ="abe";
    System.out.println(distinctSubsequence(s,p));
  }
}

Output

2

Time Complexity

T(n) = O(n*m) as we are filling the 2D matrix( dp ) of size n*m using two nested loops, hence the time complexity will be O(n*m)

where n is the length of 1st string and m is the length of the 2nd string.

References

Translate »