Longest Common Substring using Recursion and DP - recursion

I'm trying to find the Longest Common Substring of two strings using Recursion and DP. Please note that I'm not referring to Longest Contiguous subsequence. So, if the two strings were
String s1 = "abcdf"; String s2 = "bzcdf"
Longest Common Substring == "cdf" (not "bcdf").
Basically they have to be continuous elements
I am trying to do this using recursion and backtracking. However, the problem is that if I use a recursion such as below, the +1 are added upfront in a frame, that is higher up in the call stack, and unaware of whether the characters to come are indeed continuous elements or no. And so, going by the example above, "bcdf" would be the answer.
public class ThisIsLongestCommonSubsequence_NotSubstring {
public static void main(String[] args) {
String s1 = "abcdgh";
String s2 = "abefgh";
System.out.println(fun(s1, s1.length()-1, s2, s2.length()-1));
}
static int fun(String s1, int i, String s2, int j)
{
if(i == -1 || j == -1)
return 0;
int ret = 0;
if(s1.charAt(i) == s2.charAt(j))
ret = fun(s1, i-1, s2, j-1) + 1;
else
ret = max(fun(s1, i-1, s2, j), fun(s1, i, s2, j-1));
return ret;
}
static int max(int a, int b)
{
return a>b?a:b;
}
}
As for now, the code below is what I have come up with. Note how, I reset the count to 0, every time I find a mismatch. And keep track of the number of matching characters using a variable called int count, and record the highest at any point in program using a variable called int maxcount. My code below.
public class LongestContinuousSubstringGlobalvariable {
static int maxcount = 0;
public static void main(String[] args) {
String s1 = "abcdghijl";
String s2 = "abefghijk";
fun(s1, s2, s1.length()-1, s2.length()-1, 0);
System.out.println("maxcount == "+maxcount);
}
static void fun(String s1, String s2, int i, int j, int count)
{
if(i == -1 || j==-1)
return;
if(s1.charAt(i) == s2.charAt(j))
{
if(count+1 > maxcount)
maxcount = count+1;
fun(s1, s2, i-1, j-1, count+1);
}
else
{
fun(s1, s2, i-1, j, 0);
fun(s1, s2, i, j-1, 0);
}
}
}
This works fine. However, there are couple of things I don't like about my code
Use of the global variable (static int maxcount) to compare across frames
I don't think this is real dynamic programming or backtracking, since the lower frame is not returning it's output to a higher frame, which then decides what to do with it.
Please give me your inputs on how I can achieve this without the use of the global variable, and using backtracking.
PS : I am aware of other approaches to the problem, like keeping a matrix, and doing something like
M[i][j] = M[i-1][j-1]+1 if(str[i] == str[j])
The objective is not to solve the problem, but to find an elegant recursive/backtracking solution.

It could probably be done in Prolog. Following is the code which I could put down with help from this post: Foreach not working in Prolog , http://obvcode.blogspot.in/2008/11/working-with-strings-in-prolog.html and How do I find the longest list in a list of lists?
myrun(S1, S2):-
writeln("-------- codes of first string ---------"),
string_codes(S1, C1list),
writeln(C1list),
writeln("-------- codes of second string ---------"),
string_codes(S2, C2list),
writeln(C2list),
writeln("--------- substrings of first --------"),
findall(X, sublist(X, C1list), L),
writeln(L),
writeln("--------- substrings of second --------"),
findall(X, sublist(X, C2list), M),
writeln(M),
writeln("------ codes of common substrings -------"),
intersection(L,M, Outl),
writeln(Outl),
writeln("--------- common strings in one line -------"),
maplist(string_codes, Sl, Outl),
writeln(Sl),
writeln("------ common strings one by one -------"),
maplist(writeln, Sl),
writeln("------ find longest -------"),
longest(Outl, LongestL),
writeln(LongestL),
string_codes(LongestS, LongestL),
writeln(LongestS).
sublist(S, L) :-
append(_, L2, L),
append(S, _, L2).
longest([L], L) :-
!.
longest([H|T], H) :-
length(H, N),
longest(T, X),
length(X, M),
N > M,
!.
longest([H|T], X) :-
longest(T, X),
!.
It runs showing all the steps: It convert strings to codes, then make all possible substrings from both, then find those which are common and lists them:
?- myrun("abcdf", "bzcdf").
-------- codes of first string ---------
[97,98,99,100,102]
-------- codes of second string ---------
[98,122,99,100,102]
--------- substrings of first --------
[[],[97],[97,98],[97,98,99],[97,98,99,100],[97,98,99,100,102],[],[98],[98,99],[98,99,100],[98,99,100,102],[],[99],[99,100],[99,100,102],[],[100],[100,102],[],[102],[]]
--------- substrings of second --------
[[],[98],[98,122],[98,122,99],[98,122,99,100],[98,122,99,100,102],[],[122],[122,99],[122,99,100],[122,99,100,102],[],[99],[99,100],[99,100,102],[],[100],[100,102],[],[102],[]]
------ codes of common substrings -------
[[],[],[98],[],[99],[99,100],[99,100,102],[],[100],[100,102],[],[102],[]]
--------- common strings in one line -------
[,,b,,c,cd,cdf,,d,df,,f,]
------ common strings one by one -------
b
c
cd
cdf
d
df
f
------ find longest -------
[99,100,102]
cdf
true.
Ignore the 'true' at end.
If explanatory parts are removed, program is much shorter:
myrun(S1, S2):-
string_codes(S1, C1list),
string_codes(S2, C2list),
findall(X, sublist(X, C1list), L),
findall(X, sublist(X, C2list), M),
intersection(L,M, Outl),
longest(Outl, LongestL),
string_codes(LongestS, LongestL),
writeln(LongestS).
sublist(S, L) :-
append(_, L2, L),
append(S, _, L2).
longest([L], L) :-
!.
longest([H|T], H) :-
length(H, N),
longest(T, X),
length(X, M),
N > M,
!.
longest([H|T], X) :-
longest(T, X),
!.
?- myrun("abcdf", "bzcdf").
cdf
true.

Related

Memoization code for "Longest Common Substring" doesn't work as expected

I was able to think of a recursive solution for the problem "Longest Common Substring" but when I try to memoize it, it doesn't seem to work as I expected it to, and throws a wrong answer.
Here is the recursive code.
int lcs(string X, string Y,int i, int j, int count)
{
if (i == 0 || j == 0)
return count;
if (X[i - 1] == Y[j - 1])
count = lcs(X,Y,i - 1, j - 1, count + 1);
count = max(count,max(lcs(X,Y,i, j-1, 0),lcs(X,Y,i - 1, j, 0)));
return count;
}
int longestCommonSubstr(string S1, string S2, int n, int m)
{
return lcs(S1,S2,n,m,0,dp);
}
And here is the memoized code.
int lcs(string X, string Y,int i, int j, int count,vector<vector<vector<int>>>& dp)
{
if (i == 0 || j == 0)
return count;
if(dp[i - 1][j - 1][count] != -1)
return dp[i - 1][j - 1][count];
if (X[i - 1] == Y[j - 1])
count = lcs(X, Y, i - 1, j - 1, count + 1, dp);
count = max(count,max(lcs(X,Y,i, j-1, 0,dp),lcs(X,Y,i - 1, j, 0,dp)));
return dp[i-1][j-1][count]=count;
}
int longestCommonSubstr(string S1, string S2, int n, int m)
{
int maxSize=max(n,m);
vector<vector<vector<int>>> dp(n,vector<vector<int>>(m,vector<int>(maxSize,-1)));
return lcs(S1,S2,n,m,0,dp);
}
I do know that the problem can be solved using a 2D DP vector as well but my objective was to convert my original recursive solution to a memoized solution and not write a solution from scratch. And as I have 3 parameters which are changing, so it should use a 3D DP table.
Can anyone figure out what's wrong or help me out with a 3D DP solution with recursive code same or similar to mine.
Note:-
An interesting observation, the max function for some reason works from left to right on my Mac system and on Ubuntu running under parallels as well, but the same function works from right to left in Windows machine and in online compilers. I do not know the reason but I would be happy to know about it. I'm running the code in an M1 Mac, I don't know if the ARM compiler is different from x86 Mac compiler or not.
Another thing, the memoized code gives different answers depending upon which recursive call is called first on the line,
count = max(count,max(lcs(X,Y,i, j-1, 0),lcs(X,Y,i - 1, j, 0)));
If I swap the positions of the function call statements then it gives a correct output but for that specific test case and probably similar cases.
This Memo solution gives TLE as well in large test cases, and I do not know why.
I recently started studying DP and this is the only question which I wasn't able to solve by just modifying the original recursive solution. It has been two days and I just can't figure out the proper reasons.
Submission Link:- https://practice.geeksforgeeks.org/problems/longest-common-substring1452/1/#
Any help in this regard would be great.

Edit distance leetcode

So I am doing this question of EDIT DISTANCE and before going to DP approach I am trying to solve this question in recursive manner and I am facing some logical error, please help....
Here is my code -
class Solution {
public int minDistance(String word1, String word2) {
int n=word1.length();
int m=word2.length();
if(m<n)
return Solve(word1,word2,n,m);
else
return Solve(word2,word1,m,n);
}
private int Solve(String word1,String word2,int n,int m){
if(n==0||m==0)
return Math.abs(n-m);
if(word1.charAt(n-1)==word2.charAt(m-1))
return 0+Solve(word1,word2,n-1,m-1);
else{
//insert
int insert = 1+Solve(word1,word2,n-1,m);
//replace
int replace = 1+Solve(word1,word2,n-1,m-1);
//delete
int delete = 1+Solve(word1,word2,n-1,m);
int max1 = Math.min(insert,replace);
return Math.min(max1,delete);
}
}
}
here I am checking the last element of both the strings if both the characters are equal then simple moving both string to n-1 and m-1 resp.
Else
Now I am having 3 cases of insertion , deletion and replace ,and between these 3 I have to find minima.
If I am replacing the character then simply I moved the character to n-1 & m-1.
If I am inserting the character from my logic I think I should insert the character at the last of smaller length string and move the pointer to n-1 and m
To delete the element I think I should delete the element from the larger length String that's why I move pointer to n-1 and m but I think I am making mistake here please help.
Leetcode is giving me wrong answer for word1 = "plasma" and word2 = "altruism".
The problem is that the recursive expression for the insert-case is the same as for the delete-case.
Reasoning further, it turns out the one for the insert-case is wrong. In that case we choose to resolve the letter in word2 (at index m-1) through insertion, so it should not be considered any more during the recursive process. On the other hand the considered letter in word1 could still be matched with another letter in word2, so that letter should still be considered during the recursive process.
That means that m should be decremented, not n.
So change:
int insert = 1+Solve(word1,word2,n-1,m);
to:
int insert = 1+Solve(word1,word2,n,m-1);
...and it will work. Then remains to add the memoization for getting a good efficiency.
Python clean DP based solution,
class Solution:
def minDistance(self, word1: str, word2: str) -> int:
return self.edit_distance(word1, word2)
#cache
def edit_distance(self, s, t):
# Edge conditions
if len(s) == 0:
return len(t)
if len(t) == 0:
return len(s)
# If 1st char matches
if s[0] == t[0]:
return self.edit_distance(s[1:], t[1:])
else:
return min(
1 + self.edit_distance(s[1:], t), # delete
1 + self.edit_distance(s, t[1:]), # insert
1 + self.edit_distance(s[1:], t[1:]) # replace
)

How to find a pair of numbers in a list given a specific range?

The problem is as such:
given an array of N numbers, find two numbers in the array such that they will have a range(max - min) value of K.
for example:
input:
5 3
25 9 1 6 8
output:
9 6
So far, what i've tried is first sorting the array and then finding two complementary numbers using a nested loop. However, because this is a sort of brute force method, I don't think it is as efficient as other possible ways.
import java.util.*;
public class Main {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int n = sc.nextInt(), k = sc.nextInt();
int[] arr = new int[n];
for(int i = 0; i < n; i++) {
arr[i] = sc.nextInt();
}
Arrays.sort(arr);
int count = 0;
int a, b;
for(int i = 0; i < n; i++) {
for(int j = i; j < n; j++) {
if(Math.max(arr[i], arr[j]) - Math.min(arr[i], arr[j]) == k) {
a = arr[i];
b = arr[j];
}
}
}
System.out.println(a + " " + b);
}
}
Much appreciated if the solution was in code (any language).
Here is code in Python 3 that solves your problem. This should be easy to understand, even if you do not know Python.
This routine uses your idea of sorting the array, but I use two variables left and right (which define two places in the array) where each makes just one pass through the array. So other than the sort, the time efficiency of my code is O(N). The sort makes the entire routine O(N log N). This is better than your code, which is O(N^2).
I never use the inputted value of N, since Python can easily handle the actual size of the array. I add a sentinel value to the end of the array to make the inner short loops simpler and quicker. This involves another pass through the array to calculate the sentinel value, but this adds little to the running time. It is possible to reduce the number of array accesses, at the cost of a few more lines of code--I'll leave that to you. I added input prompts to aid my testing--you can remove those to make my results closer to what you seem to want. My code prints the larger of the two numbers first, then the smaller, which matches your sample output. But you may have wanted the order of the two numbers to match the order in the original, un-sorted array--if that is the case, I'll let you handle that as well (I see multiple ways to do that).
# Get input
N, K = [int(s) for s in input('Input N and K: ').split()]
arr = [int(s) for s in input('Input the array: ').split()]
arr.sort()
sentinel = max(arr) + K + 2
arr.append(sentinel)
left = right = 0
while arr[right] < sentinel:
# Move the right index until the difference is too large
while arr[right] - arr[left] < K:
right += 1
# Move the left index until the difference is too small
while arr[right] - arr[left] > K:
left += 1
# Check if we are done
if arr[right] - arr[left] == K:
print(arr[right], arr[left])
break

why there is infinite recursion?

Given the following function definition, what value is returned by the call dub(2.0, 4)?
double dub(double z, int n) {
if (z == 0) return z;
return 2 * dub(z, n-1);
}
Your question is not very clear are you asking "why is this function resulting in infinite recursion?" if so the reason is because you are checking the wrong value in the edge condition. the function should be written as so
double dub(double z, int n) {
if (n == 0) return z;
return 2 * dub(z, n-1);
}
noitice the conditional was changed to n == 0 as opposed to z == 0
what value is returned by the call dub(2.0, 4)?
It will not return, let alone return any value.
The reason is z never becomes 0, so your base case condition never becomes true, hence the function never returns. So it gets called infinitely many times, which you have already seen as your title says. A recursive function ends when it reaches the base case, and in your situation the base case is never attained.
Read about recursion:
Programmers.stackexchange
Wikipedia
May be this is what you originally intended:
double dub(double z, int n) {
if (n == 0) return z; // checking if n is 0, instead of z
return 2 * dub(z, n-1);
}

How many valid parenthesis combinations?

We have:
n1 number of {} brackets ,
n2 number of () brackets ,
n3 number of [] brackets ,
How many different valid combination of these brackets we can have?
What I thought: I wrote a brute force code in java (which comes in the following) and counted all possible combinations, I know it's the worst solution possible,
(the code is for general case in which we can have different types of brackets)
Any mathematical approach ?
Note 1: valid combination is defined as usual, e.g. {{()}} : valid , {(}){} : invalid
Note 2: let's assume that we have 2 pairs of {} , 1 pair of () and 1 pair of [], the number of valid combinations would be 168 and the number of all possible (valid & invalid) combinations would be 840
static void paranthesis_combination(char[] open , char[] close , int[] arr){
int l = 0;
for (int i = 0 ; i < arr.length ; i++)
l += arr[i];
l *= 2;
paranthesis_combination_sub(open , close , arr , new int[arr.length] , new int[arr.length], new StringBuilder(), l);
System.out.println(paran_count + " : " + valid_paran_count);
return;
}
static void paranthesis_combination_sub(char[] open , char[] close, int[] arr , int[] open_so_far , int[] close_so_far, StringBuilder strbld , int l){
if (strbld.length() == l && valid_paran(open , close , strbld)){
System.out.println(new String(strbld));
valid_paran_count++;
return;
}
for (int i = 0 ; i < open.length ; i++){
if (open_so_far[i] < arr[i]){
strbld.append(open[i]);
open_so_far[i]++;
paranthesis_combination_sub(open , close, arr , open_so_far , close_so_far, strbld , l);
open_so_far[i]--;
strbld.deleteCharAt(strbld.length() -1 );
}
}
for (int i = 0 ; i < open.length ; i++){
if (close_so_far[i] < open_so_far[i]){
strbld.append(close[i]);
close_so_far[i]++;
paranthesis_combination_sub(open , close, arr , open_so_far , close_so_far, strbld , l);
close_so_far[i]--;
strbld.deleteCharAt(strbld.length() -1 );
}
}
return;
}
Cn is the nth Catalan number, C(2n,n)/(n+1), and gives the number of valid strings of length 2n that use only (). So if we change all [] and {} into (), there would be Cn1+n2+n3 ways. Then there are C(n1+n2+n3,n1) ways to change n1 () back to {}, and C(n2+n3,n3) ways to change the remaining () into []. Putting that all together, there are C(2n1+2n2+2n3,n1+n2+n3)C(n1+n2+n3,n1)C(n2+n3,n3)/(n1+n2+n3+1) ways.
As a check, when n1=2, n2=n3=1, we have C(8,4)C(4,2)C(2,1)/5=168.
In general, infinitely. However I assume, that you meant to find how many combinations are there provided limited string length. For simplicity lets assume that the limit is an even number. Then, lets create an initial string:
(((...()...))) with length equal to the limit.
Then, we can switch any instance of () pair with [] or {} parenthesis. However, if we change an opening brace, then we ought to change the matching closing brace. So, we can look only at the opening braces, or at pairs. For each parenthesis pair we have 4 options:
leave it unchanged
change it to []
change it to {}
remove it
So, for each of (l/2) objects we choose one of four labels, which gives:
4^(l/2) possibilities.
EDIT: this assumes only "concentric" parenthesis strings (contained in each other), as you've suggested in your edit. Intuitively however, a valid combination is also: ()[]{} - this solution does not take this into account.

Resources