Tracing a recursive function for longest common substring - recursion

I am getting confused tracing the following recursive approach to find the longest common substring. The last two lines are where my confusion is. Specifically how is the count variable getting the answer when characters of both string matches? In the last line which "count" does this refer to i.e count in the function definition or the updated count from function call? Are there any resources for better understanding of recursions?
int recursive_substr(string a, string b, int m, int n,int count){
if (m == -1 || n == -1) return count;
if (a[m] == b[n]) {
count = recursive_substr(a,b,m-1,n-1,++count);
}
return max(count,max(recursive_substr(a,b,m,n-1,0),recursive_substr(a,b,m-1,n,0)));
}

The first thing to understand is what values to use for the parameters the very first time you call the function.
Consider the two following strings:
std::string a = "helloabc";
std::string b = "hello!abc";
To figure out the length of the longest common substring, you can call the function this way:
int length = recursive_substr(a, b, a.length()-1, b.length()-1, 0);
So, m begins as the index of the last character in a, and n begins as the index of the last character in b. count begins as 0.
During execution, m represents the index of the current character in a, n represents the index of the current character in b, and count represents the length of the current common substring.
Now imagine we're in the middle of the execution, with m=4 and n=5 and count=3.
We're there:
a= "helloabc"
^m
b="hello!abc" count=3
^n
We just saw the common substring "abc", which has length 3, and that is why count=3. Now, we notice that a[m] == 'o' != '!' == b[n]. So, we know that we can't extend the common substring "abc" into a longer common substring. We make a note that we have found a common substring of length 3, and we start looking for another common substring between "hello" and "hello!". Since 'o' and '!' are different, we know that we should exclude at least one of the two. But we don't know which one. So, we make two recursive calls:
count1 = recursive_substr(a,b,m,n-1,0); // length of longest common substring between "hello" and "hello"
count2 = recursive_substr(a,b,m-1,n,0); // length of longest common substring between "hell" and "hello!"
Then, we return the maximum of the three lengths we've collected:
the length count==3 of the previous common substring "abc" we had found;
the length count1==5 of the longest common substring between "hello" and "hello";
the length count2==4 of the longest common substring between "hell" and "hello!".

Related

Extract all substrings in string

I want to extract all substrings that begin with M and are terminated by a *
The string below as an example;
vec<-c("SHVANSGYMGMTPRLGLESLLE*A*MIRVASQ")
Would ideally return;
MGMTPRLGLESLLE
MTPRLGLESLLE
I have tried the code below;
regmatches(vec, gregexpr('(?<=M).*?(?=\\*)', vec, perl=T))[[1]]
but this drops the first M and only returns the first string rather than all substrings within.
"GMTPRLGLESLLE"
You can use
(?=(M[^*]*)\*)
See the regex demo. Details:
(?= - start of a positive lookahead that matches a location that is immediately followed with:
(M[^*]*) - Group 1: M, zero or more chars other than a * char
\* - a * char
) - end of the lookahead.
See the R demo:
library(stringr)
vec <- c("SHVANSGYMGMTPRLGLESLLE*A*MIRVASQ")
matches <- stringr::str_match_all(vec, "(?=(M[^*]*)\\*)")
unlist(lapply(matches, function(z) z[,2]))
## => [1] "MGMTPRLGLESLLE" "MTPRLGLESLLE"
If you prefer a base R solution:
vec <- c("SHVANSGYMGMTPRLGLESLLE*A*MIRVASQ")
matches <- regmatches(vec, gregexec("(?=(M[^*]*)\\*)", vec, perl=TRUE))
unlist(lapply(matches, tail, -1))
## => [1] "MGMTPRLGLESLLE" "MTPRLGLESLLE"
This could be done instead with a for loop on a char array converted from you string.
If you encounter a M you start concatenating chars to a new string until you encounter a *, when you do encounter a * you push the new string to an array of strings and start over from the first step until you reach the end of your loop.
It's not quite as interesting as using REGEX to do it, but it's failsafe.
It is not possible to use regular expressions here, because regular languages don't have memory states required for nested matches.
stringr::str_extract_all("abaca", "a[^a]*a") only gives you aba but not the sorrounding abaca.
The first M was dropped, because (?<=M) is a positive look behind which is by definition not part of the match, but just behind it.

Edit distance leetcode

So I am doing this question of EDIT DISTANCE and before going to DP approach I am trying to solve this question in recursive manner and I am facing some logical error, please help....
Here is my code -
class Solution {
public int minDistance(String word1, String word2) {
int n=word1.length();
int m=word2.length();
if(m<n)
return Solve(word1,word2,n,m);
else
return Solve(word2,word1,m,n);
}
private int Solve(String word1,String word2,int n,int m){
if(n==0||m==0)
return Math.abs(n-m);
if(word1.charAt(n-1)==word2.charAt(m-1))
return 0+Solve(word1,word2,n-1,m-1);
else{
//insert
int insert = 1+Solve(word1,word2,n-1,m);
//replace
int replace = 1+Solve(word1,word2,n-1,m-1);
//delete
int delete = 1+Solve(word1,word2,n-1,m);
int max1 = Math.min(insert,replace);
return Math.min(max1,delete);
}
}
}
here I am checking the last element of both the strings if both the characters are equal then simple moving both string to n-1 and m-1 resp.
Else
Now I am having 3 cases of insertion , deletion and replace ,and between these 3 I have to find minima.
If I am replacing the character then simply I moved the character to n-1 & m-1.
If I am inserting the character from my logic I think I should insert the character at the last of smaller length string and move the pointer to n-1 and m
To delete the element I think I should delete the element from the larger length String that's why I move pointer to n-1 and m but I think I am making mistake here please help.
Leetcode is giving me wrong answer for word1 = "plasma" and word2 = "altruism".
The problem is that the recursive expression for the insert-case is the same as for the delete-case.
Reasoning further, it turns out the one for the insert-case is wrong. In that case we choose to resolve the letter in word2 (at index m-1) through insertion, so it should not be considered any more during the recursive process. On the other hand the considered letter in word1 could still be matched with another letter in word2, so that letter should still be considered during the recursive process.
That means that m should be decremented, not n.
So change:
int insert = 1+Solve(word1,word2,n-1,m);
to:
int insert = 1+Solve(word1,word2,n,m-1);
...and it will work. Then remains to add the memoization for getting a good efficiency.
Python clean DP based solution,
class Solution:
def minDistance(self, word1: str, word2: str) -> int:
return self.edit_distance(word1, word2)
#cache
def edit_distance(self, s, t):
# Edge conditions
if len(s) == 0:
return len(t)
if len(t) == 0:
return len(s)
# If 1st char matches
if s[0] == t[0]:
return self.edit_distance(s[1:], t[1:])
else:
return min(
1 + self.edit_distance(s[1:], t), # delete
1 + self.edit_distance(s, t[1:]), # insert
1 + self.edit_distance(s[1:], t[1:]) # replace
)

How to match phonemic transcriptions with a single vowel except if a condition applies

I have phonemic transcriptions of English words such as these:
test <- c("ˈsɜːtnli", "ˈtwɛnti", "ˈfɒksi", "kɑːnt", "ʧeɪnʤd", "vɪkˈtɔːrɪə", "wɒznt", "ðeər", "dɪdnt",
"ˈdɪzni", "ˈəʊnli", "ˈfæbrɪks", "sɪˈkjʊərɪti", "ˈnjuːzˌpeɪpər", "ɑhɑː")
I'd like to match mono-syllabic words, i.e., words that contain a single vowel. My set of phonemic vowels is this:
vowel <- "iː|aɪ|ɔː|ɔɪ|əʊ|ɛə|eɪ|aʊ|eə|uː|ɑː|ɪə|ɜː|ʊə|ə|ɪ|ɒ|ʊ|ʌ|æ|e|ɑ|ɛ|i"
Using str_count and the vector vowel as pattern, I'm able to match a fairly good set of words:
library(stringr)
test[str_count(test, vowel) == 1]
[1] "kɑːnt" "ʧeɪnʤd" "wɒznt" "ðeər" "dɪdnt"
However, wɒznt and dɪdntcan be seen as bi-syllabic (as the nsound can replace a vowel so that nt counts as a second vowel). So the question is, how can I match mono-syllabic words except those that end in nt?
What I've tried so far is this set operation, which works well but looks clumsy:
setdiff(test[str_count(test, vowel) == 1], test[str_count(test, paste0("[^", vowel, "]nt$")) == 1])
[1] "kɑːnt" "ʧeɪnʤd" "ðeər"
I'd much rather have a single more concise regex. Any ideas?
You can use
test <- c("ˈsɜːtnli", "ˈtwɛnti", "ˈfɒksi", "kɑːnt", "ʧeɪnʤd", "vɪkˈtɔːrɪə", "wɒznt", "ðeər", "dɪdnt",
"ˈdɪzni", "ˈəʊnli", "ˈfæbrɪks", "sɪˈkjʊərɪti", "ˈnjuːzˌpeɪpər", "ɑhɑː")
vowel <- "iː|aɪ|ɔː|ɔɪ|əʊ|ɛə|eɪ|aʊ|eə|uː|ɑː|ɪə|ɜː|ʊə|ə|ɪ|ɒ|ʊ|ʌ|æ|e|ɑ|ɛ|i"
library(stringr)
p <- paste0("^(?!.*(?<!",vowel,")nt$)(?:(?!",vowel,").)*(?:",vowel,")(?:(?!",vowel,").)*$")
test[str_detect(test, p)]
## => [1] "kɑːnt" "ʧeɪnʤd" "ðeər"
See the online R demo. See the regex demo. The pattern means
^ - start of string
(?!.*(?<!",vowel,")nt$) - immediately to the right, there must not be any 0+ chars other than line break chars as many as possible followed with nt (not preceded with any of the specified vowel sound sequences) and end of string
(?:(?!",vowel,").)* - any char but a line break char, zero or more times as many as possible, that does not start a vowel char sequence
(?:",vowel,") - any of the specified vowel sound sequences
(?:(?!",vowel,").)* - any char but a line break char, zero or more times as many as possible, that does not start a vowel char sequence
$ - end of string.
This is a somewhat concise solution (thanks to #G5W for the decisive hint):
vowel_cc <- paste0(unique(unlist(strsplit(gsub("\\|", "", vowel), ""))), collapse = "")
vowel_cc
[1] "iːaɪɔəʊɛeuɑɜɒʌæ"
test[str_count(test, paste0(vowel, "|[^", vowel_cc, "]+nt$")) == 1]
[1] "kɑːnt" "ʧeɪnʤd" "ðeər"
This solution uses a vector vowel_cc consisting of all unique characters in vowels. These serve as input for a negated character class. The pattern specifies nt as one of the vowel alternatives on the condition that it be preceded by one or more non-vowel_ccs and occur at string end.

Longest substring in alphabetical order [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Write a program that prints the longest substring of s in which the letters occur in alphabetical order. For example, if s = 'azcbobobegghakl', then your program should print
Longest substring in alphabetical order is: beggh
In the case of ties, print the first substring. For example, if s = 'abcbcd', then your program should print
Longest substring in alphabetical order is: abc
Here you go edx student i've been helped to finish the code :
from itertools import count
def long_sub(input_string):
maxsubstr = input_string[0:0] # empty slice (to accept subclasses of str)
for start in range(len(input_string)): # O(n)
for end in count(start + len(maxsubstr) + 1): # O(m)
substr = input_string[start:end] # O(m)
if len(substr) != (end - start): # found duplicates or EOS
break
if sorted(substr) == list(substr):
maxsubstr = substr
return maxsubstr
sub = (long_sub(s))
print "Longest substring in alphabetical order is: %s" %sub
These are all assuming you have a string (s) and are needing to find the longest substring in alphabetical order.
Option A
test = s[0] # seed with first letter in string s
best = '' # empty var for keeping track of longest sequence
for n in range(1, len(s)): # have s[0] so compare to s[1]
if len(test) > len(best):
best = test
if s[n] >= s[n-1]:
test = test + s[n] # add s[1] to s[0] if greater or equal
else: # if not, do one of these options
test = s[n]
print "Longest substring in alphabetical order is:", best
Option B
maxSub, currentSub, previousChar = '', '', ''
for char in s:
if char >= previousChar:
currentSub = currentSub + char
if len(currentSub) > len(maxSub):
maxSub = currentSub
else: currentSub = char
previousChar = char
print maxSub
Option C
matches = []
current = [s[0]]
for index, character in enumerate(s[1:]):
if character >= s[index]: current.append(character)
else:
matches.append(current)
current = [character]
print "".join(max(matches, key=len))
Option D
def longest_ascending(s):
matches = []
current = [s[0]]
for index, character in enumerate(s[1:]):
if character >= s[index]:
current.append(character)
else:
matches.append(current)
current = [character]
matches.append(current)
return "".join(max(matches, key=len))
print(longest_ascending(s))
The following code solves the problem using the reduce method:
solution = ''
def check(substr, char):
global solution
last_char = substr[-1]
substr = (substr + char) if char >= last_char else char
if len(substr) > len(solution):
solution = substr
return substr
def get_largest(s):
global solution
solution = ''
reduce(check, list(s))
return solution

Stuck in an infinite loop in a function

I'm stuck in an infinite loop in this function:
let rec showGoatDoorSupport(userChoice, otherGuess, aGame) =
if( (userChoice != otherGuess) && (List.nth aGame otherGuess == "goat") ) then otherGuess
else showGoatDoorSupport(userChoice, (Random.int 3), aGame);;
And here's how I'm calling the function:
showGoatDoorSupport(1, 2, ["goat"; "goat"; "car"]);
In the first condition in the function, I compare the first 2 input parameters (1 and 2) if the are different, and if the item in the list at index "otherGuess" is not equal to "goat", I want to return that otherGuess.
Otherwise, I want to run the function again with a random number between 0-2 as the second input parameter.
The point is to keep trying to run the function until the second parameter doesnt equal the first, and that slot in the List isn't "goat", then return that slot number.
Don't use ==, it checks for physical equality. Use =. Two different strings will never be physically equal, even if they contain the same sequence of characters. (This is necessary, because strings are mutable in OCaml.)
$ ocaml
OCaml version 4.00.0
# "abc" == "abc";;
- : bool = false
# "abc" = "abc";;
- : bool = true
Another to do that is to use the String.compare. An example:
if String.compare str1 str2 = 0 then (* case equal *)
else (* case not equal *)

Resources