What's the time complexity to find all substrings? - recursion

I made this code to generate all unique substrings given a string, and I'm struggling to find the complexity of the code when I use recursion. I think a good complexity of time for this problem is O(N²), but what is my complexity and how can I improve my code?
'''
a b c d
ab ac ad bc bd cd
abc abd acd bcd
abcd
'''
dict_poss = {}
#get all letters
def func(string):
for i,value in enumerate(string):
dict_poss[value] = True
recursive(value,string[i+1:])
#get all combinations
def recursive(letter,string):
for i,value in enumerate(string):
if letter+value not in dict_poss:
dict_poss[letter+value] = True;
recursive(letter+value,string[i+1:])
return
func("abcd")
print(dict_poss)

From what you have written at the top there, you are trying to find all possible subsequences of the string, in which case this will be O(2^n). Think of the number of possible binary strings of length N, where you can construct a subsequence by the mask of each possible binary string (take a letter if 1, ignore it if 0).
If you want to find all possible substrings, this comes down to the implementation of strings in the language you're using. In c++, it's fairly trivial to do an n^2, but in java it would be O(n^3) since substring / concatenation is O(n) (Although you could do it in n^2 in java, just have to be tricky about what you do :)). Not sure what it is in, im guessing this is python (you should tag your question with the language you're using if you include code), but you could look it up. You could also time it with differently sized inputs, it wouldn't be hard to get a measurable runtime with a complexity of n^2.

Related

Count all possible computer programs of length N

I will make this easy to understand using Javascript as language
Using ascii as alphabet, and given all possible "statements" of length N formed using ascii, how many will be a valid JS program. Example for length 8
var i=0; //Is a string 8 characters long, and valid JS
jar i=0; //Is a string 8 characters long, but invalid JS
gfjsjhh3 //Is a string 8 characters long, but invalid JS
now imagine we have all possible strings 8 characters long.
How many will be valid JS?
Further rules:
1) variables are as short as possible
2) no blank spaces except where necessary
More formal definition of problem:
If we are given a grammar G for alphabet K, and also the collection of all possible combinations (statements) of length N that can be formed with K(or K* up to length N), how many of those statements will satisfy grammar G.
I know you are thinking this is some academic, away from reality stuff. However, if the number of statements that are programs is much less than the total number of combinations, you could use small "numbers" to refer to locations where these sparse programs appear in the sea of combinations, and be able to send this "address" instead of the whole program, greatly reducing payload
The grammar is usually only one aspect of validity checks, check out ES6 §5.3. And grammars are very poor to capture your “as short as possible” requirement. But grammars are a good tool to reason about things, so let's concentrate on that. It will still allow invalid programs, and programs with long variable names, but as a starting point for proof of concept (whatever you concept may be) it should suffice.
You might start by thinking about derivation trees based on some BNF grammar for JS. Something like sections §11 to §15 of the ES6 standard, but make sure that the grammar you use is going down to the character level, not treating a whole identifier as a single terminal. For your hypothetical compression scheem, if you encode the choice at each node of the tree, you have something like the compression effect you describe.
In order to count the number of programs of a given length, you could do dynamic programming on the BNF rules. So if you have a rule which says
IndexedMemberExpression ::= MemberExpression '[' Expression ']'
(loosely modeled after ES6 §12.3) then you know that an IndexedMemberExpression of length n is made up of a MemberExpression of length i and an Expression of length n − i − 2, for some 0 ≤ i ≤ n − 2, so if you know how many ways there are for those, you know for the whole expression. Have one array of length 1000 for each BNF non-terminal, fill them in order of increasing length, and you get the number of derivation trees (i.e. grammatically correct programs) for the root rule.
Actually an IndexedMemberExpression is just one kind of MemberExpression, and you'd want to sum all of those up. So it's more like this:
allocate an arry of size 1000 for each non-terminal,
and initialize all the elements to zero
for n from 0 to 1000:
…
# compute PrimaryExpression[n] first
# rule MemberExpression ::= PrimaryExpression
MemberExpression[n] = PrimaryExpression[n]
# rule MemberExpression ::= MemberExpression '[' Expression ']'
if n >= 2:
for i from 0 to n - 2:
MemberExpression[n] += MemberExpression[i] * Expression[n-i-2]
# rule MemberExpression ::= MemberExpression '.' IdentifierName
if n >= 1:
for i from 0 to n - 1:
MemberExpression[n] += MemberExpression[i] * IdentifierName[n-i-1]
…
Do this for all the rules, in an order which makes sure that you fully update each non-terminal on the left hand side of the increments before using it for the first time on the right hand side.
Note that for almost any character sequence of length n, the corresponding string literal will be a valid JS expression of length n + 2. So don't expect the number of valid programs to have a radically different asymptotic behavior compared to the number of arbitrary character sequences.

Divisibility function in SML

I've been struggling with the basics of functional programming lately. I started writing small functions in SML, so far so good. Although, there is one problem I can not solve. It's on Project Euler (https://projecteuler.net/problem=5) and it simply asks for the smallest natural number that is divisible from all the numbers from 1 - n (where n is the argument of the function I'm trying to build).
Searching for the solution, I've found that through prime factorization, you analyze all the numbers from 1 to 10, and then keep the numbers where the highest power on a prime number occurs (after performing the prime factorization). Then you multiply them and you have your result (eg for n = 10, that number is 2520).
Can you help me on implementing this to an SML function?
Thank you for your time!
Since coding is not a spectator sport, it wouldn't be helpful for me to give you a complete working program; you'd have no way to learn from it. Instead, I'll show you how to get started, and start breaking down the pieces a bit.
Now, Mark Dickinson is right in his comments above that your proposed approach is neither the simplest nor the most efficient; nonetheless, it's quite workable, and plenty efficient enough to solve the Project Euler problem. (I tried it; the resulting program completed instantly.) So, I'll go with it.
To start with, if we're going to be operating on the prime decompositions of positive integers (that is: the results of factorizing them), we need to figure out how we're going to represent these decompositions. This isn't difficult, but it's very helpful to lay out all the details explicitly, so that when we write the functions that use them, we know exactly what assumptions we can make, what requirements we need to satisfy, and so on. (I can't tell you how many times I've seen code-writing attempts where different parts of the program disagree about what the data should look like, because the exact easiest form for one function to work with was a bit different from the exact easiest form for a different function to work with, and it was all done in an ad hoc way without really planning.)
You seem to have in mind an approach where a prime decomposition is a product of primes to the power of exponents: for example, 12 = 22 × 31. The simplest way to represent that in Standard ML is as a list of pairs: [(2,2),(3,1)]. But we should be a bit more precise than this; for example, we don't want 12 to sometimes be [(2,2),(3,1)] and sometimes [(3,1),(2,2)] and sometimes [(3,1),(5,0),(2,2)]. So, we can say something like "The prime decomposition of a positive integer is represented as a list of prime–exponent pairs, with the primes all being positive primes (2,3,5,7,…), the exponents all being positive integers (1,2,3,…), and the primes all being distinct and arranged in increasing order." This ensures a unique, easy-to-work-with representation. (N.B. 1 is represented by the empty list, nil.)
By the way, I should mention — when I tried this out, I found that everything was a little bit simpler if instead of storing exponents explicitly, I just repeated each prime the appropriate number of times, e.g. [2,2,3] for 12 = 2 × 2 × 3. (There was no single big complication with storing exponents explicitly, it just made a lot of little things a bit more finicky.) But the below breakdown is at a high level, and applies equally to either representation.
So, the overall algorithm is as follows:
Generate a list of the integers from 1 to 10, or 1 to 20.
This part is optional; you can just write the list by hand, if you want, so as to jump into the meatier part faster. But since your goal is to learn the basics of functional programming, you might as well do this using List.tabulate [documentation].
Use this to generate a list of the prime decompositions of these integers.
Specifically: you'll want to write a factorize or decompose function that takes a positive integer and returns its prime decomposition. You can then use map, a.k.a. List.map [documentation], to apply this function to each element of your list of integers.
Note that this decompose function will need to keep track of the "next" prime as it's factoring the integer. In some languages, you would use a mutable local variable for this; but in Standard ML, the normal approach is to write a recursive helper function with a parameter for this purpose. Specifically, you can write a function helper such that, if n and p are positive integers, p ≥ 2, where n is not divisible by any prime less than p, then helper n p is the prime decomposition of n. Then you just write
local
fun helper n p = ...
in
fun decompose n = helper n 2
end
Use this to generate the prime decomposition of the least common multiple of these integers.
To start with, you'll probably want to write a lcmTwoDecompositions function that takes a pair of prime decompositions, and computes the least common multiple (still in prime-decomposition form). (Writing this pairwise function is much, much easier than trying to create a multi-way least-common-multiple function from scratch.)
Using lcmTwoDecompositions, you can then use foldl or foldr, a.k.a. List.foldl or List.foldr [documentation], to create a function that takes a list of zero or more prime decompositions instead of just a pair. This makes use of the fact that the least common multiple of { n1, n2, …, nN } is lcm(n1, lcm(n2, lcm(…, lcm(nN, 1)…))). (This is a variant of what Mark Dickinson mentions above.)
Use this to compute the least common multiple of these integers.
This just requires a recompose function that takes a prime decomposition and computes the corresponding integer.

How to represent Graham's number in programming language (specifically python)?

I'm new to programming, and I would like to know how to represent Graham's number in python (the language I decided to learn as my first). I can show someone in real life on paper how to somewhat get to Graham's number, but for anyone who doesn't know what it is, here it is.
So imagine you have a 3. Now I can represent powers of 3 using ^ (up arrow). So 3^3 = 27. 3^^3 = 3^3^3 = 3^(3^3) = 3^27 = 7625597484987. 3^^^3 = 3^7625597484987 = scary big number. Now we can see that every time you add an up arrow, the number gets massively big.
Now imagine 3^^^^3 = 3^(3^7625597484987)... this number is stupid big.
So now that we have 3^(3^7625597484987), we will call this G1. That's 3^^^^3 (4 arrows in between the 3's).
Now G2 is basically 3 with G1 number of arrows in between them. Whatever number 3^(3^7625597484987) is, is the number of arrows in between the 2 3's of G2. So this number has 3^(3^7625597484987) number of arrows.
Now G3 has G2 number of arrows in between the 2 3's. We can see that this number (G3) is just huge. Stupidly huge. So each G has the number of arrows represented by the G before the current G.
Do this over and over again, placing and previous number of "G" arrows into the next number. Do this until G64, THAT'S Graham's number.
Now here is my question. How do you represent "the number of a certain thing (in this case arrows) is the number of arrows in the next G"? How do you represent the number of something "goes into" the next string in programming. If I can't do it in python, please post which languages this would be possible in. Thanks for any responses.
This is an example of a recursively defined number, which can be expressed using a base case and a recursive case. These two cases capture the idea of "every output is calculated from a previous output following the rule '____,' except for the first one which is ____."
A simple canonical one would be the factorial, which uses the base case 1! = 1 (or sometimes 0! = 1) and the recursive case n! = n * (n-1)! (when n>1). Or, in a more code-like fashion, f(1) = 1 and f(n) = n * f(n-1). For more information on how to write a function that behaves like this, find a good resource to learn about recursion.
The G1, G2, G3, etc. in the definition of Graham's number are like these calls to the previous results. You need a function to represent an "arrow" so you can call an arbitrary number of them. So to translate your mental model into code:
"[...] we will call this [3^^^^3] G1. [...] 'The number of a certain thing (in this case arrows) is the number of arrows in the next G' [...]"
Let the aforementioned function be arrow(n) = 3^^^...^3 (where there are n arrows). Then to get Graham's number, your base case will be G(1) = arrow(4), and your recursive case will be G(n) = arrow(G(n-1)) (when n>1).
Note that arrow(n) will also have a recursive definition (see Knuth's up-arrow notation), as you showed but didn't describe, by calculating each output from the previous (emphasis added):
3^3 = 27.
3^^3 = 3^3^3 = 3^(3^3) = 3^27 = 7625597484987.
3^^^3 = 3^7625597484987 = scary big number.
You might describe this as "every output [arrow(n)] is 3 to the power of the previous output, except for the first one [arrow(1)] which is just 3." I'll leave the translation from description to definition as an exercise.
(Also, I hope you didn't actually want to represent Graham's number, but rather just its calculation, since it's too big even for Python or any computer for that matter.)
class GrahamsNumber(object):
pass
G = GrahamsNumber()
For most purposes, this is as good of a representation of Graham's number as any other one. Sure, you can't do anything useful with G, but for the most part you can't do anything useful with Graham's number either, so it accurately reflects realistic use cases.
If you do have some specific use cases (and they're feasible), then a representation can be tailored to allow those use cases; but we can't guess what you want to be able to do with G, you have to spell it out explicitly.
For fun, I implemented hyperoperators in python as the module hyperop and one of my examples is Graham's number:
def GrahamsNumber():
# This may take awhile...
g = 4
for n in range(1,64+1):
g = hyperop(g+2)(3,3)
return g
The magic is in the recursion, which loosely stated looks like H[n](x,y) = reduce(lambda x,y: H[n-1](y,x), [a,]*b).
To answer your question, this function (in theory) will calculate the number in question. Since there is no way this program will ever finish before the heat death of the Universe, you gain more of an understanding from the "function" itself than the actual value.

Recursive definition of set strings over {a,b} that contains one b and even number of a's before the first b

I'm working on a problem from the Languages and Machines: An Introduction to the Theory of Computer Science (3rd Edition) in Chapter 2 Example 6.
I need help finding the answer of:
Recursive definition of set strings over {a,b} that contains one b and even number of a's before the first b?
When looking for a recursive definition, start by identifying the base cases and then look for the recursive steps - like you're doing induction. What are the smallest strings in this language? Well, any string must have a b. Is b alone a string in the language? Why yes it is, since there are zero as that come before it and zero is an even number.
Rule 1: b is in L.
Now, given any string in the language, how can we get more strings? Well, we can apparently add any number of as to the end of the string and get another string in the language. In fact, we can get all such strings from b if we simply allow you to add one more a to the end of a string in the language. From x in L, we therefore recover xa, xaa, ..., xa^n, ... = xa*.
Rule 2: if x is in L then xa is in L.
Finally, what can we do to the beginning of strings in our language? The number of as must be even. So far, rules 1 and 2 only allow us to construct strings that have zero as before the b. We should be able to get two, four, six, etc., all the even numbers, of as. A rule that lets us add two as to any string in our language will let us add ever more as to the beginning while maintaining the evenness property we require. Starting with x in L, we recover aax, aaaax, ..., (aa)^(2n)x, ... = (aa)*x.
Rule 3: if x is in L, then aax is in L.
Optionally, you may add the sometimes implicitly understood rule that only those things allowed by the aforementioned rules are allowed. Otherwise, technically anything is allowed since we haven't explicitly disallowed anything yet.
Rule 4: Nothing is in L unless by virtue of some combination of the rules 1, 2 and/or 3 above.

modifying an element of a list in-place in J, can it be done?

I have been playing with an implementation of lookandsay (OEIS A005150) in J. I have made two versions, both very simple, using while. type control structures. One recurs, the other loops. Because I am compulsive, I started running comparative timing on the versions.
look and say is the sequence 1 11 21 1211 111221 that s, one one, two ones, etc.
For early elements of the list (up to around 20) the looping version wins, but only by a tiny amount. Timings around 30 cause the recursive version to win, by a large enough amount that the recursive version might be preferred if the stack space were adequate to support it. I looked at why, and I believe that it has to do with handling intermediate results. The 30th number in the sequence has 5808 digits. (32nd number, 9898 digits, 34th, 16774.)
When you are doing the problem with recursion, you can hold the intermediate results in the recursive call, and the unstacking at the end builds the results so that there is minimal handling of the results.
In the list version, you need a variable to hold the result. Every loop iteration causes you to need to add two elements to the result.
The problem, as I see it, is that I can't find any way in J to modify an extant array without completely reassigning it. So I am saying
try. o =. o,e,(0&{y) catch. o =. e,(0&{y) end.
to put an element into o where o might not have a value when we start. That may be notably slower than
o =. i.0
.
.
.
o =. (,o),e,(0&{y)
The point is that the result gets the wrong shape without the ravels, or so it seems. It is inheriting a shape from i.0 somehow.
But even functions like } amend don't modify a list, they return a list that has a modification made to it, and if you want to save the list you need to assign it. As the size of the assigned list increases (as you walk the the number from the beginning to the end making the next number) the assignment seems to take more time and more time. This assignment is really the only thing I can see that would make element 32, 9898 digits, take less time in the recursive version while element 20 (408 digits) takes less time in the loopy version.
The recursive version builds the return with:
e,(0&{y),(,lookandsay e }. y)
The above line is both the return line from the function and the recursion, so the whole return vector gets built at once as the call gets to the end of the string and everything unstacks.
In APL I thought that one could say something on the order of:
a[1+rho a] <- new element
But when I try this in NARS2000 I find that it causes an index error. I don't have access to any other APL, I might be remembering this idiom from APL Plus, I doubt it worked this way in APL\360 or APL\1130. I might be misremembering it completely.
I can find no way to do that in J. It might be that there is no way to do that, but the next thought is to pre-allocate an array that could hold results, and to change individual entries. I see no way to do that either - that is, J does not seem to support the APL idiom:
a<- iota 5
a[3] <- -1
Is this one of those side effect things that is disallowed because of language purity?
Does the interpreter recognize a=. a,foo or some of its variants as a thing that it should fastpath to a[>:#a]=.foo internally?
This is the recursive version, just for the heck of it. I have tried a bunch of different versions and I believe that the longer the program, the slower, and generally, the more complex, the slower. Generally, the program can be chained so that if you want the nth number you can do lookandsay^: n ] y. I have tried a number of optimizations, but the problem I have is that I can't tell what environment I am sending my output into. If I could tell that I was sending it to the next iteration of the program I would send it as an array of digits rather than as a big number.
I also suspect that if I could figure out how to make a tacit version of the code, it would run faster, based on my finding that when I add something to the code that should make it shorter, it runs longer.
lookandsay=: 3 : 0
if. 0 = # ,y do. return. end. NB. return on empty argument
if. 1 ~: ##$ y do. NB. convert rank 0 argument to list of digits
y =. (10&#.^:_1) x: y
f =. 1
assert. 1 = ##$ y NB. the converted argument must be rank 1
else.
NB. yw =. y
f =. 0
end.
NB. e should be a count of the digits that match the leading digit.
e=.+/*./\y=0&{y
if. f do.
o=. e,(0&{y),(,lookandsay e }. y)
assert. e = 0&{ o
10&#. x: o
return.
else.
e,(0&{y),(,lookandsay e }. y)
return.
end.
)
I was interested in the characteristics of the numbers produced. I found that if you start with a 1, the numerals never get higher than 3. If you start with a numeral higher than 3, it will survive as a singleton, and you can also get a number into the generated numbers by starting with something like 888888888 which will generate a number with one 9 in it and a single 8 at the end of the number. But other than the singletons, no digit gets higher than 3.
Edit:
I did some more measuring. I had originally written the program to accept either a vector or a scalar, the idea being that internally I'd work with a vector. I had thought about passing a vector from one layer of code to the other, and I still might using a left argument to control code. With I pass the top level a vector the code runs enormously faster, so my guess is that most of the cpu is being eaten by converting very long numbers from vectors to digits. The recursive routine always passes down a vector when it recurs which might be why it is almost as fast as the loop.
That does not change my question.
I have an answer for this which I can't post for three hours. I will post it then, please don't do a ton of research to answer it.
assignments like
arr=. 'z' 15} arr
are executed in place. (See JWiki article for other supported in-place operations)
Interpreter determines that only small portion of arr is updated and does not create entire new list to reassign.
What happens in your case is not that array is being reassigned, but that it grows many times in small increments, causing memory allocation and reallocation.
If you preallocate (by assigning it some large chunk of data), then you can modify it with } without too much penalty.
After I asked this question, to be honest, I lost track of this web site.
Yes, the answer is that the language has no form that means "update in place, but if you use two forms
x =: x , most anything
or
x =: most anything } x
then the interpreter recognizes those as special and does update in place unless it can't. There are a number of other specials recognized by the interpreter, like:
199(1000&|#^)199
That combined operation is modular exponentiation. It never calculates the whole exponentiation, as
199(1000&|^)199
would - that just ends as _ without the #.
So it is worth reading the article on specials. I will mark someone else's answer up.
The link that sverre provided above ( http://www.jsoftware.com/jwiki/Essays/In-Place%20Operations ) shows the various operations that support modifying an existing array rather than creating a new one. They include:
myarray=: myarray,'blah'
If you are interested in a tacit version of the lookandsay sequence see this submission to RosettaCode:
las=: ,#((# , {.);.1~ 1 , 2 ~:/\ ])&.(10x&#.inv)#]^:(1+i.#[)
5 las 1
11 21 1211 111221 312211

Resources