How to determine the time-complexity of a simple palindrome R code? - r

I am pretty new to calculating time-complexity of an algorithm or a code, so I'm not sure what will be the complexity of this next function:
isPalindrome <- function(num){
if(num < 0) return(F)
rev <- 0
orig_num <- num
while(num != 0){
pop <- num %% 10
num <- num %/% 10
rev <- rev*10 + pop
}
if(orig_num == rev) return(T)
else return(F)
}
And calling the function, e.g. isPalindrome(122221) will return TRUE.
The basic idea is that a reverse number is being calculated and then compared against the original number, if they are equal then it is a palindrome.
My basic intuition was that in order to calculate the reverse number the while loop will go through every digit, so e.g for a 4 digit number like 1221 there will be 4 actions to be made (with some execution time to complete each), and so if my number becomes 2 times larger with respect to its digits, e.g 12222221 then I will need 8 actions to be made. Then, my input grew by 2 and time also grew by 2, so the time-complexity should be O(n). Is this correct?

Your intuition is right: your algorithm will run in O(n) with respect to the digits of the number. That is to say, comparing it to the size of the number, an O(floor(log10(n)) + 1), which is is the number-counting function.

Related

R Precision for Double - Why code returns negative why positive outcome expected?

I am testing 2 ways of calculating Prod(b-a), where a and b are vectors of length n. Prod(b-a)=(b1-a1)(b2-a2)(b3-a3)*... (bn-an), where b_i>a_i>0 for all i=1,2,3, n. For some special cases, another way (Method 2) of calculation this prod(b-a) is more efficient. It uses the following formula, which is to expand the terms and sum them:
Here is my question is: When it happens that a_i very close to b_i, the true outcome could be very, very close 0, something like 10^(-16). Method 1 (substract and Multiply) always returns positive output. Method 2 of using the formula some times return negative output ( about 7~8% of time returning negative for my experiment). Mathematically, these 2 methods should return exactly the same output. But in computer language, it apparently produces different outputs.
Here are my codes to run the test. When I run the testing code for 10000 times, about 7~8% of my runs for method 2 returns negative output. According to the official document, the R double has the precision of "2.225074e-308" as indicated by R parameter: ".Machine$double.xmin". Why it's getting into the negative values when the differences are between 10^(-16) ~ 10^(-18)? Any help that sheds light on this will be apprecaited. I would also love some suggestions concerning how to practically increase the precision to higher level as indicated by R document.
########## Testing code 1.
ftest1case<-function(a,b) {
n<-length(a)
if (length(b)!=n) stop("--------- length a and b are not right.")
if ( any(b<a) ) stop("---------- b has to be greater than a all the time.")
out1<-prod(b-a)
out2<-0
N<-2^n
for ( i in 1:N ) {
tidx<-rev(as.integer(intToBits(x=i-1))[1:n])
tsign<-ifelse( (sum(tidx)%%2)==0,1.0,-1.0)
out2<-out2+tsign*prod(b[tidx==0])*prod(a[tidx==1])
}
c(out1,out2)
}
########## Testing code 2.
ftestManyCases<-function(N,printFreq=1000,smallNum=10^(-20))
{
tt<-matrix(0,nrow=N,ncol=2)
n<-12
for ( i in 1:N) {
a<-runif(n,0,1)
b<-a+runif(n,0,1)*0.1
tt[i,]<-ftest1case(a=a,b=b)
if ( (i%%printFreq)==0 ) cat("----- i = ",i,"\n")
if ( tt[i,2]< smallNum ) cat("------ i = ",i, " ---- Negative summation found.\n")
}
tout<-apply(tt,2,FUN=function(x) { round(sum(x<smallNum)/N,6) } )
names(tout)<-c("PerLess0_Method1","PerLee0_Method2")
list(summary=tout, data=tt)
}
######## Step 1. Test for 1 case.
n<-12
a<-runif(n,0,1)
b<-a+runif(n,0,1)*0.1
ftest1case(a=a,b=b)
######## Step 2 Test Code 2 for multiple cases.
N<-300
tt<-ftestManyCases(N=N,printFreq = 100)
tt[[1]]
It's hard for me to imagine when an algorithm that consists of generating 2^n permutations and adding them up is going to be more efficient than a straightforward product of differences, but I'll take your word for it that there are some special cases where it is.
As suggested in comments, the root of your problem is the accumulation of floating-point errors when adding values of different magnitudes; see here for an R-specific question about floating point and here for the generic explanation.
First, a simplified example:
n <- 12
set.seed(1001)
a <- runif(a,0,1)
b <- a + 0.01
prod(a-b) ## 1e-24
out2 <- 0
N <- 2^n
out2v <- numeric(N)
for ( i in 1:N ) {
tidx <- rev(as.integer(intToBits(x=i-1))[1:n])
tsign <- ifelse( (sum(tidx)%%2)==0,1.0,-1.0)
j <- as.logical(tidx)
out2v[i] <- tsign*prod(b[!j])*prod(a[j])
}
sum(out2v) ## -2.011703e-21
Using extended precision (with 1000 bits of precision) to check that the simple/brute force calculation is more reliable:
library(Rmpfr)
a_m <- mpfr(a, 1000)
b_m <- mpfr(b, 1000)
prod(a_m-b_m)
## 1.00000000000000857647286522936696473705868726043995807429578968484409120647055193862325070279593735821154440625984047036486664599510856317884962563644275433171621778761377125514191564456600405460403870124263023336542598111475858881830547350667868450934867675523340703947491662460873009229537576817962228e-24
This proves the point in this case, but in general doing extended-precision arithmetic will probably kill any performance gains you would get.
Redoing the permutation-based calculation with mpfr values (using out2 <- mpfr(0, 1000), and going back to the out2 <- out2 + ... running summation rather than accumulating the values in a vector and calling sum()) gives an accurate answer (at least to the first 20 or so digits, I didn't check farther), but takes 6.5 seconds on my machine (instead of 0.03 seconds when using regular floating-point).
Why is this calculation problematic? First, note the difference between .Machine$double.xmin (approx 2e-308), which is the smallest floating-point value that the system can store, and .Machine$double.eps (approx 2e-16), which is the smallest value such that 1+x > x, i.e. the smallest relative value that can be added without catastrophic cancellation (values a little bit bigger than this magnitude will experience severe, but not catastrophic, cancellation).
Now look at the distribution of values in out2v, the series of values in out2v:
hist(out2v)
There are clusters of negative and positive numbers of similar magnitude. If our summation happens to add a bunch of values that almost cancel (so that the result is very close to 0), then add that to another value that is not nearly zero, we'll get bad cancellation.
It's entirely possible that there's a way to rearrange this calculation so that bad cancellation doesn't happen, but I couldn't think of one easily.

R stops working and arrows disappearing with prime check function

prime <- function(number){
if (number!=2){
for (num in 1:number){
while ((number%%num)==0){
counter <- 0
counter <- counter+1
}
}
return((counter-2)==0)
}else{
FALSE
}
}
My function was designed for prime test, prime numbers only divided by itself and 1. So I've looped all the numbers from 1 to n(number itself) and counted the number of the 0 remainder divisions. Result must be 2 (n/n and n/1 remainders are 0) so (counter-2)==0 returns TRUE if the number is the prime number. Only exception is 2. But my code doesn't working also stops the RStudio. Code line arrows disappearing, R stops return any value.
What is wrong with this code?
I don't think you need a loop for this. Also, I'm not sure what you mean by the line arrow disappears, but this code works for me:
is.prime <- function(x){
if( x == 2) return(TRUE)
sum(x %% 1:x == 0) == 2
}
is.prime(2)
#> [1] TRUE
is.prime(10)
#> [1] FALSE
is.prime(11)
#> [1] TRUE
There are at least 3 problems with your code:
You are arbitrarily defining 2 as being non-prime. It is prime and there is no reason to treat it as a special case.
You are constantly resetting counter to 0. It should be initialized just once outside of the for loop
Your while loop is an infinite loop. If (number%%num)==0 then nothing in the body of the loop will make that false. This causes the loop to be an infinite loop, which is why RStudio hangs when you run your code. The fix is to change this loop into something which is not a loop at all -- it is really an if statement that you need.
There was another thing which isn't incorrect but is somewhat awkward: using (counter - 2) == 0 to test if counter == 2.
Fixing these problems leads to the following code:
prime <- function(number) {
counter <- 0
for (num in 1:number) {
if ((number %% num) == 0) {
counter <- counter + 1
}
}
counter == 2
}
This succeeds in correctly testing for primes. Note that it is an extremely inefficient test. Your test would use 1,000,000 remainder operations to classify 1,000,000 as nonprime when surely one operation would suffice -- just divide by 2. To make it more efficient you could exploit the fact that a number is either prime or it is divisible by a number no bigger than its square root. For an odd number, you can test if it is prime by checking if it has an odd divisor less that or equal to its square root. This would allow you to check if a number in the vicinity of 1,000,000 is prime with at most 500 remainder operations rather than the 1,000,000 that your approach would use.

R poisson simulation

I need to write a function which calculates the number of arrivals until time t in n trials. And the arguments should be the lambda (which I can assume to be between 0.1 and 1), the time (I can assume to be less than or equal to 1) and the number of counts to be sampled.
I've previously written a function which takes a vector of length n which has the first n-1 elements as inter-event times and the nth element as the time t, and it counts the number of events which occur before t.
inp <- readline(prompt="Input vector with each element seperated by a space")
inp <- strsplit(inp," ")
inp <- as.integer(as.vector(inp[[1]]))
t <- tail(inp, n=1)
c.e <- function(x) {
inp = x
stopped = NA
for (i in seq_along(inp)) {
runsum <- sum(inp[1:i])
cat("The sum of the", i, "first elements is", runsum, "\n")
if (runsum > tail(inp, 1)) {
stopped = i - 1
break()
}
}
stopped
}
cat(c.e(inp), "events occur in", t, "time units")
(e.g. inputting 1 2 3 4 7 would output that 3 events occur in 6 time units)
I think I need to use and possibly edit this function in order to get it to do what I need it do, but I'm really not sure how to do this. Any help would be appreciated :)
You could edit this in to your old function if you want something like the printing and the nice inputs/outputs that you've got, but as for what you've actually asked for, it seems like all that you need is counts<-function(lambda, t, n) sum(rpois(n, lambda) < t).
rpois generates the trial results, with parameters n (the number of trials) and lambda, we then compare them with t (time) and sum our results.

how many unique powers are for x^y for x in 1-1000 and y in 1-1000 using R

Using R, calculate for x and y be integers ∈ [1, 1000], How many unique powers, x^y exist.
This is what I have right now, just don't know how to eliminate the duplicate numbers,
x<-1:1000
y<-1:1000
for (i in x)
{
for (j in y){
print(i^j)
}
}
A combinatorial approach to this could split the numbers from 1-1000 into equivalence classes where each number in the class is the power of some other number. For instance, we would split the numbers 1-10 into (1), (2, 4, 8), (3, 9), (5), (6), (7), (10). None of the powers of values between equivalence classes will coincide, so we can just handle each equivalence class separately.
num.unique.comb <- function(limit) {
# Count number of powers in each equivalence class (labeled by lowest val)
num.powers <- rep(0, limit)
# Handle 1 as special case
num.powers[1] <- 1
# Beyond sqrt(limit), all unhandled numbers are in own equivalence class
handled <- c(T, rep(F, limit-1))
for (base in 2:ceiling(sqrt(limit))) {
if (!handled[base]) {
# Handle all the values in 1:limit that are powers of base
num.handle <- floor(log(limit, base))
handled[base^(1:num.handle)] <- T
# Compute the powers of base that we cover
num.powers[base] <- length(unique(as.vector(outer(1:num.handle, 1:limit))))
}
}
num.powers[!handled] <- limit
# Handle sums too big for standard numeric types
library(gmp)
print(sum(as.bigz(num.powers)))
}
num.unique.comb(10)
# [1] 76
num.unique.comb(1000)
# [1] 978318
One nice property of this combinatorial approach is that it's very fast compared to a brute-force approach. For instance, it takes less than 0.1 seconds to compute with limit set to 1000. This allows us to compute the result for much larger values:
# ~0.15 seconds
num.unique.comb(10000)
# [1] 99357483
# ~4 seconds
num.unique.comb(100000)
# [1] 9981335940
# ~220 seconds
num.unique.comb(1000000)
# [1] 999439867182
This is a pretty neat result -- in under 4 minutes we can compute the number of unique values within 1 trillion numbers, where each number can have up to 6 million digits!
Update: Based on this combinatorial code I've updated the OEIS entry for this sequence to include terms up to 10,000.
A brute-force approach would be to just compute all the powers and count the number of unique values:
num.unique.bf <- function(limit) {
length(unique(as.vector(sapply(1:limit, function(x) x^(1:limit)))))
}
num.unique.bf(10)
# [1] 76
A problem with this brute-force analysis is that you are dealing with large numbers that will create numerical issues. For instance:
1000^1000
# [1] Inf
As a result we get an inaccurate value:
# Wrong due to numerical issues!
num.unique.bf(1000)
# [1] 119117
However, a package like the gmp can enable us to compute even numbers as large as 1000^1000. My computer has trouble storing all 1 million numbers in memory at once, so I'll write them to a file (size for n=1000 is 1.2 GB on my computer) and then compute the number of unique values in that file:
library(gmp)
num.unique.bf2 <- function(limit) {
sink("foo.txt")
for (x in 1:limit) {
vals <- as.bigz(x)^(1:limit)
for (idx in 1:limit) {
cat(paste0(as.character(vals[idx]), "\n"))
}
}
sink()
as.numeric(system("sort foo.txt | uniq | wc -l", intern=T))
}
num.unique.bf2(10)
# [1] 76
num.unique.bf2(1000)
# [1] 978318
A quick visit to the OEIS (click the link for the first 1000 values) shows that this is correct. This approach is rather slow (roughly 40 minutes on my computer), and combinatorial approaches should be significantly faster.

Why do I get different answers for these two algorithms in R?

This is quite literally the first problem in Project Euler. I created these two algorithms to solve it, but they each yield different answers. Basically, the job is to write a program that sums all the products of 3 and 5 that are under 1000.
Here is the correct one:
divisors<-0
for (i in 1:999){
if ((i %% 3 == 0) || (i %% 5 == 0)){
divisors <- divisors+i
}
}
The answer it yields is 233168
Here is the wrong one:
divisors<-0
for (i in 1:999){
if (i %% 3 == 0){
divisors <- divisors + i
}
if (i %% 5 == 0){
divisors <- divisors + i
}
}
This gives the answer 266333
Can anyone tell me why these two give different answers? The first is correct, and obviously the simpler solution. But I want to know why the second one isn't correct.
EDIT: fudged the second answer on accident.
Because multiples of 15 will add i once in the first code sample and twice in the second code sample. Multiples of 15 are multiples of both 3 and 5.
To make them functionally identical, the second would have to be something like:
divisors<-0
for (i in 1:999) {
if (i %% 3 == 0) {
divisors <- divisors + i
} else {
if (i %% 5 == 0) {
divisors <- divisors + i
}
}
}
But, to be honest, your first sample seems far more logical to me.
As an aside (and moot now that you've edited it), I'm also guessing that your second output value of 26633 is a typo. Unless R wraps integers around at some point, I'd expect it to be more than the first example (such as the value 266333 which I get from a similar C program, so I'm assuming you accidentally left of a 3).
I don't know R very well, but right off the bat, I see a potential problem.
In your first code block, the if statement is true if either of the conditions are true. Your second block runs the if statement twice if both conditions are met.
Consider the number 15. In your first code block, the if statement will trigger once, but in the second, both if statements will trigger, which is probably not what you want.
I can tell you exactly why that's incorrect, conceptually.
Take the summation of all integers to 333 and multiply is by 3, you'll get x
Take the summation of all integers to 200 and multiply it by 5, you'll get y
Take the summation of all integers to 66 and multiply it by 15, you'll get z
x + y = 266333
x + y - z = 233168
15 is divisible by both 3 and 5. You've counted all multiples of 15 twice.

Resources