A confusion about `ifelse` - r

There is a confusion about ifelse.I hope someone can help explain.
Consider code below:
x1 = c(1,4,3)
y1 = c(2,3,5)
# 1
> ifelse(x1 > y1, x1^2 + y1^2,y1)
[1] 2 25 5
# 2
> ifelse(x1 > y1, sum(x1),y1)
[1] 2 8 5
# from #1 I guess second element should be sum(x1) == sum(x1[2]) == sum(4)
Why?
Update:
After reading the book -- The Art of R Programming, I solve my problem.
ifelse(b,u,v) where b is a Boolean vector, and u and v are vectors.
The return value is itself a vector; element i is u[i] if b[i] is true, or v[i] if b[i] is false
So
ifelse(x1 > y1, sum(x1),y1) == ifelse(x1 > y1, c(sum(x1),sum(x1),sum(x1)),c(2,3,5)) # by recycling
# then b = c(T,F,T), u = c(8,8,8), v = c(2,3,5)
# therefore output would be (v[1],u[2],v[3]), i.e.
# [1] 2 8 5

sum(x1)=8 is obvious since 1+4+3=8. Now you might wonder why ifelse seems to evaluate expressions differently: It is not, it is just that ^2 cannot be applied to a vector (whats a vector squared?) so it is applying element wise. you can however apply sum() to a vector, which happens in the second evaluation. try ifelse(x1 > y1, x1,y1)

Related

How to add possible divisor numbers?

How do I retrieve maximum sum of possible divisors numbers
I have a below function which will give possible divisors of number
Code
divisors <- function(x) {
y <- seq_len(ceiling(x / 2))
y[x %% y == 0]
}
Example
Divisors of 99 will give the below possible values.
divisors(99)
[1] 1 3 9 11 33
My expected Logic :
Go from last digit to first digit in the divisors value
The last number is 33, Here next immediate number divisible by 33 is 11 . So I selected 11 , now traversing from 11 the next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
33 + 11 + 1 = 45
Move to next number 11, Now next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
11 + 1 = 12
Here immediate
Move to next number 9, Now next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
9 + 3 + 1 = 13
Move to next number 3, Now next immediate number divisible by 3 is 1. So selected 1. Now add all the numbers.
3+1=4
Now maximum among these is 45.
Now I am struggling to write this logic in R . Help / Advice much appreciated.
Note : Prime numbers can be ignored.
update
For large integers, e.g., the maximum integer .Machine$integer.max (prime number), you can run the code below (note that I modified functions divisors and f a bit)
divisors <- function(x) {
y <- seq(x / 2)
y[as.integer(x) %% y == 0]
}
f <- function(y) {
if (length(y) <= 2) {
return(as.integer(sum(y)))
}
l <- length(y)
h <- y[l]
yy <- y[-l]
h + f(yy[h %% yy == 0])
}
and you will see
> n <- .Machine$integer.max - 1
> x <- divisors(n)
> max(sapply(length(x):2, function(k) f(head(x, k))))
[1] 1569603656
You can define a recursive function f that gives successive divisors
f <- function(y) {
if (length(y) == 1) {
return(y)
}
h <- y[length(y)]
yy <- y[-length(y)]
c(f(yy[h %% yy == 0]), h)
}
and you will see all possible successive divisor tuples
> sapply(rev(seq_along(x)), function(k) f(head(x, k)))
[[1]]
[1] 1 11 33
[[2]]
[1] 1 11
[[3]]
[1] 1 3 9
[[4]]
[1] 1 3
[[5]]
[1] 1
Then, we apply f within sapply like below
> max(sapply(rev(seq_along(x)), function(k) sum(f(head(x, k)))))
[1] 45
which gives the desired output.
You can also use the following solution. It may sound a little bit complicated and of course there is always an easier, more efficient solution. However, I thought this could be useful to you. I will take it from your divisors output:
> x
[1] 1 3 9 11 33
# First I created a list whose first element is our original x and from then on
# I subset the first element till the last element of the list
lst <- lapply(0:(length(x)-1), function(a) x[1:(length(x)-a)])
> lst
[[1]]
[1] 1 3 9 11 33
[[2]]
[1] 1 3 9 11
[[3]]
[1] 1 3 9
[[4]]
[1] 1 3
[[5]]
[1] 1
Then I wrote a custom function in order to implement your conditions and gather your desired output. For this purpose I created a function factory which in fact is a function that creates a function:
As you might have noticed the outermost function does not take any argument. It only sets up an empty vector out to save our desired elements in. It is created in the execution environment of the outermost function to shield it from any changes that might affect it in the global environment
The inner function is the one that takes our vector x so in general we call the whole setup like fnf()(x). First element of of our out vector is in fact the first element of the original x(33). Then I found all divisors of the first element whose quotient were 0. After I fount them I took the second element (11) as the first one was (33) and stored it in our out vector. Then I modified the original x vector and omitted the max value (33) and repeated the same process
Since we were going to repeat the process over again, I thought this might be a good case to use recursion. Recursion is a programming technique that a function actually calls itself from its body or from inside itself. As you might have noticed I used fn inside the function to repeat the process again but each time with one fewer value
This may sound a bit complicated but I believed there may be some good points for you to pick up for future exploration, since I found them very useful, hoped that's the case for you too.
fnf <- function() {
out <- c()
fn <- function(x) {
out <<- c(out, x[1])
z <- x[out[length(out)]%%x == 0]
if(length(z) >= 2) {
out[length(out) + 1] <<- z[2]
} else {
return(out)
}
x <- x[!duplicated(x)][which(x[!duplicated(x)] == z[2]):length(x[!duplicated(x)])]
fn(x)
out[!duplicated(out)]
}
}
# The result of applying the custom function on `lst` would result in your
# divisor values
lapply(lst, function(x) fnf()(sort(x, decreasing = TRUE)))
[[1]]
[1] 33 11 1
[[2]]
[1] 11 1
[[3]]
[1] 9 3 1
[[4]]
[1] 3 1
[[5]]
[1] 1
In the end we sum each element and extract the max value
Reduce(max, lapply(lst, function(x) sum(fnf()(sort(x, decreasing = TRUE)))))
[1] 45
Testing a very large integer number, I used dear #ThomasIsCoding's modified divisors function:
divisors <- function(x) {
y <- seq(x / 2)
y[as.integer(x) %% y == 0]
}
x <- divisors(.Machine$integer.max - 1)
lst <- lapply(0:(length(x)-1), function(a) x[1:(length(x)-a)])
Reduce(max, lapply(lst, function(x) sum(fnf()(sort(x, decreasing = TRUE)))))
[1] 1569603656
You'll need to recurse. If I understand correctly, this should do what you want:
fact <- function(x) {
x <- as.integer(x)
div <- seq_len(abs(x)/2)
factors <- div[x %% div == 0L]
return(factors)
}
maxfact <- function(x) {
factors <- fact(x)
if (length(factors) < 3L) {
return(sum(factors))
} else {
return(max(factors + mapply(maxfact, factors)))
}
}
maxfact(99)
[1] 45

How can I find and replace a specific sequence of numbers in a vector in R?

I need to replace the sequence "1,0,1" with "1,1,1" whenever it is found in a vector. How can I do this?
x <- c(1,2,3,4,1,0,1)
Edit:
This search needs to be dynamic. If after changing from 1,0,1 to 1,1,1 another 1,0,1 occurs, this must also be replaced.
Considering:
x <- c (1,2,3,4,1,0,1,0,1,2)
I want the algorithm to do:
x <- c (1,2,3,4,1,1,1,0,1,2)
And after:
x <- c (1,2,3,4,1,1,1,1,1,2)
A function that deals dynamically with the length of the sub-vector (being sought). Solutions that convert to/from strings are going to be hugely inefficient asymptotically. Solutions that hard-code a sub-vec of length 3 are limited to sub-vecs of length 3. This deals with anything as long as the source vector is as large or larger than the sub-vec to be found.
#' Find a matching sub-vector
#'
#' Given a vector (`invec`) and a no-larger sub-vector (`subvec`),
#' determine if the latter occurs perfectly.
#' #param invec vector
#' #param subvec vector
#' #return integer positions, length 0 or more
find_subvec <- function(invec, subvec) {
sublen <- seq_along(subvec) - 1L
if (length(subvec) > length(invec)) return(integer(0))
which(
sapply(seq_len(length(invec) - length(subvec) + 1L),
function(i) all(subvec == invec[i + sublen]))
)
}
Use:
find_subvec(c(1,2,3,4,1,0,1), c(1,0,1))
# [1] 5
find_subvec(c(1,2,3,4,1,0,1,0,1), c(1,0,1))
# [1] 5 7
A literal replacement.
z <- c(1,1,1)
x <- c(1,2,3,4,1,0,1)
y <- c(1,0,1)
z <- c(1,1,1)
ind <- find_subvec(x, y)
for (i in ind) x[i + seq_along(y) - 1] <- z
x
# [1] 1 2 3 4 1 1 1
There could be edge cases as mentioned by #Onyambu when the expected results are not clear, but one option could be:
x + (x == 0 & c(NA, head(x, -1)) == 1 & c(tail(x, -1), NA) == 1)
1] 1 2 3 4 1 1 1
Here, it is not treating x as a string, but it is assessing whether the lag and lead values are 1 and the value in the middle is 0.
This should work well enough
library(tidyverse)
x <- c(1,2,3,4,1,0,1,0,1)
x %>%
reduce(str_c) %>%
str_replace_all("(?<=1)0(?=1)","1")
#> [1] "123411111"
Created on 2020-06-14 by the reprex package (v0.3.0)

Find the closest average of three numbers - code optimization

This may seem trivial, but I have a code that finds the average of the closest two numbers in a set of three numbers. So 5 examples:
x1 <- c(1,5,7)
x2 <- c(NA,2,3)
x3 <- c(2,6,4)
x4 <- c(1,NA,NA)
x5 <- c(1,3,1)
I would get an output of
y1 = 6
y2 = 2.5
y3 = 4
y4 = 1
y5 = 1
respectively. Basically, finding the closest 2 numbers, then averaging them, accounting for NA and ties.
This code is a monstrosity:
x <-x[!is.na(x)]
x <-x[order(x)]
y <-ifelse(length(x) == 1, x,
ifelse(length(x) == 2, mean(x),
ifelse(length(x) == 3,
ifelse(abs(x[1] - x[2]) == abs(x[2] - x[3]), mean(x),
ifelse(abs(x[1] - x[2]) > abs(x[2] - x[3]), mean(x[2:3]),
ifelse(abs(x[1] - x[2]) < abs(x[2] - x[3]), mean(x[1:2]),
"error"))), NA)))
It works, but because this is part of a larger for loop, I was wondering there's a better way of doing this..
We define an S3 generic with "list" and "default" methods.
The "default" method takes a vector and sort it (which also removes NA values) and then if the length of what is left is <= 1 it returns the single value or NA if none. If the length is 2 or the two successive differences are the same then it returns the mean of all values; otherwise, it finds the index of the first of the pair of the closest two values and returns the mean of the values.
The "list" method applies the default method repeatedly.
mean_min_diff <- function(x) UseMethod("mean_min_diff")
mean_min_diff.list <- function(x) sapply(x, mean_min_diff.default)
mean_min_diff.default <- function(x) {
x0 <- sort(x)
if (length(x0) <= 1) c(x0, NA)[1]
else if (length(x0) == 2 || sd(diff(x0)) == 0) mean(x0)
else mean(x0[seq(which.min(diff(x0)), length = 2)])
}
Now test it out:
mean_min_diff(x1)
## [1] 6
mean_min_diff(list(x1, x2, x3, x4, x5))
## [1] 6.0 2.5 4.0 1.0 1.0

R: Vectorize Finite Difference Equations

I'm trying to move some Fortran code to R for finite differences related to chemical kinetics.
Sample Fortran loop:
DOUBLE PRECISION, DIMENSION (2000,2) :: data=0.0
DOUBLE PRECISION :: k1=5.0, k2=20.0, dt=0.0005
DO i=2, 2000
data(i,1) = data(i-1,1) + data(i-1,1)*(-k1)*dt
data(i,2) = data(i-1,2) + ( data(i-1,1)*k1*dt - data(i-1,2)*k2*dt )
...
END DO
The analogous R code:
k1=5
k2=20
dt=0.0005
data=data.frame(cbind(c(500,rep(0,1999)),rep(0,2000)))
a.fun=function(y){
y2=y-k1*y*dt
return(y2)
}
apply(data,2,a.fun)
This overwrites my first value in the dataframe and leaves zeros elsewhere. I'd like to run this vectorized and not using a for loop since they are so slow in R. Also, my function only calculates the first column so far. I can't get the second column working until I get the syntax right on the first.
Its not necessarily true that R is bad at loops. It very much depends on what you are doing. Using k1, k2, dt and data from the question (i.e. the four lines beginning with k1=5) and formulating the problem in terms of an iterated matrix, the loop in the last line below returns nearly instantaneously on my PC:
z <- as.matrix(data)
m <- matrix(c(1-k1*dt, k1*dt, 0, 1-k2*dt), 2)
for(i in 2:nrow(z)) z[i, ] <- m %*% z[i-1, ]
(You could also try storing the vectors in columns of z rather than rows since R stores matrices by column.)
Here is the first bit of the result:
> head(z)
X1 X2
[1,] 500.0000 0.000000
[2,] 498.7500 1.250000
[3,] 497.5031 2.484375
[4,] 496.2594 3.703289
[5,] 495.0187 4.906905
[6,] 493.7812 6.095382
May be this can help.
I think you need to have the initial condition for data[1,2]. I assumed both data[1,1] as 500 and data[1,2 as 0 at the initial condition.
The code goes like this:
> ## Define two vectors x and y
> x <- seq(from=0,length=2000,by=0)
> y <- seq(from=0,length=2000,by=0)
>
> ## Constants
> k1 = 5.0
> dt = 0.0005
> k2 = 20.0
>
> ## Initialize x[1]=500 and y[1]=0
> x[1]=500
> y[1] = 0
>
> for (i in 2:2000){
+ x[i]=x[i-1]+x[i-1]*-k1*dt
+ y[i] = y[i-1]+x[i-1]*k1*dt-y[i-1]*k2*dt
+ }
>
> finaldata <- data.frame(x,y)
> head(finaldata)
x y
1 500.0000 0.000000
2 498.7500 1.250000
3 497.5031 2.484375
4 496.2594 3.703289
5 495.0187 4.906905
6 493.7812 6.095382
I hope this helps.

find contiguous responses greater than x

I'm trying to find the the point at which participants reach 8 contiguous responses in a row that are greater than 3. For example:
x <- c(2,2,4,4,4,4,5,5,5,5,7)
i want to return the value 10.
i tried the code (Thanks #DWin):
which( rle(x)$values>3 & rle(x)$lengths >= 8)
sum(rle(x)$lengths[ 1:(min(which(rle(x)$lengths >= 8))-1) ]) + 8
the problem with the above code is that it only works if the responses are all identical and greater than 3. thus the code returns a zero.
if:
x <- c(2,2,4,4,4,4,4,4,4,4,7)
the code works fine. but this isn't how my data looks.
Thanks in advance!
Why don't you create a new vector that contains the identical values that rle needs to work properly? You can use ifelse() for this and put everything into a function:
FUN <- function(x, value, runlength) {
x2 <- ifelse(x > value, 1, 0)
ret <- sum(rle(x2)$lengths[ 1:(min(which(rle(x2)$lengths >= runlength))-1) ]) + runlength
return(ret)
}
> FUN(x, value = 3, runlength = 8)
[1] 10
You could just convert your data so that the responses are only coded discriminating the measure of interest (greater than 3) and then your code will work as it is replacing x with x1.
x1 <- ifelse( x > 3, 4, 0 )
But if I was already doing this I might rewrite the code slightly more clearly this way.
runl <- rle(x1)
i <- which( runl$length > 8 & runl$value > 3 )[1]
sum( runl$length[1:(i-1)] ) + 8
Here's a vectorized way of doing it with just cumsum and cummax. Let's take an example that has a short (less than length 8) sequence of elements greater than 3 as well as a long one, just to make sure it's doing the right thing.
> x <- c(2,2,4,5,6,7,2,2,4,9,8,7,6,5,4,5,6,9,2,2,9)
> x3 <- x > 3
> cumsum(x3) - cummax(cumsum(x3)*(!x3))
[1] 0 0 1 2 3 4 0 0 1 2 3 4 5 6 7 8 9 10 0 0 1
> which( cumsum(x3) - cummax(cumsum(x3)*(!x3)) == 8)[1]
[1] 16

Resources