How can I complete this Fibonacci sequence evaluation in R? - r

Greetings good people of Stackland!
Recently I was given this task
Generate the Fibonacci sequence in any language
Evaluate whether each value is odd or even
Sum the even numbers such that their total is not >500,000
I chose to do this R, as I am learning the language and thought it would be a good exercise in doing so.
I have managed to complete step 2 of the task but haven't been able to proceed any further. Please see code and comments below.
len <- 50
fibvals <- numeric(len)
fibvals[1] <- 1
fibvals[2] <- 1
for(i in 3:len) { fibvals[i] <- fibvals[i-1]+fibvals[i-2]}
fibvals
[1] 1 1 2 3 5
[6] 8 13 21 34 55
[11] 89 144 233 377 610
[16] 987 1597 2584 4181 6765
[21] 10946 17711 28657 46368 75025
[26] 121393 196418 317811 514229 832040
[31] 1346269 2178309 3524578 5702887 9227465
[36] 14930352 24157817 39088169 63245986 102334155
[41] 165580141 267914296 433494437 701408733 1134903170
[46] 1836311903 2971215073 4807526976 7778742049 12586269025
# Creates a variable called len in which the value 50 is stored
# Creates a var called fibvals, which is a numeric datatype, which should have len (50) vals
# Sets the value of the first entry in fibvals to 1
# Sets the value of the second entry in fibvals to 1
# Loop - "for (i in 3:len)" dictates that the loop should be executed between step 3 and step 50 (denoted by "len")
# Loop - Defines a loop step "i" as being the result of the (current i - the before it) + (current i - i two before it)
# Loop - Example 5 = (5-3) + (5-2) OR 2 + 3 = 5 | Example 21 = (21-13) + (21-8) OR 8 + 13 = 21
is.even <- function(x){ x %% 2 == 0 }
# Creates a UDF to check if values are odd or even by using modulo.
If the remainder is 0 when any value is divided by 2, it is an even number
is.even(fibvals)
[1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[11] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
[21] TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
[31] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[41] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
# Evaluates all Fibonacci values on odd or even property
What I need is a bit of guidance as to where I should go from here.
Should I create a data.table and query that using the SQL package, or is there a much more elegant and less cumbersome way?
Thanks in advance!

For sorting out the even number from first 50 fibonacci numbers you can use this
even_numbers <- fibvals[fibvals%%2==0]
Then by computing cumulative sum of those even numbers and imposing the condition of the maximum value of the sum, you can select those even numbers by this
cumsum(even_numbers)<500000
Therefore your desired fibonacci numbers are
even_numbers[cumsum(even_numbers)<500000]
and their sum is
sum(even_numbers[cumsum(even_numbers)<500000])

This would do it
fsum <- 0
for (i in 1:len) { if (is.even(fibvals[i]) && (fsum + fibvals[i])<=500000) {fsum = fsum + fibvals[i]}}
The sum would then be stored in fsum.

Here's a way to do it with a recursive function:
getEvenWithFibber <- function(y = c(1,1),
s = 0,
threshold = 500000) {
if(s + y[1] + y[2] < threshold)
getEvenWithFibber(y = c(y[1] + y[2],y), s = s + ifelse(y[1]%%2==0,y[1],0))
else list(sum = s, seq = y, iseven = y%%2 == 0)
}
getEvenWithFibber()

Related

Looping through 2 vectors of different dimension in R

I have two character vectors a, b with different dimensions. I have to take each element in a and compare with all elements in b and note the element if there is a close match. For matching I'm using agrepl function.
Following is the sample data
a <- c("US","Canada","United States","United States of America")
b <- c("United States","U.S","United States","Canada", "America", "Spain")
Following is the code that I'm using to match. Please help me how to avoid for loop as my real data has more 900 and 5000 records respectively
for(i in 1:4)
{
for(j in 1:6)
{
bFlag <- agrepl(a[i],b[j], max.distance = 0.1,ignore.case = TRUE)
if(bFlag)
{
#Custom logic
}
else
{
#Custom logic
}
}
}
You don't need a double loop, since agrepl's second argument accepts vectors of length >= 1. So you could do something like:
lapply(a, function(x) agrepl(x, b, max.distance = 0.1, ignore.case = TRUE))
# [[1]]
# [1] TRUE TRUE TRUE FALSE FALSE TRUE
#
# [[2]]
# [1] FALSE FALSE FALSE TRUE FALSE FALSE
#
# [[3]]
# [1] TRUE FALSE TRUE FALSE FALSE FALSE
#
# [[4]]
# [1] FALSE FALSE FALSE FALSE FALSE FALSE
You can add some custom logic inside the lapply call if needed, but that's not specified in the question so I'll just leave the output as a list of logicals.
If you want indices (of TRUEs) instead of logicals, you can use agrep instead of agrepl:
lapply(a, function(x) agrep(x, b, max.distance = 0.1,ignore.case = TRUE))
# [[1]]
# [1] 1 2 3 6
#
# [[2]]
# [1] 4
#
# [[3]]
# [1] 1 3
#
# [[4]]
# integer(0)
If you only want the first TRUE index, you can use:
sapply(a, function(x) agrep(x, b, max.distance = 0.1,ignore.case = TRUE)[1])
# US Canada United States United States of America
# 1 4 1 NA

Vectorize R command (part 2)

Yesterday I asked a very simple vectorization question and got some great answers. Today the question is a bit more complex and I'm wondering if R has a function to speed up the runtime of this loop through vectorization.
The loop is
for(j in 1:N) {
A[j,1] = B[max(which(C[j]>=D))];
}
I tried
A[,1] = B[max(which(C>=D))];
and this dropped the runtime considerably ... but the answer was wrong. Is there a "correct" way to do this in R?
EDIT1:
Thanks for the questions regarding data. I will give sizes of the arrays here:
We are looping over 1:N
A is N x 1
B is length M
C is length N
D is length M
If it matters in terms of speed, in this example, N = 844, M = 2500.
Edit2:
And here are some values for a smaller simulated dataset:
B <- c(1.0000000, 1.0000000, 1.0000000, 0.9565217, 0.9565217, 0.9565217, 0.9565217,
0.9565217, 0.9565217, 0.9565217, 0.8967391, 0.8369565, 0.7771739, 0.7173913,
0.7173913, 0.7173913, 0.7173913, 0.7173913, 0.6277174, 0.6277174, 0.5230978,
0.5230978, 0.3923234, 0.3923234, 0.3923234)
C <- c(0.10607, 0.14705, 0.43607, 0.56587, 0.76203, 0.95657, 1.03524, 1.22956, 1.39074, 2.36452)
D <- c(0.10607, 0.13980, 0.14571, 0.14705, 0.29412, 0.33693, 0.43607, 0.53968, 0.56587,
0.58848, 0.64189, 0.65475, 0.75518, 0.76203, 0.95657, 1.03524, 1.05454, 1.18164,
1.22956, 1.23760, 1.39074, 1.87604, 2.36452, 2.89497, 4.42393)
The result should be:
> A
[,1]
[1,] 1.0000000
[2,] 0.9565217
[3,] 0.9565217
[4,] 0.9565217
[5,] 0.7173913
[6,] 0.7173913
[7,] 0.7173913
[8,] 0.6277174
[9,] 0.5230978
[10,] 0.3923234
If you are eager to get the answer immediately, jump to Conclusion. I offer you a single line R code, with maximum efficiency. For details/ideas, read through the following.
Code re-shaping and problem re-definition
When OP asks a vectorization of the following loop:
for(j in 1:N) A[j, 1] <- B[max(which(C[j] >= D))]
The first thing I do is to transform it into a nice version:
## stage 1: index computation (need vectorization)
id <- integer(N); for(j in 1:N) id[j] <- max(which(D <= C[j]))
## stage 2: shuffling (readily vectorized)
A[, 1] <- B[id]
Now we see that only stage 1 needs be vectorized. This stage essentially does the following:
D[1] D[2] D[3] ... D[M]
C[1]
C[2]
C[3]
.
.
C[N]
For each row j, find the cut off location k(j) in D, such that D[k(j) + 1], D[k(j) + 2], ..., D[M] > C[j].
Efficient algorithm based on sorting
There is actually an efficient algorithm to do this:
sort C in ascending order, into CC (record ordering index iC, such that C[iC] == CC)
sort D in ascending order, into DD (record ordering index iD, such that D[iD] == DD)
By sorting, we substantially reduce the work complexity.
If data are unsorted, then we have to explicitly scan all elements: D[1], D[2], ..., D[M] in order to decide on k(j). So there is O(M) costs for each row, thus O(MN) costs in total.
However, If data are sorted, then we only need to do the following:
j = 1: search `D[1], D[2], ..., D[k(1)]`, till `D[k(1) + 1] > C[1]`;
j = 2: search `D[k(1) + 1], D[k(1)+2], ..., D[k(2)]`, till `D[k(2) + 1] > C[2]`;
...
For each row, only partial searching is applied, and the overall complexity is only O(M), i.e., D vector is only touched once, rather than N times as in the trivial implementation. As a result, after sorting, the algorithm is N times faster!! For large M and N, this is a huge difference! As you said in other comment, this code will be called millions of times, then we definitely want O(M) algorithm instead of O(MN) algorithm.
Also note, that the memory costs for this approach is O(M + N), i.e., we only concatenate two vectors together, rather than expanding it into an M-by-N matrix. So such storage saving is also noticeable.
In fact, we can take one step further, by converting this comparison problem into a matching problem, which is easier to vectorize in R.
## version 1:
CCDD <- c(CC, DD) ## combine CC and DD
CCDD <- sort(CCDD, decreasing = TRUE) ## sort into descending order
id0 <- M + N - match(CC, CCDD) + 1
id <- id0 - 1:N
To understand why this work, consider an alternative representation:
## version 2:
CCDD <- c(CC, DD) ## combine CC and DD
CCDD <- sort(CCDD) ## sort into ascending order
id0 <- match(CC, CCDD)
id <- id0 - 1:N
Now the following diagram illustrates what CCDD vector looks like:
CCDD: D[1] D[2] C[1] D[3] C[2] C[3] D[4] D[5] D[6] C[4] .....
id0: 3 5 6 10 .....
id : 2 3 3 6 .....
So, CCDD[id] gives: D[2], D[3], D[3], D[6], ...., exactly the last element no greater than C[1], C[2]. C[3], C[4], ...., Therefore, id is just the index we want!
Then people may wonder why I suggest doing "version 1" rather than "version 2". Because when there are tied values in CCDD, "version 2" will give wrong result, because match() will take the first element that matches, ignoring later matches. So instead of matching from left to right (in ascending index), we have to match from right to left (in descending index).
Using OP's data
With this in mind, I start looking at OP's data. Now amazingly, OP's data are already sorted:
C <- c(0.10607, 0.14705, 0.43607, 0.56587, 0.76203, 0.95657, 1.03524, 1.22956, 1.39074, 2.36452)
D <- c(0.10607, 0.13980, 0.14571, 0.14705, 0.29412, 0.33693, 0.43607, 0.53968, 0.56587, 0.58848,
0.64189, 0.65475, 0.75518, 0.76203, 0.95657, 1.03524, 1.05454, 1.18164, 1.22956, 1.23760,
1.39074, 1.87604, 2.36452, 2.89497, 4.42393)
M <- length(D); N <- length(C)
is.unsorted(C)
# FALSE
is.unsorted(D)
#FALSE
Furthermore, OP has already combined C and D:
all(C %in% D)
# TRUE
It seems that OP and I have the same idea on efficiency in mind. Presumably OP once had a shorter D vector, while the D vector he supplied is really the CCDD vector I mentioned above!
Now, in this situation, things are all the way simple: we just do a single line:
id <- M - match(C, rev(D)) + 1
Note I put rev() because OP has sorted D in ascending order so I need to reverse it. This single line may look very much different from the "version 1" code, but nothing is wrong here. Remember, The D used here is really the CCDD in "version 1" code, and the M here is really the M + N there. Also, there is no need to subtract 1:N from id, due to our different definition of D.
Checking result
Now, the trivial R-loop gives:
id <- integer(N); for(j in 1:N) id[j] <- max(which(D <= C[j]))
id
# [1] 1 4 7 9 14 15 16 19 21 23
Well, our single line, vectorized code gives:
id <- M - match(C, rev(D)) + 1
id
# [1] 1 4 7 9 14 15 16 19 21 23
Perfect match, hence we are doing the right thing.
Conclusion
So, Laurbert, this is the answer you want:
A[, 1] <- B[M - match(C, rev(D)) + 1]
You can use outer for this.
Your code:
A1 <- matrix(NA_real_, ncol = 1, nrow = length(C))
for(j in seq_along(C)) {
A1[j,1] = B[max(which(C[j]>=D))];
}
Test if the elements of C are larger/equal the elements of D with outer:
test <- outer(C, D, FUN = ">=")
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
# [1,] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [2,] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [5,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [6,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [7,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [8,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
# [9,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
#[10,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
Note that this can use a lot of memory for large vectors.
Then find the last TRUE value in each row:
ind <- max.col(test, ties.method = "last") * (rowSums(test) > 0)
rowSums(test) > 0 tests if there are any TRUE values and makes the corresponding element of ind 0 otherwise. It's undefined what you'd want to happen in this case. (A 0 index is ignored during subsetting. Possibly, you'd want NA instead in your final result?)
Now subset:
A2 <- as.matrix(B[ind], ncol = 1)
# [,1]
# [1,] 1.0000000
# [2,] 0.9565217
# [3,] 0.9565217
# [4,] 0.9565217
# [5,] 0.7173913
# [6,] 0.7173913
# [7,] 0.7173913
# [8,] 0.6277174
# [9,] 0.5230978
#[10,] 0.3923234
Are the results identical?
identical(A2, A1)
#[1] TRUE
The data (please use dput next time to provide example data):
B <- c(1.0000000, 1.0000000, 1.0000000, 0.9565217, 0.9565217, 0.9565217, 0.9565217,
0.9565217, 0.9565217, 0.9565217, 0.8967391, 0.8369565, 0.7771739, 0.7173913,
0.7173913, 0.7173913, 0.7173913, 0.7173913, 0.6277174, 0.6277174, 0.5230978,
0.5230978, 0.3923234, 0.3923234, 0.3923234)
C <- c(0.10607, 0.14705, 0.43607, 0.56587, 0.76203, 0.95657, 1.03524, 1.22956, 1.39074,
2.36452)
D <- c(0.10607, 0.13980, 0.14571, 0.14705, 0.29412, 0.33693, 0.43607, 0.53968, 0.56587,
0.58848, 0.64189, 0.65475, 0.75518, 0.76203, 0.95657, 1.03524, 1.05454, 1.18164,
1.22956, 1.23760, 1.39074, 1.87604, 2.36452, 2.89497, 4.42393)
After seeing #Roland's answer, I think I understand better what you are asking. To double check: you want to compare each value of C (individually) against all values of D, and get the largest index of D (let's call it k) that holds a value smaller than C[j]. You then want to use it to assign the corresponding value of B to A, thus A[j]=B[k]. Is this correct?
I don't have an answer regarding how to vectorize what you want to do, but do have some suggestions on how to speed it up. Before that, let me ask whether it's actually worth going through the effort. For the larger example you mentioned (N~1000, M~2500), your loop still runs in well under a second on my laptop. Unless this calculation is done many times over inside another loop, it seems like unnecessary optimization...
Also, like #Roland pointed out, it's not clear what should happen if there is a value in C that's smaller than all values in D. These functions (including your original loop) will not work if that happens and would need some slight tweaking.
Anyway, these are my suggestions:
First, let me wrap your loop into a function for convenience.
f_loop <- function(B, C, D){
N <- length(C)
A <- matrix(0, ncol=1, nrow=N)
for(j in 1:N) {
A[j,1] = B[max(which(C[j]>=D))]
}
return(A)
}
If you want it to look a bit more "R-like" you can replace the loop with one of the *apply functions. In this case, it also runs slightly faster than the loop.
vapply(C, function(x) B[max(which(x>=D))], 0)
## Wrapped into a function for easier reference
f_vapply <- function(B, C, D){
vapply(C, function(x) B[max(which(x>=D))], 0)
}
My other suggestion is uglier (and not really "R-like"), but can help speed things up a lot (if that's the end goal here). I used the inline package to create a compiled version of your loop (note that depending on your OS and R setup, you may need to download additional tools or packages to be able to compile code).
## Translate loop into Fortran
loopcode <-
" integer i, j, k
do i = 1, n
k = 0
do j = 1, m
if (C(i) >= D(j)) k = j
end do
A(i) = B(k)
end do
"
## Compile into function
library(inline)
loopfun <- cfunction(sig = signature(A="numeric", B="numeric", C="numeric", D="numeric", n="integer", m="integer"), dim=c("(n)", "(m)", "(n)", "(m)", "", ""), loopcode, language="F95")
## Wrap into function for easier reference
f_compiled <- function(B, C, D){
A <- C
n <- length(A)
m <- length(B)
out <- loopfun(A, B, C, D, n, m)
return(as.matrix(out$A, ncol=1))
}
Let's check that the results all match:
cbind(A, f_loop(B, C, D), f_vapply(B, C, D), f_compiled(B, C, D))
## [,1] [,2] [,3] [,4]
## [1,] 1.0000000 1.0000000 1.0000000 1.0000000
## [2,] 0.9565217 0.9565217 0.9565217 0.9565217
## [3,] 0.9565217 0.9565217 0.9565217 0.9565217
## [4,] 0.9565217 0.9565217 0.9565217 0.9565217
## [5,] 0.7173913 0.7173913 0.7173913 0.7173913
## [6,] 0.7173913 0.7173913 0.7173913 0.7173913
## [7,] 0.7173913 0.7173913 0.7173913 0.7173913
## [8,] 0.6277174 0.6277174 0.6277174 0.6277174
## [9,] 0.5230978 0.5230978 0.5230978 0.5230978
## [10,] 0.3923234 0.3923234 0.3923234 0.3923234
And check the speed:
microbenchmark(f_loop(B, C, D), f_vapply(B, C, D), f_compiled(B, C, D))
## Unit: microseconds
## expr min lq mean median uq max neval cld
## f_loop(B, C, D) 52.804 54.8075 57.34588 56.5420 58.4615 83.843 100 c
## f_vapply(B, C, D) 38.677 41.5055 43.21231 42.8825 44.1525 65.355 100 b
## f_compiled(B, C, D) 17.095 18.2775 20.55372 20.1770 21.4710 66.407 100 a
We can also try it with vectors of similar size to the larger ones you mentioned (note the change in units for the results):
## Make the vector larger for benchmark
B <- rep(B, 100) # M = 2500
C <- rep(C, 100) # N = 1000
D <- rep(D, 100) # M = 2500
microbenchmark(f_loop(B, C, D), f_vapply(B, C, D), f_compiled(B, C, D))
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## f_loop(B, C, D) 24.380069 24.85061 25.99855 25.839282 25.952433 62.75721 100 b
## f_vapply(B, C, D) 23.543749 24.18427 25.34881 25.015859 25.179924 62.60746 100 b
## f_compiled(B, C, D) 1.976611 2.01403 2.06750 2.032864 2.057594 3.13658 100 a
EDIT:
I realized that if you always want the largest index of D for which C[j]>=D holds, of course it makes much more sense to loop through D starting from the end of the array, and exiting as soon as the first instance is found (instead of looping through the full array).
This is a small tweak to the Fortran code I wrote above that takes advantage of that.
loopcode <-
" integer i, j, k
do j = 1, n
k = 0
do i = m, 1, -1
if (C(j) >= D(i)) then
k = i
exit
end if
end do
A(j) = B(k)
end do
"
I won't include it in the benchmarks, because it'll be much more dependent on the actual data points. But it is obvious that worst case behavior is the same as the previous loop (e.g. if the index of interest occurs at the beginning, D is looped through in full) and the best case behavior almost completely eliminates looping through D (e.g. if the condition holds at the end of the array).

R - Convert this nested for loop (MATLAB) to R [duplicate]

This question already has an answer here:
R - My conditional and nested for loop take too long. How to vectorize?
(1 answer)
Closed 9 years ago.
I need to convert this for loop into R
for ii = 100:(size(start,1)-N)
if start(ii) == 1 && mean(start(ii-11:ii-1)) == 0
count = count + 1;
sif(count,:) = s(ii:ii+N-1);
time(count) = ii*1/FS;
end
end
The start vetor is a single dimension vector of true and false values about 3 million elements in total.
As loops in R take a long time, it take about 3 hours to execute the code, so it needs to be vectorized.
If someone could help I would be really really appreciate it.
Edit
Here is my R code with just a simple count (which takes hours to execute)
for(ii in 100:sp)
{
if(start(ii) == 1 && mean(start(ii-11:ii-1)) == 0)
{
count = count + 1
}
}
Edit-2
Here are the dummy values:
start:
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[13] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
N:
[1] 882
FS:
[1] 44100
s:
[1] 1.762390e-01 1.797791e-01 1.826172e-01 1.795044e-01 1.724243e-01
[6] 1.665039e-01 1.640625e-01 1.634827e-01 1.628723e-01 1.606750e-01
I just created some dummy data:
set.seed(1234)
start = sample(c(TRUE,FALSE), 300000, replace=TRUE)
N = 882
count = 0
Your R code takes:
system.time(
for(ii in 100:(length(start)-N))
{
if(start(ii) == 1 && mean(start((ii-11):(ii-1))) == 0)
{
count = count + 1
}
})
## user system elapsed
## 15.42 0.00 15.43
There is a function in R called start and it was getting called instead of indexing the vector start. The correct and faster way is:
system.time(
for(ii in 100:(length(start)-N))
{
if(start[ii] == 1 && mean(start[(ii-11):(ii-1)]) == 0)
{
count = count + 1
}
})
## user system elapsed
## 2.04 0.00 2.04

Creation of a specific vector without loop or recursion in R

I've got a first vector, let's say x that consists only of 1's and -1's. Then, I have a second vector y that consists of 1's, -1's, and zeros. Now, I'd like to create a vector z that contains in index i a 1 if x[i] equals 1 and a 1 exists within the vector y between the n precedent elements (y[(i-n):i])...
more formally: z <- ifelse(x == 1 && 1 %in% y[(index(y)-n):index(y)],1,0)
I'm looking to create such a vector in R without looping or recursion. The proposition above does not work since it does not recognize to take the expression y[(index(y)-n):index(y)] element by element.
Thanks a lot for your support
Here's an approach that uses the cumsum function to test for the number of ones that have been seen so far. If the number of ones at position i is larger than the number of ones at position i-n, then the condition on the right will be satisfied.
## Generate some random y's.
> y <- sample(-1:1, 25, replace=T)
> y
[1] 0 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 0 0 -1 -1 -1 1 -1 1 1 0 0 0 1
> n <- 3
## Compute number of ones seen at each position.
> cs <- cumsum(ifelse(y == 1, 1, 0))
> lagged.cs <- c(rep(0, n), cs[1:(length(cs)-n)])
> (cs - lagged.cs) > 0
[1] FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
[25] TRUE
You could use apply like this, although it is essentially a pretty way to do a loop, I'm not sure if it will be faster (it may or may not).
y1 <- unlist(lapply(1:length(x), function(i){1 %in% y[max(0, (i-n)):i]}))
z <- as.numeric(x==1) * as.numeric(y1)

Insert elements into a vector at given indexes

I have a logical vector, for which I wish to insert new elements at particular indexes. I've come up with a clumsy solution below, but is there a neater way?
probes <- rep(TRUE, 15)
ind <- c(5, 10)
probes.2 <- logical(length(probes)+length(ind))
probes.ind <- ind + 1:length(ind)
probes.original <- (1:length(probes.2))[-probes.ind]
probes.2[probes.ind] <- FALSE
probes.2[probes.original] <- probes
print(probes)
gives
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
and
print(probes.2)
gives
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[13] TRUE TRUE TRUE TRUE TRUE
So it works but is ugly looking - any suggestions?
These are all very creative approaches. I think working with indexes is definitely the way to go (Marek's solution is very nice).
I would just mention that there is a function to do roughly that: append().
probes <- rep(TRUE, 15)
probes <- append(probes, FALSE, after=5)
probes <- append(probes, FALSE, after=11)
Or you could do this recursively with your indexes (you need to grow the "after" value on each iteration):
probes <- rep(TRUE, 15)
ind <- c(5, 10)
for(i in 0:(length(ind)-1))
probes <- append(probes, FALSE, after=(ind[i+1]+i))
Incidentally, this question was also previously asked on R-Help. As Barry says:
"Actually I'd say there were no ways of doing this, since I dont think you can actually insert into a vector - you have to create a new vector that produces the illusion of insertion!"
You can do some magic with indexes:
First create vector with output values:
probs <- rep(TRUE, 15)
ind <- c(5, 10)
val <- c( probs, rep(FALSE,length(ind)) )
# > val
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [13] TRUE TRUE TRUE FALSE FALSE
Now trick. Each old element gets rank, each new element gets half-rank
id <- c( seq_along(probs), ind+0.5 )
# > id
# [1] 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0
# [16] 5.5 10.5
Then use order to sort in proper order:
val[order(id)]
# [1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
# [13] TRUE TRUE TRUE TRUE TRUE
probes <- rep(TRUE, 1000000)
ind <- c(50:100)
val <- rep(FALSE,length(ind))
new.probes <- vector(mode="logical",length(probes)+length(val))
new.probes[-ind] <- probes
new.probes[ind] <- val
Some timings:
My method
user system elapsed
0.03 0.00 0.03
Marek method
user system elapsed
0.18 0.00 0.18
R append with for loop
user system elapsed
1.61 0.48 2.10
How about this:
> probes <- rep(TRUE, 15)
> ind <- c(5, 10)
> probes.ind <- rep(NA, length(probes))
> probes.ind[ind] <- FALSE
> new.probes <- as.vector(rbind(probes, probes.ind))
> new.probes <- new.probes[!is.na(new.probes)]
> new.probes
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[13] TRUE TRUE TRUE TRUE TRUE
That is sorta tricky. Here's one way. It iterates over the list, inserting each time, so it's not too efficient.
probes <- rep(TRUE, 15)
probes.ind <- ind + 0:(length(ind)-1)
for (i in probes.ind) {
probes <- c(probes[1:i], FALSE, probes[(i+1):length(probes)])
}
> probes
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[13] TRUE TRUE TRUE TRUE TRUE
This should even work if ind has repeated elements, although ind does need to be sorted for the probes.ind construction to work.
Or you can do it using the insertRow function from the miscTools package.
probes <- rep(TRUE, 15)
ind <- c(5,10)
for (i in ind){
probes <- as.vector(insertRow(as.matrix(probes), i, FALSE))
}
I came up with a good answer that's easy to understand and fairly fast to run, building off Wojciech's answer above. I'll adapt the method for the example here, but it can be easily generalized to pretty much any data type for an arbitrary pattern of missing points (shown below).
probes <- rep(TRUE, 15)
ind <- c(5,10)
probes.final <- rep(FALSE, length(probes)+length(ind))
probes.final[-ind] <- probes
The data I needed this for is sampled at a regular interval, but many samples are thrown out, and the resulting data file only includes the timestamps and measurements for those retained. I needed to produce a vector containing all the timestamps and a data vector with NAs inserted for timestamps that were tossed. I used the "not in" function stolen from here to make it a bit simpler.
`%notin%` <- Negate(`%in%`)
dat <- rnorm(50000) # Data given
times <- seq(from=554.3, by=0.1, length.out=70000] # "Original" time stamps
times <- times[-sample(2:69999, 20000)] # "Given" times with arbitrary points missing from interior
times.final <- seq(from=times[1], to=times[length(times)], by=0.1)
na.ind <- which(times.final %notin% times)
dat.final <- rep(NA, length(times.final))
dat.final[-na.ind] <- dat
Um, hi, I had the same doubt, but I couldn't understand what people had answered, because I'm still learning the language. So I tried make my own and I suppose it works! I created a vector and I wanted to insert the value 100 after the 3rd, 5th and 6th indexes. This is what I wrote.
vector <- c(0:9)
indexes <- c(6, 3, 5)
indexes <- indexes[order(indexes)]
i <- 1
j <- 0
while(i <= length(indexes)){
vector <- append(vector, 100, after = indexes[i] + j)
i <-i + 1
j <- j + 1
}
vector
The vector "indexes" must be in ascending order for this to work. This is why I put them in order at the third line.
The variable "j" is necessary because at each iteration, the length of the new vector increases and the original values are moved.
In the case you wish to insert the new value next to each other, simply repeat the number of the index. For instance, by assigning indexes <- c(3, 5, 5, 5, 6), you should get vector == 0 1 2 100 3 4 100 100 100 5 100 6 7 8 9

Resources