Say I have a vector like so:
vector <- 1:9
#$ [1] 1 2 3 4 5 6 7 8 9
I now want to repeat every i to i+x sequence n times, like so for x=3, and n=2:
#$ [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
I'm accomplishing this like so:
index <- NULL
x <- 3
n <- 2
for (i in 1:(length(vector)/3)) {
index <- c(index, rep(c(1:x + (i-1)*x), n))
}
#$ [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
This works just fine, but I have a hunch there's got to be a better way (especially since usually, a for loop is not the answer).
Ps.: the use case for this is actually repeating rows in a dataframe, but just getting the index vector would be fine.
You can try to first split the vector, then use rep and unlist:
x <- 3 # this is the length of each subset sequence from i to i+x (see above)
n <- 2 # this is how many times you want to repeat each subset sequence
unlist(lapply(split(vector, rep(1:(length(vector)/x), each = x)), rep, n), use.names = FALSE)
# [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
Or, you can try creating a matrix and converting it to a vector:
c(do.call(rbind, replicate(n, matrix(vector, ncol = x), FALSE)))
# [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
Related
I was wondering if you had any idea what R code I could use to automate my process.
I would like to repeat "chunks" of an initial vector (Vec1). I divide the vector in groups of 4 values and repeat each group 5 times. Currently, with my bad technique, each time I add a new experiment to the analysis I have to manually create a vector to indicate which chunk I would like to repeat next. In the end I put the vector corresponding to each experiment together to get my desired output.
Vec1 <- A simple numeric vector that grows in size for each new experiment. Each new experiment extends the vector by 4 additional values.
Exp1 <- rep(Vec1 [1:4], times=5)
Exp2 <- rep(Vec1 [5:8], times=5)
Exp3 <- rep(Vec1 [9:12], times=5)
NewVector<- c(Exp1, Exp2, Exp3)
Could I use a trick to automate it?
Many thanks for the help,
Best regards,
Edouard M.
I don't know about "automate". You could write a function that takes the values 1:4 and adds multiples of 4 to it.
add_exp <- function(values = 1:4, n = 0) {
rep(values, 5) + 4 * n
}
Then add_exp() gives:
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
And add_exp(n = 1) gives:
[1] 5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8
So you could get NewVector using:
NewVector<- c(add_exp(), add_exp(n = 1), add_exp(n = 2))
Or if you wanted to use lapply to supply the values of n:
NewVector <- unlist(lapply(0:2, function(x) add_exp(n = x)))
Using sequence:
n <- 3L # number of experiments
v <- 4L # length of vector added for each experiment
r <- 5L # number of replications
sequence(rep(v, n*r), rep(seq(1, n*v, v), each = r))
#> [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 5 6 7 8 5
#> [26] 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8 9 10 11 12 9 10 11 12 9 10
#> [51] 11 12 9 10 11 12 9 10 11 12
How do I generate a vector of sequence in this range 1<i<n that is the number contained in the vector will be a positive integer greater than 1, but less than n.
Here is what I tried bellow:
n <- 10
my_seq <- seq(from => 1, to =< n)
It gave me this error:
Error: unexpected '>' in "my_seq <- seq(from =>"
my expected output should be
[1] 2 3 4 5 6 7 8 9
Depending on which type of vectors you need. Below are some examples:
If you want to have ascend sequence (without duplicates)
seq(n-2)+1
# [1] 2 3 4 5 6 7 8 9
If you want to shuffle the values 2 to n-2:
sample(n-2)+1
# [1] 6 7 9 5 8 4 2 3
If you need random integers that allow duplicates
sample(n-2,replace = TRUE)+1
# [1] 5 2 8 9 4 3 6 9
You could generate the sequence using
n <- 10
2:(n-1)
#[1] 2 3 4 5 6 7 8 9
OR
seq(2, n - 1)
You can also do:
tail(head(1:n, -1), -1)
[1] 2 3 4 5 6 7 8 9
I have a list of numbers and would like to find which is the next highest compared to each number in a data.frame. I have:
list <- c(3,6,9,12)
X <- c(1:10)
df <- data.frame(X)
And I would like to add a variable to df being the next highest number in the list. i.e:
X Y
1 3
2 3
3 3
4 6
5 6
6 6
7 9
8 9
9 9
10 12
I've tried:
df$Y <- which.min(abs(list-df$X))
but that gives an error message and would just get the closest value from the list, not the next above.
Another approach is to use findInterval:
df$Y <- list[findInterval(X, list, left.open=TRUE) + 1]
> df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12
You could do this...
df$Y <- sapply(df$X, function(x) min(list[list>=x]))
df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12
I have a sensor that measures a variable and when there is no connection it returns always the last value seen instead of NA. So in my vector I would like to replace these identical values by an imptuted value (for example with na.approx).
set.seed(3)
vec <- round(runif(20)*10)
#### [1] 2 8 4 3 6 6 1 3 6 6 5 5 5 6 9 8 1 7 9 3
But I want only the sequences bigger than 2 (3 or more identical numbers) because 2 identical numbers can appear naturally. (in previous example the sequence to tag would be 5 5 5)
I tried to do it with diff to tag my identical points (c(0, diff(vec) == 0)) but I don't know how to deal with the length == 2 condition...
EDIT
my expected output could be like this:
#### [1] 2 8 4 3 6 6 1 3 6 6 5 NA NA 6 9 8 1 7 9 3
(The second identical value of a sequence of 3 or more is very probably a wrong value too)
Thanks
you can use the lag function
set.seed(3)
> vec <- round(runif(20)*10)
>
> vec
[1] 2 8 4 3 6 6 1 3 6 6 5 5 5 6 9 8 1 7 9 3
>
> vec[vec == lag(vec) & vec == lag(vec,2)] <- NA
>
> vec
[1] 2 8 4 3 6 6 1 3 6 6 5 5 NA 6 9 8 1 7 9 3
>
you can use rle to get the indices of the positions where NA should be assigned.
vec[with(data = rle(vec),
expr = unlist(sapply(which(lengths > 2), function(i)
(sum(lengths[1:i]) - (lengths[i] - 2)):sum(lengths[1:i]))))] = NA
vec
#[1] 2 8 4 3 6 6 1 3 6 6 5 NA NA 6 9 8 1 7 9 3
In function
foo = function(X, length){
replace(x = X,
list = with(data = rle(X),
expr = unlist(sapply(which(lengths > length), function(i)
(sum(lengths[1:i]) - (lengths[i] - length)):sum(lengths[1:i])))),
values = NA)
}
foo(X = vec, length = 2)
#[1] 2 8 4 3 6 6 1 3 6 6 5 NA NA 6 9 8 1 7 9 3
Lets say I have a data frame with the following structure:
DF <- data.frame(x = 0:4, y = 5:9)
> DF
x y
1 0 5
2 1 6
3 2 7
4 3 8
5 4 9
what is the most efficient way to turn 'DF' into a data frame with the following structure:
w x y
1 0 5
1 1 6
2 1 6
2 2 7
3 2 7
3 3 8
4 3 8
4 4 9
Where w is a length 2 window rolling through the dataframe 'DF.' The length of the window should be arbitrary, i.e a length of 3 yields
w x y
1 0 5
1 1 6
1 2 7
2 1 6
2 2 7
2 3 8
3 2 7
3 3 8
3 4 9
I am a bit stumped by this problem, because the data frame can also contain an arbitrary number of columns, i.e. w,x,y,z etc.
/edit 2: I've realized edit 1 is a bit unreasonable, as xts doesn't seem to deal with multiple observations per data point
My approach would be to use the embed function. The first thing to do is to create a rolling sequence of indices into a vector. Take a data-frame:
df <- data.frame(x = 0:4, y = 5:9)
nr <- nrow(df)
w <- 3 # window size
i <- 1:nr # indices of the rows
iw <- embed(i,w)[, w:1] # matrix of rolling-window indices of length w
> iw
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
wnum <- rep(1:nrow(iw),each=w) # window number
inds <- i[c(t(iw))] # the indices flattened, to use below
dfw <- sapply(df, '[', inds)
dfw <- transform(data.frame(dfw), w = wnum)
> dfw
x y w
1 0 5 1
2 1 6 1
3 2 7 1
4 1 6 2
5 2 7 2
6 3 8 2
7 2 7 3
8 3 8 3
9 4 9 3