R: How to make sequence (1,1,1,2,3,3,3,4,5,5,5,6,7,7,7,8) - r

Title says it all: how would I code such a repeating sequence where the base repeat unit is : a vector c(1,1,1,2) - repeated 4 times, but incrementing the values in the vector by 2 each time?
I've tried a variety of rep,times,each,seq and can't get the wanted result out..

c(1,1,1,2) + rep(seq(0, 6, 2), each = 4)
# [1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8

The rep function allows for a vector of the same length as x to be used in the times argument. We can extend the desired pattern with the super secret rep_len.
rep(1:8, rep_len(c(3, 1), 8))
#[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8

I'm not sure if I get it right but what's wrong with something as simple as that:
rep<-c(1,1,1,2)
step<-2
vec<-c(rep,step+rep,2*step+rep,3*step+rep)

I accepted luke as it is the easiest for me to understand (and closest to what I was already trying, but failing with!)
I have used this final form:
> c(1,1,1,2)+rep(c(0,2,4,6),each=4)
[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8

You could do:
pattern <- rep(c(3, 1), len = 50)
unlist(lapply(1:8, function(x) rep(x, pattern[x])))
[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8
This lets you just adjust the length of the pattern under rep(len = X) and removes any usage of addition, which some of the other answers show.

How about:
input <- c(1,1,1,2)
n <- 4
increment <- 2
sort(rep.int(seq.int(from = 0, by = increment, length.out = n), length(input))) + input
[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8

Related

Vector recycling concept in R

I am trying to understand the working of vector recycling in R. I have 2 vectors
c(2,4,6)
and
c(1,2)
And I want to use the rep() to produce an output as follows:
[1] 2 4 6 4 8 12
based on what I understand from ?rep() is that there are times and each parameters which do the operations which I tried.
> rep(c(2,4,6), times=2)
[1] 2 4 6 2 4 6
But I also see the first vector is multiplied by the first element of the second vector and then to the second element of the second vector. Not sure how to proceed with it.
You can use:
rep(c(2,4,6), 2) * rep(c(1,2), each=3)
#[1] 2 4 6 4 8 12
or with auto recycling:
c(2,4,6) * rep(c(1,2), each=3)
#[1] 2 4 6 4 8 12
Alternative outer could be used:
c(outer(c(2,4,6), c(1,2)))
#[1] 2 4 6 4 8 12
Also crossprod could be used:
c(crossprod(t(c(2,4,6)), c(1,2)))
#[1] 2 4 6 4 8 12
Or %*%:
c(c(2,4,6) %*% t(c(1,2)))
#[1] 2 4 6 4 8 12

R: How to create a equal blocks of variables randomly?

I have a data frame of n = 20 variables (number of columns) spread over b = 5 blocks (4 variables per block).
I would like to create p = 4 random and equal sized blocks of variables from the 5 blocks of variables.
I tried :
sample (x = 1: p, size = n, replace = TRUE)
[1] 1 1 1 1 1 1 1 1 1 2 2 2 3 3 3 4 4 4 4 4
Example of expected result (5 variables per block):
[1] 4 1 2 1 4 2 3 1 2 3 2 1 4 3 1 2 3 3 4 4
Thanks for your help !
You can try:
sample(x = rep(1:p,n/p), size = n, replace = FALSE)
Having discussed this in comments below, here is a solution:
Create a vector that looks like what you want, and then use sample to randomly sort it by sampling the whole vector without replacement:
p <- 4
b <- 5
sample(rep(1:p, b), size = p * b)
[1] 3 1 4 3 3 4 1 1 4 2 2 4 3 2 1 2 2 4 3 1

Percolation clustering

Consider the following groupings:
> data.frame(x = c(3:5,7:9,12:14), grp = c(1,1,1,2,2,2,3,3,3))
x grp
1 3 1
2 4 1
3 5 1
4 7 2
5 8 2
6 9 2
7 12 3
8 13 3
9 14 3
Let's say I don't know the grp values but only have a vector x. What is the easiest way to generate grp values, essentially an id field of groups of values within a threshold from from each other? Is this a percolation algorithm?
One option would be to compare the next with the current value and check if the difference is greater than 1, and get the cumulative sum.
df1$grp <- cumsum(c(TRUE, diff(df1$x) > 1))
df1$grp
#[1] 1 1 1 2 2 2 3 3 3
EDIT: From #geotheory's comments.

Wrap around subtraction

I have these numbers:
login.day$wday
[1] 5 6 7 1 2 3 4
and I want to map them to:
login.day$wday
[1] 4 5 6 7 1 2 3
Each number is subtracted by 1, and if the answer is 0, wrap it around back to 7. This is embarrassingly simple but I just can't figure it out. My attempt keeps giving me a zero:
> (login.day$wday + 6) %% 7
[1] 4 5 6 0 1 2 3
Prefer solution in R. Is it possible to do with modulo arithmetic or must I use an if statement?
Mathematically equivalent to the other solution, and with some explanation.
(login.day$wday - 1 - 1) %% 7 + 1
The problem is that it is hard to do modular arithmetic with numbers starting at 1.
We start by doing -1 to shift everything down by 1, so we have a zero-based numbers ranging from [0,6].
We then subtract 1, because that is what we were trying to do to begin with.
Next, we take the modulus, and add 1 back to shift everything back up to the range [1,7].
(login.day$wday + 5) %% 7 + 1
perhaps?
The boundary conditions are 7 -> 6, 1 -> 7 and 2 -> 1.
The result had to involve %% 7 as you so rightly spotted.
And since the last of these boundary conditions results in 1, then we need to add 1 after doing the modulo, and reduce the number added before the modulo by 1.
I have a silly function I've written called shift that does this:
shift <- function(x = 1:10, n = 1) {
if (n == 0) x <- x
else x <- c(tail(x, -n), head(x, n))
x
}
x <- c(5, 6, 7, 1, 2, 3, 4)
shift(x, -1)
# [1] 4 5 6 7 1 2 3
shift(x, -2)
# [1] 3 4 5 6 7 1 2
The use I had in mind for this was something like the following:
set.seed(1)
X <- sample(7, 20, TRUE)
X
# [1] 2 3 5 7 2 7 7 5 5 1 2 2 5 3 6 4 6 7 3 6
shift(sort(unique(X)), -1)[X]
# [1] 1 2 4 6 1 6 6 4 4 7 1 1 4 2 5 3 5 6 2 5
I like the solution of #merlin2011 but just to add to the options here is a lookup table approach:
c(7, 1:6)[login.day$wday]

Excel OFFSET function in r

I am trying to simulate the OFFSET function from Excel. I understand that this can be done for a single value but I would like to return a range. I'd like to return a group of values with an offset of 1 and a group size of 2. For example, on row 4, I would like to have a group with values of column a, rows 3 & 2. Sorry but I am stumped.
Is it possible to add this result to the data frame as another column using cbind or similar? Alternatively, could I use this in a vectorized function so I could sum or mean the result?
Mockup Example:
> df <- data.frame(a=1:10)
> df
a
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> #PROCESS
> df
a b
1 1 NA
2 2 (1)
3 3 (1,2)
4 4 (2,3)
5 5 (3,4)
6 6 (4,5)
7 7 (5,6)
8 8 (6,7)
9 9 (7,8)
10 10 (8,9)
This should do the trick:
df$b1 <- c(rep(NA, 1), head(df$a, -1))
df$b2 <- c(rep(NA, 2), head(df$a, -2))
Note that the result will have to live in two columns, as columns in data frames only support simple data types. (Unless you want to resort to complex numbers.) head with a negative argument cuts the negated value of the argument from the tail, try head(1:10, -2). rep is repetition, c is concatenation. The <- assignment adds a new column if it's not there yet.
What Excel calls OFFSET is sometimes also referred to as lag.
EDIT: Following Greg Snow's comment, here's a version that's more elegant, but also more difficult to understand:
df <- cbind(df, as.data.frame((embed(c(NA, NA, df$a), 3))[,c(3,2)]))
Try it component by component to see how it works.
Do you want something like this?
> df <- data.frame(a=1:10)
> b=t(sapply(1:10, function(i) c(df$a[(i+2)%%10+1], df$a[(i+4)%%10+1])))
> s = sapply(1:10, function(i) sum(b[i,]))
> df = data.frame(df, b, s)
> df
a X1 X2 s
1 1 4 6 10
2 2 5 7 12
3 3 6 8 14
4 4 7 9 16
5 5 8 10 18
6 6 9 1 10
7 7 10 2 12
8 8 1 3 4
9 9 2 4 6
10 10 3 5 8

Resources