Vector recycling concept in R - r

I am trying to understand the working of vector recycling in R. I have 2 vectors
c(2,4,6)
and
c(1,2)
And I want to use the rep() to produce an output as follows:
[1] 2 4 6 4 8 12
based on what I understand from ?rep() is that there are times and each parameters which do the operations which I tried.
> rep(c(2,4,6), times=2)
[1] 2 4 6 2 4 6
But I also see the first vector is multiplied by the first element of the second vector and then to the second element of the second vector. Not sure how to proceed with it.

You can use:
rep(c(2,4,6), 2) * rep(c(1,2), each=3)
#[1] 2 4 6 4 8 12
or with auto recycling:
c(2,4,6) * rep(c(1,2), each=3)
#[1] 2 4 6 4 8 12
Alternative outer could be used:
c(outer(c(2,4,6), c(1,2)))
#[1] 2 4 6 4 8 12
Also crossprod could be used:
c(crossprod(t(c(2,4,6)), c(1,2)))
#[1] 2 4 6 4 8 12
Or %*%:
c(c(2,4,6) %*% t(c(1,2)))
#[1] 2 4 6 4 8 12

Related

Create all possible combinations from two values for each element in a vector in R [duplicate]

This question already has answers here:
How to generate a matrix of combinations
(3 answers)
Closed 6 years ago.
I have been trying to create vectors where each element can take two different values present in two different vectors.
For example, if there are two vectors a and b, where a is c(6,2,9) and b is c(12,5,15) then the output should be 8 vectors given as follows,
6 2 9
6 2 15
6 5 9
6 5 15
12 2 9
12 2 15
12 5 9
12 5 15
The following piece of code works,
aa1 <- c(6,12)
aa2 <- c(2,5)
aa3 <- c(9,15)
for(a1 in 1:2)
for(a2 in 1:2)
for(a3 in 1:2)
{
v <- c(aa1[a1],aa2[a2],aa3[a3])
print(v)
}
But I was wondering if there was a simpler way to do this instead of writing several for loops which will also increase linearly with the number of elements the final vector will have.
expand.grid is a function that makes all combinations of whatever vectors you pass it, but in this case you need to rearrange your vectors so you have a pair of first elements, second elements, and third elements so the ultimate call is:
expand.grid(c(6, 12), c(2, 5), c(9, 15))
A quick way to rearrange the vectors in base R is Map, the multivariate version of lapply, with c() as the function:
a <- c(6, 2, 9)
b <- c(12, 5, 15)
Map(c, a, b)
## [[1]]
## [1] 6 12
##
## [[2]]
## [1] 2 5
##
## [[3]]
## [1] 9 15
Conveniently expand.grid is happy with either individual vectors or a list of vectors, so we can just call:
expand.grid(Map(c, a, b))
## Var1 Var2 Var3
## 1 6 2 9
## 2 12 2 9
## 3 6 5 9
## 4 12 5 9
## 5 6 2 15
## 6 12 2 15
## 7 6 5 15
## 8 12 5 15
If Map is confusing you, if you put a and b in a list, purrr::transpose will do the same thing, flipping from a list of two elements of length three to a list of three elements of length two:
library(purrr)
list(a, b) %>% transpose() %>% expand.grid()
and return the same thing.
I think what you're looking for is expand.grid.
a <- c(6,2,9)
b <- c(12,5,15)
expand.grid(a,b)
Var1 Var2
1 6 12
2 2 12
3 9 12
4 6 5
5 2 5
6 9 5
7 6 15
8 2 15
9 9 15

First occurrence of each value in a vector depending on a condition

From a vector:
v <- c(2,2,2,2,5,7,7,5,5,7,3,3,3)
and according to the condition v[i] != v[i+1], how can I obtain:
[1] 2 5 7 5 7 3
The rle function will do this. rle stands for run length encoding.
v <- c(2,2,2,2,5,7,7,5,5,7,3,3,3)
rle(v)$values
## [1] 2 5 7 5 7 3
This can be also done using diff
v[c(TRUE,diff(v)!=0)]
#[1] 2 5 7 5 7 3
Or using rleid from library(data.table)
library(data.table)
setDT(list(v))[,V1[1L] ,rleid(V1)]$V1
#[1] 2 5 7 5 7 3

Finding the minimum positive value

I guess I don't know which.min as well as I thought.
I'm trying to find the occurrence in a vector of a minimum value that is positive.
TIME <- c(0.00000, 4.47104, 6.10598, 6.73993, 8.17467, 8.80862, 10.00980, 11.01080, 14.78110, 15.51520, 16.51620, 17.11680)
I want to know for the values z of 1 to 19, the index of the above vector TIME containing the value that is closest to but above z. I tried the following code:
vec <- sapply(seq(1,19,1), function(z) which.min((z-TIME > 0)))
vec
#[1] 2 2 2 2 3 3 5 5 7 7 8 9 9 9 10 11 12 1 1
To my mind, the last two values of vec should be '12, 12'. The reason it's doing this is because it thinks that '0.0000' is closest to 0.
So, I thought that maybe it was because I exported the data from external software and that 0.0000 wasn't really 0. But,
TIME[1]==0 #TRUE
Then I got further confused. Why do these give the answer of index 1, when really they should be an ERROR?
which.min(0 > 0 ) #1
which.min(-1 > 0 ) #1
I'll be glad to be put right.
EDIT:
I guess in a nutshell, what is the better way to get this result:
#[1] 2 2 2 2 3 3 5 5 7 7 8 9 9 9 10 11 12 12 12
which shows the index of TIME that gives the smallest possible positive value, when subtracting each element of TIME from the values of 1 to 19.
The natural function to use here (both to limit typing and for efficiency) is actually not which.min + sapply but the cut function, which will determine which range of times each of the values 1:19 falls into:
cut(1:19, breaks=TIME, right=FALSE)
# [1] [0,4.47) [0,4.47) [0,4.47) [0,4.47) [4.47,6.11) [4.47,6.11) [6.74,8.17)
# [8] [6.74,8.17) [8.81,10) [8.81,10) [10,11) [11,14.8) [11,14.8) [11,14.8)
# [15] [14.8,15.5) [15.5,16.5) [16.5,17.1) <NA> <NA>
# 11 Levels: [0,4.47) [4.47,6.11) [6.11,6.74) [6.74,8.17) [8.17,8.81) ... [16.5,17.1)
From this, you can easily determine what you're looking for, which is the index of the smallest element in TIME greater than the cutoff:
(x <- as.numeric(cut(1:19, breaks=TIME, right=FALSE))+1)
# [1] 2 2 2 2 3 3 5 5 7 7 8 9 9 9 10 11 12 NA NA
The last two entries appear as NA because there is no element in TIME that exceeds 18 or 19. If you wanted to replace these with the largest element in TIME, you could do so with replace:
replace(x, is.na(x), length(TIME))
# [1] 2 2 2 2 3 3 5 5 7 7 8 9 9 9 10 11 12 12 12
Here's one way:
x <- t(outer(TIME,1:19,`-`))
max.col(ifelse(x<0,x,Inf),ties="first")
# [1] 2 2 2 2 3 3 5 5 7 7 8 9 9 9 10 11 12 12 12
It's computationally wasteful to take all the differences in this way, since both vectors are ordered.

Excel OFFSET function in r

I am trying to simulate the OFFSET function from Excel. I understand that this can be done for a single value but I would like to return a range. I'd like to return a group of values with an offset of 1 and a group size of 2. For example, on row 4, I would like to have a group with values of column a, rows 3 & 2. Sorry but I am stumped.
Is it possible to add this result to the data frame as another column using cbind or similar? Alternatively, could I use this in a vectorized function so I could sum or mean the result?
Mockup Example:
> df <- data.frame(a=1:10)
> df
a
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> #PROCESS
> df
a b
1 1 NA
2 2 (1)
3 3 (1,2)
4 4 (2,3)
5 5 (3,4)
6 6 (4,5)
7 7 (5,6)
8 8 (6,7)
9 9 (7,8)
10 10 (8,9)
This should do the trick:
df$b1 <- c(rep(NA, 1), head(df$a, -1))
df$b2 <- c(rep(NA, 2), head(df$a, -2))
Note that the result will have to live in two columns, as columns in data frames only support simple data types. (Unless you want to resort to complex numbers.) head with a negative argument cuts the negated value of the argument from the tail, try head(1:10, -2). rep is repetition, c is concatenation. The <- assignment adds a new column if it's not there yet.
What Excel calls OFFSET is sometimes also referred to as lag.
EDIT: Following Greg Snow's comment, here's a version that's more elegant, but also more difficult to understand:
df <- cbind(df, as.data.frame((embed(c(NA, NA, df$a), 3))[,c(3,2)]))
Try it component by component to see how it works.
Do you want something like this?
> df <- data.frame(a=1:10)
> b=t(sapply(1:10, function(i) c(df$a[(i+2)%%10+1], df$a[(i+4)%%10+1])))
> s = sapply(1:10, function(i) sum(b[i,]))
> df = data.frame(df, b, s)
> df
a X1 X2 s
1 1 4 6 10
2 2 5 7 12
3 3 6 8 14
4 4 7 9 16
5 5 8 10 18
6 6 9 1 10
7 7 10 2 12
8 8 1 3 4
9 9 2 4 6
10 10 3 5 8

Splitting a vector into two

How can I split a vector into two such that it selects a random sample for each new vector. But I always want to split in half. For instance
x <- 1:10
obj <- splitMyVector(x)
obj$a
> 5 3 9 7 10
obj$b
> 8 4 1 6 2
Note: the purpose for this is to do a split half reliability.
split(sample(x),letters[seq(length(x))%%2+1])
$a
[1] 9 7 10 4 2
$b
[1] 6 1 8 3 5

Resources