Picking a variable in R - r

I have two variables: X and state which are given below
set.seed(3)
state <- rbinom(15,4,0.6)
X <- c(1:15)
X
state
and the output is
> state
[1] 3 2 3 3 2 2 4 3 2 2 2 2 2 2 1
> X
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
I want to select the Xs corresponding to the same state. Any idea how to do this in R?

Using split you get a list of 4 states
ll <- split(X,state)
$`1`
[1] 15
$`2`
[1] 2 5 6 9 10 11 12 13 14
$`3`
[1] 1 3 4 8
$`4`
[1] 7
ll[3]
$`3`
[1] 1 3 4 8
generally we use , ave to perform some operations while grouping.
For example here I get the mean of X by state:
ave(X,state,FUN = mean)
[1] 4.000000 9.111111 4.000000 4.000000 9.111111 9.111111 7.000000 4.000000 9.111111 9.111111 9.111111 9.111111 9.111111 9.111111 15.000000

Another way could be to put you variables in a data frame and then select them from there:
> df <- data.frame(x = X, state = state)
> df
x state
1 1 3
2 2 2
3 3 3
4 4 3
5 5 2
6 6 2
7 7 4
8 8 3
9 9 2
10 10 2
11 11 2
12 12 2
13 13 2
14 14 2
15 15 1
> df[df$state == 3,]
x state
1 1 3
3 3 3
4 4 3
8 8 3

Related

R Subtracting columns within a list

I'd like to subtract specific columns within a list. I'm still learning how to properly use the apply functions. For example, given
> b <- list(data.frame(12:16, 3*2:6), data.frame(10:14, 2*1:5))
> b
[[1]]
X12.16 X3...2.6
1 12 6
2 13 9
3 14 12
4 15 15
5 16 18
[[2]]
X10.14 X2...1.5
1 10 2
2 11 4
3 12 6
4 13 8
5 14 10
I'd like some function x so that I get
> x(b)
[[1]]
X12.16 X3...2.6 <newcol>
1 12 6 6
2 13 9 4
3 14 12 2
4 15 15 0
5 16 18 -2
[[2]]
X10.14 X2...1.5 <newcol>
1 10 2 8
2 11 4 7
3 12 6 6
4 13 8 5
5 14 10 4
Thanks in advance.
If your data.frames had nice and consistent names, you could use transform with lapply
b <- list(data.frame(a=12:16, b=3*2:6), data.frame(a=10:14, b=2*1:5))
lapply(b, transform, c=a-b)
Here is a solution:
lapply(b, function(x) {
x[, 3] <- x[, 1] - x[, 2]
x
})
[[1]]
X12.16 X3...2.6 V3
1 12 6 6
2 13 9 4
3 14 12 2
4 15 15 0
5 16 18 -2
[[2]]
X10.14 X2...1.5 V3
1 10 2 8
2 11 4 7
3 12 6 6
4 13 8 5
5 14 10 4
with dplyr:
library(dplyr)
lapply(b, function(x) x %>% mutate(new_col = .[[1]]-.[[2]]))
Result:
[[1]]
X12.16 X3...2.6 new_col
1 12 6 6
2 13 9 4
3 14 12 2
4 15 15 0
5 16 18 -2
[[2]]
X10.14 X2...1.5 new_col
1 10 2 8
2 11 4 7
3 12 6 6
4 13 8 5
5 14 10 4

Generating Permutations of Values Within Multiple Lists [duplicate]

This question already has an answer here:
All possible combinations of elements from different bins (one element from every bin) [duplicate]
(1 answer)
Closed 6 years ago.
I'm trying to generate permutations by taking 1 value from 3 different lists
l <- list(A=c(1:13), B=c(1:5), C=c(1:3))
Desired result => Matrix of all the permutations where the first value can be 1-13, second value can be 1-5, third value can be 1-3
I tried using permn from the combinat package, but it seems to just rearrange the 3 lists.
> permn(l)
[[1]]
[[1]]$A
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13
[[1]]$B
[1] 1 2 3 4 5
[[1]]$C
[1] 1 2 3
[[2]]
[[2]]$A
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13
[[2]]$C
[1] 1 2 3
[[2]]$B
[1] 1 2 3 4 5
....
Expected output
[,1] [,2] [,3]
[1,] 1 1 3
[2,] 1 2 1
[3,] 1 1 2
[4,] 1 1 3
and so on...
We can use expand.grid. It can directly be applied on the list
expand.grid(l)
You can create a data frame using do.call and expand.grid, if you really need a matrix, then use as.matrix on the result:
> l <- list(A=c(1:13), B=c(1:5), C=c(1:3))
> out <- do.call(expand.grid, l)
> head(out)
A B C
1 1 1 1
2 2 1 1
3 3 1 1
4 4 1 1
5 5 1 1
6 6 1 1
> tail(out)
A B C
190 8 5 3
191 9 5 3
192 10 5 3
193 11 5 3
194 12 5 3
195 13 5 3
> tail(as.matrix(out))
A B C
[190,] 8 5 3
[191,] 9 5 3
[192,] 10 5 3
[193,] 11 5 3
[194,] 12 5 3
[195,] 13 5 3
>

Getting all combinations in R, repetition allowed

The built-in combn only gives half the combinations:
> t(combn(1:5, 2))
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 1 5
[5,] 2 3
[6,] 2 4
[7,] 2 5
[8,] 3 4
[9,] 3 5
[10,] 4 5
For example there is no (1,1) nor (2,1).
How can I get all combinations?
As #akrun said, it looks like expand.grid will do it.
> expand.grid(rep(list(1:5), 2))
Var1 Var2
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 1 2
7 2 2
8 3 2
9 4 2
10 5 2
11 1 3
12 2 3
13 3 3
14 4 3
15 5 3
16 1 4
17 2 4
18 3 4
19 4 4
20 5 4
21 1 5
22 2 5
23 3 5
24 4 5
25 5 5
You could get the Cartesian product using merge:
merge(1:5, 1:5)
Output:
x y
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 1 2
7 2 2
8 3 2
9 4 2
10 5 2
11 1 3
12 2 3
13 3 3
14 4 3
15 5 3
16 1 4
17 2 4
18 3 4
19 4 4
20 5 4
21 1 5
22 2 5
23 3 5
24 4 5
25 5 5
Using sqldf:
df1 <- data.frame(a = 1:5)
df2 <- df1
sqldf("SELECT df1.a, df2.a FROM df1
CROSS JOIN df2")
This is actually called as permutations with repeated elements. Besides the given recommendations, you can use gtools::permutations function:
gtools::permutations(5, 2, 1:5, repeats.allowed=TRUE)

How to change the way split returns values in R?

I'm working on a project and I want to take a matrix, split it by the values w and x, and then for each of those splits find the maximum value of y.
Here's an example matrix
>rah = cbind(w = 1:6, x = 1:3, y = 12:1, z = 1:12)
>rah
w x y z
[1,] 1 1 12 1
[2,] 2 2 11 2
[3,] 3 3 10 3
[4,] 4 1 9 4
[5,] 5 2 8 5
[6,] 6 3 7 6
[7,] 1 1 6 7
[8,] 2 2 5 8
[9,] 3 3 4 9
[10,] 4 1 3 10
[11,] 5 2 2 11
[12,] 6 3 1 12
So I run split
> doh = split(rah, list(rah[,1], rah[,2]))
> doh
$`1.1`
[1] 1 1 1 1 12 6 1 7
$`2.1`
integer(0)
$`3.1`
integer(0)
$`4.1`
[1] 4 4 1 1 9 3 4 10
$`5.1`
integer(0)
$`6.1`
integer(0)
$`1.2`
integer(0)
$`2.2`
[1] 2 2 2 2 11 5 2 8
$`3.2`
integer(0)
$`4.2`
integer(0)
$`5.2`
[1] 5 5 2 2 8 2 5 11
...
So I'm a bit confused as to how take the output of split and use it to sort the rows with the matching combination of w and x values (Such as row 1 compared to row 7) and then compared them to find the one with the high y value.
EDIT: Informative answers so far but I just realized that I forgot to mention one very important part: I want to keep the whole row (x,w,y,z).
Use aggregate instead
> aggregate(y ~ w + x, max, data=rah)
w x y
1 1 1 12
2 4 1 9
3 2 2 11
4 5 2 8
5 3 3 10
6 6 3 7
If you want to use split, try
> split_rah <- split(rah[,"y"], list(rah[, "w"], rah[, "x"]))
> ind <- sapply(split_rah, function(x) length(x)>0)
> sapply(split_rah[ind], max)
1.1 4.1 2.2 5.2 3.3 6.3
12 9 11 8 10 7
Just for the record, summaryBy from doBy package also works in the same fashion of aggregate
> library(doBy)
> summaryBy(y ~ w + x, FUN=max, data=as.data.frame(rah))
w x y.max
1 1 1 12
2 2 2 11
3 3 3 10
4 4 1 9
5 5 2 8
6 6 3 7
data.table solution:
> library(data.table)
> dt <- data.table(rah)
> dt[, max(y), by=list(w, x)]
w x V1
1: 1 1 12
2: 2 2 11
3: 3 3 10
4: 4 1 9
5: 5 2 8
6: 6 3 7
> tapply(rah[,"y"], list( rah[,"w"], rah[,"x"]), max)
1 2 3
1 12 NA NA
2 NA 11 NA
3 NA NA 10
4 9 NA NA
5 NA 8 NA
6 NA NA 7
Another option using plyr package:
ddply(as.data.frame(rah),.(w,x),summarize,z=max(y))
w x z
1 1 1 12
2 2 2 11
3 3 3 10
4 4 1 9
5 5 2 8
6 6 3 7

Sequence expansion question

I have a sequence of 'endpoints', e.g.:
c(7,10,5,11,15)
that I want to expand to a sequence of 'elapsed time' between the endpoints, e.g.
c(7,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,1,2,3,4,5,6,7,8,9,10,11,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
Whats the most efficient way to do this in R? I'm imagining some creative use of the embed function, but I can't quite get there without using a ugly for loop.
Here's the naive way to do this:
expandSequence <- function(x) {
out <- x[1]
for (y in (x[-1])) {
out <- c(out,seq(1,y))
}
return(out)
}
expandSequence(c(7,10,5,11,15))
There is a base function to do this, called, wait for it, sequence:
sequence(c(7,10,5,11,15))
[1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 1 2 3
[26] 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
In your case it seems your first endpoint is in fact not part of the sequence, so it becomes:
c(7, sequence(c(10,5,11,15)))
[1] 7 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 1 2 3 4 5 6 7 8 9
[26] 10 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
How about this:
> unlist(sapply(x,seq))
[1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 1 2
[25] 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
With the first element added on at the end:
c( x[1], unlist( sapply( x[seq(2,length(x))], seq ) ) )
And a slightly more readable version:
library(taRifx)
c( x[1], unlist( sapply( shift(x,wrap=FALSE), seq ) ) )
A combination of lapply() and seq_len() is useful here:
expandSequence <- function(x) {
out <- lapply(x[-1], seq_len)
do.call(c, c(x[1], out))
}
Which gives for
pts <- c(7,10,5,11,15)
> expandSequence(pts)
[1] 7 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 1 2 3 4
[21] 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 12 13
[41] 14 15
(An alternative is:
expandSequence <- function(x) {
out <- lapply(x[-1], seq_len)
unlist(c(x[1], out), use.names = FALSE)
}
)

Resources