R - List of combinations with outer() and expand.grid() - r

I have a list of prime numbers with I multiply using outer() and upper.tri() to get a unique set of numbers.
primes <- c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
m <- outer(primes, primes, "*")
unq <- m[which(upper.tri(m))]
> unq
6 10 15 14 21 35 22 33 55 77 26 39 65 91 143 34 51 85 119 187 221 38 57 95 133 209 247 323 46 69 115 161 253 299 391 437 58 87 145 203 319 377 493 551 667
Each of the original prime numbers represents a set of two numbers:
a2 <- c(1,1)
a3 <- c(1,2)
a5 <- c(2,2)
a7 <- c(1,3)
a11 <- c(1,4)
a13 <- c(2,3)
a17 <- c(2,4)
a19 <- c(3,3)
a23 <- c(3,4)
a29 <- c(4,4)
The combination of the two sets of two numbers produces 4 numbers
expand.grid(a2,a3)
1 1
1 1
1 2
1 2
So what I would like to do is have a kind of a list of lists, with each prime number having all 4 possible combinations.
I tried something like this, but I am missing some fundamentals here:
outer(a ,a , "expand.grid")
So the result would look something like this for the first prime:
6 c(11, 11, 12, 12)

I'm not sure I understand correctly, but I hope this helps:
#function to `outer`
fun <- function(x, y)
{
a1 <- get(paste0("a", x))
a2 <- get(paste0("a", y))
res <- apply(expand.grid(a1, a2), 1, paste, collapse = "")
res2 <- paste(res, collapse = ";")
return(res2)
}
#`outer` a vectorized `fun`
m2 <- outer(primes, primes, Vectorize(fun))
#select `upper.tri`
unq2 <- m2[upper.tri(m2)]
#combine to a list
myls <- lapply(as.list(unq2), function(x) as.numeric(unlist(strsplit(x, ";"))))
names(myls) <- unq
myls
#$`6`
#[1] 11 11 12 12
#$`10`
#[1] 12 12 12 12
#$`15`
#[1] 12 22 12 22
#$`14`
#[1] 11 11 13 13
#...

Related

Add number to vector repeatdly and duplicate vector

I have a two value
3 and 5
and I make vector
num1 <- 3
num2 <- 12
a <- c(num1, num2)
I want add number(12) to vector "a" and
also I want to make new vector with repeat and append
like this:
3,12, 15,24, 27,36, 39,48 ....
repeat number "n" is 6
I don't have any idea.
Here are two methods in base R.
with outer, you could do
c(outer(c(3, 12), (12 * 0:4), "+"))
[1] 3 12 15 24 27 36 39 48 51 60
or with sapply, you can explicitly loop through and calculate the pairs of sums.
c(sapply(0:4, function(i) c(3, 12) + (12 * i)))
[1] 3 12 15 24 27 36 39 48 51 60
outer returns a matrix where every pair of elements of the two vectors have been added together. c is used to return a vector. sapply loops through 0:4 and then calculates the element-wise sum. It also returns a matrix in this instance, so c is used to return a vector.
Here is a somewhat generic function that takes as input your original vector a, the number to add 12, and n,
f1 <- function(vec, x, n){
len1 <- length(vec)
v1 <- sapply(seq(n/len1), function(i) x*i)
v2 <- rep(v1, each = n/length(v1))
v3 <- rep(vec, n/len1)
return(c(vec, v3 + v2))
}
f1(a, 12, 6)
#[1] 3 12 15 24 27 36 39 48
f1(a, 11, 12)
#[1] 3 12 14 23 25 34 36 45 47 56 58 67 69 78
f1(a, 3, 2)
#[1] 3 12 6 15
EDIT
If by n=6 you mean 6 times the whole vector then,
f1 <- function(vec, x, n){
len1 <- length(vec)
v1 <- sapply(seq(n), function(i) x*i)
v2 <- rep(v1, each = len1)
v3 <- rep(vec, n)
return(c(vec, v3 + v2))
}
f1(a, 12, 6)
#[1] 3 12 15 24 27 36 39 48 51 60 63 72 75 84
Using rep for repeating and cumsum for the addition:
n = 6
rep(a, n) + cumsum(rep(c(12, 0), n))
# [1] 15 24 27 36 39 48 51 60 63 72 75 84

Apply a family of functions over nested list -R

I need to apply a family of functions of form (a*x +b+ c) to
a nested list, e.g. in following form
map_function <- function(x,y){
return(linear_function(x[1],x[2],x[3],y))
}
linear_function <- function(x1,x2,x3,y){
g <- sapply(y, function(x){x1*x+x2+x3})%>% min(.)
return(g)
}
over two lists e.g. so that when map_function is passed on all arguments
pr_list <- list(c(1,2,3),c(4,5,6)) and
f_list <- list(c(234,34),c(456,34,567),c(111,222))
It will generate a nested list/matrix of 3 with 2 values in each. What is the R-way to do it other than using for loop?
e.g. if the output is a matrix, examples of the elements will be
M11 <- linear_function(pr_list[[1]][1],pr_list[[1]][2],pr_list[[1]][3],f_list[[1]] )
M12 <- linear_function(pr_list[[1]][1],pr_list[[1]][2],pr_list[[1]][3],f_list[[2]] )
M13 <- linear_function(pr_list[[1]][1],pr_list[[1]][2],pr_list[[1]][3],f_list[[3]] )
M21 <- linear_function(pr_list[[2]][1],pr_list[[2]][2],pr_list[[2]][3],f_list[[1]] )
M22 <- linear_function(pr_list[[2]][1],pr_list[[2]][2],pr_list[[2]][3],f_list[[2]] )
M23 <- linear_function(pr_list[[2]][1],pr_list[[2]][2],pr_list[[2]][3],f_list[[3]] )
M <- list(c(M11,M21),c(M12,M22),c(M13,M23))
print(M)
[[1]]
[1] 39 147
[[2]]
[1] 39 147
[[3]]
[1] 116 455
This is my best guess for you.
x <- data.frame(x1 = c(1,2,3), x2 = c(4,5,6))
y <- data.frame(y1 = c(234,34,NA),y2 = c(456,34,567), y3 = c(111,222,NA))
linear_function <- function(x, y){x[[1]]*y +x[[2]]+x[[3]]}
Which when applied like this, results in the following.
> linear_function(x$x1, y)
y1 y2 y3
1 239 461 116
2 39 39 227
3 NA 572 NA
> linear_function(x$x2, y)
y1 y2 y3
1 947 1835 455
2 147 147 899
3 NA 2279 NA
If you want a single object.
> z <- lapply(x, linear_function, y)
> z
$x1
y1 y2 y3
1 239 461 116
2 39 39 227
3 NA 572 NA
$x2
y1 y2 y3
1 947 1835 455
2 147 147 899
3 NA 2279 NA

R programming(sum of products)

i'm working on how to find sum of products of two dataframes.
data<-w1 w2 w3 w4
4 6 8 5
where w1 w2 w3 w4 are column names
and I have one more dataframe
data2<-p1 p2 p3 p4
3 4 5 6
5 6 8 4
4 6 6 8
3 5 8 9
my result should be like this:
result <- w1*P1+w2*p2+w3*p3*w4*p4
result1 <- 4*3+6*4+8*5+5*6 # result on row 1
result2 <- 4*5+6*6+8*8+5*4 # result on row 2
and so on for each row in data2
how to do this in general
Thanks
Fastest way is to come back to R linear algebra (even more is you have big data.frame's):
> as.matrix(data2) %*% unlist(data)
# [,1]
#[1,] 106
#[2,] 140
#[3,] 140
#[4,] 151
Or sweep:
> rowSums(sweep(as.matrix(data2), 2, unlist(data), `*`))
#[1] 106 140 140 151
Data
data=data.frame(a=4,b=6,c=8,d=5)
data2=data.frame(a=c(3,5,4,3),b=c(4,6,6,5),c=c(5,8,6,8),d=c(6,4,8,9))
You could use mapply:
df1 <- data.frame(w1 = 4, w2 = 6, w3 = 8, w4 = 5)
df2 <- data.frame(p1 = c(3, 5, 4, 3), p2 = c(4, 6, 6, 5),
p3 = c(5, 8, 6, 8), p4 = c(6, 4, 8, 9))
This multiplies each element of df2 with each element of df1 (by element I mean column - the data frame is treated as a list in this context):
> (tmp <- mapply(`*`, df2, df1))
p1 p2 p3 p4
[1,] 12 24 40 30
[2,] 20 36 64 20
[3,] 16 36 48 40
[4,] 12 30 64 45
>sum(tmp)
[1] 537
Edit If you want to get the sum of each row from the above matrix you can use either apply(tmp, 1, sum) or rowSums:
> rowSums(tmp)
[1] 106 140 140 151

Condense a matrix in R

I have loaded a table of integer data with 2,200 columns. What I'd like to do is condense the data down by averaging the values in every 5 columns and placing that in a new column in a new table.
For example, if I had:
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 | Col10
2 4 6 8 10 12 14 16 18 20
I would get:
Col1 | Col2
6 16
Which is just the average of the values in columns 1-5 from the original table in Col1 and the average of the values in columns 6-10 in Col2.
I haven't quite wrapped my head around R syntax, so any help would be appreciated.
Here's one approach that's applicable if the number of elements to be grouped is divisible by n (5, in your case):
x <- 1:100
n <- 5
tapply(x, rep(seq(1, length(x), n), each=n), mean)
# 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
# 3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98
The first row of output contains element names, and the second row contains means of successive groups of n elements.
To apply this to all rows of a matrix or data.frame, you can do, e.g.:
m <- matrix(1:1000, ncol=100)
apply(m, 1, function(x) tapply(x, rep(seq(1, length(x), n), each=n), mean))
EDIT
This alternative approach will give you some performance gains due to vectorisation with rowMeans:
t(mapply(function(x, y) rowMeans(m[, x:y]),
seq(1, ncol(m), n), seq(n, ncol(m), n)))
Oops, I see this is the comment of #user20650 in #jbaums answer. The rowsum function splits rows of a matrix by a factor, and sums the columns of each split. So for
m <- matrix(1:1000, ncol=100)
n <- 5
we have
rowsum(t(m), rep(seq_len(ncol(m) / n), each=n)) / n
This is fast, if that's important
library(microbenchmark)
f0 = function(m, n) rowsum(t(m), rep(seq_len(ncol(m) / n), each=n)) / n
f1 = function(m, n)
apply(m, 1, function(x) tapply(x, rep(seq(1, length(x), n), each=n), mean))
f2 = function(m, n)
t(mapply(function(x, y) rowMeans(m[, x:y]),
seq(1, ncol(m), n), seq(n, ncol(m), n)))
all.equal(f0(m, n), f1(m, n), check.attributes=FALSE)
## [1] TRUE
all.equal(f0(m, n), f2(m, n), check.attributes=FALSE)
## [1] TRUE
microbenchmark(f0(m, n), f1(m, n), f2(m, n))
## Unit: microseconds
## expr min lq median uq max neval
## f0(m, n) 164.351 170.1675 176.730 187.8570 237.419 100
## f1(m, n) 8060.639 8513.3035 8696.742 8908.5190 9771.019 100
## f2(m, n) 540.894 588.3820 603.787 634.1615 732.209 100
Here's another approach using a loop and rowMeans instead, in case you prefer a loop in this case. Will work for matrices, but needs adjustment for vectors.
# example data
dat <- as.data.frame( matrix(1:20,ncol=10,byrow=TRUE) )
# pick range
range <- 5
ind <- seq(1,ncol(dat),range)
newdat <- NULL
for(i in ind){
newcol <- rowMeans(dat[,i:(i+range-1)])
newdat <- cbind(newdat, newcol)
}
Will result in:
> newdat
newcol newcol
[1,] 3 8
[2,] 13 18
#jbaums answer looks pretty good. Since I had already started this answer, I thought I would post my solution as well.
#Make some fake data
require(data.table)
data <- data.table(t(iris[,1:4]))
#Transpose since rows are easier to deal with than columns
data <- data.table(t(data))
data[ , row := .I]
#Sum by every 5 rows
data <- data[ , lapply(.SD,sum), by=cut(row,seq(0,nrow(data),5))]
#Transpose back to original results
result <- data.table(t(data))
If you wanted to get the means of the elements from col1-col5, col6-col10, etc.
m1 <- matrix(c(rep(1:100, 2), 1:20), ncol=22)
n <- 5
p1 <- prod(dim(m1))
n1 <- nrow(m1)*n
n2 <- p1-p1%%n1
c(rowMeans(matrix(m1[1:n2], nrow=p1%/%n1, byrow=TRUE)), mean(m1[(n2+1):p1]))
#[1] 25.5 75.5 25.5 75.5 10.5
Or
sapply(seq(1,ncol(m1), by=n), function(i) mean(m1[,i:(min(c(i+n-1), ncol(m1)))]) )
#[1] 25.5 75.5 25.5 75.5 10.5
With some labels
indx <- seq(1,n2/nrow(m1), by=n)
indx1 <- paste("Col",paste(indx, indx+4, sep="-"),sep="_")
indx2 <- paste("Col", paste(seq(p1%%n1+1, ncol(m1)),collapse="-"), sep="_")
c(rowMeans(matrix(m1[1:n2], nrow=p1%/%n1, byrow=TRUE, dimnames=list(indx1, NULL))), setNames(mean(m1[(n2+1):p1]), indx2))
# Col_1-5 Col_6-10 Col_11-15 Col_16-20 Col_21-22
# 25.5 75.5 25.5 75.5 10.5
Update
I realized that you wanted the rowMeans by splitting up columns 1:5, 6:10, 11:15 etc. If that is the case:
res1 <- cbind( colMeans(aperm(array(m1[1:n2], dim=c(nrow(m1), n, p1%/%n1)), c(2,1,3))),
rowMeans(m1[,(ncol(m1)-ncol(m1)%%n+1):ncol(m1)]))
which is equal to manual splitting the columns
res2 <- cbind(rowMeans(m1[,1:5]), rowMeans(m1[,6:10]), rowMeans(m1[,11:15]),
rowMeans(m1[,16:20]), rowMeans(m1[,21:22]))
identical(res1,res2)
#[1] TRUE
colnames(res1) <- c(indx1,indx2)
res1
# Col_1-5 Col_6-10 Col_11-15 Col_16-20 Col_21-22
#[1,] 21 71 21 71 6
#[2,] 22 72 22 72 7
#[3,] 23 73 23 73 8
#[4,] 24 74 24 74 9
#[5,] 25 75 25 75 10
#[6,] 26 76 26 76 11
#[7,] 27 77 27 77 12
#[8,] 28 78 28 78 13
#[9,] 29 79 29 79 14
#[10,] 30 80 30 80 15

How to transform a dataframe in an ordered matrix?

Please, input the following code:
A <- matrix(11, nrow = 4, ncol = 3)
A[,2] <- seq(119, 122, 1)
A[,3] <- seq(45, 42)
B <- matrix(39, nrow = 4, ncol = 3)
B[,2] <- seq(119, 122, 1)
B[,3] <- seq(35, 32)
C <- matrix(67, nrow = 4, ncol = 3)
C[,2] <- seq(119, 122, 1)
C[,3] <- seq(27, 24)
D <- rbind(A, B, C)
You will get D which is a 12 x 3 matrix; I would like to know the most efficient way to obtain Mat starting from D.
> Mat
11 39 67
119 45 35 27
120 44 34 26
121 43 33 25
122 42 32 24
In fact, Mat is the last column of D indexed by the first and the second column of D; e.g. consider Mat[1,1] which is equal to 45: it comes from the only row of D which is identified by 11 and 119.
How may I obatin it?
Thanks,
You can use xtabs:
xtabs(D[,3]~D[,2]+D[,1])
D[, 1]
D[, 2] 11 39 67
119 45 35 27
120 44 34 26
121 43 33 25
122 42 32 24
library(reshape2)
dcast(data.frame(D), X2 ~ X1)

Resources