An error about vectorization in R - r

My R code is as follows. The main task is to calculate the row number of repetitions.
library(plyr)
data<-data.frame(1,2,3);
x <- read.table(text = "ID1 ID2 n m
13 156 12 15
94 187 14 16
66 297 41 48
29 89 42 49
78 79 51 79", header= TRUE)
distfunc <- function(data,ID1,ID2,n,m){
X1<-ID1; ################
X2<-ID2; ################
X3<-unlist(mapply(':', n, m));
data<-rbind(data,data.frame(X1,X2,X3));
return(data);
}
data<-distfunc(data,x$ID1, x$ID2,x$n, x$m)
data<-data[-1,]
plyr::count(data, names(data)); ## Calculates the row number of repetitions
The error message I get:
Error in data.frame(X1, X2, X3) :
arguments imply differing number of rows: 5, 52
I try to fix it by R Error: “In numerical expression has 19 elements: only the first used”, but it failed and the result is wrong. This probelm is not the same as that probelm.

I suppose you want to do:
# library(plyr)
# data<-data.frame(1,2,3);
x <- read.table(header=TRUE, text =
"ID1 ID2 n m
13 156 12 15
94 187 14 16
66 297 41 48
29 89 42 49
78 79 51 79")
#distfunc <- function(data, ID1, ID2, n, m) {
# X1 <- ID1 ################
# X2 <- ID2 ################
# X3 <- unlist(mapply(':', n, m))
# data <- rbind(data, data.frame(X1,X2,X3))
#}
#data <- distfunc(data, x$ID1, x$ID2, x$n, x$m)
L <- apply(x, 1, function(x) data.frame(X1=x[1], X2=x[2], X3=x[3]:x[4], row.names=NULL))
data <- L[[1]]
for (i in 2:length(L)) data <- rbind(data, L[[i]])
or with a better readable function in apply():
L <- apply(x, 1, function(r) data.frame(X1=r["ID1"], X2=r["ID2"], X3=r["n"]:r["m"], row.names=NULL))
data <- L[[1]]; for (i in 2:length(L)) data <- rbind(data, L[[i]])
Here is a simpler variant:
data <- data.frame(X1=x$ID1[1], X2=x$ID2[1], X3=x$n[1]:x$m[1])
for (i in 2:nrow(x)) data <- rbind(data, data.frame(X1=x$ID1[i], X2=x$ID2[i], X3=x$n[i]:x$m[i]))

I just fixed it.
distfunc <- function(data, ID1, ID2, n, m) {
X1 <- ID1
X2 <- ID2
X3 <- unlist(mapply(':', n, m))
data <- rbind(data,data.frame(X1, X2, X3))
return(data)
}

Related

Split a dataframe or matrix into multiple parts using assign in a loop

i have a matrix, consisting of monthly values from 2004 to 2018. i would like to split and save these into the individual years with the corresponding months. Like this
...
Sigma.ma2004 <- Sigma.ma[1:12,]
Sigma.ma2005 <- Sigma.ma[13:24,]
Sigma.ma2006 <- Sigma.ma[25:36,]
Sigma.ma2007 <- Sigma.ma[37:48,]
Sigma.ma2008 <- Sigma.ma[49:60,]
Sigma.ma2009 <- Sigma.ma[61:72,]
Sigma.ma2010 <- Sigma.ma[73:84,]
Sigma.ma2011 <- Sigma.ma[85:96,]
Sigma.ma2012 <- Sigma.ma[97:108,]
Sigma.ma2013 <- Sigma.ma[109:120,]
Sigma.ma2014 <- Sigma.ma[121:132,]
Sigma.ma2015 <- Sigma.ma[133:144,]
Sigma.ma2016 <- Sigma.ma[145:156,]
Sigma.ma2017 <- Sigma.ma[157:168,]
Sigma.ma2018 <- Sigma.ma[169:180,]
...
I tried to create a loop for it.
...
start_var <- seq(from = 1 ,to = 169, by = 12)
end_var <- seq(from = 12, to = 180, by = 12)
for (i in 1:length(start_var)){
for(j in 2004:2018){
assign(paste("Sigma.ma",j,sep=""), Sigma.ma[start_var[i]:end_var[i],])
}
}
...
The individual parts are saved, but all with the same strange values.
where is the mistake?
Instead of creating multiple objects in global environment, you could store the data in a list. You could convert the matrix into dataframe and then use split
df <- as.data.frame(Sigma.ma)
temp <- split(df, gl(nrow(df)/12, 12))
This will give you list of dataframes which you can access using temp[[1]], temp[[2]] and so on.
Using a reproducible example,
mat <- matrix(1:100, 10)
df <- as.data.frame(mat)
split(df, gl(nrow(df)/2, 2))
#$`1`
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#1 1 11 21 31 41 51 61 71 81 91
#2 2 12 22 32 42 52 62 72 82 92
#$`2`
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#3 3 13 23 33 43 53 63 73 83 93
#4 4 14 24 34 44 54 64 74 84 94
#....
#....
Something like this should work
years <- 2004:2018
for(i in 1:length(years)) {
start_row <- (i - 1) * 12 + 1
end_row <- start_row + 11
assign(paste0("Sigma.ma", years[i]), Sigma.ma[start_row:end_row, ])
}
Maybe you can try sapply() over assign() like below:
sapply(2004:2018, function(k) assign(paste0("Sigma.ma",toString(k)),Sigma.ma[(k-2004)*12 + (1:12),]))
Another approach would be to convert your matrix into a 3D array.
n_years <- 3
start_year <- 2004
n_rows <- 5
set.seed(123)
Sigma.ma <- matrix(sample(100, n_years * n_rows * 12, replace = T), nrow = n_rows)
array(Sigma.ma, dim = c(n_rows, 12, n_years))

R lapply on a function with three variables

I have a function with three variables
fcalc <- function(n1,n2,n3){
calc1 <- n1*5+n2*10-n3*2
)
I want to pass the values:
1:2 to n1
3:5 to n2
and
6:9 to n3
However when I try
list1 <-lapply(1:2,3:5,6:9,fcalc)
liat1
I get an error.
eg when n1=1 and n2=3 and n3=6
the function would give
calc1 <- 1*5+3*10-6*2
23
I would be grateful for your help.
An option with pmap
library(purrr)
library(tidyr)
fcalc <- function(n1, n2, n3) n1 * 5 + n2 * 10 - n3 * 2
pmap_dbl(crossing(n1 = 1:2, n2 = 3:5, n3 = 6:9), fcalc)
If what you actually want to do is to get calc1 for each combination of the values then this could be your solution:
fcalc <- function(x) {
x[1]*5+x[2]*10-x[3]*2
}
(df <- expand.grid(x1=1:2,x2=3:5,x3=6:9))
(df$calc1 <- apply(df,1,fcalc))
You can also use outer():
fcalc <- function(x, y, z) {
c(outer(outer(x * 5, y * 10, FUN = "+"), z * 2, FUN = "-"))
}
fcalc(1:2, 3:5, 6:9)
[1] 23 28 33 38 43 48 21 26 31 36 41 46 19 24 29 34 39 44 17 22 27 32 37 42

Find the maximum values of a variable in a list

I have a list of the following structure & I intend to find the maximum value of X2 in the second variable (b) in the list
sample data
[[1]]
[[1]]$a
[1] 2
[[1]]$b
X1 X2
1 58 1686729
2 106 1682303
[[2]]
[[2]]$a
[1] 3
[[2]]$b
X1 X2
1 24 1642468
2 89 1695581
3 156 1634019
I looked into multiple filters that can be applied to the list like:
library(rlist)
list.filter(result, max(b$area))
and also tried lapply but with no success
lapply(result, function(x) x[which.max(x$b)])
I need the following output:
a x1 x2
2 58 1686729
3 89 1695581
With lapply() you can find the max of X2 in $b in each list, then cbind() with the a element.
l_max <- lapply(l, function(x) {
b <- x$b
cbind(a=x$a, b[which.max(b$X2),])
})
Use bind_rows() from dplyr for binding together.
l_max %>%
dplyr::bind_rows()
# a X1 X2
# 1 2 58 1686729
# 2 3 89 1695581
Example data:
l <- list(
list(a = 2,
b = data.frame(X1 = c(58, 106), X2 = c(1686729, 1682303))),
list(a = 3,
b = data.frame(X1 = c(24, 89,156), X2 = c(1642468, 1695581,1634019)))
)
With your example:
l_max <- lapply(l, function(x) {
b <- x$b
cbind(a = x$a, b[which.max(b[,2]),]) # NOTICE I used [,2] to refer to the second column
#b$area works too if all df share the col name
})
l_max %>%
dplyr::bind_rows()
# a rt area
# 1 2 58 1686729
# 2 3 89 1695581
# 3 4 101 1679889
# 4 5 88 1695983
# 5 6 105 1706445
# 6 7 121 1702019
Another solution with purrr::map_df() avoids the use of bind_rows():
purrr::map_df(l, function(x) {
b <- x$b
cbind(a = x$a, b[which.max(b[,2]),])
})
All base R using mapply():
t(mapply(function(x) {
b <- x$b
cbind(a = x$a, b[which.max(b[,2]),])
}, l))
Or with Map():
do.call("rbind", Map(function(x) {
b <- x$b
cbind(a = x$a, b[which.max(b[,2]),])
}, l))
You can also use sapply():
t(sapply(list, function(elem){
c(a = elem$a, elem$b[which.max(elem$b$area), ])
}))

Condense a matrix in R

I have loaded a table of integer data with 2,200 columns. What I'd like to do is condense the data down by averaging the values in every 5 columns and placing that in a new column in a new table.
For example, if I had:
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 | Col10
2 4 6 8 10 12 14 16 18 20
I would get:
Col1 | Col2
6 16
Which is just the average of the values in columns 1-5 from the original table in Col1 and the average of the values in columns 6-10 in Col2.
I haven't quite wrapped my head around R syntax, so any help would be appreciated.
Here's one approach that's applicable if the number of elements to be grouped is divisible by n (5, in your case):
x <- 1:100
n <- 5
tapply(x, rep(seq(1, length(x), n), each=n), mean)
# 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
# 3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98
The first row of output contains element names, and the second row contains means of successive groups of n elements.
To apply this to all rows of a matrix or data.frame, you can do, e.g.:
m <- matrix(1:1000, ncol=100)
apply(m, 1, function(x) tapply(x, rep(seq(1, length(x), n), each=n), mean))
EDIT
This alternative approach will give you some performance gains due to vectorisation with rowMeans:
t(mapply(function(x, y) rowMeans(m[, x:y]),
seq(1, ncol(m), n), seq(n, ncol(m), n)))
Oops, I see this is the comment of #user20650 in #jbaums answer. The rowsum function splits rows of a matrix by a factor, and sums the columns of each split. So for
m <- matrix(1:1000, ncol=100)
n <- 5
we have
rowsum(t(m), rep(seq_len(ncol(m) / n), each=n)) / n
This is fast, if that's important
library(microbenchmark)
f0 = function(m, n) rowsum(t(m), rep(seq_len(ncol(m) / n), each=n)) / n
f1 = function(m, n)
apply(m, 1, function(x) tapply(x, rep(seq(1, length(x), n), each=n), mean))
f2 = function(m, n)
t(mapply(function(x, y) rowMeans(m[, x:y]),
seq(1, ncol(m), n), seq(n, ncol(m), n)))
all.equal(f0(m, n), f1(m, n), check.attributes=FALSE)
## [1] TRUE
all.equal(f0(m, n), f2(m, n), check.attributes=FALSE)
## [1] TRUE
microbenchmark(f0(m, n), f1(m, n), f2(m, n))
## Unit: microseconds
## expr min lq median uq max neval
## f0(m, n) 164.351 170.1675 176.730 187.8570 237.419 100
## f1(m, n) 8060.639 8513.3035 8696.742 8908.5190 9771.019 100
## f2(m, n) 540.894 588.3820 603.787 634.1615 732.209 100
Here's another approach using a loop and rowMeans instead, in case you prefer a loop in this case. Will work for matrices, but needs adjustment for vectors.
# example data
dat <- as.data.frame( matrix(1:20,ncol=10,byrow=TRUE) )
# pick range
range <- 5
ind <- seq(1,ncol(dat),range)
newdat <- NULL
for(i in ind){
newcol <- rowMeans(dat[,i:(i+range-1)])
newdat <- cbind(newdat, newcol)
}
Will result in:
> newdat
newcol newcol
[1,] 3 8
[2,] 13 18
#jbaums answer looks pretty good. Since I had already started this answer, I thought I would post my solution as well.
#Make some fake data
require(data.table)
data <- data.table(t(iris[,1:4]))
#Transpose since rows are easier to deal with than columns
data <- data.table(t(data))
data[ , row := .I]
#Sum by every 5 rows
data <- data[ , lapply(.SD,sum), by=cut(row,seq(0,nrow(data),5))]
#Transpose back to original results
result <- data.table(t(data))
If you wanted to get the means of the elements from col1-col5, col6-col10, etc.
m1 <- matrix(c(rep(1:100, 2), 1:20), ncol=22)
n <- 5
p1 <- prod(dim(m1))
n1 <- nrow(m1)*n
n2 <- p1-p1%%n1
c(rowMeans(matrix(m1[1:n2], nrow=p1%/%n1, byrow=TRUE)), mean(m1[(n2+1):p1]))
#[1] 25.5 75.5 25.5 75.5 10.5
Or
sapply(seq(1,ncol(m1), by=n), function(i) mean(m1[,i:(min(c(i+n-1), ncol(m1)))]) )
#[1] 25.5 75.5 25.5 75.5 10.5
With some labels
indx <- seq(1,n2/nrow(m1), by=n)
indx1 <- paste("Col",paste(indx, indx+4, sep="-"),sep="_")
indx2 <- paste("Col", paste(seq(p1%%n1+1, ncol(m1)),collapse="-"), sep="_")
c(rowMeans(matrix(m1[1:n2], nrow=p1%/%n1, byrow=TRUE, dimnames=list(indx1, NULL))), setNames(mean(m1[(n2+1):p1]), indx2))
# Col_1-5 Col_6-10 Col_11-15 Col_16-20 Col_21-22
# 25.5 75.5 25.5 75.5 10.5
Update
I realized that you wanted the rowMeans by splitting up columns 1:5, 6:10, 11:15 etc. If that is the case:
res1 <- cbind( colMeans(aperm(array(m1[1:n2], dim=c(nrow(m1), n, p1%/%n1)), c(2,1,3))),
rowMeans(m1[,(ncol(m1)-ncol(m1)%%n+1):ncol(m1)]))
which is equal to manual splitting the columns
res2 <- cbind(rowMeans(m1[,1:5]), rowMeans(m1[,6:10]), rowMeans(m1[,11:15]),
rowMeans(m1[,16:20]), rowMeans(m1[,21:22]))
identical(res1,res2)
#[1] TRUE
colnames(res1) <- c(indx1,indx2)
res1
# Col_1-5 Col_6-10 Col_11-15 Col_16-20 Col_21-22
#[1,] 21 71 21 71 6
#[2,] 22 72 22 72 7
#[3,] 23 73 23 73 8
#[4,] 24 74 24 74 9
#[5,] 25 75 25 75 10
#[6,] 26 76 26 76 11
#[7,] 27 77 27 77 12
#[8,] 28 78 28 78 13
#[9,] 29 79 29 79 14
#[10,] 30 80 30 80 15

R - List of combinations with outer() and expand.grid()

I have a list of prime numbers with I multiply using outer() and upper.tri() to get a unique set of numbers.
primes <- c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
m <- outer(primes, primes, "*")
unq <- m[which(upper.tri(m))]
> unq
6 10 15 14 21 35 22 33 55 77 26 39 65 91 143 34 51 85 119 187 221 38 57 95 133 209 247 323 46 69 115 161 253 299 391 437 58 87 145 203 319 377 493 551 667
Each of the original prime numbers represents a set of two numbers:
a2 <- c(1,1)
a3 <- c(1,2)
a5 <- c(2,2)
a7 <- c(1,3)
a11 <- c(1,4)
a13 <- c(2,3)
a17 <- c(2,4)
a19 <- c(3,3)
a23 <- c(3,4)
a29 <- c(4,4)
The combination of the two sets of two numbers produces 4 numbers
expand.grid(a2,a3)
1 1
1 1
1 2
1 2
So what I would like to do is have a kind of a list of lists, with each prime number having all 4 possible combinations.
I tried something like this, but I am missing some fundamentals here:
outer(a ,a , "expand.grid")
So the result would look something like this for the first prime:
6 c(11, 11, 12, 12)
I'm not sure I understand correctly, but I hope this helps:
#function to `outer`
fun <- function(x, y)
{
a1 <- get(paste0("a", x))
a2 <- get(paste0("a", y))
res <- apply(expand.grid(a1, a2), 1, paste, collapse = "")
res2 <- paste(res, collapse = ";")
return(res2)
}
#`outer` a vectorized `fun`
m2 <- outer(primes, primes, Vectorize(fun))
#select `upper.tri`
unq2 <- m2[upper.tri(m2)]
#combine to a list
myls <- lapply(as.list(unq2), function(x) as.numeric(unlist(strsplit(x, ";"))))
names(myls) <- unq
myls
#$`6`
#[1] 11 11 12 12
#$`10`
#[1] 12 12 12 12
#$`15`
#[1] 12 22 12 22
#$`14`
#[1] 11 11 13 13
#...

Resources