I have known that list can include data.frame.
This is an example.
df<- cbind(column=c(1,2,3), column=c(2,3,4), column=c(3,4,5), column=c(4,5,6))
col<-list()
col[1]<- list(df[,1:2])
col[2]<- list(df[,2:3])
col[3]<- list(df[,3:4])
The result of col is that
col
[[1]]
column column
[1,] 1 2
[2,] 2 3
[3,] 3 4
[[2]]
column column
[1,] 2 3
[2,] 3 4
[3,] 4 5
[[3]]
column column
[1,] 3 4
[2,] 4 5
[3,] 5 6
However, when I use for.
col<-list()
for(i in 1:3){
col[i]<- list(df[,i:i+1])
}
The result is that
col
[[1]]
[1] 2 3 4
[[2]]
[1] 3 4 5
[[3]]
[1] 4 5 6
What is the different point of these two approaches?
How can I get same result using for.?
In the for statement you need to add parenthesis like this:
col<-list()
for(i in 1:3){
col[i]<- list(df[,i:(i+1)])
}
Basically in your code "i:i+1" is equivalent to "(i:i)+1" since ":" has priority over "+" on this operation... or in other words you are doing (i+1):(i+1)
Related
tl;dr What is the idiomatic way to identify groups of identical rows in a matrix in R?
Given an n-by-2 matrix where some rows occur more than once,
> mat <- matrix(c(2,5,5,3,4,6,2,5,4,6,4,6), ncol=2, byrow=T)
> mat
[,1] [,2]
[1,] 2 5
[2,] 5 3
[3,] 4 6
[4,] 2 5
[5,] 4 6
[6,] 4 6
I am looking to get the groups of row indices of identical rows. In the example above, rows (1,4) are identical, and so are rows (3,5,6). Finally, there is row (2). I am looking to get these groups, represented in whatever way is idiomatic in R.
The output could be something like this,
> groups <- matrix(c(1,1, 2,2, 3,3, 4,1, 5,3, 6,3), ncol=2, byrow=T)
> groups
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 1
[5,] 5 3
[6,] 6 3
where the first column contains the row indices of mat and the second the group index for each row index. Or it could be like this:
> split(groups[,1], groups[,2])
$`1`
[1] 1 4
$`2`
[1] 2
$`3`
[1] 3 5 6
Either will do. I am not sure what is the best way to represent groups in R, and advice on this is also welcome.
For benchmarking purposes, here's a larger dataset:
set.seed(123)
n <- 10000000
mat <- matrix(sample.int(10, 2*n, replace = T), ncol=2)
cbind with sequence of rows and the match between the rows and unique values of the row
v1 <- paste(mat[,1], mat[,2])
# or if there are more columns
#v1 <- do.call(paste, as.data.frame(mat))
out <- cbind(seq_len(nrow(mat)), match(v1, unique(v1)))
-output
> out
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 1
[5,] 5 3
[6,] 6 3
If we want a list output
split(out[,1], out[,2])
-ouptut
$`1`
[1] 1 4
$`2`
[1] 2
$`3`
[1] 3 5 6
Benchmarks
With the OP's big data
> system.time({
+ v1 <- paste(mat[,1], mat[,2])
+
+ out <- cbind(seq_len(nrow(mat)), match(v1, unique(v1)))
+
+ })
user system elapsed
2.603 0.130 2.706
I have a matrix with 6000 columns and each column belongs to one of 100 "groups" i need. I need to convert this matrix into a list 100 smaller matrices. This is a toy example of what i have:
mat = cbind(c(2,2,2),c(3,3,3),c(4,4,4),c(1,1,1))
colnames(mat) = c("2018.3 1","2018.3 2","2019.1 1","2019.2 2")
so the "group" is identified by the last name of each colname, here there are 2 groups. The result I need would look like:
list(cbind(c(2,2,2),c(4,4,4)),cbind(c(3,3,3),c(1,1,1)))
I've been thinking and I think it should be something like this:
lapply(do.call(cbind,sapply(something here to find the columns in each group)))
but i haven't figure out how exactly to do it.
#Obtain the last part of each column names
groups = sapply(strsplit(x = colnames(mat), split = " "), function(x) x[2])
#Go through each unique column name and extract the corresponding columns
lapply(unique(groups), function(x) mat[,which(groups == x)])
#[[1]]
# 2018.3 1 2019.1 1
#[1,] 2 4
#[2,] 2 4
#[3,] 2 4
#[[2]]
# 2018.3 2 2019.2 2
#[1,] 3 1
#[2,] 3 1
#[3,] 3 1
OR
lapply(split(1:NCOL(mat), sapply(strsplit(x = colnames(mat), split = " "),
function(x) x[2])), function(i) mat[,i])
#$`1`
# 2018.3 1 2019.1 1
#[1,] 2 4
#[2,] 2 4
#[3,] 2 4
#$`2`
# 2018.3 2 2019.2 2
#[1,] 3 1
#[2,] 3 1
#[3,] 3 1
this is my problem:
I have a grid (see plot below), and I need to get and store in a list the coordinates of each vertex of each block (cell). The order of blocks that I need is '1-1', ... '4-1', '1-2', ... '4-2'. To keep it simple I'm just working with the indexes for now.
Based on two vectors with the common East and North coordinates I've written a little function, which is partially producing the output that I need. It is skipping the cell '1-2' and '2-2' (see output below). I can't see where exactly is the error, but I suspect that the issue is in my nested for loop. (There are many questions on for loop, but none helped me with my problem).
Any help will be appreciated and apologise if this is too basic to be asked here.
vectors:
x.breaks <- c(191789.1, 291789.1, 391789.1)
y.breaks <- c(5172287, 5272287, 5372287, 5472287, 5572287)
Function:
getting_vertices <- function(x.breaks, y.breaks){
xs <- list()
ys <- list()
polys <- list()
for(i in 1 : (length(x.breaks)-1)){
xs[[i]] <- c(i, i+1 , i+1, i, i)
}
for(j in 1 : (length(y.breaks)-1)){
ys[[j]] <- c(j, j, j+1, j+1, j)
}
for(v in 1 : length(sapply(ys, length)) ){
for(k in 1: length(sapply(xs, length))){
polys[[v*k]] <- cbind(xs[[k]], ys[[v]])
}
}
return(polys)
}
getting_vertices(x.breaks, y.breaks)
Output (this is partially correct):
[[1]]
[,1] [,2]
[1,] 1 1
[2,] 2 1
[3,] 2 2
[4,] 1 2
[5,] 1 1
[[2]]
[,1] [,2]
[1,] 1 2
[2,] 2 2
[3,] 2 3
[4,] 1 3
[5,] 1 2
[[3]]
[,1] [,2]
[1,] 1 3
[2,] 2 3
[3,] 2 4
[4,] 1 4
[5,] 1 3
[[4]]
[,1] [,2]
[1,] 1 4
[2,] 2 4
[3,] 2 5
[4,] 1 5
[5,] 1 4
[[5]]
NULL
[[6]]
[,1] [,2]
[1,] 2 3
[2,] 3 3
[3,] 3 4
[4,] 2 4
[5,] 2 3
[[7]]
NULL
[[8]]
[,1] [,2]
[1,] 2 4
[2,] 3 4
[3,] 3 5
[4,] 2 5
[5,] 2 4
The logic behind the line polys[[v*k]] <- ... is incorrect, for example, v=2, k=1 will overwrite v=1, k=2. There are no combinations of v and k that make 5 or 7, hence these entries are empty.
I expect that you meant to write something like:
polys[[v+(k-1)*(length(ys))]] <- ...
or
polys[[k+(v-1)*(length(xs))]] <- ...
depending on the order that you want your results in
I am confused about the output from the replicate function in R, I am trying to use it in two different ways, that (in my mind) should give a matrix as output!
so, if I use
replicate(5, seq(1,5,1))
I get a matrix 5x5
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 2 2 2 2 2
[3,] 3 3 3 3 3
[4,] 4 4 4 4 4
[5,] 5 5 5 5 5
..and that's ok, I get that...
but, if I instead use:
replicate(5, for(i in 1:5){print(i)})
I get the following:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
can anyone explain me why does this happen?
thanks :)
A for loop returns NULL. So in the second case, the replicate function is executing for(i in 1:5){print(i)} five times, which is why you see all those numbers printed out.
Then it is putting the return values in a list, so the return value of the replicate call is a list of five NULLs, which gets printed out. Executing
x<-replicate(5, for(i in 1:5){print(i)})
x
should clarify.
As #mrip says a for-loop returns NULL so you need to assign to an object within the loop, and return that object to replicate so it can be output. However, mrip's code still results in NULLs from each iteration of the replicate evaluation.
You also need to assign the output of replicate to a name, so it doesn't just evaporate, er, get garbage collected. That means you need to add the d as a separate statement so that the evaluation of the whole expression inside the curley-braces will return 'something' rather than NULL.
d <- numeric(5); res <- replicate(5, {
for(i in 1:5){d[i] <- print(i)} ; d}
)
[1] 1
[1] 2
snipped
[1] 4
[1] 5
> res
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 2 2 2 2 2
[3,] 3 3 3 3 3
[4,] 4 4 4 4 4
[5,] 5 5 5 5 5
The for loop is giving a list back, while the seq() call is giving a vector back. This should give you the same as the seq() using a for loop
foo <- function(){
b = list()
for(i in 1:5) b[i] <- i
do.call(c, b)
}
replicate(5, foo())
I am looking for a more versatile way to get from a data.frame to a multidimensional array.
I would like to be able to create as many dimensions as needed from as many variables in the data frame as desired.
Currently, the method has to be tailored to each data.frame, requires subletting to form a vector.
I would love something along the melt/cast methods in plyr.
data<-data.frame(coord.name=rep(1:10, 2),
x=rnorm(20),
y=rnorm(20),
ID=rep(c("A","B"), each=10))
data.array<-array(dim=c(10, 2, length(unique(data$ID))))
for(i in 1:length(unique(data$ID))){
data.array[,1,i]<-data[data$ID==unique(data$ID)[i],"x"]
data.array[,2,i]<-data[data$ID==unique(data$ID)[i],"y"]
}
data.array
, , 1
[,1] [,2]
[1,] 1 1
[2,] 3 3
[3,] 5 5
[4,] 7 7
[5,] 9 9
[6,] 1 1
[7,] 3 3
[8,] 5 5
[9,] 7 7
[10,] 9 9
, , 2
[,1] [,2]
[1,] 2 2
[2,] 4 4
[3,] 6 6
[4,] 8 8
[5,] 10 10
[6,] 2 2
[7,] 4 4
[8,] 6 6
[9,] 8 8
[10,] 10 10
You may have had trouble applying the reshape2 functions for a somewhat subtle reason. The difficulty was that your data.frame has no column that can be used to direct how you want to arrange the elements along the first dimension of an output array.
Below, I explicitly add such a column, calling it "row". With it in place, you can use the expressive acast() or dcast() functions to reshape the data in any way you choose.
library(reshape2)
# Use this or some other method to add a column of row indices.
data$row <- with(data, ave(ID==ID, ID, FUN = cumsum))
m <- melt(data, id.vars = c("row", "ID"))
a <- acast(m, row ~ variable ~ ID)
a[1:3, , ]
# , , A
#
# x y
# 1 1 1
# 2 3 3
# 3 5 5
#
# , , B
#
# x y
# 1 2 2
# 2 4 4
# 3 6 6
I think this is right:
array(unlist(lapply(split(data, data$ID), function(x) as.matrix(x[ , c("x", "y")]))), c(10, 2, 2))