Extract and organize data from subsetted lists on R - r

I spent the last days trying to solve it by myself using several different sources of information, including other questions here on Stackoverflow, but failed. I'm a complete beginner, so that’s probably why I’m struggling so much with this.
I created these dummy data below to illustrate how my original data looks like.
list1<-list(path = ".../folder1/folder2/Country_State_Species_Individual1.png",
matrix1 = cbind(1:3, 1:9),
matrix2 = cbind(1:3, 1:9),
matrix3 = cbind(1:3, 1:9))
list2<-list(path = ".../folder1/folder2/Country_State_Species_Individual2.png",
matrix1 = cbind(1:3, 1:9),
matrix2 = cbind(1:3, 1:9),
matrix3 = cbind(1:3, 1:9))
list3<-list(path = ".../folder1/folder2/Country_State_Species_Individual3.png",
matrix1 = cbind(1:3, 1:9),
matrix2 = cbind(1:3, 1:9),
matrix3 = cbind(1:3, 1:9))
general_list <- list(list1, list2, list3)
As you can see, it is a big list (general_list) composed by small lists (list1, list2, list3) that are identical in structure.
My initial goal can be described in two steps:
1 – Sample 6 random rows from each matrix2 and save each of these outputs in a new object.
2 – Rename these objects using the information contained in the original file name stored in the path
I want to rename the extracted matrices this way because I need to be able to sort the matrices by the variables expressed in the file names (Country, State and especially Individuals). But maybe might be a more efficient/practical way to do this.
The most advisable way to store these new objects would be on a new list?
I would also be happy to receive any suggestions on how to achieve my initial goal and how to proceed in order to optimize the storage of these new objects (having in mind that they will be used in some analysis after everything is done).
Best regards!

We loop over the 'general_list', extract the matrix2, then sample 6 rows from the dataset, create a new list ('out') and rename the list with the basename of the 'path' element
out <- lapply(general_list, function(x) {
x1 <- x$matrix2
x1[sample(nrow(x1), 6, replace = FALSE),] })
names(out) <- sapply(general_list,
function(x) tools::file_path_sans_ext(basename(x$path)))
out
#$Country_State_Species_Individual1
# [,1] [,2]
#[1,] 3 9
#[2,] 2 2
#[3,] 1 7
#[4,] 1 4
#[5,] 3 6
#[6,] 2 8
#$Country_State_Species_Individual2
# [,1] [,2]
#[1,] 3 3
#[2,] 1 7
#[3,] 3 9
#[4,] 2 2
#[5,] 3 6
#[6,] 1 1
#$Country_State_Species_Individual3
# [,1] [,2]
#[1,] 3 3
#[2,] 2 2
#[3,] 1 4
#[4,] 2 5
#[5,] 1 7
#[6,] 3 6
Or using tidyverse
library(dplyr)
library(purrr)
out <- map(general_list, ~ .x %>%
pluck('matrix2') %>%
as.data.frame %>%
sample_n(6) %>%
as.matrix)
names(out) <- map_chr(general_list, ~
tools::file_path_sans_ext(basename(.x$path)))

Related

how to select neighbouring elements in a vector and put them into a list or matrix in R

I have a problem about how to select neighboring elements in a vector and put them into a list or matrix in R.
For example:
vl <- c(1,2,3,4,5)
I want to get the results like this:
1,2
2,3
3,4
4,5
The results can be in a list or matrix
I know we can use a loop to get results.Like this:
pl <- list()
k=0
for (p in 1: length(vl)) {
k=k+1
pl[[k]] <- sort(c(vl[p],vl[p+1]))}
But I have a big data. Using loop is relatively slow.
Is there any function to get results directly?
Many thanks!
We can use head and tail to ignore the last and first element respectively.
data.frame(a = head(vl, -1), b = tail(vl, -1))
# a b
#1 1 2
#2 2 3
#3 3 4
#4 4 5
EDIT
If the data needs to be sorted we can use apply row-wise to sort it.
vl <- c(2,5,3,1,6,4)
t(apply(data.frame(a = head(vl, -1), b = tail(vl, -1)), 1, sort))
# [,1] [,2]
#[1,] 2 5
#[2,] 3 5
#[3,] 1 3
#[4,] 1 6
#[5,] 4 6
You can do:
matrix(c(vl[-length(vl)], vl[-1]), ncol = 2)
[,1] [,2]
[1,] 1 2
[2,] 2 3
[3,] 3 4
[4,] 4 5
If you want to sort two columns rowwise, then you can use pmin() and pmax() which will be faster than using apply(x, 1, sort) with a large number of rows.
sapply(c(pmin, pmax), do.call, data.frame(vl[-length(vl)], vl[-1]))
The problem can also be solved by applying the sort() function on a rolling window of length 2:
vl <- c(2,5,3,1,6,4)
zoo::rollapply(vl, 2L, sort)
which returns a matrix as requested:
[,1] [,2]
[1,] 2 5
[2,] 3 5
[3,] 1 3
[4,] 1 6
[5,] 4 6
Note that the modified input vector vl is used which has been posted by the OP in comments here and here.
Besides zoo, there are also other packages which offer rollapply functions, e.g.,
t(rowr::rollApply(vl, sort, 2L, 2L))

Return a dataframe of averages from a list of dataframes

I have a list of 22 dataframes each is 49 columns and 497 rows.
I need to produce an average/mean dataframe from these 22.
Already tried these, myfiles2 is the list of dataframes
ans1 = aaply(laply(myfiles2, as.matrix), c(2, 3), mean)
ans2 <- do.call("mean", myfiles2)
ans3 <- lapply(myfiles2, function (x) lapply(x, mean, na.rm=TRUE))
ans4 <- Reduce("+", myfiles2)/length(myflies2)
ans5 <- lapply(myfiles2, mean)
The list of dataframes was created using
myfiles2 = lapply(filesToProcess, read.csv, skip=2, colClasses=colClasses)
Taking the first value in each dataframe manually and calculating the mean with mean() works.
Trying to use mean or calculating it as shown above across the list of dataframes gives an incorrect result.
The result I'm looking for is a [49X497] dataframe with each location containing the mean calculated from the same location in the 22 dataframes.
All values are 10 significant figures with 4 decimal places.
You may use simplify2array() in base R.
Example
list1
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 1 9 8 3
# [2,] 5 2 6 11
# [3,] 12 4 10 7
#
# [[2]]
# [,1] [,2] [,3] [,4]
# [1,] 4 12 3 6
# [2,] 9 2 1 7
# [3,] 5 8 10 11
#
# [[3]]
# [,1] [,2] [,3] [,4]
# [1,] 5 8 1 12
# [2,] 4 3 7 6
# [3,] 2 10 11 9
t(apply(simplify2array(list1), 1:2, mean))
# [,1] [,2] [,3]
# [1,] 3.333333 6.000000 6.333333
# [2,] 9.666667 2.333333 7.333333
# [3,] 4.000000 4.666667 10.333333
# [4,] 7.000000 8.000000 9.000000
Data
set.seed(42)
list1 <- replicate(3, matrix(sample(1:12), 3, 4), simplify=FALSE)
Use the abind package to create a 3D array from your list of data.frames;
library(abind)
myfiles2 <- abind(myfiles2, along = 3)
or in Base R:
myfiles2 <- simplify2array(myfiles2)
Then, use apply() to take the mean for each cell across all 22 data.frames:
apply(myfiles2, 1:2, mean)
After the hint from #tom above the final solution arrived at was to change the list of data frames to a single data frame with all data and use the tidyverse to process it.
There were a few little tidy ups needed.
An errant character column from the origin of the data
A column with data in both upper and lower case
Avoiding the character columns in the mean calculation
Then putting the character columns and the mean data frame back together to get it back in the correct order.
So...
Change the format to a single data frame and fix the non-numeric column
myfiles3 <- myfiles2 %>%
bind_rows() %>%
transform(EdgeStepL2 = as.numeric(EdgeStepL2))
ensure the section names are in uppercase to be consistent
myfiles3$Section <- str_to_upper(myfiles3$Section)
calculate the mean of each cell grouped by common values.
myfiles4 <- myfiles3 %>% group_by(Section,Chainage) %>%
summarise_at(vars("East":"Surf.Det"),funs(mean(., na.rm = TRUE)))
myfiles5 <- data.frame(myfiles2[[1]][1:2])
myfiles6 <- left_join(myfiles5, myfiles4)
This is not the simple solution I had hoped for but for the next person to try this.
Look for the NA's (everywhere in the data).
Make sure that all the columns you are running the mean (or other function) on are those you can calculate with.

How to use cbind on matrices in one list and place them in another (r)

I'm trying to join matrices stored in nested lists and place them in a new list. For example, if I have a list of fruit, I would like to take various matrices stored under Kiwi, and join them together as one matrix in a new list.
This will generate something that looks like my data:
#Define some important things
Fruits = c('Mango', 'Kiwi')
Attr = c('Size', 'Shape')
#generate empty lists
MyFruitList <- lapply(Fruits, function(q) {
EmptySublist <- (setNames(vector("list", length(Fruits)), Attr))
})
names(MyFruitList) <- Fruits
#Full lists with example matrices
MyFruitList[['Mango']][['Size']] <- matrix(c(3,5,7,2), nrow=2, ncol=2)
MyFruitList[['Mango']][['Shape']] <- matrix(c(3,6,7,5), nrow=2, ncol=2)
MyFruitList[['Kiwi']][['Size']] <- matrix(c(1,3,4,2), nrow=2, ncol=2)
MyFruitList[['Kiwi']][['Shape']] <- matrix(c(2,4,5,1), nrow=2, ncol=2)
And here is what I have been trying to use to move the matrices stored under Kiwi and Mango into a new list.
#Obviously this doesn't actually work
MyFruitListAsRows <- lapply(Fruits, function(i) {
MyFruitListAsRows <- matrix(cbind(paste0(MyFruitList[i])))
})
names(MyFruitListAsRows) <- paste0(Fruits, "Row")
Ideally I should end up with a list called MyFruitsAsRows which contains 2, 4 by 2 matrices named Kiwi and Mango, containing their respective Size and Shape data from the original MyFruitList list.
e.g. For Mango it would look like this:
[,1] [,2] [,3] [,4]
[1,] 3 7 3 7
[2,] 5 2 6 5
(Sorry that the numbers are overly similar, that was not well planned and might make it hard at first to recognise where I'd like my numbers to go)
Having been constructed from this:
$Size
[,1] [,2]
[1,] 3 7
[2,] 5 2
$Shape
[,1] [,2]
[1,] 3 7
[2,] 6 5
Edit: I have tried to adapt the advice of Ronak Shah and done the following:
library(tidyverse)
MyFruitListAsRows <- map(MyFruitList[i], bind_cols)
but running either,
MyFruitListAsRows[['KiwiRow']]
MyFruitListAsRows[['MangoRow']]
Produces:
I get Error in x[i, , drop = FALSE] : subscript out of bounds
If I try to get RStudio to show me what's in either of my new lists in a window, RStudio encounters a fatal error and crashes.
We can use base R to loop over each MyFruitList and cbind them with do.call
lapply(MyFruitList, function(x) do.call(cbind, x))
#$Mango
# [,1] [,2] [,3] [,4]
#[1,] 3 7 3 7
#[2,] 5 2 6 5
#$Kiwi
# [,1] [,2] [,3] [,4]
#[1,] 1 4 2 5
#[2,] 3 2 4 1
You can also use cbind.data.frame here.
Using tidyverse we can map over each list and then cbind
library(tidyverse)
map(MyFruitList, cbind.data.frame)
#$Mango
# Size.1 Size.2 Shape.1 Shape.2
#1 3 7 3 7
#2 5 2 6 5
#$Kiwi
# Size.1 Size.2 Shape.1 Shape.2
#1 1 4 2 5
#2 3 2 4 1

Reshape each row of a data.frame to be a matrix in R

I am working with the hand-written zip codes dataset. I have loaded the dataset like this:
digits <- read.table("./zip.train",
quote = "",
comment.char = "",
stringsAsFactors = F)
Then I get only the ones:
ones <- digits[digits$V1 == 1, -1]
Right now, in ones I have 442 rows, with 256 column. I need to transform each row in ones to a 16x16 matrix. I think what I am looking for is a list of 16x16 matrix like the ones in this question:
How to create a list of matrix in R
But I tried with my data and did not work.
At first I tried ones <- apply(ones, 1, matrix, nrow = 16, ncol = 16) but is not working as I thought it was. I also tried lapply with no luck.
An alternative is to just change the dims of your matrix.
Consider the following matrix "M":
M <- matrix(1:12, ncol = 4)
M
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
We are looking to create a three dimensional array from this, so you can specify the dimensions as "row", "column", "third-dimension". However, since the matrix is constructed by column, you first need to transpose it before changing the dimensions.
`dim<-`(t(M), c(2, 2, nrow(M)))
# , , 1
#
# [,1] [,2]
# [1,] 1 7
# [2,] 4 10
#
# , , 2
#
# [,1] [,2]
# [1,] 2 8
# [2,] 5 11
#
# , , 3
#
# [,1] [,2]
# [1,] 3 9
# [2,] 6 12
though there are probably simple ways, you can try with lapply:
ones_matrix <- lapply(1:nrow(ones), function(i){matrix(ones[i, ], nrow=16)})

Find combinations of objects in R

I have the following example code:
library(caTools)
sample1 = rnorm(20)
sample2 = rnorm(30)
sample3 = rnorm(40)
# could be more samples
args = list(sample1, sample2, sample3) # could be more
> combs(c(args), k=2)
[,1] [,2]
[1,] Numeric,20 Numeric,30
[2,] Numeric,20 Numeric,40
[3,] Numeric,30 Numeric,40
However, this is not what is desired. I would like to feed combs input that should give the same as:
> combs(c("sample1","sample2", "sample3"),k=2)
[,1] [,2]
[1,] "sample1" "sample2"
[2,] "sample1" "sample3"
[3,] "sample2" "sample3"
and from there I would want to use get to extract the vectors for each sampleX object by row.
How can I do this without hardcoding "sample1", "sample2", etc. so that I can have as many as samples as are fed to it?
From library(gtools):
combinations(3,2,c("sample1","sample2", "sample3"))
Result:
[,1] [,2]
[1,] "sample1" "sample2"
[2,] "sample1" "sample3"
[3,] "sample2" "sample3"
The same result can be obtained if those objects are named elements of a list:
tmp <- list(sample1=1:3,sample2=4:6,sample3=7:9)
combinations(3,2,names(tmp))
Or, if those objects are all in an environment:
tmp <- new.env()
tmp$sample1 <- 1:3
tmp$sample2 <- 4:6
tmp$sample3 <- 7:9
combinations(3,2,objects(tmp))
How about this? I use simplified data as an illustrative example.
Edit
Thanks to #GSee for recommending two improvements in this approach [see comment].
This is not something I'd be keen to do, but we use ls and the pattern argument on the names of all objects in your global environment to return the names of those that fit the pattern i.e. all objects which include "sample" in the object names - so be careful - and then stick them in a list using mget.
We then get the combinations of list elements using combn and use an anonymous function to combine all elements of list pairs using expand.grid. If you want this as a two column data.frame you can use do.call and rbind the returned list together:
sample1 <- 1:2
sample2 <- 3:4
sample3 <- 5:6
args <-mget( ls( pattern = "^sample\\d+") , env = .GlobalEnv )
res <- combn( length(args) , 2 , FUN = function(x) expand.grid(args[[x[1]]] , args[[x[2]]]) , simplify = FALSE )
do.call( rbind , res )
Var1 Var2
1 1 3
2 2 3
3 1 4
4 2 4
5 1 5
6 2 5
7 1 6
8 2 6
9 3 5
10 4 5
11 3 6
12 4 6
Here is an approach
# put samples in separate structure, for instance a list
samples <- list(s1=rnorm(20), s2=rnorm(30), s3=rnorm(40))
cmb <- t(combn(names(samples),m=2))
apply(cmb,1,FUN=function(x) list(samples[[x[[1]]]], samples[[x[[2]]]]))

Resources