select a specific columns in R nested list - r

suppose i have a list of data frames, just like this:
M1 <- data.frame(matrix(1:4, nrow = 2, ncol = 2))
M2 <- data.frame(matrix(1:9, nrow = 3, ncol = 3))
M3 <- data.frame(matrix(1:4, nrow = 2, ncol = 2))
mlist <- list(M1, M2, M3)
and now i want to select X1 columns from all of dataframes, I tried :
M.X1 <- mlist$X1
but failed with NULL:
> mlist$X1
NULL
I don't want to use for to extract each data frames' X1, is there some better way to do this ? And what if extract columns X3 ? (which means some columns may not exists in other row)

Normally you can use lapply as below:
lapply(mlist, function(x) x$X2)
The 2nd parameter you define a function right inside to pass to each member of mlist.

Related

how to create a matrix from sub-elements of a list?( in R)

to put it simply, I have a list of DFMs created by quanteda package(LD1). each DFM has different texts of different lengths.
now, I want to calculate and compare lexical diversity for each text within DFMs and among DFMs.
lex.div <-lapply(LD1, function(x) {textstat_lexdiv(x,measure = "all")})
this leaves me with a list of S3 type data, and within each of which, there are different attributes that are lexical diversity measures.
lex.div[[1]]$TTR
[1] 0.2940000 0.2285000 0.2110000 0.1912500 0.1802000 0.1671667 0.1531429 0.1483750 0.1392222
[10] 0.1269000
lex.div[[2]]$TTR
[1] 0.3840000 0.2895000 0.2273333 0.2047500 0.1922000 0.1808333 0.1677143 0.1616250 0.1530000
[10] 0.1439000 0.1352727 0.1279167 0.1197692 0.1125000 0.1069333
here comes the problem. I need all the TTR values in one matrix. i want lex.div[[1]]$TTR to be the first row of the matrix, lex.div[[2]]$TTR to be the second, and so on. note that the length of lex.div[[1]]$TTR ≠ lex.div[[2]]$TTR.
here is what I've done so far:
m1 <-matrix(lex.div[[1]]$TTR, nrow = 1, ncol = length(lex.div[[1]]$TTR))
m.sup <- if(ncol(m1) < 30) {mat.to.add = matrix(NA, nrow = nrow(m1), ncol = 30 - ncol(m1))}
m1 <-cbind(m1, m.sup)
m2 <-matrix(lex.div[[2]]$TTR, nrow = 1, ncol = length(lex.div[[2]]$TTR))
m.sup <- if(ncol(m2) < 30) {mat.to.add = matrix(NA, nrow = nrow(m2), ncol = 30 - ncol(m2))}
m2 <-cbind(m2, m.sup)
m3 <-matrix(lex.div[[3]]$TTR, nrow = 1, ncol = length(lex.div[[3]]$TTR))
m.sup <- if(ncol(m3) < 30) {mat.to.add = matrix(NA, nrow = nrow(m3), ncol = 30 - ncol(m3))}
m3 <-cbind(m3, m.sup)
...
m.total <-rbind (m1,m2,m3...)
but I cannot do it this way. can you help me write a for loop or sth to get it done easier and quicker?
You can try the code below
TTRs <- lapply(lex.div, `[[`, "TTR")
m <- t(sapply(TTRs, `length<-`, max(lengths(TTRs))))

Create subset matrix according to criteria/ Extract key rows according to criteria

I want to subset the rows of my original matrix into two separate matrices.
I setup the problem as follows:
set.seed(2)
Mat1 <- data.frame(matrix(nrow = 4, ncol =10, data = rnorm(40,0,1)))
keep.rows = matrix(nrow =2, ncol =4)
keep.rows[,1] = c(1,2)
keep.rows[,2] = c(2,3)
keep.rows[,3] = c(2,3)
keep.rows[,4] = c(1,2)
Mat1
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0.9959846 -2.2079198 -0.3869496 -1.183606 1.959357077 1.0744594 -0.8621983 -0.4213736 0.4718595 1.2309537
2 -1.6957649 1.8221225 0.3866950 -1.358457 0.007645872 0.2605978 2.0480403 -0.3508344 1.3589398 1.1471368
3 -0.5333721 -0.6533934 1.6003909 -1.512671 -0.842615198 -0.3142720 0.9399201 -1.0273806 0.5641686 0.1065980
4 -1.3722695 -0.2846812 1.6811550 -1.253105 -0.601160105 -0.7496301 2.0086871 -0.2505191 0.4559801 -0.7833167
Mat 1 is my original matrix. Now from the Keep rows matrix, I want to create two output matrices. The first output matrix (Output1) should store all the row numbers specified in keep.row. The second output(Output2) matrix should store all remaining rows. In my actual application my matrices are very large and so cannot be sorted manually as i do here.
I need:
1) I need a function that does this simply over large matrices.
2) Ideally one where i can change the number of entries to "keep" each time. So in this case I store 3 entries. However, imagine if my keep.rows matrix was 2x2. In this case, I might want to store five entries each time.
Results should be of the form:
Output1 <- data.frame(matrix(nrow = 2, ncol =10))
Output1[1:2,1:3] <- Mat1[c(1,2), 1:3]
Output1[1:2,4:6] <- Mat1[c(2,3), 4:6]
Output1[1:2,7:9] <- Mat1[c(2,3), 7:9]
Output1[1:2,10] <- Mat1[c(1,2), 10]
Output2 <- data.frame(matrix(nrow = 2, ncol =10))
Output2[1:2,1:3] <- Mat1[c(3,4), 1:3]
Output2[1:2,4:6] <- Mat1[c(1,4), 4:6]
Output2[1:2,7:9] <- Mat1[c(1,4), 7:9]
Output2[1:2,10] <- Mat1[c(3,4), 10]
IMPORTANT: In the answer i need output 2 to be specified in a way that keeps all remaining rows. In my application my keep.row matrix is the same size. But Mat1 contains 1000 rows +
You can use sapply which iterates over the columns of Mat1 with seq_along(Mat1) and subset Mat1 using keep.rows. With cbind you get a matrix-like data.frame from the returned list of sapply. To get the remaining data you simply place a - before keep.rows.
Output1 <- do.call(cbind, sapply(seq_along(Mat1), function(i) Mat1[keep.rows[,(i+2) %/% 3], i, drop = FALSE], simplify = FALSE))
Output2 <- do.call(cbind, sapply(seq_along(Mat1), function(i) Mat1[-keep.rows[,(i+2) %/% 3], i, drop = FALSE], simplify = FALSE))

Remove NA value within a list of dataframes

I'm sure there is a very easy answer to this but I can't find one. In a separate post, How do I remove empty data frames from a list? I have looked at removing an empty data frame from a list of data frames.
But how can you do this when one of the items in the list isn't classified as a data frame and is just a NA value? Modifying the parameters of the question above slightly, you have:
M1 <- data.frame(matrix(1:4, nrow = 2, ncol = 2))
M2 <- NA
M3 <- data.frame(matrix(9:12, nrow = 2, ncol = 2))
mlist <- list(M1, M2, M3)
I would like to remove M2 in this instance, but I have several examples of these empty data frames so I would like a function that removes them all simultaenously.
I have tried a couple of solutions to the question above which do not work:
mlist[sapply(mlist, function(x) dim(x)[1]) > 0]##Error message -
##Error: (list) object cannot be coerced to type 'double'
Filter(function(x) dim(x), mlist) ###Incorrect outputs
Thank you in advance for any help!
One option is to use Filter to check wheter the list elements are data.frames
Filter(is.data.frame, mlist)
#[[1]]
# X1 X2
#1 1 3
#2 2 4
#[[2]]
# X1 X2
#1 9 11
#2 10 12
Here's a slightly different way to get your result
library(tidyverse)
M1 <- data.frame(matrix(1:4, nrow = 2, ncol = 2))
M2 <- NA
M3 <- data.frame(matrix(9:12, nrow = 2, ncol = 2))
M4 <- NA
mlist <- list(M1, M2, M3,M4)
indexes <- tibble()
for (i in 1:length(mlist)) {
if (is.na(mlist[[i]]) == TRUE) {
new_index <- tibble(index = i)
indexes <- bind_rows(new_index,indexes)
}
}
indexnums <- indexes %>% pull(index)
mlist <- mlist[-indexnums]
With this, you check if each list element is NA or not, then add the index number to a table if it is, then you pull those index numbers out and subset the list. If you have a lot of these in your data set this should remove them all.
Hope to help you.
# Method 1
mlist[!is.na(mlist)]
# Method 2
replace(mlist, is.na(mlist), NULL)

replicate by columns, but transposing in r with vectorization

I have a matrix like this:
m1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, byrow = TRUE)
and I would like to have every column repeated "m" times, but transposing into files and concat the results horizontally. I mean, suppose "m" is 3, I would like to have something like this:
matrix(c(1,4,7,2,5,8,3,6,9,1,4,7,2,5,8,3,6,9,1,4,7,2,5,8,3,6,9),
nrow = 3, byrow = TRUE)
Is there any vectorized way to do this?
I have tried using rep to replicate the columns and then transposing, but I end with many rows
We can use rep
matrix(rep(m1, each=nrow(m1)), nrow=3)
Or
`dim<-`(rep(m1, each=nrow(m1)), dim(m1)*c(1,3))
Or
t(replicate(nrow(m1), c(m1)))
data
m1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, byrow = TRUE)

Combining many matrices with different names in R

I have 141 matrices with the same dimensions, but with different names like:
mat_1, mat_55, mat_154, ...
I have their names in another matrix:
"mat_1" , "mat_55" , ...
And now I'm trying to combine all of them in a single matrix. Should I write the name of all of them manually in rbind(), or there is another way?
rbind(mat_1,mat_55,....)
mat_1 = matrix(1:10, ncol = 2)
mat_2 = matrix(11:20, ncol = 2)
mat_3 = matrix(21:30, ncol = 2)
names = c('mat_1','mat_2','mat_3')
x = lapply(lapply(names, as.symbol), eval)
do.call("rbind", x)
You can use
do.call(rbind, mget(mat_names))
where mat_names is the name of you vector including matrix names.

Resources