R code runs too slow,how to rewrite this code

R code runs too slow,how to rewrite this code - r

The input.txt contains 8000000 rows and 4 columns. The first 2 columns is text.The last 2 columns is number. The number of unique symbols (e.g., "c33") in columns 1 and 2 is not fixed. The value of columns 3 and 4 is the number of unique symbols of columns 1 and 2 after splitting by "]" respectively.
Each row of input.txt file is like this:
c33]c21]c5]c7]c8]c9 TPS2]MIC17]ERG3]NNF1]CIS3]CWP2 6 6
**The desired result:
row[ , ] represents characters like "c33 c21 c5 c7 c8 c9" or "TPS2 MIC17 ERG3 NNF1 CIS3 CWP2", | .| represents the number of characters, |c33 c21 c5 c7 c8 c9|=6
If two rows are overlapped (>=0.6), it outputs the NO. of these two rows to a file.**
This code is as follows, but it runs too slow.
The code:
library(compiler)
enableJIT(3)
data<-read.table("input.txt",header=FALSE)
row<-8000000
for (i in 1:(row-1)){
row11<-unlist(strsplit(as.character(data[i,1]),"]"))
row12<-unlist(strsplit(as.character(data[i,2]),"]"))
s1<-data[i,3]*data[i,4]
zz<-file(paste("output",i,".txt",sep=""),"w")
for (j in (i+1):row)
{ row21<-unlist(strsplit(as.character(data[j,1]),"]"))
row22<-unlist(strsplit(as.character(data[j,2]),"]"))
up<-length(intersect(row11,row21))*length(intersect(row12,row22))
s2<-data[j,3]*data[j,4]
down<-min(s1,s2)
if ((up/down)>=0.6) cat(i,"\t",j,"\n",file=zz,append=TRUE)
}
close(zz)
}
The running result:
each row can produce a file, it is like this:
1 23
1 67
1 562
1 78
...
In order to run fast, I rewrite the code.The code is as follows
The input.txt contains 16000000 rows. The number of columns is not fixed. The number of unique symbols (e.g., "c33") in columns 1 and 2 is not fixed. Each two rows of input.txt file is like this:
The 1st row (odd row1): c33 c21 c5 c7 c8
The 2nd row (even row1): TPS2 MIC17 ERG3 NNF1 CIS3 CWP2 MCM6
The 3rd row (odd row2): c33 c21 c5 c21 c18 c4 c58
The 4th row (even row2): TPS12 MIC3 ERG2 NNF1 CIS4
**The desired result:
If two rows are overlapped (>=0.6) with other two rows, it outputs the NO. of these two rows to a file.**
The code:
library(compiler)
enableJIT(3)
con <- file("input.txt", "r")
zz<-file("output.txt","w")
oddrow1<-readLines(con,n=1)
j<-0
i<-0
while( length(oddrow1) != 0 ){
oddrow1<-strsplit(oddrow1," ")
evenrow1<-readLines(con,n=1)
evenrow1<-strsplit(evenrow1," ")
j<-j+1
con2 <- file("input.txt", "r")
readLines(con2,n=(j*2))
oddrow2<-readLines(con2,n=1)
i<-j
while( length(oddrow2) != 0 ){
i<-i+1
oddrow2<-strsplit(oddrow2," ")
evenrow2<-readLines(con2,n=1)
evenrow2<-strsplit(evenrow2," ")
oddrow1<-unlist(oddrow1)
oddrow2<-unlist(oddrow2)
evenrow1<-unlist(evenrow1)
evenrow2<-unlist(evenrow2)
up<-length(intersect(oddrow1,oddrow2))*length(intersect(evenrow1,evenrow2))
down<-min(length(oddrow1)*length(evenrow1),length(oddrow2)*length(evenrow2))
if ((up/down)>=0.6) {cat(j,"\t",i,"\n",file=zz,append=TRUE) }
oddrow2<-readLines(con2,n=1)
}
close(con2)
oddrow1<-readLines(con,n=1)
}
close(con)
close(zz)
The running result:
it can produce a file, it is like this:
1 23
1 67
1 562
1 78
2 25
2 89
3 56
3 79
...
Both the above two methods are too slow, In order to run fast,how to rewrite this code. Thank you!

Well, I suspect uses too much memory for your size of data, but perhaps it will provoke some ideas.
Make up some data, with 20 total unique values and 5 to 10 in each cell.
set.seed(5)
n <- 1000L
ng <- 20
g1 <- paste(sample(10000:99999, ng))
g2 <- paste(sample(10000:99999, ng))
n1 <- sample(5:10, n, replace=TRUE)
n2 <- sample(5:10, n, replace=TRUE)
x1 <- sapply(n1, function(i) paste(g1[sample(ng, i)], collapse="|"))
x2 <- sapply(n2, function(i) paste(g2[sample(ng, i)], collapse="|"))
Load Matrix library and a helper function that takes a list of string vectors and converts them to a matrix with number of columns equal to the number of unique strings and 1's where it was present.
library(Matrix)
str2mat <- function(s) {
n <- length(s)
ni <- sapply(s, length)
s <- unlist(s)
u <- unique(s)
spMatrix(nrow=n, ncol=length(u), i=rep(1L:n, ni), j=match(s, u), x=rep(1, length(s)))
}
OK, now we can actually do something. First create the matrices and get the total number present in each row.
m1 <- str2mat(strsplit(x1, "|", fixed=TRUE))
m2 <- str2mat(strsplit(x2, "|", fixed=TRUE))
n1 <- rowSums(m1)
n2 <- rowSums(m2)
Now we can use crossproducts of these matrices to get the numerator, and outer to get the minimum to get the numerator. We then can compute the overlap and test if > 0.6. Since we have the whole matrix, we're not interested in the diagonal or the lower half. (There's ways of storing this kind of matrix more efficiently with Matrix library, but I'm not sure how.) We then get the rows that have enough overlap with which.
num <- tcrossprod(m1)*tcrossprod(m2)
n12 <- n1*n2
den <- outer(n12, n12, pmin)
use <- num/den > 0.6
diag(use) <- FALSE
use[lower.tri(use)] <- FALSE
out <- which(use, arr.ind=TRUE)
> head(out)
[,1] [,2]
[1,] 64 65
[2,] 27 69
[3,] 34 81
[4,] 26 82
[5,] 5 85
[6,] 21 115

Related

how to create a row that is calculated from another row automatically like how we do it in excel?

does anyone know how to have a row in R that is calculated from another row automatically? i.e.
lets say in excel, i want to make a row C, which is made up of (B2/B1)
e.g. C1 = B2/B1
C2 = B3/B2
...
Cn = Cn+1/Cn
but in excel, we only need to do one calculation then drag it down. how do we do it in R?

In R you work with columns as vectors so the operations are vectorized. The calculations as described could be implemented by the following commands, given a data.frame df (i.e. a table) and the respective column names as mentioned:
df["C1"] <- df["B2"]/df["B1"]
df["C2"] <- df["B3"]/df["B2"]
In R you usually would name the columns according to the content they hold. With that, you refer to the columns by their name, although you can also address the first column as df[, 1], the first row as df[1, ] and so on.
EDIT 1:
There are multiple ways - and certainly some more elegant ways to get it done - but for understanding I kept it in simple base R:
Example dataset for demonstration:
df <- data.frame("B1" = c(1, 2, 3),
"B2" = c(2, 4, 6),
"B3" = c(4, 8, 12))
Column calculation:
for (i in 1:ncol(df)-1) {
col_name <- paste0("C", i)
df[col_name] <- df[, i+1]/df[, i]
}
Output:
B1 B2 B3 C1 C2
1 1 2 4 2 2
2 2 4 8 2 2
3 3 6 12 2 2
So you iterate through the available columns B1/B2/B3. Dynamically create a column name in every iteration, based on the number of the current iteration, and then calculate the respective column contents.
EDIT 2:
Rowwise, as you actually meant it apparently, works similarly:
a <- c(10,15,20, 1)
df <- data.frame(a)
for (i in 1:nrow(df)) {
df$b[i] <- df$a[i+1]/df$a[i]
}
Output:
a b
1 10 1.500000
2 15 1.333333
3 20 0.050000
4 1 NA

You can do this just using vectors, without a for loop.
a <- c(10,15,20, 1)
df <- data.frame(a)
df$b <- c(df$a[-1], 0) / df$a
print(df)
a b
1 10 1.500000
2 15 1.333333
3 20 0.050000
4 1 0.000000
Explanation:
In the example data, df$a is the vector 10 15 20 1.
df$a[-1] is the same vector with its first element removed, 15 20 1.
And using c() to add a new element to the end so that the vector has the same lenght as before:
c(df$a[-1],0) which is 15 20 1 0
What we want for column b is this vector divided by the original df$a.
So:
df$b <- c(df$a[-1], 0) / df$a

How to discover patterns in sequences of numbers

I have a list in R in which every single row is a half verse of bible text. The columns are: B(ook), C(chapter), V(erse), H(alf verse), and a1–a31. These columns with a plus an integer are codes that represent Hebrew cantillation marks.
What I need is a way to find patterns in the sequences of the numbers that tell me which combinations of integers occurs and how many times.
E.g.: how many times is 74 followed by 63; how many times is 63 preceded by 05.
Ideally it would also tell me combination of more than two. E.g.: how many times is 74 preceded by 05 which is preceded by 35.
Finally I'd need to chart this in some way.
Below are the header and the first 3 rows of the list.
B,C,V,H,a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31
Genesis,1,1,A,73,74,92
Genesis,1,1,B,71,73,71,00
Genesis,1,2,A,81,71,3303,80,73,74,92

I have a rather complicated solution, not sure it is the best one.
I makes use of data.table for reshaping and for the data manipulation.
the data:
library(data.table)
df <- setDT(read.table(text="
B,C,V,H,a1,a2,a3,a4,a5,a6,a7
Genesis,1,1,A,73,74,92,NA,NA,NA,NaN
Genesis,1,1,B,71,73,71,00,NA,NA,NA
Genesis,1,2,A,81,71,3303,80,73,74,92",h=T,sep = ","))
I transform the data this way:
test <- melt(df,measure.vars = patterns("^a"))
plouf <- dcast(test[!is.na(value)],variable ~B+C+V+H,fill = "value")
variable Genesis_1_1_A Genesis_1_1_B Genesis_1_2_A
1: a1 73 71 81
2: a2 74 73 71
3: a3 92 71 3303
4: a4 NA 0 80
5: a5 NA NA 73
6: a6 NA NA 74
7: a7 NA NA 92
I then create a vector of all combination of 3 successive numbers:
allcomb <- unlist(lapply(1:(nrow(plouf)-2),function(i){
plouf[i:(i+2),lapply(.SD,function(col){paste(col,collapse = ",")}),.SDcols = grep("Genesis",names(plouf),value = T)]
}))
It is a bit tricky:
plouf[1:3,lapply(.SD,function(col){paste(col,collapse = ",")}),.SDcols = grep("Genesis",names(plouf),value = T)]
concantenate the 3 first lines of all columns of plouf specified by .SDcols = grep("Genesis",names(plouf),value = T)
[1] "Genesis_1_1_A" "Genesis_1_1_B" "Genesis_1_2_A"
that is the columns beginning with Genesis. Doing that for all successive combination of 3 lines, and transforming the output into a vector gives me the vector allcomb. It contains combination with NA, that you can clean:
allcomb <- allcomb[!grepl("NA",allcomb)]
Genesis_1_1_A Genesis_1_1_B Genesis_1_2_A Genesis_1_1_B Genesis_1_2_A Genesis_1_2_A Genesis_1_2_A Genesis_1_2_A
"73,74,92" "71,73,71" "81,71,3303" "73,71,0" "71,3303,80" "3303,80,73" "80,73,74" "73,74,92"
Having all the combination in text, you can use table to count the occurrence of each combination, leading to the results you wanted:
> table(allcomb)
3303,80,73 71,3303,80 71,73,71 73,71,0 73,74,92 80,73,74 81,71,3303
1 1 1 1 2 1 1
The vector allcomb contains the names in which you have the combination as a column name. You can thus find back each repetition :
sapply(unique(allcomb),function(comb){
names(allcomb[grep(comb,allcomb)])}
)
$`73,74,92`
[1] "Genesis_1_1_A" "Genesis_1_2_A"
$`71,73,71`
[1] "Genesis_1_1_B"
$`81,71,3303`
[1] "Genesis_1_2_A"
$`73,71,0`
[1] "Genesis_1_1_B"
$`71,3303,80`
[1] "Genesis_1_2_A"
$`3303,80,73`
[1] "Genesis_1_2_A"
$`80,73,74`
[1] "Genesis_1_2_A"

how to iterate through each element in a matrix in r

Context: I am iterating through several variables in my dataset, and performing a pairwise t.test between the factors for each of those variables. ( which i have succesfully managed to perform). and example of the result i have is as so:
Table of P-values between classes 11,12,13 and 14
My next task with which i am having difficulty with is presenting each of those values as a table where for each element, if its value is below a certain threshold (say .05) then the table should dynamically display if the test between the two classes passes ( represented by a 1 if below 0.05 and a 0 if above 0.05) the table should also display a ratio of the number of tests passed as a proportion of the number of tests conducted. ( number of entries in the table below 0.05 over the total number of entries in the diagonal matrix). In reference to the image above the output should look like this:
Ideal Matrix
And so the problem, is essentially that i have to iterate through the first matrix (exclude the first row and first column), apply a function then generate a new row and header with a row and column summary! Any help or advice would be appreciated.

R is not really a useful tool to build such a table, but here is one solution.
Data (shortened the decimals for convenience):
mat <- matrix(c(.569, .0001, .1211, NA, .0001, .3262, NA, NA, .0001), nrow = 3)
[,1] [,2] [,3]
[1,] 0.5690 NA NA
[2,] 0.0001 0.0001 NA
[3,] 0.1211 0.3262 1e-04
First we convert to the 0,1 scheme by using ifelse with the condition < .05:
mat <- ifelse(mat < .05, 1, 0)
Then we add another column with the rowSums:
mat <- cbind(mat, rowSums(mat, na.rm = T))
Then we add another row with the colSums of the boolean matrix !is.na(mat), therefore counting the numbers of non NA per column:
mat <- rbind(mat, colSums(!is.na(mat)))
Then we change the lower right cell to the sum of the inner matrix divided by the amount of non NA of the inner matrix:
mat[nrow(mat), ncol(mat)] <- sum(mat[1:nrow(mat)-1, 1:ncol(mat)-1], na.rm = T)/
sum(!is.na(mat[1:nrow(mat)-1, 1:ncol(mat)-1]))
Finally, we change the row and column names:
rownames(mat) <- c(12:14, "SumCount")
colnames(mat) <- c(11:13, "SumScore")
End result:
> mat
11 12 13 SumScore
12 0 NA NA 0.0
13 1 1 NA 2.0
14 0 0 1 1.0
SumCount 3 2 1 0.5
Notice that no looping was necessary, as R is very efficient with vectorized operations on matrices.

Here is one way of doing what you want.
First I will make up a matrix.
set.seed(3781)
pval <- matrix(runif(9, 0, 0.07), 3)
is.na(pval) <- upper.tri(pval)
dimnames(pval) <- list(12:14, 11:13)
Now the question.
Ideal <- matrix(as.integer(pval < 0.05), nrow(pval))
dimnames(Ideal) <- dimnames(pval)
Ideal
# 11 12 13
#12 1 NA NA
#13 1 1 NA
#14 1 0 0
r <- sum(Ideal, na.rm = TRUE)/sum(!is.na(Ideal))
r
#[1] 0.6666667
So now all what is needed is to add the extra row and column.
Ideal <- rbind(Ideal, colSums(!is.na(Ideal)))
Ideal <- cbind(Ideal, rowSums(Ideal, na.rm = TRUE))
Ideal[nrow(pval) + 1, ncol(pval) + 1] <- r
rownames(Ideal)[nrow(pval) + 1] <- "SumCount"
colnames(Ideal)[nrow(pval) + 1] <- "SumScore"

Finding out the percentage of times a sequence in one column is the same as in another column

I hope I articulate this properly. I have a data set with two columns I am trying to compare in a memory experiment. Recall.CRESP is a column specifying the correct answers on a memory test selected through grid coordinates. Recall.RESP shows participants response.
The columns look something like this:
|Recall.CRESP | Recall.RESP |
|---------------------------------|---------------------------------|
|grid35grid51grid12grid43grid54 | grid35grid51grid12grid43grid54 |
|grid22grid53grid35grid21grid44 | grid23grid53grid35grid21grid43 |
|grid12grid14grid15grid41grid23 | grid12grid24grid31grid41grid25 |
|grid15grid41grid33grid24grid55 | grid15grid41grid33grid14grid55 |
I have the following line of code to tell me the percentage of times per row that the columns are identical to each other:
paste0((100*with(Data, mean(Recall.CRESP==Recall.RESP, na.rm = "TRUE"))), "%")
So for example, in my dataset 20% of the time column Recall.CRESP matches Recall.RESP exactly, signifying that a subject scored 5 out of 5 in their memory test 20% of the time.
However I want to be able to expand on this in two ways. The first is rather than giving me a percentage of when the rows are identical, I would like a percentage for when there is a partial match in the sequence. For instance grid11gird42gird22grid51grid32 and grid11gird15gird55grid42grid32 share a match of 2/5, with both the first and the last grid coordinate being identical. I am not sure how to specify the request in R for a partial sequence match of 2/5 (or any other outcome out of 5). Also keep in mind that in this example grid42 shows up in both sequences, but is not correctly recalled considering it is remembered out of position in Recall.RESP. The order is important in these sequences.
The other point is that so far I have described the experiment in terms of checking accuracy for forwards recall of memory items. Yet I also have separate data where participants were recalling in backwards order. So for example, grid11gird22gird33grid44grid55 from Recall.CRESP and grid51grid44grid33grid22grid11 from Recall.RESP are correctly matching 4/5 times. How can I turn the code around to check for reverse sequences and calculate percentages out of 5?
Any thoughts would be greatly appreciated.

I would separate the strings into columns of matrices, which will make them easy to compare and manipulate:
# borrowing Oriol's nicely shared data
Recall.CRESP <- c('grid35grid51grid12grid43grid54',
'grid22grid53grid35grid21grid44',
'grid12grid14grid15grid41grid23',
'grid15grid41grid33grid24grid55')
Recall.RESP <- c('grid35grid51grid12grid43grid54',
'grid23grid53grid35grid21grid43',
'grid12grid24grid31grid41grid25',
'grid15grid41grid33grid14grid55')
# function to create matrices
matrixify = function(dat) {
dat = do.call(rbind, strsplit(dat, split = "grid"))
dat = dat[, -1]
mode(dat) = "numeric"
return(dat)
}
cresp_mat = matrixify(Recall.CRESP)
resp_mat = matrixify(Recall.RESP)
## an example of what we made: just the numbers in the right order
cresp_mat
# [,1] [,2] [,3] [,4] [,5]
# [1,] 35 51 12 43 54
# [2,] 22 53 35 21 44
# [3,] 12 14 15 41 23
# [4,] 15 41 33 24 55
## Calculating results is now easy:
(forwards = rowMeans(cresp_mat == resp_mat))
# [1] 1.0 0.6 0.4 0.8
(reverse = rowMeans(cresp_mat == resp_mat[, 5:1]))
# [1] 0.2 0.2 0.0 0.2
You could, of course, assign the results to be new columns of your original data.

Here is my solution:
Recall.CRESP <- c('grid35grid51grid12grid43grid54',
'grid22grid53grid35grid21grid44',
'grid12grid14grid15grid41grid23',
'grid15grid41grid33grid24grid55')
Recall.RESP <- c('grid35grid51grid12grid43grid54',
'grid23grid53grid35grid21grid43',
'grid12grid24grid31grid41grid25',
'grid15grid41grid33grid14grid55')
df <- data.frame(Recall.CRESP, Recall.RESP, stringsAsFactors = F)
df$correctNormal <- NA
df$correctReverse <- NA
for (row in 1:nrow(df)) {
crespVector <- unlist(strsplit(as.character(df[row, 1]), 'grid'))[-1]
respVector <- unlist(strsplit(as.character(df[row, 2]), 'grid'))[-1]
correctNormal <- 0
correctReverse <- 0
for (i in 1:length(crespVector)) {
if (crespVector[i] == respVector[i]) correctNormal <- correctNormal + 1
if (crespVector[i] == respVector[length(respVector) + 1 - i]) correctReverse <- correctReverse + 1
}
df$correctNormal[row] = correctNormal / 5
df$correctReverse[row] = correctReverse / 5
}
df
## Recall.CRESP Recall.RESP correctNormal correctReverse
## 1 grid35grid51grid12grid43grid54 grid35grid51grid12grid43grid54 1.0 0.2
## 2 grid22grid53grid35grid21grid44 grid23grid53grid35grid21grid43 0.6 0.2
## 3 grid12grid14grid15grid41grid23 grid12grid24grid31grid41grid25 0.4 0.0
## 4 grid15grid41grid33grid24grid55 grid15grid41grid33grid14grid55 0.8 0.2

In R, reorganize list based on element names (rbind and indicator variable)

I am trying to reorganize my data, basically a list of data.frames.
Its elements represent subjects of interest (A and B), with observations on x and y, collected on two occasions (1 and 2).
I am trying to make this a list that contains data.frames referring to the subjects, with the information on which occasion x and y were collected being stored in the respective data.frames as new variable, as opposed to the element name:
library('rlist')
A1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
A2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
list <- list(A1=A1,A2=A2,B1=B1,B2=B2)
A <- do.call(rbind,list.match(list,"A"))
B <- do.call(rbind,list.match(list,"B"))
list <- list(A=A,B=B)
list <- lapply(list,function(x) {
y <- data.frame(x)
y$class <- c(rep.int(1,2),rep.int(2,2))
return(y)
})
> list
$A
x y class
A1.1 66 96 1
A1.2 76 58 1
A2.1 50 93 2
A2.2 57 12 2
$B
x y class
B1.1 58 56 1
B1.2 69 15 1
B2.1 77 77 2
B2.2 9 9 2
In my real world problem there are about 500 subjects, not always two occasions, differing numbers of observations.
So my example above is just to illustrate where I want to get, and I am stuck at how to pass to the do.call-rbind that it should, based on elements names, bind subject-specific elements as new list elements together, while assigning a new variable.
To me, this is a somewhat fuzzy task, and the closest I got was the rlist package. This question is related but uses unique to identify elements, whereas in my case it seems to be more a regex problem.
I'd be happy even for instructions on how to use google, any keywords for further research etc.

From the data you provided:
subj <- sub("[A-Z]*", "", names(lst))
newlst <- Map(function(x, y) {x[,"class"] <- y;x}, lst, subj)
First we do the regular expression call to isolate the number that will go in the class column. In this case, I matched on capital letters and erased them leaving the number. Therefore, "A1" becomes "1". Please note that the real names will mean a different regex pattern.
Then we use Map to create a new column for each data frame and save to a new list called newlst. Map takes the first element of each argument and carries out the function then continues on with each object element. So the first data frame in lst and the first number in subj are used first. The anonymous function I used is function(x,y) {x[, "class"] <- y; x}. It takes two arguments. The first is the data frame, the second is the column value.
Now it's much easier to move forward. We can create a vector called uniq.nmes to get the names of the data frames that we will combine. Where "A1" will become "A". Then we can rbind on that match:
uniq.nmes <- unique(sub("\\d", "", names(lst)))
lapply(uniq.nmes, function(x) {
do.call(rbind, newlst[grep(x, names(newlst))])
})
# [[1]]
# x y class
# A1.1 1 79 1
# A1.2 30 13 1
# A2.1 90 39 2
# A2.2 43 22 2
#
# [[2]]
# x y class
# B1.1 54 59 1
# B1.2 83 90 1
# B2.1 85 36 2
# B2.2 91 28 2
Data
A1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
A2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
lst <- list(A1=A1,A2=A2,B1=B1,B2=B2)

It sounds like you're doing a lot of gymnastics because you have a specific form in mind. What I would suggest is first trying to make the data tidy. Without reading the link, the quick summary is to put your data into a single data frame, where it can be easily processed.
The quick version of the answer (here I've used lst instead of list for the name to avoid confusion with the built-in list) is to do this:
do.call(rbind,
lapply(seq(lst), function(i) {
lst[[i]]$type <- names(lst)[i]; lst[[i]]
})
)
What this will do is create a single data frame, with a column, "type", that contains the name of the list item in which that row appeared.
Using a slightly simplified version of your initial data:
lst <- list(A1=data.frame(x=rnorm(5)), A2=data.frame(x=rnorm(3)), B=data.frame(x=rnorm(5)))
lst
$A1
x
1 1.3386071
2 1.9875317
3 0.4942179
4 -0.1803087
5 0.3094100
$A2
x
1 -0.3388195
2 1.1993115
3 1.9524970
$B
x
1 -0.1317882
2 -0.3383545
3 0.8864144
4 0.9241305
5 -0.8481927
And then applying the magic function
df <- do.call(rbind,
lapply(seq(lst), function(i) {
lst[[i]]$type <- names(lst)[i]; lst[[i]]
})
)
df
x type
1 1.3386071 A1
2 1.9875317 A1
3 0.4942179 A1
4 -0.1803087 A1
5 0.3094100 A1
6 -0.3388195 A2
7 1.1993115 A2
8 1.9524970 A2
9 -0.1317882 B
10 -0.3383545 B
11 0.8864144 B
12 0.9241305 B
13 -0.8481927 B
From here we can process to our hearts content; with operations like df$subject <- gsub("[0-9]*", "", df$type) to extract the non-numeric portion of type, and tools like split can be used to generate the sub-lists that you mention in your question.
In addition, once it is in this form, you can use functions like by and aggregate or libraries like dplyr or data.table to do more advanced split-apply-combine operations for data analysis.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex