get column from list of dataframes R - r

I am an R beginner and I am stuck on this problem. I had a dataframe and by using the split() function I have created a list of dataframes, e.g:
dfList <- split(mtcars, mtcars$cyl)
Now I want to retrieve a column of a specific dataframe, e.g. column 2 from dataframe 1, so something like
dfList[1][2]
What I can do right now is create for loops to get inside the data structure. But I can't find a oneliner to do it, if it exists. How can I do that? Thanks in advance!

I'm putting docendo's comment here to close out the question.
If you want to extract an element from a list (and treat it like a data.frame) rather than subset a list (to create a smaller list), you need to use the [[ ]] syntax. Plus, to get a column by index from a data.frame, you either need to use [[ idx ]] or [, idx ]. These are pretty basic indexing operations that you will probably want to review if you will be programming in R. So your "correct" call is probably
dfList[[1]][[2]]

Related

Logical comparison of elements from named list vs named vector in R

Sorry if this is a duplicate, I read through a number of threads but couldn't really find a good explanation.
I have a dataset (dataframe) where I calculated the mean value of each column. I now want to do some logical comparisons between these values. I used lapply to get the means
means_list <- lapply(dataset_df, mean)
which outputs a named list. But when I try to compare two elements of this list, e.g.
means_list["condition1"] > means_list["condition2"]
I get an error ("comparison of these types is not implemented").
I don't get that error if I use sapply instead so that I'm working with a named vector. I can also get around the error by converting the list to a dataframe with as.data.frame first.
So, I feel like I'm doing something wrong when subsetting a named list here but I don't quite understand how. Is there a correct way to subset the list so that I can do the logical comparison? Or is this not possible with named lists?
Thanks!
To access to the element of a list by its name, you have to use double brackets:
means_list[["condition1"]] > means_list[["condition2"]]

Is there a way to apply plyr's count() function to every column individually?

Similar to this question but for R. I want to get a summary count of every variable in each column of a data frame.
Currently, doing something like plyr::count(df[,1:10]) checks for how many times every variable in a row match. Instead, I just want a quick way of printing out what all my variables even are, though. I know this can be done with C-style recursion, but I'm hoping for a more elegant/simpler solution.
You can use lapply:
lapply(df, plyr::count)
Alternatively, keeping everything in base R you can use table with stack to get similar output
lapply(df, function(x) stack(table(x)))

Using a list of matrix names

I have 75 matrices that I want to search through. The matrices are named a1r1, a1r2, a1r3, a1r4, a1r5, a2r1,...a15r5, and I have a list with all 75 of those names in it; each matrix has the same number of rows and columns. Inside some nested for loops, I also have a line of code that, for the first matrix looks like this:
total <- (a1r1[row,i]) + (a1r1[row,j]) + (a1r1[row,k])
(i, j, k, and row are all variables that I am looping over.) I would like to automate this line so that the for loops would fully execute using the first matrix in the list, then fully execute using the second matrix and so on. How can I do this?
(I'm an experienced programmer, but new to R, so I'm willing to be told I shouldn't use a list of the matrix names, etc. I realize too that there's probably a better way in R than for loops, but I was hoping for sort of quick and dirty at my current level of R expertise.)
Thanks in advance for the help.
Here The R way to do this :
lapply(ls(pattern='a[0-9]r[0-9]'),
function(nn) {
x <- get(nn)
sum(x[row,c(i,j,k)])
})
ls will give a list of variable having a certain pattern name
You loop through the resulted list using lapply
get will transform the name to a varaible
use multi indexing with the vectorized sum function
It's not bad practice to build automatically lists of names designating your objects. You can build such lists with paste, rep, and sequences as 0:10, etc. Once you have a list of object names (let's call it mylist), the get function applied on it gives the objects themselves.

How to order a matrix by all columns

Ok, I'm stuck in a dumbness loop. I've read thru the helpful ideas at How to sort a dataframe by column(s)? , but need one more hint. I'd like a function that takes a matrix with an arbitrary number of columns, and sorts by all columns in sequence. E.g., for a matrix foo with N columns,
does the equivalent of foo[order(foo[,1],foo[,2],...foo[,N]),] . I am happy to use a with or by construction, and if necessary define the colnames of my matrix, but I can't figure out how to automate the collection of arguments to order (or to with) .
Or, I should say, I could build the entire bloody string with paste and then call it, but I'm sure there's a more straightforward way.
The most elegant (for certain values of "elegant") way would be to turn it into a data frame, and use do.call:
foo[do.call(order, as.data.frame(foo)), ]
This works because a data frame is just a list of variables with some associated attributes, and can be passed to functions expecting a list.

Writing a loop to apply the operator 'data.frame' multiple times

I would like to write a loop to create multiple data frames from a set of already existsing matrices.
I've imported and created these using the code:
temp<-list.files(pattern="*.csv")
ddives <- lapply(temp, read.csv)
so 'ddives' is my set of set of csv files. I now want to create a data frame out of each of these using a looped version of the code:
d.dives1<- data.frame(ddives[1])
A quick primer on terminology before I answer your question:
The result of read.csv() is a data.frame.
The result of lapply() is a list.
Thus you now have a list of data frames.
If you can safely assume that the data frames in the list have the same structure (i.e. the same number of columns and the same classes), then you can use rbind() to combine your list of data frames into a single data.frame.
To make this easier, you can use do.call() as follows:
do.call(rbind, ddives)
do.call constructs a call from the function using the list elements as arguments. If they are named, they are passed as named arguments, otherwise in order (as always in R). In this case you apply rbind to all of the elements in your list, thus creating a single data.frame.
This is clearly untested, since I don't have your data. But, in general, do.call is a useful function for this type of operation.
As this is a follow up to the earlier question you posted, try this:
for (i in 1:length(ddives)) assign(temp[i], ddives[[i]])
If you really want a looped version of your code, this would be:
for (i in 1:length(ddives)){
assign(paste("d.dives", i, sep =""), ddives[i])
}

Resources