I have 75 matrices that I want to search through. The matrices are named a1r1, a1r2, a1r3, a1r4, a1r5, a2r1,...a15r5, and I have a list with all 75 of those names in it; each matrix has the same number of rows and columns. Inside some nested for loops, I also have a line of code that, for the first matrix looks like this:
total <- (a1r1[row,i]) + (a1r1[row,j]) + (a1r1[row,k])
(i, j, k, and row are all variables that I am looping over.) I would like to automate this line so that the for loops would fully execute using the first matrix in the list, then fully execute using the second matrix and so on. How can I do this?
(I'm an experienced programmer, but new to R, so I'm willing to be told I shouldn't use a list of the matrix names, etc. I realize too that there's probably a better way in R than for loops, but I was hoping for sort of quick and dirty at my current level of R expertise.)
Thanks in advance for the help.
Here The R way to do this :
lapply(ls(pattern='a[0-9]r[0-9]'),
function(nn) {
x <- get(nn)
sum(x[row,c(i,j,k)])
})
ls will give a list of variable having a certain pattern name
You loop through the resulted list using lapply
get will transform the name to a varaible
use multi indexing with the vectorized sum function
It's not bad practice to build automatically lists of names designating your objects. You can build such lists with paste, rep, and sequences as 0:10, etc. Once you have a list of object names (let's call it mylist), the get function applied on it gives the objects themselves.
Related
This is a question of a general approach in R, I'm trying to find a way into R language but the data types and loop approaches (apply, sapply, etc) are a bit unclear to me.
What is my target:
Query data from API with parameters from a config list with multiple parameters. Return the data as aggregated data.frame.
First I want to define a list of multiple vectors (colums)
site segment id
google.com Googleuser 123
bing.com Binguser 456
How to manage such a list of value groups (row by row)? data.frames are column focused, you cant write a data.frame row by row in an R script. So the only way I found to define this initial config table is a csv, which is really an approach I try to avoid, but I can't find a way to make it more elegant.
Now I want to query my data, lets say with this function:
query.data <- function(site, segment, id){
config <- define_request(site, segment, id)
result <- query_api(config)
return result
}
This will give me a data.frame as a result, this means every time I query data the same columns are used. So my result should be one big data.frame, not a list of similar data.frames.
Now sapply allows to use one parameter-list and multiple static parameters. The mapply works, but it will give me my data in some crazy output I cant handle or even understand exactly what it is.
In principle the list of data.frames is ok, the data is correct, but it feels cumbersome to me.
What core concepts of R I did not understand yet? What would be the approach?
If you have a lapply/sapply solution that is returning a list of dataframes with identical columns, you can easily get a single large dataframe with do.call(). do.call() inputs each item of a list as arguments into another function, allowing you to do things such as
big.df <- do.call(rbind, list.of.dfs)
Which would append the component dataframes into a single large dataframe.
In general do.call(rbind,something) is a good trick to keep in your back pocket when working with R, since often the most efficient way to do something will be some kind of apply function that leaves you with a list of elements when you really want a single matrix/vector/dataframe/etc.
This is my first post, and I think I have looked thoroughly for my answer with no luck, but I might not be typing in the right search terms, since I am relatively new to R. I apologize if this has been answered before and if it has a link would be greatly appreciated.
In essence, I am trying to make a loop that will operate on a set of data frames that I have read into R from .txt files using read.table. I am working with simulated vegetation data organized into many species by site matrices, so it would be best for me if I could create loops that will just operate on the objects I have read in using some functions I have made and then put out new objects into my workspace with a specific naming pattern (e.g. put "_av" on the end of the name of the object operated on when creating a new object).
for convenience sake, lets say I have only four matrices I want to work with, all which contain the phrase "mod" for model. I have read that I can put these data frames into a list of data frames by the following code:
list.mods=lapply(ls(pattern="mod"),get)
This does create a list which I have been having trouble on getting my functions to actually operate on. From what I read this is the best way to make a list of objects you want to operate on.
So lets say that list.mods is now my list of operable matrices - mod1, mod2, mod3, and mod4. Also, lets say I have a function that simply calculates Bray-Curtis dissimilarity as follows:
bc=function(x){
vegdist(x,method="bray")
}
I can use this by typing in:
mod1.bc=bc(mod1)
That works. But it seems like I should be able to apply my list of models to the function bc and have it output the models with a pattern mod1.bc, mod2.bc, mod3.bc, and mod4.bc. I cannot get my list of files to work in the function much less save each operation as a new object with a patterned name.
What am I doing wrong? In the end I might have as many as a hundred models or more and would really appreciate being able to create a list of items that I can run through loops.
Thanks in advance.
You can use lapply again:
new.list.mods <- lapply(list.mods, bc)
This will return a new list in which each element is the result of applying bc to the corresponding element of list.mods.
The 'apply' family of functions in R basically allows you to save typing. If that's easier for you to understand, you can use a 'for loop' instead. Of course you will need to know how to access elements in a list for that. There is a question about that.
How about collecting the names of the models/objects you want into a list:
mod_list <- sapply(ls(pattern = "mod"), as.name)
and then looping over them with your function:
output_list <- lapply(eval(mod_list), bc)
With this approach you avoid creating the potentially large and redundant list.mods object in your example. Also, I think this will result in conveniently named lists.
I am an R beginner and I am stuck on this problem. I had a dataframe and by using the split() function I have created a list of dataframes, e.g:
dfList <- split(mtcars, mtcars$cyl)
Now I want to retrieve a column of a specific dataframe, e.g. column 2 from dataframe 1, so something like
dfList[1][2]
What I can do right now is create for loops to get inside the data structure. But I can't find a oneliner to do it, if it exists. How can I do that? Thanks in advance!
I'm putting docendo's comment here to close out the question.
If you want to extract an element from a list (and treat it like a data.frame) rather than subset a list (to create a smaller list), you need to use the [[ ]] syntax. Plus, to get a column by index from a data.frame, you either need to use [[ idx ]] or [, idx ]. These are pretty basic indexing operations that you will probably want to review if you will be programming in R. So your "correct" call is probably
dfList[[1]][[2]]
I'm really new to R but have a computer science background. I currently am trying to read in a bunch of different data files and then perform some analysis (the same) on each of them.
Right now, I have a list of datasets. So, my first data set is in list[[1]], second in list[[2]], etc. So, what I was going to do is loop on the length of the list and call some function passing values from two columns into that function from each unique dataset. I was reading an article on this, however, and found that:
foo = seq(1, 100, by=2)
foo.squared = NULL
foo.squared = foo^2
will square all the values within foo. So, is there any way to do something similar for my case? For example, passing in values from all the datasets in the list or something?
To make this more concrete, I have a list of datasets named data_list and each data set is identical with columns a, b and c. I need to call a function f with the arguments a and b from the datasets. Is there any way to do this besides using a for loop?
Please let me know if that makes sense. Sorry for any confusion, like I said, I am very new to this language. Thank you for your help!
Use this:
lapply(data_list, function(x) f(x$a, x$b))
Ok, I'm stuck in a dumbness loop. I've read thru the helpful ideas at How to sort a dataframe by column(s)? , but need one more hint. I'd like a function that takes a matrix with an arbitrary number of columns, and sorts by all columns in sequence. E.g., for a matrix foo with N columns,
does the equivalent of foo[order(foo[,1],foo[,2],...foo[,N]),] . I am happy to use a with or by construction, and if necessary define the colnames of my matrix, but I can't figure out how to automate the collection of arguments to order (or to with) .
Or, I should say, I could build the entire bloody string with paste and then call it, but I'm sure there's a more straightforward way.
The most elegant (for certain values of "elegant") way would be to turn it into a data frame, and use do.call:
foo[do.call(order, as.data.frame(foo)), ]
This works because a data frame is just a list of variables with some associated attributes, and can be passed to functions expecting a list.