How to make loops in R that operate on and return multiple objects - r

This is my first post, and I think I have looked thoroughly for my answer with no luck, but I might not be typing in the right search terms, since I am relatively new to R. I apologize if this has been answered before and if it has a link would be greatly appreciated.
In essence, I am trying to make a loop that will operate on a set of data frames that I have read into R from .txt files using read.table. I am working with simulated vegetation data organized into many species by site matrices, so it would be best for me if I could create loops that will just operate on the objects I have read in using some functions I have made and then put out new objects into my workspace with a specific naming pattern (e.g. put "_av" on the end of the name of the object operated on when creating a new object).
for convenience sake, lets say I have only four matrices I want to work with, all which contain the phrase "mod" for model. I have read that I can put these data frames into a list of data frames by the following code:
list.mods=lapply(ls(pattern="mod"),get)
This does create a list which I have been having trouble on getting my functions to actually operate on. From what I read this is the best way to make a list of objects you want to operate on.
So lets say that list.mods is now my list of operable matrices - mod1, mod2, mod3, and mod4. Also, lets say I have a function that simply calculates Bray-Curtis dissimilarity as follows:
bc=function(x){
vegdist(x,method="bray")
}
I can use this by typing in:
mod1.bc=bc(mod1)
That works. But it seems like I should be able to apply my list of models to the function bc and have it output the models with a pattern mod1.bc, mod2.bc, mod3.bc, and mod4.bc. I cannot get my list of files to work in the function much less save each operation as a new object with a patterned name.
What am I doing wrong? In the end I might have as many as a hundred models or more and would really appreciate being able to create a list of items that I can run through loops.
Thanks in advance.

You can use lapply again:
new.list.mods <- lapply(list.mods, bc)
This will return a new list in which each element is the result of applying bc to the corresponding element of list.mods.
The 'apply' family of functions in R basically allows you to save typing. If that's easier for you to understand, you can use a 'for loop' instead. Of course you will need to know how to access elements in a list for that. There is a question about that.

How about collecting the names of the models/objects you want into a list:
mod_list <- sapply(ls(pattern = "mod"), as.name)
and then looping over them with your function:
output_list <- lapply(eval(mod_list), bc)
With this approach you avoid creating the potentially large and redundant list.mods object in your example. Also, I think this will result in conveniently named lists.

Related

Extracting Nested Elements of an R List Generated by Loops

For lists within lists produced by a loop in R (in this example a list of caret models) I get an object with an unpredictable length and names for inner elements, such as list[[1]][[n repeats of 1]][[2]] where the internal [[1]] is repeated multiple times according to the function's input. In some cases, the length of n is not known, when accessing some older stored lists where input was not saved. While there are ways to work within a list index, like with list[length(list)], there appears to be no way to do this with repeated nested elements. This has made accessing them and passing them to various jobs awkward. I assume there is an efficient way to access them that I have missed, so I'm asking for help to do so, with an example case given below.
The function I'm generating gives out a list from a function that creates several outputs. The final list returned for a function having a complicated output structure is produced by returning something like:
return(list(listOfModels, trainingData, testingData))
The listofModels has variable length, depending on input of models given, and potentially other conditions depend on evaluation inside the function. It is made by:
listOfModels <- list(c(listOfModels, list(trainedModel)))
Where the "trainedModel" refers to the most recently trained model generated in the loop. The models used and the number of them may vary each time depending on choice. An unfortunate result is a complicated nested lists within a list.
That is, output[[1]] contains the models I want to access more efficiently, which are themselves list objects, while output[[2]] and output[[3]] are the dataframes used to train and evaluate the models. While accessing the dataframes is simple and has a defined, reproducible structure each time (simply being output[[2]], output[[3]] every time), output[[1]] becomes a mess. E.g., something like the following follows the "output[[1]]":
The only thing I am able to attempt in order to access this is using the fact that [[1]] is attached upon output[[1]] before [[2]]. All of the nested elements except one have a [[2]] at the end. Given the above pattern, there is an ugly solution that works, but is not a desirable format to work with. E.g., after evaluating n models given by a vector of strings called inputList, and a list given as output of the function, "output", I can have [[1]] repeated tens to hundreds of times.
for (i in (1:length(inputList)-1)){
eval(rlang::parse_expr(paste0(c("output", c(rep("[[1]]", 1+i)), "[[2]]" ) , collapse="")) )
}
This could be used to use all models for some downstream task like making predictions on new data, or whatever. In cases where the length of the inputList was not known, this could be found out by attempting to repeat this until finding an error, or something similar. This approach can be modified to call on a specific part of the list, for example, a certain model within inputList, if I know the original list input and can find the number for that model. Besides the bulkiness code working this way, compared to some way where I could just call on output[[1]][[n]] using some predictable format for various length n. One of the big problems is when accessing older runs that have been saved where the input list of models was not saved, leaving the length of n unknown. I don't know of any way of using something like length() or lengths() to count how many nested elements exist within a list. (For my example, output[[1]] is of length 1, no matter how many [[1]] repeat elements there are.)
I believe the simplest solution is to change the way the list is saved by the function, so that I can access it by a systematic reference, however, I have a bunch of old lists which I still want to access and perform some work with, and I'd also like to be able to have better control of working with lists in any case. So any help would be greatly appreciated.
I expected there would be some way to query the structure of nested R lists, which could be used to pass nested elements to separate functions, without having to use very long repetition of brackets.

Is there a way I can make r work within a list

Is there anyway to make r work 'within' a list? I.e. to make it assume a certain list is the current environment?
I have some code in which all the objects that I manipulate are in a list, call it "mylist".
Is there a way I can tell r that all objects are to be looked for in mylist, so that instead of writing
mylist$object
I can just write
object
and r will automatically look for it in mylist?
I am a fairly new r user so apologies if I've misused any terminology.
Are you looking for with()?
For example,
mylist <- list(object = c(1:10),
obj2 = c("a", "b"))
with(mylist, sum(object))
[1] 55
with(mylist, obj2)
[1] "a" "b"
# etc
If not, it sounds like you might not actually want to store you objects as elements in a list; instead, it might be better to create individual objects in your global environment to reference.
If you have multiple lists, each containing the same kind of objects, and you want to perform the same sort of tasks on those lists, then it would be time to look into writing a function. Still, even with a function you will want to pull those components apart (or pass the function to with()!) to be able to call those elements of a list by their names (e.g. object or obj2).
It might help to read up on lists and environments in R. A few good resources to that end:
(lists) - http://rforpublichealth.blogspot.com/2015/03/basics-of-lists.html
(environments) - http://adv-r.had.co.nz/Environments.html

R approach for iterative querying

This is a question of a general approach in R, I'm trying to find a way into R language but the data types and loop approaches (apply, sapply, etc) are a bit unclear to me.
What is my target:
Query data from API with parameters from a config list with multiple parameters. Return the data as aggregated data.frame.
First I want to define a list of multiple vectors (colums)
site segment id
google.com Googleuser 123
bing.com Binguser 456
How to manage such a list of value groups (row by row)? data.frames are column focused, you cant write a data.frame row by row in an R script. So the only way I found to define this initial config table is a csv, which is really an approach I try to avoid, but I can't find a way to make it more elegant.
Now I want to query my data, lets say with this function:
query.data <- function(site, segment, id){
config <- define_request(site, segment, id)
result <- query_api(config)
return result
}
This will give me a data.frame as a result, this means every time I query data the same columns are used. So my result should be one big data.frame, not a list of similar data.frames.
Now sapply allows to use one parameter-list and multiple static parameters. The mapply works, but it will give me my data in some crazy output I cant handle or even understand exactly what it is.
In principle the list of data.frames is ok, the data is correct, but it feels cumbersome to me.
What core concepts of R I did not understand yet? What would be the approach?
If you have a lapply/sapply solution that is returning a list of dataframes with identical columns, you can easily get a single large dataframe with do.call(). do.call() inputs each item of a list as arguments into another function, allowing you to do things such as
big.df <- do.call(rbind, list.of.dfs)
Which would append the component dataframes into a single large dataframe.
In general do.call(rbind,something) is a good trick to keep in your back pocket when working with R, since often the most efficient way to do something will be some kind of apply function that leaves you with a list of elements when you really want a single matrix/vector/dataframe/etc.

R: calling function on multiple list entries

I'm really new to R but have a computer science background. I currently am trying to read in a bunch of different data files and then perform some analysis (the same) on each of them.
Right now, I have a list of datasets. So, my first data set is in list[[1]], second in list[[2]], etc. So, what I was going to do is loop on the length of the list and call some function passing values from two columns into that function from each unique dataset. I was reading an article on this, however, and found that:
foo = seq(1, 100, by=2)
foo.squared = NULL
foo.squared = foo^2
will square all the values within foo. So, is there any way to do something similar for my case? For example, passing in values from all the datasets in the list or something?
To make this more concrete, I have a list of datasets named data_list and each data set is identical with columns a, b and c. I need to call a function f with the arguments a and b from the datasets. Is there any way to do this besides using a for loop?
Please let me know if that makes sense. Sorry for any confusion, like I said, I am very new to this language. Thank you for your help!
Use this:
lapply(data_list, function(x) f(x$a, x$b))

Saving many subsets as dataframes using "for"-loops

this question might be very simple, but I do not find a good way to solve it:
I have a dataset with many subgroups which need to be analysed all-together and on their own. Therefore, I want to use subsets for the groups and use them for the later analysis. As well, the defintion of the subsets as the analysis should be partly done with loops in order to save space and to ensure that the same analysis has been done with all subgroups.
Here is an example of my code using an example dataframe from the boot package:
data(aids)
qlist <- c("1","2","3","4")
for (i in length(qlist)) {
paste("aids.sub.",qlist[i],sep="") <- subset(aids, quarter==qlist[i])
}
The variable which contains the subgroups in my dataset is stored as a string, therefore I added the qlist part which would be not required otherwise.
Make a list of the subsets with lapply:
lapply(qlist, function(x) subset(aids, quarter==x))
Equivalently, avoiding the subset():
lapply(qlist, function(x) aids[aids$quarter==x,])
It is likely the case that using a list will make the subsequent code easier to write and understand. You can subset the list to get a single data frame (just as you can use one of the subsets, as created below). But you can also iterate over it (using for or lapply) without having to construct variable names.
To do the job as you are asking, use assign:
for (i in qlist) {
assign(paste("aids.sub.",i,sep=""), subset(aids, quarter==i))
}
Note the removal of the length() function, and that this is iterating directly over qlist.

Resources