Function whose output is a list over matrix rows in R - r

I have the next problem. I have a function f, whose output is a list with two elements. Let's say
f<-function(row, number){
#some procedures
p1<-vector1
p2<-vector2
return(list(p1,p2))
}
Now, I have to apply this function to 190 rows of a DB with a long number of columns a lot of times, so I'm looking for a function like "apply" that allows me to save in a list the output of the function for the 190 rows. I mean, a function whose output (applied over the rows of my DB) is something like below:
output[[i]][[j]] = f(row_i, number)[[j]]
I hope to have been clearly enough.
Notes:
It'd be awsome if the code would run in the less possible time, because, as I said, I have to do this procedure a lot of times.
I don't know if it's necessary to remove the "list" part from the function I define to obtain the result I'm looking for.
Thank you so much.

Related

Lapply on a list of a list

This is all about a code in R.
I have seperated a big data file "All_data.csv" in smaller data of individuals in a particular year.
So the list looks like this:
All individuals is a list of 25 individuals. If you would then take the first element of that list you get:
Indivudual 1: dataframe_year1, dataframe_year2.
If you take the second element you get for instance:
Individual 2: dataframe_year1, dataframe_year2, dataframe_year3.
etc. so the lists in the lists differ in their length.
Now I want to do a (analysis) function on the dataframes, I do not need to store the output again in the list per se.
I solved it with doing an lapply on the list All_data, with a function defined by myself which also calls lapply again and then my analysis function. But I was wondering if there was another way. Because it seems a bit inefficient to do.
split <- function (All_data)
{
#function that splits files by date and individual
#returns list of individuals and within that list is another list of dataframes. Called All_individuals
}
Make_analysis <- function (All_individuals)
{
Listfiles <- split (All_individuals)
HRE <- lapply (Listfiles, Doall)
}
Analysis <- function (files)
{
...
}
function calls:
lapply (All_data, Make_analysis)
Could anyone help?
Also is this the best way to go if I would want to parallise the analysis with RSlurm to run it on a HPC? Then I could change lapply with slurm map right?
My function in itself works but it seems very inefficient. Would like some tips on how to make code more efficient. Also on how to parallise it with Rslurm.

R approach for iterative querying

This is a question of a general approach in R, I'm trying to find a way into R language but the data types and loop approaches (apply, sapply, etc) are a bit unclear to me.
What is my target:
Query data from API with parameters from a config list with multiple parameters. Return the data as aggregated data.frame.
First I want to define a list of multiple vectors (colums)
site segment id
google.com Googleuser 123
bing.com Binguser 456
How to manage such a list of value groups (row by row)? data.frames are column focused, you cant write a data.frame row by row in an R script. So the only way I found to define this initial config table is a csv, which is really an approach I try to avoid, but I can't find a way to make it more elegant.
Now I want to query my data, lets say with this function:
query.data <- function(site, segment, id){
config <- define_request(site, segment, id)
result <- query_api(config)
return result
}
This will give me a data.frame as a result, this means every time I query data the same columns are used. So my result should be one big data.frame, not a list of similar data.frames.
Now sapply allows to use one parameter-list and multiple static parameters. The mapply works, but it will give me my data in some crazy output I cant handle or even understand exactly what it is.
In principle the list of data.frames is ok, the data is correct, but it feels cumbersome to me.
What core concepts of R I did not understand yet? What would be the approach?
If you have a lapply/sapply solution that is returning a list of dataframes with identical columns, you can easily get a single large dataframe with do.call(). do.call() inputs each item of a list as arguments into another function, allowing you to do things such as
big.df <- do.call(rbind, list.of.dfs)
Which would append the component dataframes into a single large dataframe.
In general do.call(rbind,something) is a good trick to keep in your back pocket when working with R, since often the most efficient way to do something will be some kind of apply function that leaves you with a list of elements when you really want a single matrix/vector/dataframe/etc.

R - Please explain this code and how to make a function that outputs like it?

I am new to R and mostly working with old code written by someone else. And I am trying to create my own R functions.
I found some of the following code used for eigenvalue decomposition.
eigenMatrix = eigen(myMatrix)[[2]]
eigenVals = eigen(myMatrix)[[1]]
Here there is single function that can output 2 different data structures, being, a vector and a matrix depending of the value in the brackets.
When I search of functions with multiple outputs, they usually use lists to output multiple variables at once which does not work, possibly because of different types.
I don't understand why there are two setts of brackets and how the underlying function would work.
The posted code takes the eigen function, which returns a list with 2 values.
Then the [[]] are use to extract the first and second items from the list.
The [[]] is needed to return the underlying structure, and is better explained here: How to Correctly Use Lists in R?
Also, since the eigen function is run twice the code in the question is inefficient.
resultList = eigen(myMatrix)
eigenMatrix = resultList[[2]]
eigenVals = resultList[[1]]
This code is better since eigen is run only once and saves the result of the function as a list and then reads the values from the list.
For the function itself can be coaded as any function with multiple outputs such as here: https://stat.ethz.ch/pipermail/r-help/2007-March/126851.html or here: How to assign from a function with multiple outputs?
The list values can hold any structure and [[]] can be used to return the underlying structure of each value.

Pass Individual Arguments from a Vector to a Complex Function

Problem: What is the best way to loop through a vector of IDs, so that one ID is passed as an argument to a function, the function runs, then the next ID is used to run the function again, and so on until the function has been run 30 times with the 30 IDs in my vector?
Additional Info: I have a complex function that retrieves data from several different sources, manipulates it, writes it to a different table, and then emails me when its done. It has several arguments that are hard coded in, and an ID argument that I manually input each time I want to run it.
I'm sorry that I can't give a lot of specifics, but this is an extremely simplified version of my setup
#Manually Entered Arguments
ID<-3402
Arg1<- "Jon_Doe"
Arg2<- "Jon_Doe#gmail.com"
#Run Function
RunFun <- function (ID, arg1, arg2) {...}
Now, I have 30 non-sequential IDs (all numerical) that I have imported from an Excel column using:
ID.Group<- scan()
I know that it is extremely inefficient to run each ID through the function one at a time, but the complexity of the function and technological limitations only allow for one to be run at a time.
I am just getting started with R, so I'm sorry if any of this didn't make sense. I have spent the last 5 hours trying to figure this out so any help would be greatly appreciated.
Thank you!
The Vectorize function is actually a wrapper to mapply and is often used when vectorization is not a natural outcome of the function body. If you wrote the function with values for the arg1 and arg2 like this:
RunFun <- function (ID, arg1="Jon_Doe", arg2="Jon_Doe#gmail.com") {...}
V.RunFun <- Vectorize(Runfun)
V.RunFun ( IDvector )
This is often used with integrate or outer which require that their arguments return a vector of equal length to input.

How to order a matrix by all columns

Ok, I'm stuck in a dumbness loop. I've read thru the helpful ideas at How to sort a dataframe by column(s)? , but need one more hint. I'd like a function that takes a matrix with an arbitrary number of columns, and sorts by all columns in sequence. E.g., for a matrix foo with N columns,
does the equivalent of foo[order(foo[,1],foo[,2],...foo[,N]),] . I am happy to use a with or by construction, and if necessary define the colnames of my matrix, but I can't figure out how to automate the collection of arguments to order (or to with) .
Or, I should say, I could build the entire bloody string with paste and then call it, but I'm sure there's a more straightforward way.
The most elegant (for certain values of "elegant") way would be to turn it into a data frame, and use do.call:
foo[do.call(order, as.data.frame(foo)), ]
This works because a data frame is just a list of variables with some associated attributes, and can be passed to functions expecting a list.

Resources