Pass Individual Arguments from a Vector to a Complex Function - r

Problem: What is the best way to loop through a vector of IDs, so that one ID is passed as an argument to a function, the function runs, then the next ID is used to run the function again, and so on until the function has been run 30 times with the 30 IDs in my vector?
Additional Info: I have a complex function that retrieves data from several different sources, manipulates it, writes it to a different table, and then emails me when its done. It has several arguments that are hard coded in, and an ID argument that I manually input each time I want to run it.
I'm sorry that I can't give a lot of specifics, but this is an extremely simplified version of my setup
#Manually Entered Arguments
ID<-3402
Arg1<- "Jon_Doe"
Arg2<- "Jon_Doe#gmail.com"
#Run Function
RunFun <- function (ID, arg1, arg2) {...}
Now, I have 30 non-sequential IDs (all numerical) that I have imported from an Excel column using:
ID.Group<- scan()
I know that it is extremely inefficient to run each ID through the function one at a time, but the complexity of the function and technological limitations only allow for one to be run at a time.
I am just getting started with R, so I'm sorry if any of this didn't make sense. I have spent the last 5 hours trying to figure this out so any help would be greatly appreciated.
Thank you!

The Vectorize function is actually a wrapper to mapply and is often used when vectorization is not a natural outcome of the function body. If you wrote the function with values for the arg1 and arg2 like this:
RunFun <- function (ID, arg1="Jon_Doe", arg2="Jon_Doe#gmail.com") {...}
V.RunFun <- Vectorize(Runfun)
V.RunFun ( IDvector )
This is often used with integrate or outer which require that their arguments return a vector of equal length to input.

Related

Replace for loop with vectorized call of a function returning multiple values

I have the following function: problema_firma_emprestimo(r,w,r_emprestimo,posicao,posicao_banco), where all input are scalars.
This function return three different matrix, using
return demanda_k_emprestimo,demanda_l_emprestimo,lucro_emprestimo
I need to run this function for a series of values of posicao_banco that are stored in a vector.
I'm doing this using a for loop, because I need three separate matrix with each of them storing one of the three outputs of the function, and the first dimension of each matrix corresponds to the index of posicao_banco. My code for this part is:
demanda_k_emprestimo = zeros(num_bancos,na,ny);
demanda_l_emprestimo = similar(demanda_k_emprestimo);
lucro_emprestimo = similar(demanda_k_emprestimo);
for i in eachindex(posicao_bancos)
demanda_k_emprestimo[i,:,:] , demanda_l_emprestimo[i,:,:] , lucro_emprestimo[i,:,:] = problema_firma_emprestimo(r,w,r_emprestimo[i],posicao,posicao_bancos[i]);
end
Is there a fast and clean way of doing this using vectorized functions? Something like problema_firma_emprestimo.(r,w,r_emprestimo[i],posicao,posicao_bancos) ? When I do this, I got a tuple with the result, but I can't find a good way of unpacking the answer.
Thanks!
Unfortunately, it's not easy to use broadcasting here, since then you will end up with output that is an array of tuples, instead of a tuple of arrays. I think a loop is a very good approach, and has no performance penalty compared to broadcasting.
I would suggest, however, that you organize your output array dimensions differently, so that i indexes into the last dimension instead of the first:
for i in eachindex(posicao_bancos)
demanda_k_emprestimo[:, :, i] , ...
end
This is because Julia arrays are column major, and this way the output values are filled into the output arrays in the most efficient way. You could also consider making the output arrays into vectors of matrices, instead of 3D arrays.
On a side note: since you are (or should be) creating an MWE for the sake of the people answering, it would be better if you used shorter and less confusing variable names. In particular for people who don't understand Portuguese (I'm guessing), your variable names are super long, confusing and make the code visually dense. Telling the difference between demanda_k_emprestimo and demanda_l_emprestimo at a glance is hard. The meaning of the variables are not important either, so it's better to just call them A and B or X and Y, and the functions foo or something.

Can someone please explain me this code? especially the role of "function x and [[x]]"?

This is the code in R and I'm having trouble understanding the role of function(x) and qdata[[x]] in this line of code. Can someone elaborate me this piece by piece? I didn't write this code. Thank you
outs=lapply(names(qdata[,12:35]), function(x)
hist(qdata[[x]],data=qdata,main="Histogram of Quality Trait",
xlab=as.character(x),las=1.5)$out)
This code generate a series of histograms, one for each of columns 12 to 35 of dataframe qdata. The lapply function iterates over the columns. At each iteraction, the name of the current column is passed as argument "x" to the anonymous function defined by "function(x)". The body of the function is a call to the hist() function, which creates the histogram. qdata[[x]] (where x is the name of a column) extracts the data from that column. I am actually confused by "data=qdata".
We don't have the data object named qdata so we cannot really be sure what will happen with this code. It appears that the author of this code is trying to pass the values of components named outs from function calls to hist. If qdata is an ordinary dataframe, then I suspect that this code will fail in that goal, because the hist function does not have an out component. (Look at the output of ?hist. When I run this with a simple dataframe, I do get histogram plots that appear in my interactive plotting device but I get NULL values for the outs components. Furthermore the 12 warnings are caused by the lack of a data parameter to hte hist function.
qdata <- data.frame(a=rnorm(10), b=rnorm(10))
outs=lapply(names(qdata), function(x)
hist(qdata[[x]],data=qdata,main="Histogram of Quality Trait",
xlab=as.character(x),las=1.5)$out)
#There were 12 warnings (use warnings() to see them)
> str(outs)
List of 2
$ : NULL
$ : NULL
So I think we need to be concerned about the level of R knowledge of the author of this code. It's possible I'm wrong about this presumption. The hist function is generic and it is possible that some unreferenced package has a function designed to handle a data object and retrun an outs value when delivered a vector having a particular class. In a typical starting situation with only the base packages loaded however, there are only three hist.* functions:
methods(hist)
#[1] hist.Date* hist.default hist.POSIXt*
#see '?methods' for accessing help and source code
As far as the questions about the role of function and [[x]]: the keyword function returns a language object that can receive parameter values and then do operations and finally return results. In this case the names get passed to the anonymous function and become, each in turn, the local name, x and the that value is used by the '[['-function to look-up the column in what I am presuming is the ‘qdata’-dataframe.

Function whose output is a list over matrix rows in R

I have the next problem. I have a function f, whose output is a list with two elements. Let's say
f<-function(row, number){
#some procedures
p1<-vector1
p2<-vector2
return(list(p1,p2))
}
Now, I have to apply this function to 190 rows of a DB with a long number of columns a lot of times, so I'm looking for a function like "apply" that allows me to save in a list the output of the function for the 190 rows. I mean, a function whose output (applied over the rows of my DB) is something like below:
output[[i]][[j]] = f(row_i, number)[[j]]
I hope to have been clearly enough.
Notes:
It'd be awsome if the code would run in the less possible time, because, as I said, I have to do this procedure a lot of times.
I don't know if it's necessary to remove the "list" part from the function I define to obtain the result I'm looking for.
Thank you so much.

R - Please explain this code and how to make a function that outputs like it?

I am new to R and mostly working with old code written by someone else. And I am trying to create my own R functions.
I found some of the following code used for eigenvalue decomposition.
eigenMatrix = eigen(myMatrix)[[2]]
eigenVals = eigen(myMatrix)[[1]]
Here there is single function that can output 2 different data structures, being, a vector and a matrix depending of the value in the brackets.
When I search of functions with multiple outputs, they usually use lists to output multiple variables at once which does not work, possibly because of different types.
I don't understand why there are two setts of brackets and how the underlying function would work.
The posted code takes the eigen function, which returns a list with 2 values.
Then the [[]] are use to extract the first and second items from the list.
The [[]] is needed to return the underlying structure, and is better explained here: How to Correctly Use Lists in R?
Also, since the eigen function is run twice the code in the question is inefficient.
resultList = eigen(myMatrix)
eigenMatrix = resultList[[2]]
eigenVals = resultList[[1]]
This code is better since eigen is run only once and saves the result of the function as a list and then reads the values from the list.
For the function itself can be coaded as any function with multiple outputs such as here: https://stat.ethz.ch/pipermail/r-help/2007-March/126851.html or here: How to assign from a function with multiple outputs?
The list values can hold any structure and [[]] can be used to return the underlying structure of each value.

How to use a value that is specified in a function call as a "variable"

I am wondering if it is possible in R to use a value that is declared in a function call as a "variable" part of the function itself, similar to the functionality that is available in SAS IML.
Given something like this:
put.together <- function(suffix, numbers) {
new.suffix <<- as.data.frame(numbers)
return(new.suffix)
}
x <- c(seq(1000,1012, 1))
put.together(part.a, x)
new.part.a ##### does not exist!!
new.suffix ##### does exist
As it is written, the function returns a dataframe called new.suffix, as it should because that is what I'm asking it to do.
I would like to get a dataframe returned that is called new.part.a.
EDIT: Additional information was requested regarding the purpose of the analysis
The purpose of the question is to produce dataframes that will be sent to another function for analysis.
There exists a data bank where elements are organized into groups by number, and other people organize the groups
into a meaningful set.
Each group has an id number. I use the information supplied by others to put the groups together as they are specified.
For example, I would be given a set of id numbers like: part-1 = 102263, 102338, 202236, 302342, 902273, 102337, 402233.
So, part-1 has seven groups, each group having several elements.
I use the id numbers in a merge so that only the groups of interest are extracted from the large data bank.
The following is what I have for one set:
### all.possible.elements.bank <- .csv file from large database ###
id.part.1 <- as.data.frame(c(102263, 102338, 202236, 302342, 902273, 102337, 402233))
bank.names <- c("bank.id")
colnames(id.part.1) <- bank.names
part.sort <- matrix(seq(1,nrow(id.part.1),1))
sort.part.1 <- cbind(id.part.1, part.sort)
final.part.1 <- as.data.frame(merge(sort.part.1, all.possible.elements.bank,
by="bank.id", all.x=TRUE))
The process above is repeated many, many times.
I know that I could do this for all of the collections that I would pull together, but I thought I would be able to wrap the selection process into a function. The only things that would change would be the part numbers (part-1, part-2, etc..) and the groups that are selected out.
It is possible using the assign function (and possibly deparse and substitute), but it is strongly discouraged to do things like this. Why can't you just return the data frame and call the function like:
new.part.a <- put.together(x)
Which is the generally better approach.
If you really want to change things in the global environment then you may want a macro, see the defmacro function in the gtools package and most importantly read the document in the refrences section on the help page.
This is rarely something you should want to do... assigning to things out of the function environment can get you into all sorts of trouble.
However, you can do it using assign:
put.together <- function(suffix, numbers) {
assign(paste('new',
deparse(substitute(suffix)),
sep='.'),
as.data.frame(numbers),
envir=parent.env(environment()))
}
put.together(part.a, 1:20)
But like Greg said, its usually not necessary, and always dangerous if used incorrectly.

Resources