What is the best way to write a function in R where the output depends on an argument? - r

I want to write a function that calculates income tax in the UK, potentially inside a package. The precise formula changes regularly, every year say, but instead of writing a new function every time it changes, I want to use an argument, 'year', that controls the behavior of the function. So for example:
income_tax(x = '25000', year = '2019/20')
where x is a vector of incomes and year specifies the tax rules to apply.
What is the best way to organize and manage this function, considering that the formula for each year can be quite complex, and new updates will be added each year?
Is there an object orientated solution? Or write internal functions inside for each year and some if/else logic inside the main function?

It's a bit of a meta-question, as it's not specific to R, but... I imagine that even though the years are different, there are some parts that are common (in form, even if not in all the details) to several years. I'd be tempted to have a list of functions (one per year) that gets called by income_tax, and these in turn are bespoke, but can call common functionality:
year_fns <- list(
`2019/20` = function(x, ...) {
taxable <- post_allowance(x, 11000)
# now some bespoke stuff for this year
if (taxable > 100e3 ){
...
}
....
final_value
},
`2020/21` = #another function
...
)
post_allowance <- function(x, allowance) {
# some common functionality
}
and then you'd just calculate income tax via
income_tax <- function(x, year, ...) {
year_fns[[year]](x, ...)
}
This may take a bit of getting used to, but it's a beauty of R that functions are as easy to handle as numbers, strings etc. So we're looking-up the correct function to call in the year_fns[[year]] part, and then calling that function with the (x,...) part

In R you can pass around expressions (or parts of it), just like numbers and strings, meaning that they can be deployed as arguments in function calls. Most likely, the function expression() may suit your needs.

Related

R: Not to look for variables outside a function if they do not exist within it

This function is OK in R:
f <- function(x) {
x + y
}
Because if the variable y is not defined inside the function f(), R will look for it outside the environment of the function, in its parent environment.
Apart from the fact that this behavior can be a bug generator, what is the point of functions having input parameters? Anyway, all the variables inside a function can be searched outside of it.
Is there any way not to look for variables outside a function if they do not exist within the function?
Some reasons for using parameters that came to my mind:
Without parameters, users have to define variables before using the function, and these variable names need to match the variable names used within the function -- this is impractical.
How is anyone supposed to know/remember the names of the variables within a function? How do I know which variables within a function are purely local, and which variables have to exist outside of the function?
Input parameters can be passed directly as values or as a variable (and the variable name does not matter).
Input parameters communicate the intended usage of the function; it is clear what data is needed to operate it (or at the very least: how many values need to be inserted by the user of the function)
Input parameters can be documented properly using Rd files (or roxygen syntax)
I am sure there are many other reasons to use input parameters.
M. Papenberg provides a very good explanation.
Here's a quick addendum how to make a function not look for objects in parental environments:
Just provide them in the parameter list! This might sound stupid, but that's what you should always do unless you have good reason to do otherwise. In your example only x is passed to the function. So, if the idea here is that x should be returned if y doesn't exist, you can go for default parameters. In this case this could be done as
f <- function(x, y = 0) {
x + y
}

Pass Individual Arguments from a Vector to a Complex Function

Problem: What is the best way to loop through a vector of IDs, so that one ID is passed as an argument to a function, the function runs, then the next ID is used to run the function again, and so on until the function has been run 30 times with the 30 IDs in my vector?
Additional Info: I have a complex function that retrieves data from several different sources, manipulates it, writes it to a different table, and then emails me when its done. It has several arguments that are hard coded in, and an ID argument that I manually input each time I want to run it.
I'm sorry that I can't give a lot of specifics, but this is an extremely simplified version of my setup
#Manually Entered Arguments
ID<-3402
Arg1<- "Jon_Doe"
Arg2<- "Jon_Doe#gmail.com"
#Run Function
RunFun <- function (ID, arg1, arg2) {...}
Now, I have 30 non-sequential IDs (all numerical) that I have imported from an Excel column using:
ID.Group<- scan()
I know that it is extremely inefficient to run each ID through the function one at a time, but the complexity of the function and technological limitations only allow for one to be run at a time.
I am just getting started with R, so I'm sorry if any of this didn't make sense. I have spent the last 5 hours trying to figure this out so any help would be greatly appreciated.
Thank you!
The Vectorize function is actually a wrapper to mapply and is often used when vectorization is not a natural outcome of the function body. If you wrote the function with values for the arg1 and arg2 like this:
RunFun <- function (ID, arg1="Jon_Doe", arg2="Jon_Doe#gmail.com") {...}
V.RunFun <- Vectorize(Runfun)
V.RunFun ( IDvector )
This is often used with integrate or outer which require that their arguments return a vector of equal length to input.

Subsetting within a function

I'm trying to subset a dataframe within a function using a mixture of fixed variables and some variables which are created within the function (I only know the variable names, but cannot vectorise them beforehand). Here is a simplified example:
a<-c(1,2,3,4)
b<-c(2,2,3,5)
c<-c(1,1,2,2)
D<-data.frame(a,b,c)
subbing<-function(Data,GroupVar,condition){
g=Data$c+3
h=Data$c+1
NewD<-data.frame(a,b,g,h)
subset(NewD,select=c(a,b,GroupVar),GroupVar%in%condition)
}
Keep in mind that in my application I cannot compute g and h outside of the function. Sometimes I'll want to make a selection according to the values of h (as above) and other times I'll want to use g. There's also the possibility I may want to use both, but even just being able to subset using 1 would be great.
subbing(D,GroupVar=h,condition=5)
This returns an error saying that the object h cannot be found. I've tried to amend subset using as.formula and all sorts of things but I've failed every single time.
Besides the ease of the function there is a further reason why I'd like to use subset.
In the function I'm actually working on I use subset twice. The first time it's the simple subset function. It's just been pointed out below that another blog explored how it's probably best to use the good old data[colnames()=="g",]. Thanks for the suggestion, I'll have a go.
There is however another issue. I also use subset (or rather a variation) in my function because I'm dealing with several complex design surveys (see package survey), so subset.survey.design allows you to get the right variance estimation for subgroups. If I selected my group using [] I would get the wrong s.e. for my parameters, so I guess this is quite an important issue.
Thank you
It's happening right as the function is trying to define GroupVar in the beginning. R is looking for the object h by itself (not within the dataframe).
The best thing to do is refer to the column names in quotes in the subset function. But of course, then you'd have to sidestep the condition part:
subbing <- function(Data, GroupVar, condition) {
....
DF <- subset(Data, select=c("a","b", GroupVar))
DF <- DF[DF[,3] %in% condition,]
}
That will do the trick, although it can be annoying to have one data frame indexing inside another.

How to use a value that is specified in a function call as a "variable"

I am wondering if it is possible in R to use a value that is declared in a function call as a "variable" part of the function itself, similar to the functionality that is available in SAS IML.
Given something like this:
put.together <- function(suffix, numbers) {
new.suffix <<- as.data.frame(numbers)
return(new.suffix)
}
x <- c(seq(1000,1012, 1))
put.together(part.a, x)
new.part.a ##### does not exist!!
new.suffix ##### does exist
As it is written, the function returns a dataframe called new.suffix, as it should because that is what I'm asking it to do.
I would like to get a dataframe returned that is called new.part.a.
EDIT: Additional information was requested regarding the purpose of the analysis
The purpose of the question is to produce dataframes that will be sent to another function for analysis.
There exists a data bank where elements are organized into groups by number, and other people organize the groups
into a meaningful set.
Each group has an id number. I use the information supplied by others to put the groups together as they are specified.
For example, I would be given a set of id numbers like: part-1 = 102263, 102338, 202236, 302342, 902273, 102337, 402233.
So, part-1 has seven groups, each group having several elements.
I use the id numbers in a merge so that only the groups of interest are extracted from the large data bank.
The following is what I have for one set:
### all.possible.elements.bank <- .csv file from large database ###
id.part.1 <- as.data.frame(c(102263, 102338, 202236, 302342, 902273, 102337, 402233))
bank.names <- c("bank.id")
colnames(id.part.1) <- bank.names
part.sort <- matrix(seq(1,nrow(id.part.1),1))
sort.part.1 <- cbind(id.part.1, part.sort)
final.part.1 <- as.data.frame(merge(sort.part.1, all.possible.elements.bank,
by="bank.id", all.x=TRUE))
The process above is repeated many, many times.
I know that I could do this for all of the collections that I would pull together, but I thought I would be able to wrap the selection process into a function. The only things that would change would be the part numbers (part-1, part-2, etc..) and the groups that are selected out.
It is possible using the assign function (and possibly deparse and substitute), but it is strongly discouraged to do things like this. Why can't you just return the data frame and call the function like:
new.part.a <- put.together(x)
Which is the generally better approach.
If you really want to change things in the global environment then you may want a macro, see the defmacro function in the gtools package and most importantly read the document in the refrences section on the help page.
This is rarely something you should want to do... assigning to things out of the function environment can get you into all sorts of trouble.
However, you can do it using assign:
put.together <- function(suffix, numbers) {
assign(paste('new',
deparse(substitute(suffix)),
sep='.'),
as.data.frame(numbers),
envir=parent.env(environment()))
}
put.together(part.a, 1:20)
But like Greg said, its usually not necessary, and always dangerous if used incorrectly.

The Art of R Programming : Where else could I find the information?

I came across the editorial review of the book The Art of R Programming, and found this
The Art of R Programming takes you on a guided tour of software development with R, from basic types and data structures to advanced topics like closures, recursion, and anonymous functions
I immediately became fascinated by the idea of anonymous functions, something I had come across in Python in the form of lambda functions but could not make the connection in the R language.
I searched in the R manual and found this
Generally functions are assigned to symbols but they don't need to be. The value returned by the call to function is a function. If this is not given a name it is referred to as an anonymous function. Anonymous functions are most frequently used as arguments other functions such as the apply family or outer.
These things for a not-very-long-time programmer like me are "quirky" in a very interesting sort of way.
Where can I find more of these for R (without having to buy a book) ?
Thank you for sharing your suggestions
Functions don't have names in R. Whether you happen to put a function into a variable or not is not a property of the function itself so there does not exist two sorts of functions: anonymous and named. The best we can do is to agree to call a function which has never been assigned to a variable anonymous.
A function f can be regarded as a triple consisting of its formal arguments, its body and its environment accessible individually via formals(f), body(f) and environment(f). The name is not any part of that triple. See the function objects part of the language definition manual.
Note that if we want a function to call itself then we can use Recall to avoid knowing whether or not the function was assigned to a variable. The alternative is that the function body must know that the function has been assigned to a particular variable and what the name of that variable is. That is, if the function is assigned to variable f, say, then the body can refer to f in order to call itself. Recall is limited to self-calling functions. If we have two functions which mutually call each other then a counterpart to Recall does not exist -- each function must name the other which means that each function must have been assigned to a variable and each function body must know the variable name that the other function was assigned to.
There's not a lot to say about anonymous functions in R. Unlike Python, where lambda functions require special syntax, in R an anonymous function is simply a function without a name.
For example:
function(x,y) { x+y }
whereas a normal, named, function would be
add <- function(x,y) { x+y }
Functions are first-class objects, so you can pass them (regardless of whether they're anonymous) as arguments to other functions. Examples of functions that take other functions as arguments include apply, lapply and sapply.
Get Patrick Burns' "The R Inferno" at his site
There are several good web sites with basic introductions to R usage.
I also like Zoonekynd's manual
Great answers about style so far. Here's an answer about a typical use of anonymous functions in R:
# Make some data up
my.list <- list()
for( i in seq(100) ) {
my.list[[i]] <- lm( runif(10) ~ runif(10) )
}
# Do something with the data
sapply( my.list, function(x) x$qr$rank )
We could have named the function, but for simple data extractions and so forth it's really handy not to have to.

Resources