Unexpected R behavior with function parameter - r

Am R newb. I coded a function that uses 3 parameters. In my code i use one of the parameters to help me read files from a directory. There are 100 files in the directory. The code works fine when I pass it all the function parameters and specify the files i want to read.
functionX(var1, var2, id) and functionX(var1, var2, id = 1:100)
## Below is the first line of code for me that uses "id".
sub.file.names <- file.names[id] ### Get file names
The odd thing is that when a value for "id" is not passed to the function initially (or set with a 1:100 default), the code seems to read all the file names anyway. And it does so even though a value for "id" has never been established.
It's as if R somehow treats the two functions below the same when the user omits passing a value to "id" when executing the function ... eg, functionx("var1", "var2") ## and does not pass any id variable
functionx(var1, var2, id)
functionx(var1, var2, id = 1:100)
Any pointers on why this is happening would be great to know. I feel the answer is obvious, but have not been able to figure it out.

Let me try to explain what is happening with a simple example. Consider the following function
foo = function(i){
LETTERS[i]
}
When you try foo(), you will notice that the function returns all 26 uppercase letters. Why does that happen? Well, everything in R is a function. So when you say LETTERS[i], you are essentially calling the function [. So, the function call is
`[`(LETTERS, i)
Since i is missing, this call is executed as [(LETTERS) (essentially LETTERS[]) which returns all elements of the vector. Note that this occurs because the [ function allows for the i argument to be missing while calling it. Check ?[
If you want the function to act differently when id is missing, either check for missing(id), or explicitly set it to NULL as default. So, if you do
foo2 = function(i = NULL){
LETTERS[i]
}
foo2() will return a zero length character vector.

Related

code executes outside function, but not inside it

my problem is, that some code gets executed outside a function, but not in it. In my example, the content of certain cells should be transferred from the input table to the output table. In case of removal or adding of rows/cols I don't access the cells by their index (e.g input[3,4]), but by application of a condition (e.g. input[(which(input$code=="A1")),(which(colnames(input)=="kg"))].
so here's a minimized version of my data:
input<-data.frame(animal=c("cat","dog","mouse","deer","lion"),
m=c(0.5,1,0.1,1.5,3),
kg=c(5,20,0.2,50,100),
code=c("A4","A5","A3","A1","A2"))
output<-data.frame(code=c("A1","A2","A3","A4","A5"),
kg=numeric(5))
execution outside the function, that works (the content of a cell of the input table should be copied to a suitable one in the output table):
row_out<-which(output$code=="A1")
col_out<-which(colnames(output)=="kg")
row_in<-which(input$code=="A1")
col_in<-which(colnames(input)=="kg")
output[row_out,col_out]<-input[row_in,col_in]
and the function, that contains the same code, which worked outside, except for the substitution of the quoted code expression for a function argument (codeexpression):
fun_transfer<-function(codeexpression){
row_out<-which(output$code==codeexpression)
col_out<-which(colnames(output)=="kg")
row_in<-which(input$code==codeexpression)
col_in<-which(colnames(input)=="kg")
output[row_out,col_out]<-input[row_in,col_in]
}
Problem: now the execution of
fun_transfer("A4")
does not lead to an error, nor to a result in the output table.
Why doesn't this function work or rather what does it do? Is there a problem with quotation marks?
any help would be appreciated
thanks,
Michel
In the best case, data enters a function as argument and leaves it as a return value.
Outside of a function
output[row_out,col_out] <- input[row_in,col_in]
changes the existing data.frame. You can (or better: should) not change some variable outside the function from within the function.
Just end your function with a return statement to return the changed dataframe to the caller
Edit
It appears as if what you try to write is a lesser version of merge. If the following answers your question it will probably be more concise, faster and more idiomatic:
input<-data.frame(animal=c("cat","dog","mouse","deer","lion"),
m=c(0.5,1,0.1,1.5,3),
kg=c(5,20,0.2,50,100),
code=c("A4","A5","A3","A1","A2"))
output<-data.frame(code=c("A1","A2","A3","A4","A5"))
output <- merge(output, input[, c("code", "kg")], by = "code",
all.x = TRUE, all.y = FALSE)
print(output)

Why the code only works with numbers and not letters?

I have to use the code bellow but I don't completely understand how it works. Why it won't work if I change du.4 by du.f and then use the f when calling the function? For some reason it only works with numbers and I do not undarstand why.
This is the error that it is giving in the case of du.f
Error in paste("Meth1=", nr, ".ps", sep = "") : object 'f' not found
du.4 <- function(u,v,a){(exp(a)*(-1+exp(a*v)))/(-exp(a)+exp(a+a*u)-exp(a*(u+v))+exp(a+a*v))}
plotmeth1 <- function(data1,data2,alpha,nr) {
psfile <-paste("Meth1=",nr,".ps",sep="")
diffmethod <-paste("du.",nr,sep="")
title=paste("Family",nr)
alphavalue <-paste("alpha=",round(alpha,digits=3),sep="")
#message=c("no message")
postscript(psfile)
data3<-sort(eval(call(diffmethod,data1,data2,alpha)))
diffdata <-data3[!is.na(data3)]
#if(length(data3)>length(diffdata))
#{message=paste("Family ",nr,"contains NA!")}
tq <-((1:length(diffdata))/(length(diffdata)+1))
plot(diffdata,tq,main=title,xlab="C1[F(x),G(y)]",ylab="U(0,1)",type="l")
legend(0.6,0.3,c(alphavalue))
abline(0,1)
#dev.off()
}
In R, a dot is used as just another character in identifiers. It is often used for clarity but doesn't have a formal function in defining the part after the dot as being in a name-space given by the part of the identifier before the dot. In something like du.f you can't refer to the function by f alone, even if your computation is inside of an environment named du. You can of course define a function named du.4 and then use 4 all by itself, but when you do so you are using the number 4 as just a number and not as a reference to the function. For example, if
du.4 <- function(u,v,a){(exp(a)*(-1+exp(a*v)))/(-exp(a)+exp(a+a*u)-exp(a*(u+v))+exp(a+a*v))}
Then du.4(1,2,3) evaluates to 21.08554 but attempting to use 4(1,2,3) throws the error
Error: attempt to apply non-function
In the case of your code, you are using paste to assemble the function name as a string to be passed to eval. It makes sense to paste the literal number 4 onto the string 'du.' (since the paste will convert 4 to the string '4') but it doesn't make sense to paste an undefined f onto 'du.'. It does, however, make sense to paste the literal string 'f' onto 'du.', so that the function call plotmeth1 (data1, data2, alpha, 'f') will work even though plotmeth1 (data1, data2, alpha, f) will fail.
See this question for more about the use of the dot in R identifiers.

R: IF statement evaluating expression despite condition being FALSE?

I've got a large function in R and the users have the ability to not include/specify an object. If they DO, the code checks to make sure the names in that object match the names in another. If they DON'T, there's no need to do that checking. The code line is:
if(exists("grids")) if(!all(expvarnames %in% names(grids))) {stop("Not all expvar column names found as column names in grids")}
But I'm getting the following error:
Error in match(x, table, nomatch = 0L) : argument "grids" is missing, with no default
Well in this trial run, grids is SUPPOSED to be missing. If I try
if(exists("grids")) print("yay")
Then nothing prints, i.e. the absence of grids means the expression isn't evaluated, which is as I'd expect. So can anyone think why R seems to be evaluating the subsequent IF statement in the main example? Should I slap another set of curly brackets around the second one??
Thanks!
Edit: more problems. Removing "grids," from the functions list of variables means it works if there's no object called grids and you don't specify it in the call (i.e. function(x,grids=whatever)). And keeping "grids," IN the functions list of variables means it works if there IS an object called grids and you do specify it in the call.
Please see this: http://i.imgur.com/9mr1Lwi.png
using exists(grids) is out because exists wants "quotes" and without em everything fails. WITH them ("grids"), I need to decide whether to keep "grids," in the functions list. If I don't, but I specify it in the call (function(x,grids=whatever)) then I get unused argument fail. If I DO, but don't specify it in the call because grids doesn't exist and I don't want to use it, I get match error, grids missing no default.
How do I get around this? Maybe list it in the function variables list as grids="NULL", then rather than if(exists("grids")) do if(grids!="NULL")
I still don't know why the original match problem is happening though. Match is from the expvarnames/grids names checker, which is AFTER if(exists("grids")) which evaluates to FALSE. WAaaaaaaiiiiittttt..... If I specify grids in the function variables list, i.e. simply putting function(x,grids,etc){do stuff}, does that mean the function CREATES an object called grids, within its environment?
Man this is so f'd up....
testfun <- function(x,grids)
{if(exists("grids")) globalgrids<<-grids
print(x+1)}
testfun(1) # Error in testfun(1) : argument "grids" is missing, with no default
testfun <- function(x,grids)
{if(exists("grids")) a<<-c(1,2,3)
print(x+1)}
testfun(1) #2 (and globally assigns a)
So in the first example, the function seems to have created an object called "grids" because exists("grids") evaluates to true. But THEN, ON THE SAME LINE, when asked to do something with grids, it says it doesn't exist! Schroedinger's object?!
This is proven in example 2: grids evaluates true and a is globally assigned then the function does its thing. Madness. Complete madness. Does anyone know WHY this ridiculousness is going on? And is the best solution to use my grids="NULL" default in the functions variables list?
Thanks.
Reproducible example, if you want to but I've already done it for every permutation:
testfun <- function(x,grids)
{if(exists("grids")) if(!all(expvarnames %in% names(grids))) {stop("Not all expvar column names found as column names in grids")}
print(x+1)}
testfun(1)
testfun(x=1,grids=grids)
grids<-data.frame(c(1,2,3),c(1,2,3),c(1,2,3))
expvarnames <- c("a","b","c")
colnames(grids) <- c("a","b","c")
Solution
Adapting your example use:
testfun <- function(x,grids = NULL)
{
if(!is.null(grids)){
if(!all(expvarnames %in% names(grids))){
stop("Not all expvar column names found as column names in grids")
}
print(x+1)
}
}
Using this testfun(1) will return nothing. By specifying a default argument in the function as NULL the function then checks for this (i.e. no argument specified) and then doesn't continue the function if so.
The Reason the Problem Occurs
We go through each of the examples:
testfun <- function(x,grids)
{if(exists("grids")) globalgrids<<-grids
print(x+1)}
testfun(1) # Error in testfun(1) : argument "grids" is missing, with no default
Here we call the function testfun, giving only the x argument. testfun knows it needs two arguments, and so creates local variables x and grids. We have then given an argument to x and so it assigns the value to x. There is no argument to grids, however the variable has still been created, even though no value has been assigned to it. So grids exists, but has no value.
From this exists("grids") will be TRUE, but when we try to do globalgrids<<-grids we will get an error as grids has not been assigned a value, and so we can't assign anything to globalgrids.
testfun <- function(x,grids)
{if(exists("grids")) a<<-c(1,2,3)
print(x+1)}
testfun(1) #2 (and globally assigns a)
This, however is fine. grids exists as in the previous case, and we never actually try and access the value stored in grids, which would cause an error as we have not assigned one.
In the solution, we simply set a default value for grids, which means we can always get something whenever we try and access the variable. Unlike in the previous cases, we will get NULL, not that nothing is stored there.
The main point of this is that when you declare arguments in your function, they are created each time you use the function. They exist. However, if you don't assign them values in your function call then they will exist, but have no value. Then when you try and use them, their lack of values will throw an error.
> a <- c(1,2,3,4)
> b <- c(2,4,6,8)
> if(exists("a")) if(!all(a %in% b)) {stop("Not all a in b")}
Error: Not all a in b
> rm(a)
> if(exists("a")) if(!all(a %in% b)) {stop("Not all a in b")}
>
When a does not exist, the expression does not evaluate, as expected. Before testing your first expression, make sure that grids does not exist by running rm(grids) in the console.
Richard Scriven's comment got me thinking: grids was an argument in my function but was optional, so maybe shouldn't be specified (like anything in "..." optional functions). I commented it out and it worked. Hooray, cheers everyone.

Passing arguments to functions, and variable scopes in R

I'm writing a simple function that takes two arguments (state, outcome). State is used to subset a dataframe later.
Having said that, part of the requirement is that state be a length 2 character vector. I need to write more code to ensure that the state that is passed conforms to this requirement.
So I wrote the following:
best <- function(state, outcome) {
outcome <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
state <- vector(mode = "character", length = 2)
st.checkTbl <- outcome[8]
state
}
However, when I call the function and pass the arguments:
best("AXA") or best("FOO") or even best("TX") or best(AL)
All I get back is: "" ""
If I comment out the #state <- ... then it passes the argument just fine and it prints "FOO" or "AXA" or "TX", etc.
How can I ensure that the argument passed to the function is stored as a variable (state) in the function? Or, am I way overthinking this? Really I just wanted to test that what I am passing to the state argument can be printed for test purposes.
. Sorry for the 101 lesson.
You would generally read your data outside of any function, like so:
outcome.data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
Otherwise, since a function has its own namespace, all the variables defined inside of it will vanish upon its return, unless they themselves are returned by the function with return(...). Several objects can be returned by putting them in a list: return(list(item1=var1, item2=var2)).
Some functions, such as assign, have the envir parameter that can be set to .GlobalEnv to change this behavior. Altering an object can also be done inside a function using the <<- operator instead of <-, although this practice is generally recommended against.
As a side note, when using a function, you need to define clearly:
What are its inputs
What does it do
What does it return
It's not useful, for instance, to use outcome as a function parameter and then read into a variable named income the content of a csv file. Your argument is then useless as it will be written over. That's why you had to comment out the line defining your state variable inside the function to actually be able to use state as it was received by the function.
This surely won't answer all your questions, but hopefully it can help you clarify certain things. For the rest there are plenty of good tutorials to learn further on how to program in R and how/when to use functions. Best of luck and happy learning!

Referencing a function parameter in R

I'm working on a function and need to know how to reference the incoming parameters.
For example, in python or lots of other languages, you can reference the input parameters something like this:
sys.argv[1:].
How can I reference the name of a parameter in R?
The specific problem I'm trying to solve is I want to capture the string value of the incoming parameter, so I can paste it as a concentration with a list of column_names I want to iterate through.
Here's the head of the function call, just so you can see the incoming parameter:
function(df_in)
So here's an example of the code I am writing and I want the string value of the dataframe_in, not the object that it references.
col_name <-paste(df_in,varnames[i],sep="$")
if df_in contained "my_df" and the current column_name is my_col, I'm trying to have col_name in the example above set to my_df$my_col.
I was thinking of using the get() function but quite sure how to apply it in this situation.
Thanks
Try something along these lines:
fn1 <- function(df_in){ in_nam <- deparse(substitute(df_in) )
col_names <-paste(in_nam, names(df_in), sep="$")
cat(col_names) }
> dfrm <- data.frame(a=1:10, b=letters[1:10])
> fn1(dfrm)
#dfrm$a dfrm$b
You didn't say what varnames was supposed to be so I'm guessing you want the column names from the object. BTW, don't expect to be able to reference the column values with those character values. They are no longer language objects.

Resources