User defined function export multiple data frames into global environment - r

I have created a function, which computes the statistics on various patients data, and as well as outputting plots, it generates data frames containing summary statistics for each patient.
If i copy and run the function within R, the outputs are available to me. However, I am now calling the function from a separate R script, and the data frames are no longer available.
Is there any way to correct this?
For example,
test=function(a){
A=a
B=2*a
C=3*a
D=4*a
DF=data.frame(A,B,C,D)
}
a=c(1,2,3,4)
test(a)
This does not return DF, yet if I were to type:
a=c(1,2,3,4)
A=a
B=2*a
C=3*a
D=4*a
DF=data.frame(A,B,C,D)
Then clearly DF is returned. Is there a simple way to fix this so that DF becomes available from the test function?

Try:
test=function(a){
A=a
B=2*a
C=3*a
D=4*a
DF=data.frame(A,B,C,D)
}
a=c(1,2,3,4)
df<-test(a)
print(df)
By assigning the function's returned value to a new variable it is now accessible in the global space.

If you want to assign an object from within a function to the global environment for easy retrieval then your operators are "<<-" or "->>" for more info see:
?assignOps() i.e.
test <- function(a)
A=a
B=2*a
C=3*a
D=4*a
DF <<- data.frame(A,B,C,D)
}
# trial your dummy data
a=c(1,2,3,4)
test(a)
DF
Hey presto ... it works! Writing return(DF) within the function will not deliver your data frame to the global environment.

Related

get() not working for column in a data frame in a list in R (phew)

I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.

How do I assign the argument of one of my functions as a variable name in R?

I wrote the following function:
rename.fun(rai,pred){
assign('pred',rai)
return(pred) }
I called it with the arguments rename.fun(k2e,k2e_cat2) and it returns the object I want but it is named pred.
The point of this function is to assign the object I define as rai to the object I define as pred. So rename k2e to k2e_cat2.
I am new to R but I am a SAS programmer. This is a very simple task with the SAS macro processor but I cant seem to figure it out in R
EDIT:
In SAS I would do the following:
%macro rename_fun(rai=) ;
data output (rename=(&rai.=&rai._cat2));
set input;
run;
%mend;
Essentially, I want to add the suffix _cat2 to a bunch of variables, but they need to be in a function call. I know this seems odd but its for a specific project at work. I am new to R so I apologize if this seems silly.
Since you say that you want to rename several columns in a data.frame you could simple do this by using a function that takes a data.frame and a list of column names to rename:
add_suffix_cat2 <- function(df, vars){
names(df)[match(vars, names(df))] <- paste0(vars, "_cat2")
return(df)
}
Then you can call the function like:
mydf <- mtcars
res <- add_suffix_cat2(mydf, c("hp","mpg"))
If you wanted to make the suffix customizable that's simlpe enough to do by adding another parameter to the function.

How to write a testthat unit test for a function that returns a data frame

I am writing a script that ultimately returns a data frame. My question is around if there are any good practices on how to use a unit test package to make sure that the data frame that is returned is correct. (I'm a beginning R programmer, plus new to the concept of unit testing)
My script effectively looks like the following:
# initialize data frame
df.out <- data.frame(...)
# function set
function1 <- function(x) {...}
function2 <- function(x) {...}
# do something to this data frame
df.out$new.column <- function1(df.out)
# do something else
df.out$other.new.column <- function2(df.out)
# etc ....
... and I ultimately end up with a data frame with many new columns. However, what is the best approach to test that the data frame that is produced is what is anticipated, using unit tests?
So far I have created unit tests that check the results of each function, but I want to make sure that running all of these together produces what is intended. I've looked at Hadley Wickham's page on testing but can't see anything obvious regarding what to do when returning data frames.
My thoughts to date are:
Create an expected data frame by hand
Check that the output equals this data frame, using expect_that or similar
Any thoughts / pointers on where to look for guidance? My Google-fu has let me down considerably on this one to date.
Your intuition seems correct. Construct a data.frame manually based on the expected output of the function and then compare that against the function's output.
# manually created data
dat <- iris[1:5, c("Species", "Sepal.Length")]
# function
myfun <- function(row, col, data) {
data[row, col]
}
# result of applying function
outdat <- myfun(1:5, c("Species", "Sepal.Length"), iris)
# two versions of the same test
expect_true(identical(dat, outdat))
expect_identical(dat, outdat)
If your data.frame may not be identical, you could also run tests in parts of the data.frame, including:
dim(outdat), to check if the size is correct
attributes(outdat) or attributes of columns
sapply(outdat, class), to check variable classes
summary statistics for variables, if applicable
and so forth
If you would like to test this at runtime, you should check out the excellent ensurer package, see here. At the bottom of the page you can see how to construct a template that you can test your dataframe against, you can make it as detailed and specific as you like.
I'm just using something like this
d1 <- iris
d2 <- iris
expect_that(d1, equals(d2)) # passes
d3 <- iris
d3[141,3] <- 5
expect_that(d1, equals(d3)) # fails

Variables created within a function not getting stored in data frame

I am writing a function to create some predicted variables within an existing data set that I am using to run some ML models. My function looks like this:
doall <- function(x1, x2){
J48 <- J48(ML, data=df1)
#summary(J48)
X1 <- predict(J48, df1, type="class")
X2 <- predict(J48, df2, type="class")
#return(X1)
}
doall(df1$DT_predict, df2$DT_predict1)
J48 is a decision tree model (via RWeka). The code works (doall(df1$DT_predict1, df2$DT_predict1)) properly, I believe, because when I include the return function, it returns the values of X1. However, the predicted variables are not getting generated/stored in the data frames (df1 and df2). Ideally, I would like to have the dataframe names within the function, but that's the next step.
Can someone show how can I store the variables X1 and X2 within dataframes df1 and df2 respectively.
Ideally your question would have a bit more information about what your data frames look like, what X1 and X2 look like, and where your data frames are stored. For my answer I am assuming your data frames are stored in the global environment, and you want to modify them through a function.
This question has to do with scoping. For an in-depth description of scoping check out this article http://adv-r.had.co.nz/Functions.html#lexical-scoping
First, by assigning your variables within a function you are assigning them in a local environment. This means that the variables you are assigning do not carry over into the global environment (what you see when you type ls().
I believe you either want change a 'global variable' from within a function. This is done by the
<<-
command
for instance
a <- 2
print(a)
returns 2
change_a<-function(x){
x<-x*4
}
change_a(a)
print(a)
still returns 2
while
change_a<-function(x){
x<<-x*4
}
change_a(a)
print(a)
would return 8
I think you want to use the <<- operator instead of <- to accomplish what you want.
On a related note, it is not generally considered to be best practices to assign and change global variables from within a function.

Custom function in R

I have created some simple functions in R that take data and return variables. I'd appreciate instruction on taking this to another level using the code below as illustration: There's a data frame 'df' which contains variables DOFW, CDFME (there are others, but two will suffice for this example. I'd like to pass to the function the name of the variable and have it build the barplot. Basically I want to create a function that performs the actions from z<- through abline and call that function for each variable. Thank you.
df<-data.frame(y.train,x.train)
z<-ddply(df,~DOFW,summarise,median=median(PCTCHG))
row.names(z)<-z$DOFW
z$DOFW<-NULL
barplot(t(as.matrix(z)),main="Median Return by DOFW",xlab="DOFW")
abline(h=median(y.train))
z<-ddply(df,~CDFME,summarise,median=median(PCTCHG))
row.names(z)<-z$CDFME
z$CDFME<-NULL
barplot(t(as.matrix(z)),main="Median Return by CDFME",xlab="CDFME")
abline(h=median(y.train))
Actually you don't really need non-standard evaluation since ddply allows you to pass a string for the variable name rather than a formula. Here's how you could do that.
#sample data
dd<-data.frame(a=rep(1:5, 2), b=2:11, c=runif(10))
#define function
library(plyr)
myplot<-function(coln) {
z<-ddply(dd, coln, summarise, median=median(c))
barplot(z[,2], main=paste("Median Return By", coln))
abline(h=median(dd$c))
}
#make plots
myplot("a")
myplot("b")
and the easiest way to get the names of the columns as a character vector is names(dd).

Resources