I'm trying to package some code I use for data analysis so that other workers can use it. Currently, I'm stuck trying to write a simple function that imports data from a specific file type generated by a datalogger and trims it for use by other functions. Here's the code:
import<-function(filename,type="campbell",nprobes){
if (filename==TRUE){
if (type=="campbell"){
message("File import type is from Campbell CR1000")
flux.data<<-read.table(filename,sep=",",header=T,skip=1)
flux.data<<-flux.data[,-c(1,2)];flux.data<<-flux.data[-c(1,2),]
if (nprobes=="missing"){
nprobes<-32
}
flux.data<<-flux.data[,c(1:nprobes)]
flux.data.names<<-colnames(flux.data) #Saves column names
}
}
}
Ideally, the result would be a dataframe/matrix flux.data and a concomittant vector/list of the preserved column headers flux.data.names. The code runs and the function executes without errors, but the outputs aren't preserved. I usually use <<- to get around the function enclosure but its not working in this case - any suggestions?
I think the real problem is that I don't quite understand how enclosures work, despite a lot of reading... should I be using environment to assign environments within the function?
User joran answered my question in the comments above:
The critical issue was just in how the function was written: the conditional at the start (if filename==TRUE) was intended to see if filename was specified, and instead was checking to see if it literally equaled TRUE. The result was the conditional never being met, and no function output. Here's what fixed it:
import<-function(filename,type="campbell",nprobes){
if (exists(filename){
if (type=="campbell"){
#etc....
Another cool thing he pointed out was that I didn't need the <<- operator to utilize the function output and instead could write return(flux.data). This is a much more flexible approach, and helped me understand function enclosures a lot better.
Related
I have a command with six lines that I want to use several times. Therfore, I want to assign a name to this command and use it as a procedure instead of writing the whole command lines over and over.
In this case it is a <-rbind() command, but the issue is also more general.
modelcoeff<-rbind(modelcoeff,c(as.character((summary(mymodel)$terms[[2]])[[3]]),
as.character((((((summary(mymodel)$terms[[2]])[[2]])[[3]])[[3]])[[2]])[[3]]),
summary(mymodel)$coefficients[2,1],
summary(mymodel)$coefficients[2,4],
summary(mymodel)$coefficients[2,2],
summary(mymodel)$r.squared*100))
I would like to call something like rbindmodelcoeff and execute these command lines. How can I achieve this?
I tried to write a function, but it didn't seem to be the right approach.
A literal wrapping of your code into a function:
rbindmodelcoeff <- function(modelcoeff, mymodel) {
rbind(modelcoeff,
c(as.character((summary(mymodel)$terms[[2]])[[3]]),
as.character((((((summary(mymodel)$terms[[2]])[[2]])[[3]])[[3]])[[2]])[[3]]),
summary(mymodel)$coefficients[2,1],
summary(mymodel)$coefficients[2,4],
summary(mymodel)$coefficients[2,2],
summary(mymodel)$r.squared*100))
}
However, there are a couple changes I recommend:
call summary(mymodel) once, then re-use the results
you are using as.character on some of the objects but not all within the enclosing c(.), so everything is being converted to a character; to see what I mean, try c(as.character(1), 2); we can use a list instead to preserve string-vs-number
rbindmodelcoeff <- function(modelcoeff, mymodel) {
summ <- summary(mymodel)
rbind(modelcoeff,
list(as.character((summ$terms[[2]])[[3]]),
as.character((((((summ$terms[[2]])[[2]])[[3]])[[3]])[[2]])[[3]]),
summ$coefficients[2,1],
summ$coefficients[2,4],
summ$coefficients[2,2],
summ$r.squared*100))
}
But there are still some problems with this. I can't get it to work at the moment since I don't know the model parameters you're using, so as.character((summ$terms[[2]])[[3]]) for me will fail. With that, I'm always hesitant to hard-code so many brackets without a firm understanding of what is being used. It's out of scope for this question (which is being converting your basic code into a function), but you might want to find out how to generalize that portion a bit.
I was going through swirl() again as a refresher, and I've noticed that the author of swirl says the command ?matrix is the correct form to calling for a help screen. But, when I run ?matrix(), it still works? Is there a difference between having and not having a pair of parenthesis?
It's not specific to the swirl environment (about which I was entirely unaware until 5 minutes ago) That is standard for R. The help page for the ? shortcut says:
Arguments
topic
Usually, a name or character string specifying the topic for which help is sought.
Alternatively, a function call to ask for documentation on a corresponding S4 method: see the section on S4 method documentation. The calls pkg::topic and pkg:::topic are treated specially, and look for help on topic in package pkg.
It something like the second option that is being invoked with the command:
?matrix()
Since ?? is actually a different shortcut one needs to use this code to bring up that page, just as one needs to use quoted strings for help with for, if, next or any of the other reserved words in R:
?'?' # See ?Reserved
This is not based on a "fuzzy logic" search in hte help system. Using help instead of ? gets a different response:
> help("str()")
No documentation for βstr()β in specified packages and libraries:
you could try β??str()β
You can see the full code for the ? function by typing ? at the command line, but I am just showing how it starts the language level processing of the expressions given to it:
`?`
function (e1, e2)
{
if (missing(e2)) {
type <- NULL
topicExpr <- substitute(e1)
}
#further output omitted
By running matrix and in general any_function you get the source code of it.
I'm working on a non-linear optimization, with the constrOptim.nl from the alabama package. However, my problem is more related to passing arguments (and the dot-dot-dot (ellipis/"...") and maybe do.call)- so I give first a general example and later refer to the constrOptim.nl function.
Suppose, I have the following functions - from which I only can edit the second and third but not the first.
first<-function (abc, second, third, ...){
second(abc,...)
third(abc,...)
}
second<- function(abc, ttt='nothing special'){
print(abc)
print(ttt)
}
third<- function(abc, zzz="default"){
print(abc)
print(zzz)
}
The output I want is the same I would get when I just run
second("test", ttt='something special')
third("test", zzz="non-default")
This is
"test"
"something special"
"test"
"non-default"
However, the code below doesn't work to do this.
first("test",second=second, third=third, ttt='something special',zzz="non-default")
How can I change the call or the second and third function to make it work?
http://www.r-bloggers.com/r-three-dots-ellipsis/
here I found some advice that do.call could help me but at the moment I'm not capable of understanding how it should work.
I cannot change the first function since this is the constrOptim.nl in my particular problem - and it is designed to be capable of passing more arguments to different functions. However, I can change the second and third function - as they are the restrictions and the function that I want to minimize. Obviously I can also change the call of the function.
So to be more particular, here is my specific problem:
I perform a maximum likelihood estimation with non-linear restrictions:
minimize <- function(Param,VARresiduals){
#Blahblah
for (index in 1:nrow(VARreisduals)){
#Likelihood Blahbla
}
return(LogL)
}
heq<-function(Param,W){
B<-Param[1:16]
restriction[1]<-Lrestriction%*%(diag(4)%x%(solve(W))%*%as.vector(B))
restriction[2:6]<-#BlablaMoreRestrictions
return(restriction)
}
Now I call the constrOptim.nl...
constrOptim.nl(par=rnorm(20), fn=minimize,hin=NULL heq=heq,VARresiduals,W)
...but get the same error, as I receive when I call the first function above - something like: "Error in second(abc, ...) : unused argument (zzz = "non-default")".
How can I change minimize and heq or the call? :) Thanks in Advance
Update after the post got marked as a duplicate:
The answer to the related post changes the first function in my example - as it implements a do.call there, that calls the other functions. However, I cannot change the first function in my example as I want to keep the constrOptim.nl working a variety of different functions. Is there another way?
The solution I came up with is not very elegant but it works.
second_2<- function(abc, extras){
a<-extras[[1]]
print(abc)
print(a)
}
third_2<- function(abc, extras){
a<-extras[[2]]
print(abc)
print(a)
}
extras<-list()
extras[[1]]<-'something special'
extras[[2]]<-"non-default"
extras
first("test",second=second_2, third=third_2, extras)
surprisingly also the following code works, but with a slightly different outcome
first("test",second=second, third=third, extras)
after all, setting default values is now a little clumsy but not infeasible.
For small function, it is trivial to just write conditional statement based on the argument value. For example, I have a function that extracts variable label from an ex-STATA dataframe. There are two options for output-type, environment and df.
f_extract_stata_label <- function(df, output="environment") {
if (output=="env") {
lab_env <- new.env()
for (i in seq_along(names(df))) {
lab_env[[names(df)[i]]] <- attr(df, "var.labels")[i]
}
return(lab_env)
} else if (output=="df") {
lab_df <- data.frame(var.name = names(d_tmp),
var.label = attr(d_tmp, "var.labels"))
return(lab_df)
}
}
However, I suspect that this is not good R idiom. First, how the function depends on output is not clear -- the reader has to read half way through the code to find out. Second, adding options to output in the future makes the function very hard to read.
So how should I rewrite this function?
R uses this kind of pattern in its core stats libraries where "label" strings make sense. These are functions where R's dispatch system is not that useful. That said, what you want is still dispatch-like.
You could refactor it to use a switch that calls a function dedicated to a specific output type. Two things happen then. First, the extra function call makes it clear what context you're in when using the traceback. Second, it makes the functions smaller and easier to read.
I would question whether you really want to use a dispatch function though, and why separate direct functions are not appropriate.
Folks -
I'm going to keep my code here brief, as I think to those more familiar with R, it will be obvious. I am trying to use a function (not my own) that requires I feed it a list of named lists of parameters. I am having trouble naming the lists via a function I wrote to create each list element. Here is my function:
# for invoking grts
stratumdesign<- function(ns, points, oversamp) {
stratumname<-as.character(ns)
print("from function")
print(stratumname)
designlist<-list(ns=c(panel=points, seltype="Equal", over=oversamp))
return(designlist)
}
.. I have tried both having the function call have ns be the integer it is in the originating code, or be passed as a character. Neither work. What I'm illustrating here to myself w/in the function is that ns gets properly passed to the function, but the resulting list returned is always named "$ns" when I want it to be the value passed AS ns! What on Earth am I doing wrong, here?
Since this deserves an actual answer, not just a comment...
Try something more like this:
stratumdesign<- function(ns, points, oversamp) {
print("from function")
print(stratumname)
designlist<-list(c(panel=points, seltype="Equal", over=oversamp))
names(designlist) <- as.character(ns)
return(designlist)
}