How to view results in a file using a function and *apply? - r

I like to pop out results in a window so that they're easier to see and find (e.g., they don't get lost as the console continues to scroll). One way to do this is to use sink() and file.show(). For example:
y <- rnorm(100); x <- rnorm(100); mod <- lm(y~x)
sink("tempSink", type="output")
summary(mod)
sink()
file.show("tempSink", delete.file=T, title="Model summary")
I commonly do this to examine model fits, as above, but also for a wide variety of other functions and objects, such as: summary(data.frame), anova(model1, model2), table(factor1, factor2). These are common, but other situations can arise as well. The point here is that both the nature of the function and the object can vary.
It is somewhat tedious to type out all of the above every time. I would like to write a simpler function that I can call, something like the following would be nice:
sinkShow <- function(obj, fun, title="output") {
sink("tempSink", type="output")
apply(obj, ?, fun)
sink()
file.show("tempSink", delete.file=T, title=title)
}
Clearly, this doesn't work. There are several issues. First, how would you do this so that it won't crash with the wrong type of object or function without having to have a list of conditional executions (i.e., if(is.list(obj) { lapply...). Second, I'm not sure how to handle the margin argument. Lastly, this doesn't work even when I try simple, contrived examples where I know that everything is set appropriately, so there seems to be something fundamentally wrong.
Does anyone know how such a situation can be handled simply and easily? I'm not new to R, but I've never been formally taught it; I've picked up tricks in an ad-hoc manner, i.e., I'm not a very sophisticated R programmer. Thanks.

Rather than use apply I think you want do.call. Make sure to wrap it in a print since it is inside of a function.
Here is one possible implementation:
sinkShow <- function( obj, fun, title='Output', ...) {
file <- tempfile()
args <- c(list(obj),list(...))
capture.output( do.call( fun, args ), file=file )
file.show(file, delete.file=TRUE, title=title)
}
Though it should probably be renamed since I skipped using sink as well. I might modify this a bit and put it into the TeachingDemos package.

Use page. Here's some sample models:
d <- data.frame(y1=rnorm(100), y2=rnorm(100), x=rnorm(100))
mod <- lm(y1~x, data=d)
mods <- list(mod1=lm(y1~x, data=d), mod2=lm(y2~x, data=d))
And here's how you'd use page:
page(summary(mod), method="print")
page(lapply(mods, summary), method="print")
For my original post, which had code that turned out to be a near-reimplementation of page, see the edit history.

Related

R not remembering objects written within functions

I'm struggling to clearly explain this problem.
Essentially, something has seemed to have happened within the R environment and none of the code I write inside my functions are working and not data is being saved. If I type a command line directly into the console it works (i.e. Monkey <- 0), but if I type it within a function, it doesn't store it when I run the function.
It could be I'm missing a glaring error in the code, but I noticed the problem when I accidentally clicked on the debugger and tried to excite out of the browser[1] prompt which appeared.
Any ideas? This is driving me nuts.
corr <- function(directory, threshold=0) {
directory <- paste(getwd(),"/",directory,"/",sep="")
file.list <- list.files(directory)
number <- 1:length(file.list)
monkey <- c()
for (i in number) {
x <- paste(directory,file.list[i],sep="")
y <- read.csv(x)
t <- sum(complete.cases(y))
if (t >= threshold) {
correl <- cor(y$sulfate, y$nitrate, use='pairwise.complete.obs')
monkey <- append(monkey,correl)}
}
#correl <- cor(newdata$sulfate, newdata$nitrate, use='pairwise.complete.obs')
#summary(correl)
}
corr('specdata', 150)
monkey```
It's a namespace issue. Functions create their own 'environment', that isn't necessarily in the global environment.
Using <- will assign in the local environment. To save an object to the global environment, use <<-
Here's some information on R environments.
I suggest you give a look at some tutorial on using functions in R.
Briefly (and sorry for my horrible explanation) objects that you define within functions will ONLY be defined within functions, unless you explicitly export them using (one of the possible approaches) the return() function.
browser() is indeed used for debugging, keeps you inside the function, and allows you accessing objects created inside the function.
In addition, to increase the probability to have useful answers, I suggest that you try to post a self-contained, working piece of code allowing quickly reproducing the issue. Here you are reading some files we have no access to.
It seems to me you have to store the output yourself when you run your script:
corr_out <- corr('specdata', 150)

Calling files with c(x:y)

I have a large number of files (in GB size).I want to run a for loop in which I call some files, do so processing that creates some files, bind them together, and save it.
AA<-c(1,6)
BB<-c(5,10)
for(i in length(AA)){
listofnames<-list.files(pattern="*eng")
listofnames<- listofnames[c(paste(AA[i],BB[i],sep=":"))]
listoffiles <- lapply( listofnames, readRDS)
}
But listofnames has NA. What I am doing wrong?
It took me a while looking at your code to realize that you were actually trying to construct a character representation of the expression 1:5 that was supposed to index a vector by position. This is very wrong; you just can't paste together arbitrary R commands/expressions and expect to drop them in to you code wherever. (Technically, there are tools that do that sort of thing, but they are discouraged.)
Probably you're looking to do something closer to:
listofnames <- list.files(pattern="*eng")
ind <- rep(1:5,each = 5,length.out = length(listofnames))
listofnames_split <- split(listofnames,ind)
for (i in seq_along(listofnames_split)){
my_data <- lapply(listofnames_split[[i]], readRDS)
#Do processing here
#...
rm(my_data) #Assuming memory really is a problem
}
But I'm just sketching out hypothetical code here, I can't really match it to your exact situation since your example isn't really fully fleshed out.

Why does R store the loop variable/index/dummy in memory?

I've noticed that R keeps the index from for loops stored in the global environment, e.g.:
for (ii in 1:5){ }
print(ii)
# [1] 5
Is it common for people to have any need for this index after running the loop?
I never use it, and am forced to remember to add rm(ii) after every loop I run (first, because I'm anal about keeping my namespace clean and second, for memory, because I sometimes loop over lists of data.tables--in my code right now, I have 357MB-worth of dummy variables wasting space).
Is there an easy way to get around this annoyance?
Perfect would be a global option to set (a la options(keep_for_index = FALSE); something like for(ii in 1:5, keep_index = FALSE) could be acceptable as well.
In order to do what you suggest, R would have to change the scoping rules for for loops. This will likely never happen because i'm sure there is code out there in packages that rely on it. You may not use the index after the for loop, but given that loops can break() at any time, the final iteration value isn't always known ahead of time. And having this as a global option again would cause problems with existing code in working packages.
As pointed out, it's for more common to use sapply or lapply loops in R. Something like
for(i in 1:4) {
lm(data[, 1] ~ data[, i])
}
becomes
sapply(1:4, function(i) {
lm(data[, 1] ~ data[, i])
})
You shouldn't be afraid of functions in R. After all, R is a functional language.
It's fine to use for loops for more control, but you will have to take care of removing the indexing variable with rm() as you've pointed out. Unless you're using a different indexing variable in each loop, i'm surprised that they are piling up. I'm also surprised that in your case, if they are data.tables, they they are adding additional memory since data.tables don't make deep copies by default as far as i know. The only memory "price" you would pay is a simple pointer.
I agree with the comments above. Even if you have to use for loop (using just side effects, not functions' return values) it would be a good idea to structure
your code in several functions and store your data in lists.
However, there is a way to "hide" index and all temporary variables inside the loop - by calling the for function in a separate environment:
do.call(`for`, alist(i, 1:3, {
# ...
print(i)
# ...
}), envir = new.env())
But ... if you could put your code in a function, the solution is more elegant:
for_each <- function(x, FUN) {
for(i in x) {
FUN(i)
}
}
for_each(1:3, print)
Note that with using "for_each"-like construct you don't even see the index variable.

How to import only functions from .R file without executing the whole file

Let's say I have a R script, testScript.R
test <- function(){cat('Hello world')}
cat('Bye world')
In the R-console, I understand I can import the function, test() by
source('testScript.R')
However at the same time, it will also execute cat('Bye world'). Assuming it is not allowed to create/modify files, is there a way to import only the function, test() without executing cat('Bye world')?
First of all, let me say that this really isn't a good idea. R is a functional programming language so functions are just like regular objects. There's not a strong separation between calling a function and assigning a function. These are all pretty much the same thing
a <- function(a) a+1
a(6)
# [1] 7
assign("a", function(i) i+1)
a(6)
# [1] 7
`<-`(a, function(i) i+1)
a(6)
# [1] 7
There's no difference between defining a function and calling an assignment function. You never know what the code inside a function will do unless you run it; therefore it's not easy to tell which code creates "functions" and which does not. As #mdsumner pointed out, you would be better off manual separating the code you used to define functions and the code you use to run them.
That said, if you wanted to extract all the variable assignments where you use <- from a code file, you could do
cmds <- parse("fakeload.R")
assign.funs <- sapply(cmds, function(x) {
if(x[[1]]=="<-") {
if(x[[3]][[1]]=="function") {
return(TRUE)
}
}
return(FALSE)
})
eval(cmds[assign.funs])
This will evaluate all the function assignments of the "standard" form.
Oh man... that's interesting. I don't know of any way to do that without some atrocity like this:
# assume your two like script is stored in testScript.R
a <- readLines("testScript.R")
a <- paste(a, collapse="\n")
library(stringr)
func_string <- str_extract(a, "[a-z]+ <- function.+}")
test <- eval(parse(text=func_string))
> test()
Hello world
You will certainly need to work on the regex to extract your functions. And str_extract_all() will be helpful if there's more than one function. Good luck.

R issue "object not found"

I am a newcomer to R. Last week I had a long and complicated function working perfectly. The program was letting me pick a subset of columns and doing various manipulations on that subset. The function must work 'function(arg1=first_header_name, arg2=second_header_name,....)'. I have cleared the console, removed the old history file. I have read the manual again, I have checked the .csv file to make sure everything there is still the same. I have gone back and reworked it all step by step and I have the place where this new problem occurs. As it is a very long function, I am only going to reproduce it in a simplified version of the part that is suddenly not working.
elbow <- function(arg1,arg2) {
 my_data <- read.csv("data.csv", header=TRUE, sep=",") 
average_A <- (arg1 + arg2)
average_A
}
elbow(A3,A5)
# Error in elbow(A3, A5) : object 'A3' not found
Column headers are A3,A4,A5,A7,A8,A9,B2,B3,B5,B6,B7,B9
What stupid little error am I making? This is driving me batty. It has to be something trivial.
Here's my guess at what might work the way you wanted:
elbow <- function(arg1,arg2) {
my_data <- read.csv("data.csv", header=TRUE, sep=",")
average_A <- my_data[[arg1]] + my_data[[arg2]] # "[[" evaluates args
average_A
}
elbow('A3','A5') # entered a character literals
You should realize that the rest of my_data will have evaporated and be garbage collected after return from the elbow call. I could have showed you how to use your original expression following attach(), which would have been arguably safe within that function, but that would have violated my religious principles.
Probably during your last session you had objects named A3 or A5 in your workspace (either defined explicitly, or perhaps you had loaded and attached the data). The function was working because those objects were there, but it wasn't actually doing what you thought it was doing, so in a new session with a new workspace--without those objects--it's not working. Your function as written doesn't actually do anything with the dataset (my_data) which you are reading in inside of it; I suspect you want something like this:
elbow <- function(arg1, arg2) {
my_data <- read.csv("data.csv",header=TRUE,sep=",")
average_A <- my_data[,arg1] + my_data[,arg2]
return(average_A)
}
You will also need to use quotes when calling the function, e.g.
elbow('A3','A5')

Resources