I am a newcomer to R. Last week I had a long and complicated function working perfectly. The program was letting me pick a subset of columns and doing various manipulations on that subset. The function must work 'function(arg1=first_header_name, arg2=second_header_name,....)'. I have cleared the console, removed the old history file. I have read the manual again, I have checked the .csv file to make sure everything there is still the same. I have gone back and reworked it all step by step and I have the place where this new problem occurs. As it is a very long function, I am only going to reproduce it in a simplified version of the part that is suddenly not working.
elbow <- function(arg1,arg2) {
my_data <- read.csv("data.csv", header=TRUE, sep=",")
average_A <- (arg1 + arg2)
average_A
}
elbow(A3,A5)
# Error in elbow(A3, A5) : object 'A3' not found
Column headers are A3,A4,A5,A7,A8,A9,B2,B3,B5,B6,B7,B9
What stupid little error am I making? This is driving me batty. It has to be something trivial.
Here's my guess at what might work the way you wanted:
elbow <- function(arg1,arg2) {
my_data <- read.csv("data.csv", header=TRUE, sep=",")
average_A <- my_data[[arg1]] + my_data[[arg2]] # "[[" evaluates args
average_A
}
elbow('A3','A5') # entered a character literals
You should realize that the rest of my_data will have evaporated and be garbage collected after return from the elbow call. I could have showed you how to use your original expression following attach(), which would have been arguably safe within that function, but that would have violated my religious principles.
Probably during your last session you had objects named A3 or A5 in your workspace (either defined explicitly, or perhaps you had loaded and attached the data). The function was working because those objects were there, but it wasn't actually doing what you thought it was doing, so in a new session with a new workspace--without those objects--it's not working. Your function as written doesn't actually do anything with the dataset (my_data) which you are reading in inside of it; I suspect you want something like this:
elbow <- function(arg1, arg2) {
my_data <- read.csv("data.csv",header=TRUE,sep=",")
average_A <- my_data[,arg1] + my_data[,arg2]
return(average_A)
}
You will also need to use quotes when calling the function, e.g.
elbow('A3','A5')
Related
I'm struggling to clearly explain this problem.
Essentially, something has seemed to have happened within the R environment and none of the code I write inside my functions are working and not data is being saved. If I type a command line directly into the console it works (i.e. Monkey <- 0), but if I type it within a function, it doesn't store it when I run the function.
It could be I'm missing a glaring error in the code, but I noticed the problem when I accidentally clicked on the debugger and tried to excite out of the browser[1] prompt which appeared.
Any ideas? This is driving me nuts.
corr <- function(directory, threshold=0) {
directory <- paste(getwd(),"/",directory,"/",sep="")
file.list <- list.files(directory)
number <- 1:length(file.list)
monkey <- c()
for (i in number) {
x <- paste(directory,file.list[i],sep="")
y <- read.csv(x)
t <- sum(complete.cases(y))
if (t >= threshold) {
correl <- cor(y$sulfate, y$nitrate, use='pairwise.complete.obs')
monkey <- append(monkey,correl)}
}
#correl <- cor(newdata$sulfate, newdata$nitrate, use='pairwise.complete.obs')
#summary(correl)
}
corr('specdata', 150)
monkey```
It's a namespace issue. Functions create their own 'environment', that isn't necessarily in the global environment.
Using <- will assign in the local environment. To save an object to the global environment, use <<-
Here's some information on R environments.
I suggest you give a look at some tutorial on using functions in R.
Briefly (and sorry for my horrible explanation) objects that you define within functions will ONLY be defined within functions, unless you explicitly export them using (one of the possible approaches) the return() function.
browser() is indeed used for debugging, keeps you inside the function, and allows you accessing objects created inside the function.
In addition, to increase the probability to have useful answers, I suggest that you try to post a self-contained, working piece of code allowing quickly reproducing the issue. Here you are reading some files we have no access to.
It seems to me you have to store the output yourself when you run your script:
corr_out <- corr('specdata', 150)
I have a large number of files (in GB size).I want to run a for loop in which I call some files, do so processing that creates some files, bind them together, and save it.
AA<-c(1,6)
BB<-c(5,10)
for(i in length(AA)){
listofnames<-list.files(pattern="*eng")
listofnames<- listofnames[c(paste(AA[i],BB[i],sep=":"))]
listoffiles <- lapply( listofnames, readRDS)
}
But listofnames has NA. What I am doing wrong?
It took me a while looking at your code to realize that you were actually trying to construct a character representation of the expression 1:5 that was supposed to index a vector by position. This is very wrong; you just can't paste together arbitrary R commands/expressions and expect to drop them in to you code wherever. (Technically, there are tools that do that sort of thing, but they are discouraged.)
Probably you're looking to do something closer to:
listofnames <- list.files(pattern="*eng")
ind <- rep(1:5,each = 5,length.out = length(listofnames))
listofnames_split <- split(listofnames,ind)
for (i in seq_along(listofnames_split)){
my_data <- lapply(listofnames_split[[i]], readRDS)
#Do processing here
#...
rm(my_data) #Assuming memory really is a problem
}
But I'm just sketching out hypothetical code here, I can't really match it to your exact situation since your example isn't really fully fleshed out.
I'm new to R and programming and taking a Coursera course. I've asked in their forums, but nobody can seem to provide an answer in the forums. To be clear, I'm trying to determine why this does not output.
When I first wrote the program, I was getting accurate outputs, but after I tried to upload, something went wonky. Rather than producing any output with [1], [2], etc. when I run the program from RStudio, I only get the the blue +++, but no errors and anything I change still does not produce an output.
I tried with a previous version of R, and reinstalled the most recent version 3.2.1 for Windows.
What I've done:
Set the correct working directory through RStudio
pol <- function(directory, pol, id = 1:332) {
files <- list.files("specdata", full.names = TRUE);
data <- data.frame();
for (i in ID) {
data <- rbind(data, read.csv(files_list[i]))
}
subset <- subset(data, ID %in% id);
polmean <- mean(subset[pol], na.rm = TRUE);
polmean("specdata", "sulfate", 1:10)
polmean("specdata", "nitrate", 70:72)
polmean("specdata", "nitrate", 23)
}
Can someone please provide some direction - debug help?
when I adjust the code the following errors tend to appear:
ID not found
Missing or unexpected } (although I've matched them all).
The updated code is as follow, if I'm understanding:
data <- data.frame();
files <- files[grepl(".csv",files)]
pollutantmean <- function(directory, pollutant, id = 1:332) {
pollutantmean <- mean(subset1[[pollutant]], na.rm = TRUE);
}
Looks like you haven't declared what ID is (I assume: a vector of numbers)?
Also, using 'subset' as a variable name while it's also a function, and pol as both a function name and the name of one of the arguments of that same function is just asking for trouble...
And I think there is a missing ")" in your for-loop.
EDIT
So the way I understand it now, you want to do a couple of things.
Read in a bunch of files, which you'll use multiple times without changing them.
Get some mean value out of those files, under different conditions.
Here's how I would do it.
Since you only want to read in the data once, you don't really need a function to do this (you can have one, but I think it's overkill for now). You correctly have code that makes a vector with the file names, and then loop over over them, rbinding them to each other. The problem is that this can become very slow. Check here. Make sure your directory only contains files that you want to read in, so no Rscripts or other stuff. A way (not 100% foolproof) to do this is using files <- files[grepl(".csv",files)], which makes sure you only have the csv's (grepl checks whether a certain string is a substring of another, and returns a boolean the [] then only keeps the elements for which a TRUE was returned).
Next, there is 'a thing you want to do multiple times', namely getting out mean values. This is where you'd use a function. Apparently you want to get the mean for different types of pollution, and you want this in restricted IDs.
Let's assume that 1. has given you a dataframe df with a column named Type for the type of pollution and a column called Id that somehow represents a sort of ID (substitute with the actual names in your script - if you don't have a column for ID, I'll edit the answer later on). Now you want a function
polmean <- function(type, id) {
# some code that returns the mean of a restricted version of df
}
This is all you need. You write the code that generates df, you then write a function that will get you what you want from that dataframe, and then you call it for the circumstances you want to use it in (the three polmean calls at the end of your original code, but now without the first argument as you no longer need this).
Ok - I finally solved this. Thanks for the help.
I didn't need to call "specdata" in line 2. the directory in line 1 referred to the correct directory.
My for/in statement needed to refer the the id in the first line not the ID in the dataset. The for/in statement doesn't appear to need to be indented (but it looks cleaner)
I did not need a subset
The last 3 lines for pollutantmean did not need to be a part of the program. These are used in the R console to call the results one by one.
First off, I'm an R beginner taking an R programming course at the moment. It is extremely lacking in teaching the fundamentals of R so I'm trying to learn myself via you wonderful contributors on Stack Overflow. I'm trying to figure out how nested functions work, which means I also need to learn about how lexical scoping works. I've got a function that computes the complete cases in multiple CSV files and spits out a nice table right now.
Here's the CSV files:
https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip
And here's my code, I realize it'd be cleaner if I used the apply stuff but it works as is:
complete<- function(directory, id = 1:332){
data <- NULL
for (i in 1:length(id)) {
data[[i]]<- c(paste(directory, "/", formatC(id[i], width=3, flag=0),
".csv", sep=""))
}
cases <- NULL
for (d in 1:length(data)) {
cases[[d]]<-c(read.csv(data[d]))
}
df <- NULL
for (c in 1:length(cases)){
df[[c]] <- (data.frame(cases[c]))
}
dt <- do.call(rbind, df)
ok <- (complete.cases(dt))
finally <- as.data.frame(table(dt[ok, "ID"]), colnames=c("id", "nobs"))
colnames(finally) <- c('id', 'nobs')
return(finally)
}
I am now trying to call the different variables in the dataframe finally that is the output of the above function within this new function:
corr<-function(directory, threshold = 0){
complete(directory, id = 1:332)
finally$nobs
}
corr('specdata')
Without finally$nobs this function spits out the data frame, as it should, but when I try to call the variable nobs in object finally, it says object finally is not found. I realize this problem is due to my lack of understanding in the subject of lexical scoping, my professor hasn't really made lexical scoping very clear so I'm not totally sure how to find the object within the nested function environment... any help would be great.
The object finally is only in scope within the function complete(). If you want to do something further with the object you are returning, you need to store it in a variable in the environment you are working in (in this instance, the environment you are working in is the function corr(). If we weren't working inside any function, the environment would be the "global environment"). In other words, this code should work:
corr<-function(directory, threshold=0){
this.finally <- complete(directory, id=1:332)
this.finally$nobs
}
I am calling the object that is returned by complete() this.finally to help distinguish it from the object finally that is now out of scope. Of course, you can call it anything you like!
I like to pop out results in a window so that they're easier to see and find (e.g., they don't get lost as the console continues to scroll). One way to do this is to use sink() and file.show(). For example:
y <- rnorm(100); x <- rnorm(100); mod <- lm(y~x)
sink("tempSink", type="output")
summary(mod)
sink()
file.show("tempSink", delete.file=T, title="Model summary")
I commonly do this to examine model fits, as above, but also for a wide variety of other functions and objects, such as: summary(data.frame), anova(model1, model2), table(factor1, factor2). These are common, but other situations can arise as well. The point here is that both the nature of the function and the object can vary.
It is somewhat tedious to type out all of the above every time. I would like to write a simpler function that I can call, something like the following would be nice:
sinkShow <- function(obj, fun, title="output") {
sink("tempSink", type="output")
apply(obj, ?, fun)
sink()
file.show("tempSink", delete.file=T, title=title)
}
Clearly, this doesn't work. There are several issues. First, how would you do this so that it won't crash with the wrong type of object or function without having to have a list of conditional executions (i.e., if(is.list(obj) { lapply...). Second, I'm not sure how to handle the margin argument. Lastly, this doesn't work even when I try simple, contrived examples where I know that everything is set appropriately, so there seems to be something fundamentally wrong.
Does anyone know how such a situation can be handled simply and easily? I'm not new to R, but I've never been formally taught it; I've picked up tricks in an ad-hoc manner, i.e., I'm not a very sophisticated R programmer. Thanks.
Rather than use apply I think you want do.call. Make sure to wrap it in a print since it is inside of a function.
Here is one possible implementation:
sinkShow <- function( obj, fun, title='Output', ...) {
file <- tempfile()
args <- c(list(obj),list(...))
capture.output( do.call( fun, args ), file=file )
file.show(file, delete.file=TRUE, title=title)
}
Though it should probably be renamed since I skipped using sink as well. I might modify this a bit and put it into the TeachingDemos package.
Use page. Here's some sample models:
d <- data.frame(y1=rnorm(100), y2=rnorm(100), x=rnorm(100))
mod <- lm(y1~x, data=d)
mods <- list(mod1=lm(y1~x, data=d), mod2=lm(y2~x, data=d))
And here's how you'd use page:
page(summary(mod), method="print")
page(lapply(mods, summary), method="print")
For my original post, which had code that turned out to be a near-reimplementation of page, see the edit history.