QQline command works only on the last plot - r

The formula below applies lm to a list 'dsets' that contains multiple datasets and stores the output in object 'models'.
models<-lapply(dsets,function(data){
lm(reformulate(termlabels=".",response=names(data)[1]),data)
})
I created this function to plot qqplot for each of the model outputs stored in object 'models' but it won't work
rstest<-function(x){
for (i in 1:length(x))
qqnorm(residuals(x[[i]]))
qqline(residuals(x[[i]]))
}
rstest(models)
I get the plots, but qqline works only on the last plot not all of the plots being generated. What am I missing in my function thats not making qqline iterate?

Only the first expression after for(...) is looped. Wrap the body of the for loop in {} to combine the two expressions.
rstest <- function(x) {
for (i in seq_along(x)) {
qqnorm(residuals(x[[i]]))
qqline(residuals(x[[i]]))
}
}
rstest(models)
Using an editor/IDE that indents your code would have helped you to recognize this yourself.

Related

Encountering errors when writing a quicksort function in R using a partition function

I've been writing this quicksort function in R trying to incorporate a partition function I've created as well. However, I've been encountering bugs when comparing p and r. It keeps telling me my argument is of length 0, however, I thought I declared the p and r objects when I initially called the quicksort function.
partition <- function(input,p, r){
pivot = input[r]
while(p<r){
while(input[p]<pivot) {p<-p+1}
while(input[r]>pivot) {r<-r-1}
if(input[p]==input[r]) {p<-p+1}
else if (p<r){
tmp <- input[p]
input[p] = input[r]
input[r] = tmp
}
}
return(r)
}
quicksort<- function(input,p,r){
if(p<r){
j<- partition(input,p,r)
input <- quicksort(input,p,j-1)
input <- quicksort(input,j+1,r)
}
}
input <- c(500,700,800,100,300,200,900,400,1000,600)
print("Input:")
print(input)
quicksort(input,1,10)
print("Output:")
print(input)
The error in question is caused because input[p] is of length zero. Why? Because in this instance input is NULL. input isn't NULL for the first few goes, so what would make it NULL?
Your quicksort function is designed to take an input, change it (if p<r), and then output it. But, you've left out the output step. If p<r then this is taken care of implicitly by the last input <- ... line, but if not then the function doesn't do anything and just returns NULL.
The output from one call to quicksort is the input to the next, and so this NULL propagates and breaks the next call.
Recursive functions are beautiful but often frustrating to debug. I recommend liberally sprinkling print() statements around while you're still developing it so you can see what it's doing more easily.

How do I pass a function argument into map_df()?

I am trying to create a function to clean data and return as a data.frame in R.
I'm using the map_df() function to return the cleaned data as a data.frame, and have a function written to clean the data.
The first thing I do is pull a list of files from a folder, then iterate through them and clean each file. I have a pre-defined set specifying which column names to pull (stored in selectCols) in case of variation between files:
files <- list.files(filepath,full.names=F)
colInd <- which(names(fread(files[i],nrows=0)) %in% gsub("_","",selectCols))
I also have a function to clean my data, which uses fread() to read in the .csv files. It takes colInd and i as arguments to clean files iteratively.
cleanData <- function(files,i,colInd) {
addData <- fread(files[i],select=c(colInd))
[...]
}
Overall it looks like this (as a recursive function):
i <- 1
files <- list.files(filepath,full.names=F)
iterateCleaning <- function(files,i) {
colInd <- (which(names(fread(files[i],nrows=0)) %in% gsubs("_","",selectCols))
if (length(colInd)==length(selectCols)) {
newData <- map_df(files,cleanData)
saveToFolder(newData,i,files)
}
else {}
i=i+1
if (i<-length(files)){
iterateCleaning(files,i)
}
else {}
}
When I try to run without specifying the arguments for my function I get this error:
Error in fread(files,select=c(colInd)):
argument "colInd" is missing, with no default.
When I insert it into my map_df() I do it like so:
newData <- map_df(files,i,colInd,cleanData)
Then I get this error:
Error in as_mapper(.f,...): object 'colInd' not found.
Any suggestions for resolving this error? As I understand it, map_df() applies to each element in the function, but I don't need it applied to the i and colInd inputs, I just need them for the function I am calling in map_df(). How can I call map_df() on a function that requires additional arguments?
I read the documentation but it seemed a bit confusing. It says for a single-argument function to use "." and for two-argument functions to use .x and .y, but I'm not sure what it means. My initial guess is something like these, but neither line works):
newData <- map_df(files,cleanData,.i,.colInd)
newData <- map_df(files,cleanData,.x=i,.y=colInd)
Any recommendations? Will I have the same output if I just call map_df() afterwards on the output of my function?

Subsetting data as generic function in R

I am trying to create a function that plots graphs for either an entire dataset, or a subset of the data. The function needs to be able to do both so that you can plot the subset if you so wish. I am struggling with just coming up with the generic subset function.
I currently have this code (I am more of a SAS user so R is confusing me a bit):
subset<-function(dat, varname, val)
if(dat$varname==val) {
data<-subset(dat, dat$varname==val)
}
But R keeps returning this error message:
Error in if (dat$varname == val) { : argument is of length zero
Could someone help me to resolve this? Thanks so much! I figure it may have to do with the way I wrote it.
First off all the $ operator can not handle variables. In your code you are always looking up a column named varname.
Replace $varname with [varname] instead.
The next error is that you are conditioning on a vector, dat$varname==val will be vector of booleans.
A third error in your code is that you are naming your function subset and thus overlayering the subset function in the base package. So the inner call to subset will be a recursive call to your own function. To fix this rename your function or you have to specify that it is the subset function in the base package you are calling with base::subset(dat, dat[varname]==val).
The final error in the code is that your function does not return anything. Do not assign the result to the variable data but return it instead.
Here is how the code should look like.
mySubset<-function(dat, varname, val)
if(any(dat[varname]==val)) {
subset(dat, dat[varname]==val)
} else {
NA
}
Or even better
mySubset <- function(dat,varname,val) dat[dat[varname] == val]

"could not find function" when using functions as arguments

I have two .R files, plotDataSet(..) and plotAllDataSets(). plotDataSet(..) makes a call to curve(..) (in the R graphics library), while plotAllDataSets() makes a call to plotDataSet(..). plotDataSet(..) takes a function as a parameter, and passes it to curve(..).
I want to pass in my function argument for curve(..) into plotDataSet(..) from a list of functions, such as:
v <- c(function(x){x}, function(x){x*x}, function(x){x*x}, function(x){x*x*x},
function(x){x*x}, function(x){x*x*x}, function(x){x*x*x})
for (i in 1:7) {
plotSaveData(data, v[i], i)
}
I get the following output: Error in eval(expr, envir, enclos) :
could not find function "expectedOrderEquation".
Interestingly, when I call plotDataSet(..) and pass in a function like function(x){x*x}, it works fine:
for (i in 1:7) {
plotSaveData(data, function(x) {x}, i)
}
But this won't let me call plotSaveData(..) while cycling through a list of functions.
Can someone please explain why this does not work?
I hope this is sufficient, but I am happy to provide more context as needed. Also, I am a bit new to R, so any corrections to my descriptions would be helpful.
use double brackets instead of single brackets
v[[i]] instead of v[i]
Have a look at the difference between these two:
v[[i]] (3)
v[i] (3) # error
The single brackets returns a list, whose contents is a function
The double brackets returns the function.

output from "for" loop

based on Roland's suggestion from Plot titles in R using sapply(), I have created the following loop to make boxplots out of every selected variable in my dataset.
all.box=function(x) {
for (i in seq_along(x)) {
boxplot(x[,i], main = names(x)[i])
}
}
It does the job nicely in that it provides the graphs. Could someone point out to me how to make the loop to return some output, say the $out from the boxplot to be able to see the number of outliers calculated by it?
Thanx a lot!
Using lapply here is better to avoid side-effect of the for:
all.box=function(x) {
res <- lapply(seq_along(x),function(i){
boxplot(x[,i], main = names(x)[i])$out
})
res
}
PS: you can continue to use for, but you will need either to append a list as a result within your loop or to allocate memory for the output object before calling boxplot. So I think it is simpler to use xxapply family function here.
If you want to return something from a for loop, it's very important to pre-allocate the return object if it's not a list. Otherwise for loops with many iterations will be slow. I suggest to read the R inferno and Circle 2 in particular.
all.box=function(x) {
result <- list()
for (i in seq_along(x)) {
result[[i]] <- boxplot(x[,i], main = names(x)[i])$out
}
result
}

Resources