complete.cases() working outside but not inside an R function? - r

I'm wondering why when I run: iris[complete.cases(iris), ] it works perfectly fine. But when I do the same thing from the function below, it gives me the error: colMeans(x, na.rm = TRUE) : 'x' must be numeric?
p.s. scale() works well with data.frames ==> scale(mtcars).
Can this be fixed?
Here is the function:
standard <- function(data, scale = TRUE, center = TRUE, na.rm = TRUE){
data <- if(na.rm) data[complete.cases(data), ]
data[paste0(names(data), ".s")] <- scale(data, center = center, scale = scale)
return(data)
}
# EXAMPLE:
standard(iris)

EDIT:
Yes, the error is thrown by scale(), and not earlier. If you want to scale all the numeric columns and leave the other columns as is, you'll need to add a step that extracts the numeric columns, scales them, and then puts them back in. Incidentally, scale can handle NA values, so you can put the complete.cases() call after the scale.
Original Answer:
You can step through this by adding a call to browser() inside your function, but I suspect you'll find the error is thrown here:
scale(data, center = center, scale = scale)
Note from the documentation on scale()
Arguments
x a numeric matrix(like object).
Here's how you'd debug this:
make your function this:
standard <- function(data, scale = TRUE, center = TRUE, na.rm = TRUE){
browser()
data <- if(na.rm) data[complete.cases(data), ]
data[paste0(names(data), ".s")] <- scale(data, center = center, scale = scale)
return(data)
}
Then try to call it with standard(immer)
It will open a browser for you to step through each statement in the function. If you do this in RStudio you can see the environment changes in the Environment tab in the upper right window. Use the command help to see how to navigate the browser, but in general, you'll use n and/or s to step through each statement. Q gets you out of the browser, and removing the browser() call from your function lets you run it as you would usually.

Related

Error when using 'which' to find position in a vector in R

Update 2.0: Now with data such that the errors should be reproducible:
Data for the different functions:
z <- seq(0,2,length=1000)
t <- grid <- c(0.1,0.55,0.9)
parA <- c(0.21,-0.93)
parB <- c(0.21,1.008)
p <- c(1,2,1,2)
## for plotting ##
f_func <- function(x) exp(-x^3+x)
envARS1 <- function(x){ exp(parA[1]*x+parB[1])}
envARS2 <- function(x){ exp(parA[2]*x+parB[2])}
plot(x=z,y=envARS1(z), type = "l", col = "blue", ylim = c(0,2), xlim = c(0,2))
lines(x=z,y=envARS2(z), type = "l", col = "red")
lines(x = z,(f_func(z)), type = "l", col = "black")
I'm trying to implement an Adaptive rejection sampler using a derivative-free approach. Along the way of this implementation, I have to implement a dynamic envelope function, which is able to adjust depending on the values/number of some Zt's.
I have accomplished to write a dynamic envelope function which seems to work fine but when I try to integrate the envelope, with the final aim of drawing from this envelope, I get errors.
DynamicEnv <- function(x){
exp(parA[p[max(which(x>=grid))]]*x+
parB[p[max(which(x>=grid))]])
}
The envelope function is a exponential linear line and the parameters a and b depends on where the x, it's input, is located relatively to the Zt's.
The variable 'grid' contains the Zt's and is therefore a vector, p is a dynamic position variable, which essentially tells the function which parameters to use.
So the first problem I had was that, when I gave my dynamic envelope a vector as input, I get troubles with the 'which' function which only can handle numeric values as far as I understand.
Updated with the error I receive from 'which'
I get the below error with which:
Error in which(x > grid) :
dims [product 3] do not match the length of object [1000]
Which I believe occurs because 'which' tries to compare both vectors to each other, and not the n'th element in x with the entire vector of grid.
Then I try to incorporate a loop, to loop over all the values in the x-vector, and return a vector with the output values, but then I got the error message 'non-finite function values' when I tried to integrate my dynamic envelope.
The dynamic envelope with a loop inside is;
DynamicEnv1 <- function(x){
Draws <- matrix(0,length(x),1)
for (i in 1:length(x)) Draws[i,1] <-
exp(parA[p[max(which(x[i]>=grid))]]*x[i] + parB[p[max(which(x[i]>=grid))]])
return(Draws)
}
I have written this 'static' envelope function, which works fine with respect to making draws from it (thereby integrate).
envARSup <- function(x){ (ifelse((x <= t[1] | t[2] < x & x <= t[3]),
exp(parA[1]*x+parB[1]),exp(parA[2]*x+parB[2])))*1*(x>0)}
Here the t's are the Zt's mentioned above. The idea of the dynamic envelope should be clear from this function, since they ideally should be able to return the same for the same grid (Zt's/t's).
The above function checks which interval the value of x belongs to, and based on the interval it uses a specific exponential linear line.
I would really appreciate if someone could suggest an alternative to the 'which' function, in order to locate a position in a vector or help me understand why I get the error message with the loop-based dynamic envelope.

Example in the 'rasterize' documentation does not work?

I use the rasterize function from the raster package quite often. As indicated in its documentation, any custom function being used through the fun argument needs to accept an na.rm argument. This generally means that custom functions are written with the 'dots', i.e.:
funA <- function(x,...)length(x)
However, a second proposed approach is to write a custom function with an explicit na.rm argument. The example that is given in the documentation is:
funB <- function(x, na.rm) if (na.rm) length(na.omit(x))
However, this does not seem to work! This example, in which some random points are distributed across a grid fails:
# Create a grid
grid <- raster(ncols=36, nrows=18)
# Scatter some random points within the grid
pts <- spsample(as(extent(grid), "SpatialPolygons"), 100, type = "random")
# Give them a random data field
pts <- SpatialPointsDataFrame(pts, data.frame(field1 = runif(length(pts))))
# Try rasterize
rasterize(pts, grid, field = "field1", fun = funB)
Is there something I'm missing here?
Thanks!
Andrew
You were close.
Function B should look like:
funB <- function(x, na.rm=T) if (na.rm) length(na.omit(x))
rasterize(pts, grid, field = "field1", fun = funB)
the na.rm argument as to be TRUE or FALSE, adding a default value deals with the problem.
What still annoys' me is that this:
funB <- function(x, na.rm) if (na.rm) length(na.omit(x))
rasterize(pts, grid, field = "field1", fun = funB, na.rm=TRUE)
should work but it doesn't. It's maybe something with the raster package.

R gplots heatmap.2 - key is unstable using breaks parameter (warning: unsorted 'breaks' will be sorted before use)

I'm visualizing a data set with the heatmap.2 function from the gplots package in R. Basically I'm performing a hierarchical clustering analysis on the original data, while forcing the heatmap to display a limited version of the data (between -3 and +3) to limit the effect of outliers on the appearance of the heatmap, while still retaining the original clustering. When I use the full data set (fullmousedatamat), it works just fine. However, when I use a partial data set (partialmousedatamat), and want to plot it using the same key as the full data set, a couple colors are dropped out of the key and I can't figure out why.
Here is a gist containing the relevant data sets and analyses:
https://gist.github.com/jeffbruce/7412f567ac57fe1721a3
Notice how the 4th color on either side of the centre white color are dropped out. This feels like a bug to me maybe. I get the following warning message which I'm not sure how to interpret:
Warning message:
In image.default(z = matrix(z, ncol = 1), col = col, breaks = tmpbreaks, :
unsorted 'breaks' will be sorted before use
Thanks for your help!
I came across the same issue and I had to go through the code for heatmap.2 to figure it out.
It turns out that symkey=T, which is the default, adds the extreme values of the data at both ends of breaks, rendering it un-sorted:
tmpbreaks <- breaks
if (symkey) {
max.raw <- max(abs(c(x, breaks)), na.rm = TRUE)
min.raw <- -max.raw
tmpbreaks[1] <- -max(abs(x), na.rm = TRUE)
tmpbreaks[length(tmpbreaks)] <- max(abs(x), na.rm = TRUE)
}
Therefore, the simple way to solve this is adding symkey=F if you are providing your own breaks.

R: custom ggplot2 color-transform gives error in labels

Basically, i have a dataframe with 3 numeric vectors(x,y,z), and lets say i wanna make a scatter plot of x,y colored by z. I want to transform the colorscale with a squareroot that respects sign, so i made my own with trans_new. Here is a simple dataset, but with the actual transform.
library(ggplot2)
library(scales)
set.seed(1)
plot<-data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100))
super_trans <- function(){
trans_new('super', function(X) sapply(X,function(x) {if(x>0){x^0.5} else{-(- x)^0.5}}), function(X) sapply(X,function(x){ if(x>0){x^2} else{-x^2}}))
}
ggplot(plot,aes(x,y))+geom_point(aes(colour=z))+scale_colour_gradient(trans="super")
It gives an error,
Error in if (x > 0) { : missing value where TRUE/FALSE needed
I don't understand it. I tried to backtrack the mistake, and my guess is that the error happens when trans_new tries to make breaks.
However, i do not understand how the "breaks" parameter works in trans_new.
Is there a ggplot2/Scales hero out there, that can help me transform my color-scale correctly?
It may be relevant that only some datasets gives errors.
There is a vectorized if, called ifelse. It also seems you are missing an extra minus.
super_trans <- function() {
trans_new('super',
function(x) ifelse(x>0, x^0.5, -(-x)^0.5),
function(x) ifelse(x>0, x^2, -(-x)^2))
}

tryCatch() apparently ignoring a warning

I'm writing a function that uses kmeans to determine bin widths to convert a continuous measurement (a predicted probability) into an integer (one of 3 bins). I've stumbled upon an edge case in which it's possible for my algorithm to (correctly) predict the same probability for a whole set, and I want to handle that situation. I'm using the rattle package's binning() function in the following way:
btsKmeansBin <- function(x, k = 3, default = c(0, 0.3, 0.5, 1)) {
result <- binning(x, bins = k, method = "kmeans", ordered = T)
bins <- attr(result, "breaks")
attr(bins, "names") <- NULL
bins <- bins[order(bins)]
bins[1] <- 0
bins[length(bins)] <- 1
return(bins)
}
Run this function on x <- c(.5,.5,.5,.5,.5,.5), and you'll get an error at the order(bins) step, because bins will be NULL and therefore not a vector.
Obviously, if x has only one distinct value, kmeans shouldn't work. In this case, I'd like to return the default bin divisions. When this happens, binning issues "Warning: the variable is not considered." So I'd like to use tryCatch to handle this warning, but surrounding the result <- ... line with the following code doesn't work the way I expect:
...
tryCatch({
result <- binning(x, bins = k, method = "kmeans", ordered = T)
}, warning = function(w) {
warn(sprintf("%s. Using default values", w))
return(default)
}, error = function(e) {
stop(e)
})
...
The warning gets printed as though I hadn't used tryCatch, and the code progresses past the return statement and throws the error from order again. I have tried a bunch of variations to no avail. What am I missing, here??
If you look in binning I think you'll find that the "warning" you see is not generated via warning() but with cat(), which is why tryCatch isn't picking it up. The author of binning probably deserves a few lashings with a wet noodle for this oversight. ;) (Or it could be on purpose due to the particular way that rattle works, I'm not sure.)
It appears to return NULL when this happens, so you could simply handle it manually. Not ideal, but possibly the only way to go.

Resources