I use the rasterize function from the raster package quite often. As indicated in its documentation, any custom function being used through the fun argument needs to accept an na.rm argument. This generally means that custom functions are written with the 'dots', i.e.:
funA <- function(x,...)length(x)
However, a second proposed approach is to write a custom function with an explicit na.rm argument. The example that is given in the documentation is:
funB <- function(x, na.rm) if (na.rm) length(na.omit(x))
However, this does not seem to work! This example, in which some random points are distributed across a grid fails:
# Create a grid
grid <- raster(ncols=36, nrows=18)
# Scatter some random points within the grid
pts <- spsample(as(extent(grid), "SpatialPolygons"), 100, type = "random")
# Give them a random data field
pts <- SpatialPointsDataFrame(pts, data.frame(field1 = runif(length(pts))))
# Try rasterize
rasterize(pts, grid, field = "field1", fun = funB)
Is there something I'm missing here?
Thanks!
Andrew
You were close.
Function B should look like:
funB <- function(x, na.rm=T) if (na.rm) length(na.omit(x))
rasterize(pts, grid, field = "field1", fun = funB)
the na.rm argument as to be TRUE or FALSE, adding a default value deals with the problem.
What still annoys' me is that this:
funB <- function(x, na.rm) if (na.rm) length(na.omit(x))
rasterize(pts, grid, field = "field1", fun = funB, na.rm=TRUE)
should work but it doesn't. It's maybe something with the raster package.
Related
I have been trying to find a way to make a scatter plot with colour intensity that is indicative of the density of points plotted in the area (it's a big data set with lots of overlap). I found these lines of code which allow me to do this but I want to make sure I actually understand what each line is actually doing.
Thanks in advance :)
get_density <- function(x, y, ...){
dens <- MASS::kde2d(x, y, ...)
ix <- findInterval(x, dens$x)
iy <- findInterval(y, dens$y)
ii <- cbind(ix, iy)
return(dens$z[ii])
}
set.seed(1)
dat <- data.frame(x = subset2$conservation.phyloP, y = subset2$gene.expression.RPKM)
dat$density <- get_density(dat$x, dat$y, n = 100)
Below is the function with some explanatory comments, let me know if anything is still confusing:
# The function "get_density" takes two arguments, called x and y
# The "..." allows you to pass other arguments
get_density <- function(x, y, ...){
# The "MASS::" means it comes from the MASS package, but makes it so you don't have to load the whole MASS package and can just pull out this one function to use.
# This is where the arguments passed as "..." (above) would get passed along to the kde2d function
dens <- MASS::kde2d(x, y, ...)
# These lines use the base R function "findInterval" to get the density values of x and y
ix <- findInterval(x, dens$x)
iy <- findInterval(y, dens$y)
# This command "cbind" pastes the two sets of values together, each as one column
ii <- cbind(ix, iy)
# This line takes a subset of the "density" output, subsetted by the intervals above
return(dens$z[ii])
}
# The "set.seed()" function makes sure that any randomness used by a function is the same if it is re-run (as long as the same number is used), so it makes code more reproducible
set.seed(1)
dat <- data.frame(x = subset2$conservation.phyloP, y = subset2$gene.expression.RPKM)
dat$density <- get_density(dat$x, dat$y, n = 100)
If your question is about the MASS::kde2d function itself, it might be better to rewrite this StackOverflow question to reflect that!
It looks like the same function is wrapped into a ggplot2 method described here, so if you switch to making your plot with ggplot2 you could give it a try.
I'm wondering why when I run: iris[complete.cases(iris), ] it works perfectly fine. But when I do the same thing from the function below, it gives me the error: colMeans(x, na.rm = TRUE) : 'x' must be numeric?
p.s. scale() works well with data.frames ==> scale(mtcars).
Can this be fixed?
Here is the function:
standard <- function(data, scale = TRUE, center = TRUE, na.rm = TRUE){
data <- if(na.rm) data[complete.cases(data), ]
data[paste0(names(data), ".s")] <- scale(data, center = center, scale = scale)
return(data)
}
# EXAMPLE:
standard(iris)
EDIT:
Yes, the error is thrown by scale(), and not earlier. If you want to scale all the numeric columns and leave the other columns as is, you'll need to add a step that extracts the numeric columns, scales them, and then puts them back in. Incidentally, scale can handle NA values, so you can put the complete.cases() call after the scale.
Original Answer:
You can step through this by adding a call to browser() inside your function, but I suspect you'll find the error is thrown here:
scale(data, center = center, scale = scale)
Note from the documentation on scale()
Arguments
x a numeric matrix(like object).
Here's how you'd debug this:
make your function this:
standard <- function(data, scale = TRUE, center = TRUE, na.rm = TRUE){
browser()
data <- if(na.rm) data[complete.cases(data), ]
data[paste0(names(data), ".s")] <- scale(data, center = center, scale = scale)
return(data)
}
Then try to call it with standard(immer)
It will open a browser for you to step through each statement in the function. If you do this in RStudio you can see the environment changes in the Environment tab in the upper right window. Use the command help to see how to navigate the browser, but in general, you'll use n and/or s to step through each statement. Q gets you out of the browser, and removing the browser() call from your function lets you run it as you would usually.
I am prototyping an application in R. I'm using the parallel library and parApply to run a function on columns of a data frame. I understand this will also be applicable to non-parallel/Apply application as well. I have a line similar to:
myBigList <- parApply(myCluster, myInputData, 2, myFunction)
where myFunction is a one that I have written, takes a vector as an input. The function itself performs quite a few operations that I can't go in to. It returns a list of variables of various classes. For the purposes of a MWE, say:
myFunction <- function(vectorIn){
# CODE GOES HERE
return(list(
mean = mean(vectorIn),
sd = mean(vectorIn),
vectorOut = sumUserFunction(vectorIn),
plot1 = aPlotGeneratingFunction(vectorIn),
))
What is returned to me is a list containing the results from the function. I can address elements from the list, eg:
myBigList$Column1$mean
But that isnt really helpful for my purposes. I'd like to know how to unpack the list so that I can look at all the mean values. eg:
listOfMeans <- myBigList$*ALL_ITEMS*$mean
so that listOfMeans is a vector with row.names, or data.frame with col.names.
Is this possible? I can think of a solution using a for loop but that doesnt seem very elegant.
I'd also like to do something similiar with the plots that I return so that I can automatically build a pdf containing all of them. I'm guessing learning the above will help.
tl;dr: What is the best methods of extracting common data names from a list?
EDIT: An actual MWE
library('ggplot2')
exampleData <- data.frame(Col1 = rnorm(100), Col2 = rnorm(100), Col3 = rnorm(100))
myFunction <- function(xIn){
meanX <- mean(xIn)
sdX <- sd(xIn)
vecX <- xIn^2 + xIn
plotX <-
ggplot(data.frame(xIn, vecX), aes(x = xIn, y = vecX)) +
geom_point()
return(list(
mean = meanX,
sd = sdX,
vect = vecX,
plot = plotX
))
}
myBigList <- apply(exampleData,
2,
myFunction)
from #docendo discusimus comment
mymeans <- sapply(myBigList, '[[', 'mean')
returns a vector of all the values stores in mean. To return a list, which is useful for storing the plot class the command should be:
myplots <- lapply(myBigList, '[[', 'plot')
I want to plot two different data sets in a scatterplot matrix.
I know that I can use upper.panel and lower.panel to differentiate the plot function. However, I don’t succeed in putting my data in a suitable format to harness this.
Assume I have two tissues (“brain” and “heart”) and four conditions (1–4). Now I can use e.g. pairs(data$heart) to get a scatterplot matrix for one of the data sets. Assume I have the following data:
conditions <- 1 : 4
noise <- rnorm(100)
data <- list(brain = sapply(conditions, function (x) noise + 0.1 * rnorm(100)),
heart = sapply(conditions, function (x) noise + 0.3 * rnorm(100)))
How do I get this into a format so that pairs(data, …) plots one data set above and one below the diagonal, as shown here (green = brain, violet = heart):
Just using
pairs(data, upper.panel = something, lower.panel = somethingElse)
Doesn’t work because that will plot all conditions versus all conditions without regard for different tissue – it essentially ignores the list, and the same when reordering the hierarchy (i.e. having data = (A=list(brain=…, heart=…), B=list(brain=…, heart=…), …)).
This is the best I seem to be able to do via passing arguments:
foo.upper <- function(x,y,ind.upper,col.upper,ind.lower,col.lower,...){
points(x[ind.upper],y[ind.upper],col = col.upper,...)
}
foo.lower <- function(x,y,ind.lower,col.lower,ind.upper,col.upper,...){
points(x[ind.lower],y[ind.lower],col = col.lower,...)
}
pairs(dat[,-5],
lower.panel = foo.lower,
upper.panel = foo.upper,
ind.upper = dat$type == 'brain',
ind.lower = dat$type == 'heart',
col.upper = 'blue',
col.lower = 'red')
Note that each panel needs all arguments. ... is a cruel mistress. If you include only the panel specific arguments in each function, it appears to work, but you get lots and lots of warnings from R trying to pass these arguments on to regular plotting functions and obviously they won't exist.
This was my quick first attempt, but it seems ugly:
dat <- as.data.frame(do.call(rbind,data))
dat$type <- rep(c('brain','heart'),each = 100)
foo.upper <- function(x,y,...){
points(x[dat$type == 'brain'],y[dat$type == 'brain'],col = 'red',...)
}
foo.lower <- function(x,y,...){
points(x[dat$type == 'heart'],y[dat$type == 'heart'],col = 'blue',...)
}
pairs(dat[,-5],lower.panel = foo.lower,upper.panel = foo.upper)
I'm abusing R's scoping here in this second version a somewhat ugly way. (Of course, you could probably do this more cleanly in lattice, but you probably knew that.)
The only other option I can think of is to design your own scatter plot matrix using layout, but that's probably quite a bit of work.
Lattice Edit
Here's at least a start on a lattice solution. It should handle varying x,y axis ranges better, but I haven't tested that.
dat <- do.call(rbind,data)
dat <- as.data.frame(dat)
dat$grp <- rep(letters[1:2],each = 100)
plower <- function(x,y,grp,...){
panel.xyplot(x[grp == 'a'],y[grp == 'a'],col = 'red',...)
}
pupper <- function(x,y,grp,...){
panel.xyplot(x[grp == 'b'],y[grp == 'b'],...)
}
splom(~dat[,1:4],
data = dat,
lower.panel = plower,
upper.panel = pupper,
grp = dat$grp)
I would like to write a function that takes any user-provided mathematical function (e.g., x^2) and do different things with it, for example:
#-----------------nonworking code---------------------
foo <- function(FUN, var){
math_fun <- function(x){
FUN
}
curve(math_fun, -5, 5) #plot the mathematical function
y = math_func(var) #compute the function based on a user provided x value.
points(x=var, y=y) #plot the value from the last step.
}
#A user can use the function defined above in a way as shown below:
Function <- x^2 + x
foo(FUN=Function, var = 2)
But obviously this function doesn't work:
First of all, if I run this function, I get Error in math_fun(x) : object 'x' not found.
Second of all, even if the function did work, I am assuming that the variable is x, but the user can make use of any letter.
For this second problem, one potential solution is to also ask the user to specify the letter they use as the variable.
foo <- function(FUN, var, variable){
math_fun <- function(variable){
FUN
}
curve(math_fun, -5, 5)
y = math_func(var)
points(x=var, y=y)
}
But I am at loss as to how exactly I can implement this... If someone can help me solve at least part of the problem, that would be great. Thanks!
It is a lot simpler than that. The user defined function should contain the arguments in its definition, e.g. function(x) x^2 + x instead of x^2 + x. Then it can be passed and called directly:
foo <- function(math_fun, var){
curve(math_fun, -5, 5) #plot the mathematical function
y = math_fun(var) #compute the function based on a user provided x value
points(x=var, y=y) #plot the value from the last step.
}
#A user can use the function defined above in a way as shown below:
Function <- function(x) x^2 + x
foo(Function, var = 2)