I am trying to extract two regions ex1 and ex2 (in exlist) from a list of rasters (rasterlist) using the apply family and extract from the raster package. I could use a nested for loop but was wondering if there is a way to achieve this in with one of the apply family members, since nested for loops are considered more or less bad practice in R. Here the dummy code:
library(raster)
ras1 <- raster(matrix(runif(20), nrow = 5, ncol = 5))
ras2 <- ras1 * 2
ras3 <- ras1 * 0.5
rasterlist <- list(ras1, ras2, ras3)
ex1 <- extent(0, 0.4, 0, 0.4)
ex2 <- extent(0.6, 1, 0.4, 1)
exlist <- list(ex1, ex2)
At the moment I've got this as a (rather unsatisfying) solution:
out1 <- lapply(rasterlist, function(i) extract(i, ex1))
out2 <- lapply(rasterlist, function(i) extract(i, ex2))
N.B. The solution it does not need to be a member of the apply family (although that was the task I set myself) if there is a better, faster, more elegant way please share.
You could start with combining the regions into a single SpatialPolygons object (maybe they are to begin with?). With your example data that can be done like this:
ex <- do.call(bind, sapply(exlist, function(x) as(x, 'SpatialPolygons')))
In this example (with RasterLayer objects that can be stacked) you can then do
s <- stack(rasterlist)
extract(s, ex)
Related
My problem is the following
I want to create variables e_1, e_2, e_3, ... , e_50 which are all composed of 100 draws from the uniform[-1,1]
This means e_1 is a vector of 100 draws from U[-1.1], e_2, .., e_50 as well.
Here is what I thought I could do :
periods <- c(1:50)
people <- c(1:100)
for (t in periods){
sprint('e_', t) <- runif(100, -1,1)
}
This did not work, and i am really not sure how to change it to obtain what I want.
Thank you so much for your help!!
It is better not to create objects in the global environment. Regarding the issue in the code, the assignment should be based on assign
for(t in periods) {
assign(sprintf('e_%d', t), runif(100, -1, 1))
}
An approach that wouldn't create multiple objects in the global env, would be to create a list with replicate
lst1 <- replicate(length(periods), runif(100, -1, 1), simplify = FALSE)
names(lst1) <- sprintf('e_%d', periods)
I try to create a function to remove multiple outliers via cooks distance from a list of data frames.
There are some problems at the moment:
Can I formulate part 1 as function? I tried several things that did not work out. I want to use several different variables for the lm - so it would be great if I could use colnumbers and the regular expression syntax of data frames as input argument.
Part 2 - the filename of the plots are not correct. It takes the first observation in each data frame from the list as filename. How can I correct this?
Part 3: data frames without the outliers are not created. Function comes to an end after the message is printed. I can't find my mistake.
data(iris)
iris.lst <- split(iris[, 1:2], iris$Species)
new_names <- c(paste0(unlist(levels(iris$Species)),"_data"))
for (i in 1:length(iris.lst)) {
assign(new_names[i], iris.lst[[i]])
}
# Part 1: Then cooks distances
fit <- lapply(mget(ls(pattern = "_data")),
function(x) lm(x[,1] ~ x[,3], data = x))
cooksd <-lapply(fit,cooks.distance)
# Part 2: Plot each data frame with suspected outlier
plots <- function(x){
jpeg(file=paste0(names(x),".jpeg")) # file names are numbers
#par(mfrow=c(2,1))
plot(x, pch="*", cex=2, main="Influential cases by Cooks distance") # plot cook's distance
abline(h = 3*mean(x, na.rm=T), col="red") # add cutoff line
text(x=1:length(x)+1, y=x, labels=ifelse(x > 3*mean(x, na.rm=T),
names(x),""), col="red")
dev.off()
}
myplots <- lapply(cooksd, plots)
# Part 3: give me new data frames without influential cases
show_influential_cases <- function(x){
# invisible(cooksd[["n_OG"]] <- lapply(cooksd, length)
influential <- lapply(x,function(x) names(x)[x > 3*mean(x, na.rm=T)])
test <- as.data.frame(unlist(influential))[,1]
test <- as.numeric(test)
}
tested <- show_influential_cases(result)
cleaned_data <- add_new[-tested,] # removing outliers by indexing
Could someone please help me to improve my code?
Many thanks,
Nadine
In general, it is not a good practice to create multiple dataframes in global environment. Lists always are a better option, they are easy to manage.
Part 1 -
You can combine multiple steps in one lapply function. Here in part 1 we apply lm and cooks.distance function together in the same lapply call.
master_data <- split(iris[, 1:2], iris$Species)
data <- lapply(master_data, function(x) {
cooks.distance(lm(Sepal.Length ~ Sepal.Width, data = x))
})
new_names <- paste0(levels(iris$Species),"_data")
names(data) <- new_names
Part 2 -
lapply does not have access to names of the list, pass them separately and use Map to call plots function.
plots <- function(x, y){
jpeg(file=paste0(y,".jpeg"))
plot(x, pch="*", cex=2, main="Influential cases by Cooks distance")
abline(h = 3*mean(x, na.rm=T), col="red") # add cutoff line
text(x=1:length(x)+1,y=x,labels=ifelse(x > 3*mean(x, na.rm=T),y,""), col="red")
dev.off()
}
Map(plots, data, names(data))
Part 3 -
I am not exactly clear about how you want to perform Part3 but for now I am showing outlier and data separately.
remove_influential_cases <- function(x, y){
inds <- x > 3*mean(x, na.rm=TRUE)
y[!inds, ]
}
result <- Map(remove_influential_cases, data, master_data)
I am prototyping an application in R. I'm using the parallel library and parApply to run a function on columns of a data frame. I understand this will also be applicable to non-parallel/Apply application as well. I have a line similar to:
myBigList <- parApply(myCluster, myInputData, 2, myFunction)
where myFunction is a one that I have written, takes a vector as an input. The function itself performs quite a few operations that I can't go in to. It returns a list of variables of various classes. For the purposes of a MWE, say:
myFunction <- function(vectorIn){
# CODE GOES HERE
return(list(
mean = mean(vectorIn),
sd = mean(vectorIn),
vectorOut = sumUserFunction(vectorIn),
plot1 = aPlotGeneratingFunction(vectorIn),
))
What is returned to me is a list containing the results from the function. I can address elements from the list, eg:
myBigList$Column1$mean
But that isnt really helpful for my purposes. I'd like to know how to unpack the list so that I can look at all the mean values. eg:
listOfMeans <- myBigList$*ALL_ITEMS*$mean
so that listOfMeans is a vector with row.names, or data.frame with col.names.
Is this possible? I can think of a solution using a for loop but that doesnt seem very elegant.
I'd also like to do something similiar with the plots that I return so that I can automatically build a pdf containing all of them. I'm guessing learning the above will help.
tl;dr: What is the best methods of extracting common data names from a list?
EDIT: An actual MWE
library('ggplot2')
exampleData <- data.frame(Col1 = rnorm(100), Col2 = rnorm(100), Col3 = rnorm(100))
myFunction <- function(xIn){
meanX <- mean(xIn)
sdX <- sd(xIn)
vecX <- xIn^2 + xIn
plotX <-
ggplot(data.frame(xIn, vecX), aes(x = xIn, y = vecX)) +
geom_point()
return(list(
mean = meanX,
sd = sdX,
vect = vecX,
plot = plotX
))
}
myBigList <- apply(exampleData,
2,
myFunction)
from #docendo discusimus comment
mymeans <- sapply(myBigList, '[[', 'mean')
returns a vector of all the values stores in mean. To return a list, which is useful for storing the plot class the command should be:
myplots <- lapply(myBigList, '[[', 'plot')
Let's say I generate 9 groups of data in a list data and plot them each with a for loop. I could use *apply here too, whichever you prefer.
data = list()
layout(mat = matrix(1:9, nrow = 3))
for(i in 1:9){
data[[i]] = rnorm(n = 100, mean = i, sd = 1)
plot(data[[i]])
}
After creating all the data, I want to decide which one is best:
best_data = which.min(sapply(data, sd))
Now I want to highlight that best data on the plot to distinguish it. Is there a plotting function that lets me go back to a specified sub-plot in the active device and add an element (maybe a title)?
I know I could make a second for loop: for loop 1 generates the data, then I assess which is best, then for loop 2 creates the plots, but this seems less efficient and more verbose.
Does such a plotting function exist for base R graphics?
#rawr's answer is simple and easy. But I thought I'd point out another option that allows you to select the "best" data set before you plot, in case you want more flexibility to plot the "best" data set differently from the rest.
For example:
# Create the data
data = lapply(1:9, function(i) rnorm(n = 100, mean = i, sd = 1))
par(mar=c(4,4,1,1))
layout(mat = matrix(1:9, nrow = 3))
rng = range(data)
# Plot each data frame
lapply(1:9, function(i) {
# Select data frame with lowest SD
best = which.min(sapply(data, sd))
# Highlight data frame with lowest SD by coloring points red
plot(data[[i]], col=ifelse(best==i,"red","black"), pch=ifelse(best==i, 3, 1), ylim=rng)
})
i have a raster stack of 7 rasters with quite varying data ranges and not all of the rasters adhere to quite the same range. (some are low value ranges, some much higher). Using the levelplot function with the stack, it plots nicely enough, eg:
r <- raster(ncol=10,nrow=10)
r[] <- sample(c(1:3),size=100,replace=T)
r1 <- raster(ncol=10,nrow=10)
r1[] <- sample(c(1:9),size=100,replace=T)
r2 <- raster(ncol=10,nrow=10)
r2[] <- sample(c(5:15),size=100,replace=T)
r3 <- raster(ncol=10,nrow=10)
r3[] <- sample(c(3:35),size=100,replace=T)
s <- stack(r,r1,r2,r3)
breaks <- 7
my.at <- round(seq(min(minValue(s)), max(maxValue(s)), length.out = breaks),digits=2)
myColorkey <- list(at=my.at,height=0.95, width=1, labels=list(at=my.at,cex=1.1))
cols <- (brewer.pal(length(my.at)-1, "YlGnBu"))
levelplot(s,at=my.at,col.regions=cols,colorkey = myColorkey)
As you can see, the images with the lower value data are one colour (Actually in my real data most of the plots are one colour as the data range is dominated by two latter rasters). Using the levelplot function, i would like to reclassify the entire raster stack, teasing out some patterns in the lower value rasters with some classes that i define and simply assign anything over value x (perhaps 10 in the sample data above) to be one colour.
the usual method of ratifying and setting levels will not work for a stack and any workaround i have tried (using a matrix and reclassify) will not force more levels than there are classes for a raster
this is my workaround, using the standard legend, but i'd like to use ratify etc if possible;
# using s from above
m <- c(0,1,1, 1,3,2, 3,6,3, 6,10,4, 10,35,5)
mat <- matrix(m, ncol=3, byrow=TRUE)
src <- reclassify(s, mat)
breaks <- nrow(mat)
my.at <- (0:breaks)
myColorkey <- list(at=my.at,height=0.95, width=1, labels=list(at=my.at+0.5,labels=c("0-1","1-3","3-6","6-10","10-35"),cex=1.1))
cols <- (brewer.pal(length(my.at)-1, "YlGnBu"))
levelplot(src,at=my.at,col.regions=cols,colorkey = myColorkey)