get an empty SpatialPolygonsDataFrame via subset?

get an empty SpatialPolygonsDataFrame via subset? - r

I'm looking to subset a SpatialPolygonsDataFrame by an attribute, but I want to allow it to return an empty SpatialPolygonsDataFrame.
If we are to treat objects of type SpatialPolygonsDataFrame like data.frames, as discussed here, we should be able to get and work with empty objects.
I'm interested because I want to incorporate this into a function that may try to subset by an attribute that will essentially pick no features.
owd <- getwd()
setwd(system.file("shapes", package = "maptools"))
library(maptools)
nc90 <- readShapeSpatial("co37_d90")
setwd(owd)
nc90#data[nc90#data$AREA>0.15,] # returns data.frame
bigctys <- nc90[nc90#data$AREA>0.15,] # SpatialPolygonsDataFrame
nc90#data[nc90#data$AREA>0.25,] # returns empty data.frame
bigestctys <- nc90[nc90#data$AREA>0.25,] # ERROR
Is there a way to make this work? If not, is there a way to initalize an empty SpatialPolygonsDataFrame object? The future actions I want to perform on such an object involve over plotting on an existing map, so I'd like the image to be produced anyways, even if blank.

Right now you can't. This is somewhat inconsistent, as for SpatialPointsDataFrame objects you can:
library(sp)
demo(meuse, ask = FALSE)
x = meuse[F,]
although with warnings; also, validObject(x) returns FALSE, so they are intended to be not allowed!
It's a bit abstract what such objects should represent, but I can see the analogy with data.frame objects with zero rows: it is useful that they can exist.

Related

retreive Seurat object name in R during a for loop

I'm working on single cell rna-seq on Seurat and I'm trying to make a for() loop over Seurat objects to draw several heatmaps of average gene expression.
for(i in c(seuratobject1, seuratobject2, seuratobject3)){
cluster.averages <- data.frame(AverageExpression(i, features = genelist))
cluster.averages$rowmeans <- rowMeans(cluster.averages)
genelist.new <- as.list(rownames(cluster.averages))
cluster.averages <- cluster.averages[order(cluster.averages$rowmeans),]
HMP.ordered <- DoHeatmap(i, features = genelist.new, size = 3, draw.lines = T)
ggsave(HMP.ordered, file=paste0(i, ".HMP.ordered.png"), width=7, height=30)
the ggsave line does not work as it takes i as a seurat object. Hence my question: How to get ggsave() to use the name of my seurat object stored in "i"?
I tried substitute(i) and deparse(substitute(i)) w/o success.

Short answer: you can’t.
Long answer: using substitute or similar to try to get i’s name will give you … i. (This is different for function arguments, where substitute(arg) gives you the call’s argument expression.)
You need to use a named vector instead. Ideally you’d have your Seurat objects inside a list to begin with. But to create such a list on the fly, you can use get:
names = c('seuratobject1', 'seuratobject2', 'seuratobject3')
for(i in names) {
cluster.averages <- data.frame(AverageExpression(get(i), features = genelist))
# … rest is identical …
}
That said, I generally advocate strongly against the use of get and for treating the local environment as a data structure. Lists and vectors are designed to be used in this situation instead.

Why does dput()/dput2() not work with Polygons / SpatialPolygons

I would like to ask another question, which includes SpatialPolygons. In order to make it reproducible I wanted to use dput() for the SpatialPolygons object, but its not outputting a reproducible structure.
Why can I use dput() with SpatialPoints, but not with Lines/SpatialLines, Polygons/SpatialPolygons?
Is the only workaround, to export the coordinates and recreate the SpatialPolygons in the example?
Test Data:
library(sp)
df = data.frame(lon=runif(10, 15,19), lat=runif(10,40,45))
dput(SpatialPoints(coordinates(df)))
dput(Lines(list(Line(coordinates(df))), 1))
dput(SpatialLines(list(Lines(list(Line(coordinates(df))), 1))))
dput(Polygons(list(Polygon(df)), 1))
dput(SpatialPolygons(list(Polygons(list(Polygon(df)), 1))))
dput(SpatialPolygons(list(Polygons(list(Polygon(df)), 1))), control="all")
The dupt2() method from this answer works for Lines/SpatialLines but not for Polygons/SpatialPolygons, where this error occurs:
Error in validityMethod(object) : object 'Polygons_validate_c' not
found
So how to make a SpatialPolygons-object reproducible?
A workaround would be to convert the objects to simple features and then use dput(). They can obviously be deparsed.
Example using LINESTRING and POLYGON:
library(sp)
library(sf)
df = data.frame(lon=runif(10, 15,19), lat=runif(10,40,45))
SLi = SpatialLines(list(Lines(list(Line(coordinates(df))), 1)))
SPo = SpatialPolygons(list(Polygons(list(Polygon(df)), 1)))
dput(st_as_sf(SLi))
dput(st_as_sf(SPo))

After running the code I mentioned in the comments, I decided I would offer a tentative solution and see if you a) have the same results on your system, and b) whether it addressed the issues you were having.
newSpPa <- dput(SpatialPolygons(list(Polygons(list(Polygon(df)), 1))), control="all")
oldSpPa <- SpatialPolygons(list(Polygons(list(Polygon(df)), 1)))
identical(oldSpPa, newSpPa)
#[1] TRUE
It wasn't clear from my reading your question whether the return of a call to new("SpatialPolygons", ...) was deemed to be unsatisfactory. I think the assignment step that I did was different than your code and it's possible that my assignment would only succeed in the setting of previously defined objects being in the workspace at the time of creation. If that's the case then I think the typical suggestion would be to do this in the setting of package-creation.

Fill an object with the results of every loop iteration in R

first time asker here. I have recently strated working with R and I hope I could get some help with an issue. The problem is probably easy to solve but I haven't been able to find an answer by myself and my research hasn't been succesful either.
Basically I need to create a single object based on the input of a loop. I have 7 simulated asset returns, these objects contain the results from a simulation I ran. I want to match the columns from every object and form a combined one (i.e. every column 1 forms an object), which will be used for some calculations.
Finally, the result from each iteration should be stored on a single object that has to be available outside the loop for further analysis.
I have created the following loop, the problem is that only the result from the last iteration is being written in the final object.
# Initial xts object definition
iteration_returns_combined <- iteration_returns_draft_1
for (i in 2:10){
# Compose object by extracting the i element of every simulation serie
matrix_daily_return_iteration <- cbind(xts_simulated_return_asset_1[,i],
xts_simulated_return_asset_2[,i],
xts_simulated_return_asset_3[,i],
xts_simulated_return_asset_4[,i],
xts_simulated_return_asset_5[,i],
xts_simulated_return_asset_6[,i],
xts_simulated_return_asset_7[,i])
# Transform the matrix to an xts object
daily_return_iteration_xts <- as.xts(matrix_daily_return_iteration,
order.by = index(optimization_returns))
# Calculate the daily portfolio returns using the iteration return object
iteration_returns <- Return.portfolio(daily_return_iteration_xts,
extractWeights(portfolio_optimization))
# Create a combined object for each iteration of portfolio return
# This is the object that is needed in the end
iteration_returns_combined <<- cbind(iteration_returns_draft_combined,
iteration_returns_draft)
}
iteration_returns_combined_after_loop_view
Could somebody please help me to fix this issue, I would be extremely grateful for any information anyone can provide.
Thanks,
R-Rookie

By looking at the code, I surmise that the error is in the last line of your for loop.
iteration_returns_draft_combined
was never defined, so it is assumed to be NULL. Essentially, you only bind columns of the results from each iteration to a NULL object. Hence the output of your last loop is also bound by column to a NULL object, which is what you observe. Try the following:
iteration_returns_combined <- cbind(iteration_returns_combined,
iteration_returns)
This should work, hopefully!

Consider sapply and avoid expanding an object within a loop:
iteration_returns_combined <- sapply(2:10, function(i) {
# Compose object by extracting the i element of every simulation serie
matrix_daily_return_iteration <- cbind(xts_simulated_return_asset_1[,i],
xts_simulated_return_asset_2[,i],
xts_simulated_return_asset_3[,i],
xts_simulated_return_asset_4[,i],
xts_simulated_return_asset_5[,i],
xts_simulated_return_asset_6[,i],
xts_simulated_return_asset_7[,i])
# Transform the matrix to an xts object
daily_return_iteration_xts <- as.xts(matrix_daily_return_iteration,
order.by = index(optimization_returns))
# Calculate the daily portfolio returns using the iteration return object
iteration_returns <- Return.portfolio(daily_return_iteration_xts,
extractWeights(portfolio_optimization))
})
And if needed to column bind first vector/matrix, do so afterwards:
# CBIND INITIAL RUN
iteration_returns_combined <- cbind(iteration_returns_draft_1, iteration_returns_combined)

Model R code for an S3 class structured as a data frame with columns identified by attribute(s)

I am venturing into the world of creating an R S3 class for the first time. My basic object is just going to be a data frame or tibble with certain columns identified by the user, so that my class-specific functions know how to find what they need. The constructor is also going to add a few columns computed from the others, and impose an ordering based on particular columns and parameter values.
I am guessing that there is canonical code for this, but I am not sure where to find it. My thought was just to have a series of attributes that each contain the column name(s) of the appropriate column(s), but it would be nice if I could supply the alternative of names or numbers. I don't need fancy name creation features because I am starting with data frames that should already have them, but I do need to be able to access each column in my object by either its actual name or its attribute name.
I am not at all confident that I have the basic idea of how to do this down properly. For example, I am unsure if if there is any advantage to having each column name or group of names be its own attribute, as vs having one attribute object consisting of a list of named char vectors of column names. I am a little fuzzy on the structure R imposes on multiple attributes, actually. But I am hoping to make this a package, so I want to do it right.
Anyone have a similar class handy that they would recommend as a model? Or a pointer to an a well-implemented base class of similar structure would also do the job (if implemented exclusively in R code).
Here is my basic idea of how I am doing the constructor:
distr <- function(X, inc, comp, AdultEq="sqrt", ..., major=NULL, minor=NULL,
wt){
attr(as.tbl(X), "class") <- "distr"
attr(X, "income") <- inc
attr(X, "incomeComponents") <- comp
attr(X, "adultEquiv") <- AdultEq
attr(X, "majorGroup") <- major
attr(X, "minorGroup") <- minor
attr(X, "weight") <- wt
# etc.
# adjust income and components for household composition
X <- mutate(X, adjInc = X[, income] / if(is.function(adultEquiv)) {
adultEquiv(...)} else {equivLst[[adultEquiv]](...)},
adjIncComp <- X[, incomeComponents] / if(is.function(adultEquiv)) {
adultEquiv(...)} else {equivLst[[adultEquiv]](...)})
X <- arrange(X, c(majorGroup, adjInc))
X <- group_by(majorGroup)
X <- mutate(X, cdf <- cumsum(weight/sum(weight)) )
# etc.
}
Then I will have methods for weighted sums and quantiles, conditional means, summary statistics, a print method, and so forth.
I tried to create a new tag for R's s3-classes, but I guess I don't have enough rep yet.

With what you're describing, you could consider creating a new S4 class. S4 is a stricter version of S3, but it makes sense with your data structure. So, instead of attributes, you could use slots. This means you can verify the object [and each column], but it also means you'd give up data frame properties for something that's more like a general list. You could then set the generics (show/print, plot, summary, etc) for that class. Hadley's book is okay on S4 classes. I also found the R manual to be very useful.
http://adv-r.had.co.nz/S4.html
https://cran.r-project.org/doc/manuals/r-release/R-ints.html

How to attach a simple data.frame to a SpatialPolygonDataFrame in R?

I have (again) a problem with combining data frames in R. But this time, one is a SpatialPolygonDataFrame (SPDF) and the other one is usual data.frame (DF). The SPDF has around 1000 rows the DF only 400. Both have a common column, QDGC
Now, I tried
oo <- merge(SPDF,DF, by="QDGC", all=T)
but this only results in a normal data.frame, not a spatial polygon data frame any more.
I read somewhere else, that this does not work, but I did not understand what to do in such a case (has to do something with the ID columns, merge uses)
oooh such a hard question, I quess...
Thanks!
Jens

Let df = data frame, sp = spatial polygon object and by = name or column number of common column. You can then merge the data frame into the sp object using the following line of code
sp#data = data.frame(sp#data, df[match(sp#data[,by], df[,by]),])
Here is how the code works. The match function inside aligns the columns so that order is preserved. So when we merge it with sp#data, order is correctly preserved. A quick check to see if the code has worked is to inspect the two columns corresponding to the common column and see if they are identical (the common columns get duplicated and it is easy to remove the copy, but i keep it as it is a good check)

It is as easy as this:
require(sp) # the trick is that this package must be loaded!
oo <- merge(SPDF,DF, by="QDGC")
I've tested by myself. But it only works if you use merge from package sp. This is the default when sp package is loaded. merge function is then overloaded and sp::merge is used if the first argument is spatial structure.

merge can produce a dataframe with more rows than the originals if there's not a simple 1-1 mapping of the two dataframes. In which case, it would have to copy all the geometry and create multiple polygons, which is probably not a good thing.
If you have a dataframe which is the same number of rows as a SpatialPointsDataFrame, then you can just directly replace the #data slot.
library(sp)
example(overlay) # to get the srdf object
srdf#data
spplot(srdf)
srdf#data=data.frame(x=runif(3),xx=rep(0,3))
spplot(srdf)
if you get the number of rows wrong:
srdf#data=data.frame(x=runif(2),xx=rep(0,2))
spplot(srdf)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 3, 2

Maybe the function joinCountryData2Map in the rworldmap package can give inspiration. (But I may be wrong, as I was last time.)

One more solution is to use append_data function from the tmaptools package. It is called with these arguments:
append_data(shp, data, key.shp = NULL, key.data = NULL,
ignore.duplicates = FALSE, ignore.na = FALSE,
fixed.order = is.null(key.data) && is.null(key.shp))
It's a bit unfortunate that it's called append since I'd understand append more ina sense of rbind and we want to have something like join or merge here.
Ignoring that fact, function is really useful in making sure you got your joins correct and if some rows are present only on one side of join. From the docs:
Under coverage (shape items that do not correspond to data records),
over coverage (data records that do not correspond to shape items
respectively) as well as the existence of duplicated key values are
automatically checked and reported via console messages. With
under_coverage and over_coverage the under and over coverage key
values from the last append_data call can be retrieved,

If it is two shapefiles that are needed to be merged to a single object, just use rbind().
When using rbind(), just make sure that both the arguments you use are SpatialDataFrames. You can check this using class(sf). If it is not a dataframe, then use st_as_sf() to convert them to a SpatialDataFrame before you rbind them.
Note : You can also use this to append to NULLs, especially when you are using a result from a loop and you want to cumulate the results.