R combine two lists of sfc_polygons - r

Hej,
I have two lists of polygons.
The first one is a list of 1 polygon (circle)
The second is a list of 260 polygons (260 rectangles).
See the first picture (two lists of polygons).
Now I want to keep all the rectangles that are touched by the circle.
See picture 2 merge and 3 result.
Does somebody has any idea? There are serveral things. st_combine, st_intersection - but their are not useable for this problem.

Suppose your blocks are in a, and your circle in b; have you tried
a[lenghts(st_intersects(a, b)) > 0]
?

Without a reprex it's hard to give a full answer, but I think you want to use st_intersects. This can take two sf objects and return either a list of vectors of pairs that intersect (sparse = TRUE) or a full logical matrix of whether those indices intersect (sparse = FALSE). In this case, I would use the latter, and then appropriate filter to get only the rows you want.

Related

Create a list where each element is a pair of contiguous members from given vector

Or how to split a vector into pairs of contiguous members and combine them in a list?
Supose you are given the vector
map <-seq(from = 1, to = 20, by = 4)
which is
1 5 9 13 17
My goal is to create the following list
path <- list(c(1,5), c(5,9), c(9,13), c(13,17))
This is supposed to represent the several path segments that the map is sugesting us to follow. In order to go from 1 to 17, we must first take the first path (path[1]), then the second path (path[2]), and all the way to the end.
My first attempt lead me to:
path <- split(aux <- data.frame(S = map[-length(map)], E = map[-1]), row(aux))
But I think it would be possible without creating this auxiliar data frame
and avoiding the performance decrease when the initial vector (the map) is to big. Also, it returns a warning message which is quite alright, but I like to avoid them.
Then I found this here on stackoverflow (not exactly like this, this is the adapted version for my problem):
mod_map <- c(map, map[c(-1,-length(map))])
mod_map <- sort(mod_map)
split(mod_map, ceiling(seq_along(mod_map)/2))
which is a simpler solution, but I have to use this modified version of my map.
Pherhaps I'm asking too much as I already got two solutions. But, could it be possible to have a third one, so that I don't have so use data frames as in my first solution and can use the original map, unlike my second solution?
We can use Map on the vector ('map' - better not to use function names - it is a function from purrr) with 1st and last element removed and concatenate elementwise
Map(c, map[-length(map)], map[-1])
Or as #Sotos mentioned, split can be used which would be faster
split(cbind(map[-length(map)], map[-1]), seq(length(map)-1))

concise way to generate ordered sets of line segment coordinates

I wrote a quick hack to generate the coordinates of the endpoints of all "cell walls" in an a plain old array of squares on integer coordinates.
dimx <- 4
dimy <- 5
xvert<-rep(1:(dimx+1),each=dimy)
yvert<-1:dimy
yvert<-rep(yvert,times=dimx+1)
vertwall<-cbind(xvert, xvert,yvert,yvert+1)
And similarly for the horizontal walls. It feels like I just reinvented some basic function, so: Faster, Better, Cleaner?
EDIT: consider a grid of cells. The bottom-left cell's two walls of interest have the coordinate x,y pairs (1,1),(1,2) and (1,1),(2,1) . Similar to the definition of crystal unit cells in solid-state physics, that's all that is required, as the next cell "up" has walls (1,2),(1,3) and (1,2),(2,2) and so on. Thus the reason for repeating the "xvert" data in my sample.
I am not sure to understand what do you try to do ( your column names are duplicated and this is confusing). You can try this for example:
df = expand.grid( yvert= seq_len(dimy),xver= seq_len(dimy))
transform(df,xvert1=xvert,yvert1=yvert+1)
CGW added for completeness' sake: generate both horizontal and vertical walls:
df = expand.grid( xvert= seq_len(dimx),yvert= seq_len(dimy))
transform(df,xvert1=xvert,yvert1=yvert+1) ->dfv
df2 <- expand.grid(yvert= seq_len(dimy), xvert= seq_len(dimx))
transform(df2,yvert1=yvert,xvert1=xvert+1) ->dfh
# make x,y same order in both arrays
dfh[] <- dfh[,c(2,1,4,3)]
The expand.grid function creates Cartesian products of arrays, which provides most of what you need to do.
expand.grid(x=1:5,y=1:5)

Efficient function to return varying length vector from lookup table

I have three data sources:
types<-c(1,3,3)
places<-list(c(1,2,3),1,c(2,3))
lookup.counts<-as.data.frame(matrix(runif(9,min=0,max=10),nrow=3,ncol=3))
assigned.places<-rep.int(0,length(types))
the numbers in the "types" vector tell me what 'type' a given observation is. The vectors in the places list tell me which places the observation can be found in (some observations are found in only one place, others in all places). By definition there is one entry in types and one list in places for each observation. Lookup.counts tells me how many observations of each type are located in each place (generated from another data source).
I want to randomly assign each observation to a place based on a probability generated from lookup.counts. Using for loops it looks something like"
for (i in 1:length(types)){
row<-types[i]
columns<-places[[i]]
this.obs<-lookup.counts[row,columns] #the counts of this type in each place
total<-sum(this.obs)
this.obs<-this.obs/total #the share of observations of this type in these places
pick<-runif(1,min=0,max=1)
#the following should really be a 'while' loop, but regardless it needs help
for(j in 1:length(this.obs[])){
if(this.obs[j] > pick){
#pick is less than this county so assign
pick<- 100 #just a way of making sure an observation doesn't get assigned twice
assigned.places[i]<-colnames(lookup.counts)[j]
}else{
#pick is greater, move to the next category
pick<- pick-this.obs[j]
}
}
}
I have been trying to vectorize this somehow, but am getting hung up on the variable length of 'places' and of 'this.obs'
In practice, of course, the lookup.counts table is quite a bit bigger (500 x 40) and I have some 900K observations with places lists of length 1 through length 39.
To vectorize the inner loop, you can use sample or sample.int to choose from several alternaives with prescribed probabilities. Unless I read your code incorrectly, you want something like this:
assigned.places[i] <- sample(colnames(this.obs), 1, prob = this.obs)
I'm a bit surprised that you're using colnames(lookup.counts) instead. Shouldn't this be subset by columns as well? It seems that either I missed something, or there is a bug in your code.
the different lengths of your lists are a severe obstacle to vectorizing your outer loops. Perhaps you could use the Matrix package to store that information as sparse matrices. Then you could simply multiply probabilities by that vector to exclude those columns which are not in the places list of a given observation. But as you'd probably still use apply for the above sampling code, you might as well keep the list and use some form of apply to iterate over that.
The overall result might look somewhat like this:
assigned.places <- colnames(lookup.counts)[
apply(cbind(types, places), 1, function(x) {
sample(x[[2]], 1, prob=lookup.counts[x[[1]],x[[2]]])
})
]
The use of cbind and apply isn't particularly beautiful, but seems to work. Each x is a list of two items, x[[1]] being the type and x[[2]] being the corresponding places. We use these to index lookup.counts just as you did. Then we use the found counts as relative probabilities when choosing the index of one of the columns we used in the subscript. Only after all these numbers have been assembled into a single vector by apply will the indices be turned into names based on colnames.
You can check whether things are faster if you don't cbindstuff together, but instead iterate over the indices only:
assigned.places <- colnames(lookup.counts)[
sapply(1:length(types), function(i) {
sample(places[[i]], 1, prob=lookup.counts[types[i],places[[i]]])
})
]
This appears to work as well:
# More convenient if lookup.counts is a matrix.
lookup.counts<-matrix(runif(9,min=0,max=10),nrow=3,ncol=3)
colnames(lookup.counts)<-paste0('V',1:ncol(lookup.counts))
# A function that does what the for loop does for each i
test<-function(i) {
this.places<-colnames(lookup.counts)[places[[i]]]
this.obs<-lookup.counts[types[i],this.places]
sample(this.places,size=1,prob=this.obs)
}
# Applies the function for all i
sapply(1:length(types),test)

Adding a vector to matrix rows in numpy

Is there a fast way in numpy to add a vector to every row or column of a matrix.
Lately, I have been tiling the vector to the size of the matrix, which can use a lot of memory. For example
mat=np.arange(15)
mat.shape=(5,3)
vec=np.ones(3)
mat+=np.tile(vec, (5,1))
The other way I can think of is using a python loop, but loops are slow:
for i in xrange(len(mat)):
mat[i,:]+=vec
Is there a fast way to do this in numpy without resorting to C extensions?
It would be nice to be able to virtually tile a vector, like a more flexible version of broadcasting. Or to be able to iterate an operation row-wise or column-wise, which you may almost be able to do with some of the ufunc methods.
For adding a 1d array to every row, broadcasting already takes care of things for you:
mat += vec
However more generally you can use np.newaxis to coerce the array into a broadcastable form. For example:
mat + np.ones(3)[np.newaxis,:]
While not necessary for adding the array to every row, this is necessary to do the same for column-wise addition:
mat + np.ones(5)[:,np.newaxis]
EDIT: as Sebastian mentions, for row addition, mat + vec already handles the broadcasting correctly. It is also faster than using np.newaxis. I've edited my original answer to make this clear.
Numpy broadcasting will automatically add a compatible size vector (1D array) to a matrix (2D array, not numpy matrix). It does this by matching shapes based on dimension from right to left, "stretching" missing or value 1 dimensions to match the other. This is explained in https://numpy.org/doc/stable/user/basics.broadcasting.html:
mat: 5 x 3
vec: 3
vec (broadcasted): 5 x 3
By default, numpy arrays are row-major ("C order"), with axis 0 is "matrix row" and axis 1 is "matrix col", so the broadcasting clones the vector as matrix rows along axis 0.

How to attach a simple data.frame to a SpatialPolygonDataFrame in R?

I have (again) a problem with combining data frames in R. But this time, one is a SpatialPolygonDataFrame (SPDF) and the other one is usual data.frame (DF). The SPDF has around 1000 rows the DF only 400. Both have a common column, QDGC
Now, I tried
oo <- merge(SPDF,DF, by="QDGC", all=T)
but this only results in a normal data.frame, not a spatial polygon data frame any more.
I read somewhere else, that this does not work, but I did not understand what to do in such a case (has to do something with the ID columns, merge uses)
oooh such a hard question, I quess...
Thanks!
Jens
Let df = data frame, sp = spatial polygon object and by = name or column number of common column. You can then merge the data frame into the sp object using the following line of code
sp#data = data.frame(sp#data, df[match(sp#data[,by], df[,by]),])
Here is how the code works. The match function inside aligns the columns so that order is preserved. So when we merge it with sp#data, order is correctly preserved. A quick check to see if the code has worked is to inspect the two columns corresponding to the common column and see if they are identical (the common columns get duplicated and it is easy to remove the copy, but i keep it as it is a good check)
It is as easy as this:
require(sp) # the trick is that this package must be loaded!
oo <- merge(SPDF,DF, by="QDGC")
I've tested by myself. But it only works if you use merge from package sp. This is the default when sp package is loaded. merge function is then overloaded and sp::merge is used if the first argument is spatial structure.
merge can produce a dataframe with more rows than the originals if there's not a simple 1-1 mapping of the two dataframes. In which case, it would have to copy all the geometry and create multiple polygons, which is probably not a good thing.
If you have a dataframe which is the same number of rows as a SpatialPointsDataFrame, then you can just directly replace the #data slot.
library(sp)
example(overlay) # to get the srdf object
srdf#data
spplot(srdf)
srdf#data=data.frame(x=runif(3),xx=rep(0,3))
spplot(srdf)
if you get the number of rows wrong:
srdf#data=data.frame(x=runif(2),xx=rep(0,2))
spplot(srdf)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 3, 2
Maybe the function joinCountryData2Map in the rworldmap package can give inspiration. (But I may be wrong, as I was last time.)
One more solution is to use append_data function from the tmaptools package. It is called with these arguments:
append_data(shp, data, key.shp = NULL, key.data = NULL,
ignore.duplicates = FALSE, ignore.na = FALSE,
fixed.order = is.null(key.data) && is.null(key.shp))
It's a bit unfortunate that it's called append since I'd understand append more ina sense of rbind and we want to have something like join or merge here.
Ignoring that fact, function is really useful in making sure you got your joins correct and if some rows are present only on one side of join. From the docs:
Under coverage (shape items that do not correspond to data records),
over coverage (data records that do not correspond to shape items
respectively) as well as the existence of duplicated key values are
automatically checked and reported via console messages. With
under_coverage and over_coverage the under and over coverage key
values from the last append_data call can be retrieved,
If it is two shapefiles that are needed to be merged to a single object, just use rbind().
When using rbind(), just make sure that both the arguments you use are SpatialDataFrames. You can check this using class(sf). If it is not a dataframe, then use st_as_sf() to convert them to a SpatialDataFrame before you rbind them.
Note : You can also use this to append to NULLs, especially when you are using a result from a loop and you want to cumulate the results.

Resources