I have a dataset of XY points that looks like this
x<-c(2,4,6,3,7,9,1)
y<-c(6,4,8,2,9,6,1)
id<-c("a","b","c","d","e","f","g")
dataset<-data.frame(cbind(x,y,id))
I would like to connect all combinations of all points with spatial lines, with lines named with combinations of the points that they're connecting
In "attributes table" that results from the output, names for spatial lines might look like this:
a_b
a_c
a_d
a_e
a_f
a_g
b_a
b_c
b_d
b_e
b_f
b_g
c_a
etc.
I'm speculating a bit here as to what exactly you wanted, but I think you want to visualize the connections from any point to the others. If that's the case, then this might work.
But first, some assumptions:
Your x and y coordinates are starting points. Consequently, id are thus id.origin
All other points will need to become "destinations", and then their own coordinates will become x_destination and so on.
< disclaimer> There should be a better, more elegant way to do this. I'd appreciate if someone more experienced can jump in and show me any of the *ply ways to do it. < /disclaimer>
Replicate the dataframe to cover for all possible combinations
dataset<-do.call(rbind, replicate(7, dataset, simplify=FALSE))
Now, create a matrix with all the same destination points, mixed:
nm=matrix(ncol=3)
for (i in 1:7){
nm<-rbind(nm,do.call(rbind,replicate(7,as.matrix(dataset[i,]),simplify=FALSE)))
}
nm<-nm[-1,]
Rename the columns of matrix, so they make sense, and bind the existing data frame with the new matrix
colnames(nm)<-c("x2","y2","id.dest")
newds<-cbind(dataset,as.data.frame(nm))
Remove duplicated trajectories:
newds<-newds[-which(newds$id.origin==newds$id.dest),]
and plot the result using geom_segment:
p<-ggplot(newds,aes(x=x,y=y))+geom_segment(aes(xend=x2,yend=y2))
There is a way to name the segments, but from observing the plot I would't suggest doing it. Instead you might consider naming the points using geom_text (other options are available, see ?annotate for one).
p<-p + geom_text(aes(x=1.8,y=6.1,label="a"))
That will produce a plot like the one here:
The whole solution looks like this:
plot(dataset$x,dataset$y)
Replicate the dataframe to cover for all possible combinations
dataset<-do.call(rbind, replicate(7, dataset, simplify=FALSE))
Now, create a matrix with all the same destination points, mixed:
nm=matrix(ncol=3)
for (i in 1:7){
nm<-rbind(nm,do.call(rbind,replicate(7,as.matrix(dataset[i,]),simplify=FALSE)))
}
nm<-nm[-1,]
Rename the columns of matrix, so they make sense, and bind the existing data frame with the new matrix
colnames(nm)<-c("x2","y2","id.dest")
newds<-cbind(dataset,as.data.frame(nm))
Remove duplicated trajectories:
newds1<-newds[-which(newds$id==newds$id.dest),]
library(ggplot2)
Converting destination x & y to numeric from factor
newds1$x2<-as.numeric(as.character(newds1$x2)) #converting from factor to numeric
newds1$y2<-as.numeric(as.character(newds1$y2))
Plotting the destination points . . .same as the origin points
plot(newds1$x, newds1$y)
plot(newds1$x2, newds1$y2, col="red")
Now use code from this answer:
Convert Begin and End Coordinates into Spatial Lines in R
Raw list to store Lines objects:
l <- vector("list", nrow(newds1)) #
This l is now an empty vector w/ number of rows defined by length (nrow) of newds1
Splitting origin and destination coordinates so I can run this script:
origins<-data.frame(cbind(newds1$x, newds1$y))
destinations<-data.frame(cbind(newds1$x2, newds1$y2))
library(sp)
for (i in seq_along(l)) {
l[[i]] <- Lines(list(Line(rbind(origins[i, ], destinations[i,]))), as.character(i))
}
l.spatial<-SpatialLines(l)
plot(l.spatial, add=T)
Related
Trying to get it done via mapply or something like this without iterations - I have a spatial dataframe in R and would like to subset all more complicated shapes - ie shapes with 10 or more coordinates. The shapefile is substantial (10k shapes) and the method that is fine for a small sample is very slow for a big one. The iterative method is
Street$cc <-0
i <- 1
while(i <= nrow(Street)){
Street$cc[i] <-length(coordinates(Street)[[i]][[1]])/2
i<-i+1
}
How can i get the same effect in any array way? I have a problem with accessing few levels down from the top (Shapefile/lines/Lines/coords)
I tried:
Street$cc <- lapply(slot(Street, "lines"),
function(x) lapply(slot(x, "Lines"),
function(y) length(slot(y, "coords"))/2))
/division by 2 as each coordinate is a pair of 2 values/
but is still returns a list with number of items per row, not the integer telling me how many items are there. How can i get the number of coordinates per each shape in a spatial dataframe? Sorry I do not have a reproducible example but you can check on any spatial file - it is more about accessing low level property rather than a very specific issue.
EDIT:
I resolved the issue - using function
tail()
Here is a reproducible example. Slightly different to yours, because you did not provide data, but the principle is the same. The 'principle' when drilling down into complex S4 structures is to pay attention to whether each level is a list or a slot, using [[]] to access lists, and # for slots.
First lets get a spatial ploygon. I'll use the US state boundaries;
library(maps)
local.map = map(database = "state", fill = TRUE, plot = FALSE)
IDs = sapply(strsplit(local.map$names, ":"), function(x) x[1])
states = map2SpatialPolygons(map = local.map, ID = IDs)
Now we can subset the polygons with fewer than 200 vertices like this:
# Note: next line assumes that only interested in one Polygon per top level polygon.
# I.e. assumes that we have only single part polygons
# If you need to extend this to work with multipart polygons, it will be
# necessary to also loop over values of lower level Polygons
lengths = sapply(1:length(states), function(i)
NROW(states#polygons[[i]]#Polygons[[1]]#coords))
simple.states = states[which(lengths < 200)]
plot(simple.states)
I wrote a quick hack to generate the coordinates of the endpoints of all "cell walls" in an a plain old array of squares on integer coordinates.
dimx <- 4
dimy <- 5
xvert<-rep(1:(dimx+1),each=dimy)
yvert<-1:dimy
yvert<-rep(yvert,times=dimx+1)
vertwall<-cbind(xvert, xvert,yvert,yvert+1)
And similarly for the horizontal walls. It feels like I just reinvented some basic function, so: Faster, Better, Cleaner?
EDIT: consider a grid of cells. The bottom-left cell's two walls of interest have the coordinate x,y pairs (1,1),(1,2) and (1,1),(2,1) . Similar to the definition of crystal unit cells in solid-state physics, that's all that is required, as the next cell "up" has walls (1,2),(1,3) and (1,2),(2,2) and so on. Thus the reason for repeating the "xvert" data in my sample.
I am not sure to understand what do you try to do ( your column names are duplicated and this is confusing). You can try this for example:
df = expand.grid( yvert= seq_len(dimy),xver= seq_len(dimy))
transform(df,xvert1=xvert,yvert1=yvert+1)
CGW added for completeness' sake: generate both horizontal and vertical walls:
df = expand.grid( xvert= seq_len(dimx),yvert= seq_len(dimy))
transform(df,xvert1=xvert,yvert1=yvert+1) ->dfv
df2 <- expand.grid(yvert= seq_len(dimy), xvert= seq_len(dimx))
transform(df2,yvert1=yvert,xvert1=xvert+1) ->dfh
# make x,y same order in both arrays
dfh[] <- dfh[,c(2,1,4,3)]
The expand.grid function creates Cartesian products of arrays, which provides most of what you need to do.
expand.grid(x=1:5,y=1:5)
long-time reader, first time poster.
I'm attempting perform a gIntersection() on two very large SpatialPolygonsDataFrame objects. The first is all US Counties, the second is a 240 row x 279 column grid, as a series of 66,960 polygon.
I successfully ran this by just using Pennsylvania and the piece of the grid that overlaps PA:
gIntersection(PA, grid, byid=TRUE)
I tried to run this overnight for the whole U.S. and it was still running this morning with a 10 GB(!) swap file on my hard drive and no evidence of progress. Am I doing something wrong, or is this normal behavior, and I should just do a state-by-state loop?
Thanks!
A little later than I hoped, but here's the function I ended up using for my task related to this. It could probably be adapted to other applications.
#mdsumner was right that a high-level operation to discard non-intersects sped this up greatly. Hopefully this is useful!
library("sp")
library("rgeos")
library("plyr")
ApportionPopulation <- function(AdminBounds, poly, Admindf) { # I originally wrote this function to total the population that lies within each polygon in a SpatialPolygon object. AdminBounds is a SpatialPolygon for whatever administrative area you're working with; poly is the SpatalPolygon you want to total population (or whatever variable of your choice) across, and Admindf is a dataframe that has data for each polygon inside the AdminBounds SpatialPolygon.
# the AdminBounds have the administrative ID code as feature IDS. I set that up using spChFID()
# start by trimming out areas that don't intersect
AdminBounds.sub <- gIntersects(AdminBounds, poly, byid=TRUE) # test for areas that don't intersect
AdminBounds.sub2 <- apply(AdminBounds.sub, 2, function(x) {sum(x)}) # test across all polygons in the SpatialPolygon whether it intersects or not
AdminBounds.sub3 <- AdminBounds[AdminBounds.sub2 > 0] # keep only the ones that actually intersect
# perform the intersection. This takes a while since it also calculates area and other things, which is why we trimmed out irrelevant areas first
int <- gIntersection(AdminBounds.sub3, poly, byid=TRUE) # intersect the polygon and your administrative boundaries
intdf <- data.frame(intname=names(int)) # make a data frame for the intersected SpatialPolygon, using names from the output list from int
intdf$intname <- as.character(intdf$intname) # convert the name to character
splitid <- strsplit(intdf$intname, " ", fixed=TRUE) # split the names
splitid <- do.call("rbind", splitid) # rbind those back together
colnames(splitid) <- c("adminID", "donutshpid") # now you have the administrative area ID and the polygonID as separate variables in a dataframe that correspond to the int SpatialPolygon.
intdf <- data.frame(intdf, splitid) # make that into a dataframe
intdf$adminID <- as.character(intdf$adminID) # convert to character
intdf$donutshpid <- as.character(intdf$donutshpid) # convert to character. In my application the shape I'm using is a series of half-circles
# now you have a dataframe corresponding to the intersected SpatialPolygon object
intdf$polyarea <- sapply(int#polygons, function(x) {x#area}) # get area from the polygon SP object and put it in the df
intdf2 <- join(intdf, Admindf, by="adminID") # join together the two dataframes by the administrative ID
intdf2$popinpoly <- intdf2$pop * (intdf2$polyarea / intdf2$admin_area) # calculate the proportion of the population in the intersected area that is within the bounds of the polygon (assuming the population is evenly distributed within the administrative area)
intpop <- ddply(intdf2, .(donutshpid), summarize, popinpoly=sum(popinpoly)) # sum population lying within each polygon
# maybe do other final processing to get the output in the form you want
return(intpop) # done!
}
I found the sf package is superior for this:
out <- st_intersection(grid, polygons)
gIntersection was locking up my computer for hours trying to run and as a result requires trimming or cycling through individual polygons, st_intersection from sf package runs my data in seconds.
st_intersection also automatically merges the dataframes of both inputs.
Thanks to Grant Williamson at University of Tasmania for the vignette: https://atriplex.info/blog/index.php/2017/05/24/polygon-intersection-and-summary-with-sf/
You could probably get your answer faster using rasterize in the raster package, with your grid as a raster. It has an argument for finding the amount of polygon overlap to a cell.
?rasterize
getCover: logical. If ‘TRUE’, the fraction of each grid cell that is
covered by the polygons is returned (and the values of
‘field, fun, mask’, and ‘update’ are ignored. The fraction
covered is estimated by dividing each cell into 100 subcells
and determining presence/absence of the polygon in the center
of each subcell
It doesn't look like you get to control the number of subcells, though that probably wouldn't be hard to open up.
I currently have a dataset which has a format of: (x, y, type)
I've used the code that is found on the example of plotting with Postgres through R.
My question is: How would I get R to generate multiple graphs for each unique "type" column?
I'm new to R, so my appologies if this is something that is extremely easy and I just lack the understanding of loops with R.
So lets say we have this data:
(1,1,T), (1,2,T), (1,3,T), (1,4,T), (1,5,T), (1,6,T),
(1,1,A), (1,2,B), (1,3,B), (1,4,B), (1,5,A), (1,6,A),
(1,1,B), (1,2,B), (1,3,C), (1,4,C), (1,5,C), (1,6,C),
It would plot 4 individual graphs on the page. One for each of the types T, A, B, and C. [Ploting x,y]
How would I do that with R when the data coming in may look like the data above?
While the other post has some good info, there's a faster way to do all that. So assuming your data frame or matrix is called DF and is in the form above (where each (1,2,B) or whatever is a row), then:
by(DF, DF[,3], function(x) plot(x[,1], x[,2], main=unique(x[,3])))
And that's it.
If you'd like all the four plots to be on the same page, you can first change the graphing paramter option:
par(mfrow=c(2,2))
And back to default par(mfrow=c(1,1) when you're done.
I'm quite fond of the ggplot2 package, which does the same thing that user1717913 suggests, but with slightly different syntax (it does a lot of other things very nicely, which is why I like it.)
test <- data.frame(x=rep(1,18),y=rep(1:6,3),type=c("T","T","T","T","T","T","A","B","B","B","A","A","B","B","C","C","C","C"))
require(ggplot2)
ggplot(test, aes(x=x, y=y)) + #define the data that the plot will use, and which variables go where
geom_point() + #plot it with points
facet_wrap(~type) #facet it by the type variable
R is really cool in that there's a bazillion (that's a technical term) different ways to do most things. The way I would do is is to split the data along the groups, and then plot by group.
To do that, the split command is what you want (I'll assume your data is in an object called data):
data.splitted <- split(data, data$type)
Now the data will have this form (let's assume you have 3 types, A, B, and C):
data.splitted
L A
| L x y type
| 1 4 A
| 3 6 A
L B
| L x y type
| 3 3 B
| 2 1 B
L C
L x y type
4 5 C
5 2 C
and so on. You would reference the "4" in the y column of group A like so:
data.splitted$A$y[1] or data.splitted[[1]][[2]][1] Hopefully seeing them both together makes enough sense.
Now that we have the data split, we're getting closer.
We still need to tell R that we want to plot a bunch of graphs to the same window. Now, this is just one way to go about it. You could also tell it to write each graph to a image file, or a pdf, or whatever you want.
groups <- names(data.splitted) puts your different types into a variable for reference later.
par(mfcol=c(length(groups),1))
Using mfcol fills the graphs in vertically. the mfrow option fills in horizontally. The c() just combines input. The length(groups) returns the total number of groups.
Now we can work on the for-loop.
for(i in 1:length(data.splitted)){ # This tells it what i is iterating from and to.
# It can start and stop wherever, or be a
# sequence, ascending or descending,
# the sky is the limit.
tempx <- data.splitted[[i]][[x]] # This just saves us
tempy <- data.splitted[[i]][[y]] # a bunch of typing.
plot(tempx, tempy, main=groups[i]) # Plot it and make the title the type.
rm(tempx, tempy) # Remove our temporary variables for the next run through.
}
So you see, it's not too bad when you break it down into its components. You can do pretty much anything this way. I have a project I'm working on right now, where I'm doing this for 18 lidar metrics that I calculated using another for loop.
Commands to read up on:
split, plot, data.frame, "[",
par(mfrow=___) and par(mfcol=___)
Here's a few helpful links to get you started. The most helpful one of all is built right in to R though. a ? followed by a command will bring up the html help for that command in your browser.
Good luck!
So I have some lidar data that I want to calculate some metrics for (I'll attach a link to the data in a comment).
I also have ground plots that I have extracted the lidar points around, so that I have a couple hundred points per plot (19 plots). Each point has X, Y, Z, height above ground, and the associated plot.
I need to calculate a bunch of metrics on the plot level, so I created plotsgrouped with split(plotpts, plotpts$AssocPlot).
So now I have a data frame with a "page" for each plot, so I can calculate all my metrics by the "plot page". This works just dandy for individual plots, but I want to automate it. (yes, I know there's only 19 plots, but it's the principle of it, darn it! :-P)
So far, I've got a for loop going that calculates the metrics and puts the results in a data frame called Results. I pulled the names of the groups into a list called groups as well.
for(i in 1:length(groups)){
Results$Plot[i] <- groups[i]
Results$Mean[i] <- mean(plotsgrouped$PLT01$Z)
Results$Std.Dev.[i] <- sd(plotsgrouped$PLT01$Z)
Results$Max[i] <- max(plotsgrouped$PLT01$Z)
Results$75%Avg.[i] <- mean(plotsgrouped$PLT01$Z[plotsgrouped$PLT01$Z <= quantile(plotsgrouped$PLT01$Z, .75)])
Results$50%Avg.[i] <- mean(plotsgrouped$PLT01$Z[plotsgrouped$PLT01$Z <= quantile(plotsgrouped$PLT01$Z, .50)])
...
and so on.
The problem arises when I try to do something like:
Results$mean[i] <- mean(paste("plotsgrouped", groups[i],"Z", sep="$")). mean() doesn't recognize the paste as a reference to the vector plotsgrouped$PLT27$Z, and instead fails. I've deduced that it's because it sees the quotes and thinks, "Oh, you're just some text, I can't get the mean of you." or something to that effect.
Btw, groups is a list of the 19 plot names: PLT01-PLT27 (non-consecutive sometimes) and FTWR, so I can't simply put a sequence for the numeric part of the name.
Anyone have an easier way to iterate across my test plots and get arbitrary metrics?
I feel like I have all the right pieces, but just don't know how they go together to give me what I want.
Also, if anyone can come up with a better title for the question, feel free to post it or change it or whatever.
Try with:
for(i in seq_along(groups)) {
Results$Plot[i] <- groups[i] # character names of the groups
tempZ = plotsgrouped[[groups[i]]][["Z"]]
Results$Mean[i] <- mean(tempZ)
Results$Std.Dev.[i] <- sd(tempZ)
Results$Max[i] <- max(tempZ)
Results$75%Avg.[i] <- mean(tempZ[tempZ <= quantile(tempZ, .75)])
Results$50%Avg.[i] <- mean(tempZ[tempZ <= quantile(tempZ, .50)])
}