Related
As I understand it R lacks a methods to buffer polygons in a spatially exclusive way that preserves the topology of adjacent polygons. So I'm experimenting with an approach that generates voronoi polygons of the original polygon vertices. Results seem quite promising except for apparent errors in the voronoi generation.
Fairly old school R, so it's possible a tidier alternative may work better. This reproducible example uses US/Canada, but note the problem is one of mathematical geometry so marine boundaries are not relevant:
require(rworldmap)
require(rgeos)
require(dismo)
require(purrr)
require(dplyr)
par(mai = rep(0,4))
p = rworldmap::countriesCoarse[,'ADMIN']
p = p[p$ADMIN %in% c('United States of America', 'Canada'),]
p$ADMIN = as.character(p$ADMIN)
p = rgeos::gBuffer(p, byid=T, width = 0) # precaution to ensure no badly-formed polygon nonsense
# Not critical to the problem, but consider we have points we want to assign to enclosing or nearest polygon
set.seed(42)
pts = data.frame(x = runif(1000, min = p#bbox[1,1], max = p#bbox[1,2]),
y = runif(1000, min = p#bbox[2,1], max = p#bbox[2,2]))
coordinates(pts) = pts
pts#proj4string = p#proj4string
# point in polygon classification.
pts$admin = sp::over(pts, p)$ADMIN
pts$admin = replace(pts$admin, is.na(pts$admin), 'unclass')
plot(p)
plot(pts, pch=16, cex=.4, col = c('red','grey','blue')[factor(pts$admin)], add=T)
Let's say we want to bin the grey points to nearest polygon. I think the most elegant approach would be to create a new expanded set of polygons. This avoids lots of n-squared nearest neighbour calculations. Next we try a voronoi tesselation of the original polygon vertices:
vertices1 = map_df(p#polygons, ~ map2_df(.x#Polygons, rep(.x#ID, length(.x#Polygons)),
~ as.data.frame(..1#coords) %>% `names<-`(c('x','y')) %>% mutate(id = ..2)))
print(head(vertices1))
#> x y id
#> 1 -56.13404 50.68701 Canada
#> 2 -56.79588 49.81231 Canada
#> 3 -56.14311 50.15012 Canada
#> 4 -55.47149 49.93582 Canada
#> 5 -55.82240 49.58713 Canada
#> 6 -54.93514 49.31301 Canada
coordinates(vertices1) = vertices1[,1:2]
# voronois
vor1 = dismo::voronoi(vertices1)
# visualise
plot(p)
plot(vertices1, add=T, pch=16, cex=.5, col = c('red','blue')[factor(vertices1$id)])
plot(vor1, add=T, border='#00000010', col = c('#FF000040','#0000FF40')[factor(vor1$id)])
Lots of errors in here. Maybe due to different polygons sharing some vertices. Let's try small negative buffer to help the algorithm:
p_buff2 = rgeos::gBuffer(p, byid=T, width = -.00002) # order of 1 metre
vertices2 = map_df(p_buff2#polygons, ~ map2_df(.x#Polygons, rep(.x#ID, length(.x#Polygons)),
~ as.data.frame(..1#coords) %>% `names<-`(c('x','y')) %>% mutate(id = ..2)))
coordinates(vertices2) = vertices2[,1:2]
vor2 = dismo::voronoi(vertices2)
plot(p_buff2)
plot(vertices2, add=T, pch=16, cex=.4, col = c('red','blue')[factor(vertices2$id)])
plot(vor2, add=T, border='#00000010', col = c('#FF000040','#0000FF40')[factor(vor2$id)])
Some improvements - almost validating the approach I think. But again we still have some errors, e.g. blue chunk of British Colombia and a thin pink strip of easter border area in Alaska. Lastly I plot with a bigger buffer to help show what is happening with individual vertices (click for bigger resolution):
p_buff3 = rgeos::gBuffer(p, byid=T, width = -.5, ) # order of 30kms I think
vertices3 = map_df(p_buff3#polygons, ~ map2_df(.x#Polygons, rep(.x#ID, length(.x#Polygons)),
~ as.data.frame(..1#coords) %>% `names<-`(c('x','y')) %>% mutate(id = ..2)))
coordinates(vertices3) = vertices3[,1:2]
vor3 = dismo::voronoi(vertices3)
plot(p_buff3)
plot(vertices3, add=T, pch=16, cex=.4, col = c('red','blue')[factor(vertices3$id)])
plot(vor3, add=T, border='#00000010', col = c('#FF000040','#0000FF40')[factor(vor3$id)])
Is anyone able to shed light on the problem, or possible suggest an alternative voronoi method that works? I've tried ggvoronoi but struggled to get that working. Any assistance appreciated.
That is an interesting, and important, problem; and I think it is a good idea to use voronoi. The apparent errors arise from the distribution of the vertices. For example, the border between Canada and the USA hardly has vertices in the west. This leads to undesired results, but they are not wrong. A step in the right direction might be to add vertices, using geosphere::makePoly
library(dismo)
library(geosphere)
library(rworldmap)
library(rgeos)
w <- rworldmap::countriesCoarse[,'ADMIN']
w <- w[w$ADMIN %in% c('United States of America', 'Canada'),]
p <- geosphere::makePoly(w, 25000)
p$ADMIN = as.character(p$ADMIN)
p <- buffer(p, width = 0, dissolve=FALSE)
p_buff <- buffer(p, width = -.00002, dissolve=FALSE) # order of 1 metre
g <- geom(p_buff)
g <- unique(g)
vor <- dismo::voronoi(g[,c("x", "y")])
plot(p_buff)
points(g[,c("x", "y")], pch=16, cex=.4, col= c('red','blue')[g[,"object"]])
plot(vor, add=T, border='#00000010', col = c('#FF000040','#0000FF40')[g[,"object"]])
Dissolve the polygons by country and remove holes
v <- aggregate(vor, list(g[,"object"]), FUN=length)
gg <- data.frame(geom(v))
v <- as(gg[gg$hole==0, ], "SpatialPolygons")
lines(v, col="yellow", lwd=4)
Now use this to cut the buffer by country
pp <- buffer(p, width = 10)
buf <- v * (pp - p) # intersect(v, erase(pp, p))
buf <- SpatialPolygonsDataFrame(buf, data=data.frame(p), match.ID = FALSE)
x <- bind(p, buf)
z <- aggregate(x, "ADMIN")
lines(z, lwd=2, col="dark green")
And now for something more focused. The below does essentially the same as the above, but focuses just on the regions that matter (coastal borders) making it computationally less intensive --- although not so much for this example with a rather large buffer.
library(dismo)
library(rworldmap)
library(rgeos)
w <- rworldmap::countriesCoarse[,'ADMIN']
w <- w[w$ADMIN %in% c('United States of America', 'Canada', 'Mexico'),]
p <- geosphere::makePoly(w, 25000)
p$ADMIN = as.character(p$ADMIN)
p <- buffer(p, width = 0, dissolve=FALSE)
#p <- buffer(p, width = -.00002, dissolve=FALSE) # order of 1 metre
bsz <- 10
mbuf <- buffer(p, width = bsz, dissolve=FALSE)
# e <- mbuf[1,] * mbuf[2,]
# -----------
# general solution for e?
poly_combs = expand.grid(p1 = seq_along(mbuf), p2 = seq_along(mbuf))
poly_combs = poly_combs[poly_combs$p1 < poly_combs$p2,]
# pairwise overlaps
e_pw = plyr::compact(lapply(1:nrow(poly_combs), FUN = function(i){
pair = poly_combs[i,]
pairing = suppressWarnings(mbuf[pair$p1,] * mbuf[pair$p2,])
return(pairing)
}))
e = e_pw[[1]]
for(i in 2:length(e_pw)) e = e + e_pw[[i]]
# -----------
f <- e - p
b <- buffer(f, bsz)
# bp is the area that matters
bp <- b * p
g <- data.frame(geom(bp))
# getting rid of duplicated and shared vertices
g <- aggregate(g[,1,drop=FALSE], g[,5:6], min)
v <- dismo::voronoi(g[,c("x", "y")], extent(p)+ 2 * bsz)
v <- aggregate(v, list(g[,"object"]), FUN=length)
v <- v- p
buf1 <- buffer(p, width = bsz, dissolve=TRUE)
v <- v * buf1
v#data <- p#data
plot(v, col=c("red", "blue", "green"))
Slight adaptation from Robert's, for discussion.
library(dismo)
library(rworldmap)
library(rgeos)
w <- rworldmap::countriesCoarse[,'ADMIN']
# w <- w[w$ADMIN %in% c('United States of America', 'Canada'),]
w <- w[w$ADMIN %in% c('Guyana', 'Suriname','French Guiana'),]
p <- geosphere::makePoly(w, 25000)
p$ADMIN = as.character(p$ADMIN)
p <- buffer(p, width = 0, dissolve=FALSE)
#p <- buffer(p, width = -.00002, dissolve=FALSE) # order of 1 metre
bsz <- .5
# outward buffer
mbuf = buffer(p, width = bsz, dissolve=F)
# overlay between two country buffers
# e <- mbuf[1,] * mbuf[2,]
poly_combs = expand.grid(p1 = seq_along(mbuf), p2 = seq_along(mbuf))
poly_combs = poly_combs[poly_combs$p1 < poly_combs$p2,]
# pairwise overlaps
e_pw = plyr::compact(lapply(1:nrow(poly_combs), FUN = function(i){
pair = poly_combs[i,]
pairing = suppressWarnings(mbuf[pair$p1,] * mbuf[pair$p2,])
return(pairing)
}))
e = e_pw[[1]]
for(i in 2:length(e_pw)) e = e + e_pw[[i]]
# contested buffer zones - overlap minus original polys
f <- e - p
f#data = data.frame(id = seq_along(f))
# buffer the contested zones
b <- buffer(f, bsz)
# bp is the area that matters
bp <- b * p
# vertices
bp = buffer(bp, width = -0.00002, dissolve=F)
g0 <- data.frame(data.frame(geom(bp)))
# getting rid of duplicated and shared vertices
# g <- aggregate(g0[,'object', drop=FALSE], g0[,c('x','y')], min)
g = unique(g0)
v0 <- dismo::voronoi(g[,c("x", "y")], extend(extent(p), 2 * bsz))
v0$id = g$object
v <- raster::aggregate(v0, list(g[,"object"]), FUN=length)
v#proj4string = p#proj4string
v = v * f
v#data = data.frame(ADMIN = p$ADMIN[v$Group.1])
# full buffer
fb = raster::bind(mbuf - p - f, v, p)
fb = raster::aggregate(fb, list(fb$ADMIN), FUN = function(x)x[1])[,'ADMIN']
fb#proj4string = p#proj4string
#----------------------------------
par(mai=c(0,0,0,0))
plot(p, border='grey')
plot(mbuf, add=T, border='pink')
plot(e, add=T, col='#00000010', border=NA)
plot(f, add=T, border='purple', lwd=1.5)
plot(b, add=T, border='red')
plot(bp, add=T, col='#ffff0040', border=NA)
# plot(v, add=T, col=c("#ff770020", "#0077ff20"), border=c("#ff7700", "#0077ff"))
plot(fb, add=T, col=c("#ff000020", "#00ff0020", "#0000ff20"), border=NA)
Using the example located here https://www.datacamp.com/community/tutorials/hierarchical-clustering-R and the data located https://archive.ics.uci.edu/ml/datasets/seeds# i am trying to remove the labels at the bottom of the dendrogram when using the color_branches
when plot(hclust_avg, labels=FALSE) it works but not later when using color_branches. is there a way to remove them?
`set.seed(786)
seeds_df <- read.csv("seeds_dataset.txt",sep = '\t',header = FALSE)
feature_name <- c('area','perimeter','compactness','length.of.kernel','width.of.kernal','asymmetry.coefficient','length.of.kernel.groove','type.of.seed')
colnames(seeds_df) <- feature_name
seeds_df<- seeds_df[complete.cases(seeds_df), ]
seeds_label <- seeds_df$type.of.seed
seeds_df$type.of.seed <- NULL
seeds_df_sc <- as.data.frame(scale(seeds_df))
dist_mat <- dist(seeds_df_sc, method = 'euclidean')
hclust_avg <- hclust(dist_mat, method = 'average')
cut_avg <- cutree(hclust_avg, k = 3)
suppressPackageStartupMessages(library(dendextend))
avg_dend_obj <- as.dendrogram(hclust_avg)
avg_col_dend <- color_branches(avg_dend_obj, h = 3)
plot(avg_col_dend)`
Figured this out by colouring the the labels white to the background
avg_dend_obj <- as.dendrogram(hclust_avg)
labels_colors(avg_dend_obj) <- "white"
plot(avg_dend_obj)
I'd like to automatically derive transects, perpendicular to the coastline. I need to be able to control their length and spacing and their oriƫntation needs to be on the "correct" side of the line. I came up with a way to do that, but especially selecting the "correct" (it needs to point to the ocean) can be done better. General approach:
For each line segment in a SpatialLineDataFrame define transect
locations
define transect: in both directions perpendicular to coastline: create points that determine the transect
Create a polygon based on the coastline, add extra points to grow the polygon in a direction that is known and use that to clip the points that are inside (considered as land, and therefore not of interest)
Create transect based on remaining point
Especially part 3 is of interest. I'd like a more robust method to determine the correct orientation of the transect. This is what i'm using now:
library(rgdal)
library(raster)
library(sf)
library(ggplot2)
library(rgeos) # create lines and spatial objects
# create testing lines
l1 <- cbind(c(1, 2, 3), c(3, 2, 2))
l2 <- cbind(c(1, 2, 3), c(1, 1.5, 1))
Sl1 <- Line(l1)
Sl2 <- Line(l2)
S1 <- Lines(list(Sl1), ID = "a")
S2 <- Lines(list(Sl2), ID = "b")
line <- SpatialLines(list(S1, S2))
plot(line)
# for testing:
sep <- 0.1
start <- 0
AllTransects <- vector('list', 100000) # DB that should contain all transects
for (i in 1: length(line)){
# i <- 2
###### Define transect locations
# Define geometry subset
subset_geometry <- data.frame(geom(line[i,]))[, c('x', 'y')]
# plot(SpatialPoints(data.frame(x = subset_geometry[,'x'], y = subset_geometry[,'y'])), axes = T, add = T)
dx <- c(0, diff(subset_geometry[,'x'])) # Calculate difference at each cell comapred to next cell
dy <- c(0, diff(subset_geometry[,'y']))
dseg <- sqrt(dx^2+dy^2) # get rid of negatives and transfer to uniform distance per segment (pythagoras)
dtotal <- cumsum(dseg) # cumulative sum total distance of segments
linelength = sum(dseg) # total linelength
pos = seq(start,linelength, by=sep) # Array with postions numbers in meters
whichseg = unlist(lapply(pos, function(x){sum(dtotal<=x)})) # Segments corresponding to distance
pos=data.frame(pos=pos, # keep only
whichseg=whichseg, # Position in meters on line
x0=subset_geometry[whichseg,1], # x-coordinate on line
y0=subset_geometry[whichseg,2], # y-coordinate on line
dseg = dseg[whichseg+1], # segment length selected (sum of all dseg in that segment)
dtotal = dtotal[whichseg], # Accumulated length
x1=subset_geometry[whichseg+1,1], # Get X coordinate on line for next point
y1=subset_geometry[whichseg+1,2] # Get Y coordinate on line for next point
)
pos$further = pos$pos - pos$dtotal # which is the next position (in meters)
pos$f = pos$further/pos$dseg # fraction next segment of its distance
pos$x = pos$x0 + pos$f * (pos$x1-pos$x0) # X Position of point on line which is x meters away from x0
pos$y = pos$y0 + pos$f * (pos$y1-pos$y0) # Y Position of point on line which is x meters away from y0
pos$theta = atan2(pos$y0-pos$y1,pos$x0-pos$x1) # Angle between points on the line in radians
pos$object = i
###### Define transects
tlen <- 0.5
pos$thetaT = pos$theta+pi/2 # Get the angle
dx_poi <- tlen*cos(pos$thetaT) # coordinates of point of interest as defined by position length (sep)
dy_poi <- tlen*sin(pos$thetaT)
# tabel met alleen de POI informatie
# transect is defined by x0,y0 and x1,y1 with x,y the coordinate on the line
output <- data.frame(pos = pos$pos,
x0 = pos$x + dx_poi, # X coordinate away from line
y0 = pos$y + dy_poi, # Y coordinate away from line
x1 = pos$x - dx_poi, # X coordinate away from line
y1 = pos$y - dy_poi, # X coordinate away from line
theta = pos$thetaT, # angle
x = pos$x, # Line coordinate X
y = pos$y, # Line coordinate Y
object = pos$object,
nextx = pos$x1,
nexty = pos$y1)
# create polygon from object to select correct segment of the transect (coastal side only)
points_for_polygon <- rbind(output[,c('x', 'y','nextx', 'nexty')])# select points
pol_for_intersect <- SpatialPolygons( list( Polygons(list(Polygon(points_for_polygon[,1:2])),1)))
# plot(pol_for_intersect, axes = T, add = T)
# Find a way to increase the polygon - should depend on the shape&direction of the polygon
# for the purpose of cropping the transects
firstForPlot <- data.frame(x = points_for_polygon$x[1], y = points_for_polygon$y[1])
lastForPlot <- data.frame(x = points_for_polygon$x[length(points_for_polygon$x)],
y = points_for_polygon$y[length(points_for_polygon$y)])
plot_first <- SpatialPoints(firstForPlot)
plot_last <- SpatialPoints(lastForPlot)
# plot(plot_first, add = T, col = 'red')
# plot(plot_last, add = T, col = 'blue')
## Corners of shape dependent bounding box
## absolute values should be depended on the shape beginning and end point relative to each other??
LX <- min(subset_geometry$x)
UX <- max(subset_geometry$x)
LY <- min(subset_geometry$y)
UY <- max(subset_geometry$y)
# polygon(x = c(LX, UX, UX, LX), y = c(LY, LY, UY, UY), lty = 2)
# polygon(x = c(LX, UX, LX), y = c(LY, LY, UY), lty = 2)
# if corners are changed to much the plot$near becomes a problem: the new points are to far away
# Different points are selected
LL_corner <- data.frame(x = LX-0.5, y = LY - 1)
LR_corner <- data.frame(x = UX + 0.5 , y = LY - 1)
UR_corner <- data.frame(x = LX, y = UY)
corners <- rbind(LL_corner, LR_corner)
bbox_add <- SpatialPoints(rbind(LL_corner, LR_corner))
# plot(bbox_add ,col = 'green', axes = T, add = T)
# Select nearest point for drawing order to avoid weird shapes
firstForPlot$near <-apply(gDistance(bbox_add,plot_last, byid = T), 1, which.min)
lastForPlot$near <- apply(gDistance(bbox_add,plot_first, byid = T), 1, which.min)
# increase polygon with corresponding points
points_for_polygon_incr <- rbind(points_for_polygon[1:2], corners[firstForPlot$near,], corners[lastForPlot$near,])
pol_for_intersect_incr <- SpatialPolygons( list( Polygons(list(Polygon(points_for_polygon_incr)),1)))
plot(pol_for_intersect_incr, col = 'blue', axes = T)
# Coordinates of points first side
coordsx1y1 <- data.frame(x = output$x1, y = output$y1)
plotx1y1 <- SpatialPoints(coordsx1y1)
plot(plotx1y1, add = T)
coordsx0y0 <- data.frame(x = output$x0, y = output$y0)
plotx0y0 <- SpatialPoints(coordsx0y0)
plot(plotx0y0, add = T, col = 'red')
# Intersect
output[, "x1y1"] <- over(plotx1y1, pol_for_intersect_incr)
output[, "x0y0"] <- over(plotx0y0, pol_for_intersect_incr)
x1y1NA <- sum(is.na(output$x1y1)) # Count Na
x0y0NA <- sum(is.na(output$x1y1)) # Count NA
# inefficient way of selecting the correct end point
# e.g. either left or right, depending on intersect
indexx0y0 <- with(output, !is.na(output$x0y0))
output[indexx0y0, 'endx'] <- output[indexx0y0, 'x1']
output[indexx0y0, 'endy'] <- output[indexx0y0, 'y1']
index <- with(output, is.na(output$x0y0))
output[index, 'endx'] <- output[index, 'x0']
output[index, 'endy'] <- output[index, 'y0']
AllTransects = rbind(AllTransects, output)
}
# Create the transects
lines <- vector('list', nrow(AllTransects))
for(n in 1: nrow(AllTransects)){
# n = 30
begin_coords <- data.frame(lon = AllTransects$x, lat = AllTransects$y) # Coordinates on the original line
end_coords <- data.frame(lon = AllTransects$endx, lat = AllTransects$endy) # coordinates as determined by the over: remove implement in row below by selecting correct column from output
col_names <- list('lon', 'lat')
row_names <- list('begin', 'end')
# dimnames < list(row_names, col_names)
x <- as.matrix(rbind(begin_coords[n,], end_coords[n,]))
dimnames(x) <- list(row_names, col_names)
lines[[n]] <- Lines(list(Line(x)), ID = as.character(n))
}
lines_sf <- SpatialLines(lines)
# plot(lines_sf)
df <- SpatialLinesDataFrame(lines_sf, data.frame(AllTransects))
plot(df, axes = T)
As long as i'm able to correctly define the bounding box and grow the polygon correctly this works. But I'd like to try this on multiple coastlines and parts of coastlines, each with its own orientation. In the example below the growing of the polygon is made for the bottom coastline segment, as a result the top one has transects in the wrong direction.
Anybody has an idea in what directio to look? I was considering to perhaps use external data but when possible i'd like to avoid that.
I used your code for my question (measure line inside a polygon) but maybe this works for you:
Took a spatial polygon or line
Extract the coordinates of the element
Make a combination of coordinates to create straight lines, from with you can derivate perpendicular lines (e.g. ((x1,x3)(y1, y3)) or ((x2,x4)(y2, y4)) )
Iterate along with all the pairs of coordinates
Apply the code you did, especially the result of the 'output' table.
I did this for a polygon, so I could generate perpendicular lines based on the straight line I create taking an arbitrary (1, 3) set of coordinates.
#Define a polygon
pol <- rip[1, 1] # I took the first polygon from my Shapefile
polcoords <- pol#polygons[[1]]#Polygons[[1]]#coords
# define how to create your coords pairing. My case: 1st with 3rd, 2nd with 4th, ...
pairs <- data.frame(a = 1:( nrow(polcoords) - 1),
b = c(2:(nrow(polcoords)-1)+1, 1) )
# Empty list to store the lines
lnDfls <- list()
for (j in 1:nrow(pairs)){ # j = 1
# Select the pairs
pp <- polcoords[c(pairs$a[j], pairs$b[j]), ]
#Extract mean coord, from where the perp. line will start
midpt <- apply(pp, 2, mean)
# points(pp, col = 3, pch = 20 )
# points(midpt[1], midpt[2], col = 4, pch = 20)
x <- midpt[1]
y <- midpt[2]
theta = atan2(y = pp[2, 2] - pp[1, 2], pp[2, 1] - pp[1, 1]) # Angle between points on the line in radians
# pos$theta = atan2(y = pos$y0-pos$y1 , pos$x0-pos$x1) # Angle between points on the line in radians
###### Define transects
tlen <- 1000 # distance in m
thetaT = theta+pi/2 # Get the angle
dx_poi <- tlen*cos(thetaT) # coordinates of point of interest as defined by position length (sep)
dy_poi <- tlen*sin(thetaT)
# tabel met alleen de POI informatie
# transect is defined by x0,y0 and x1,y1 with x,y the coordinate on the line
output2 <- data.frame(#pos = pos,
x0 = x + dx_poi, # X coordinate away from line
y0 = y + dy_poi, # Y coordinate away from line
x1 = x - dx_poi, # X coordinate away from line
y1 = y - dy_poi # X coordinate away from line
#theta = thetaT, # angle
#x = x, # Line coordinate X
#y = y # Line coordinate Y
)
# points(output2$x1, output2$y1, col = 2)
#segments(x, y, output2$x1[1], output2$y1[1], col = 2)
mat <- as.matrix(cbind( c( x, output2$x1[1] ) , c( y, output2$y1[1] ) ))
LL <- Lines(list(Line( mat )), ID = as.character(j))
# plot(SpatialLinesDataFrame(LL, data.frame (a = 1)), add = TRUE, col = 2)
# plot(SpatialLines(list(LL)), add = TRUE, col = 2)
#lnList[[j]] <- LL
lnDfls[[j]] <- SpatialLinesDataFrame( SpatialLines(LinesList = list(LL)) ,
match.ID = FALSE,
data.frame(id = as.character(j ) ) )
# line = st_sfc(st_linestring(mat))
# st_length(line)
# ln <- (SpatialLines(LinesList = list(LL)))
# lndf <- SpatialLinesDataFrame( lndf , data.frame(id = j ))
# sf::st_length(ln)
# # plot(lines_sf)
}
compDf <- do.call(what = sp::rbind.SpatialLines, args = lnDfls)
plot(pol)
plot(compDf, add = TRUE, col = 2)
plot(inDfLn, add = TRUE, col = 3)
I wish to visualize how well a clustering algorithm is doing (with certain distance metric). I have samples and their corresponding classes.
To visualize, I cluster and I wish to color the branches of a dendrogram by the items in the cluster. The color will be the color most items in the hierarchical cluster correspond to (given by the data\classes).
Example: If my clustering algorithm chose indexes 1,21,24 to be a certain cluster (at a certain level) and I have a csv file containing a class number in each row corresponding to lets say 1,2,1. I want this edge to be coloured 1.
Example Code:
require(cluster)
suppressPackageStartupMessages(library(dendextend))
dir <- 'distance_metrics/'
filename <- 'aligned.csv'
my.data <- read.csv(paste(dir, filename, sep=""), header = T, row.names = 1)
my.dist <- as.dist(my.data)
real.clusters <-read.csv("clusters", header = T, row.names = 1)
clustered <- diana(my.dist)
# dend <- colour_branches(???dend, max(real.clusters)???)
plot(dend)
EDIT:
another example partial code
dir <- 'distance_metrics/' # csv in here contains a symmetric matrix
clust.dir <- "clusters/" #csv in here contains a column vector with classes
my.data <- read.csv(paste(dir, filename, sep=""), header = T, row.names = 1)
filename <- 'table.csv'
my.dist <- as.dist(my.data)
real.clusters <-read.csv(paste(clust.dir, filename, sep=""), header = T, row.names = 1)
clustered <- diana(my.dist)
dnd <- as.dendrogram(clustered)
Both node and edge color attributes can be set recursively on "dendrogram" objects (which are just deeply nested lists) using dendrapply. The cluster package also features an as.dendrogram method for "diana" class objects, so conversion between the object types is seamless. Using your diana clustering and borrowing some code from #Edvardoss iris example, you can create the colored dendrogram as follows:
library(cluster)
set.seed(999)
iris2 <- iris[sample(x = 1:150,size = 50,replace = F),]
clust <- diana(iris2)
dnd <- as.dendrogram(clust)
## Duplicate rownames aren't allowed, so we need to set the "labels"
## attributes recursively. We also label inner nodes here.
rectify_labels <- function(node, df){
newlab <- df$Species[unlist(node, use.names = FALSE)]
attr(node, "label") <- (newlab)
return(node)
}
dnd <- dendrapply(dnd, rectify_labels, df = iris2)
## Create a color palette as a data.frame with one row for each spp
uniqspp <- as.character(unique(iris$Species))
colormap <- data.frame(Species = uniqspp, color = rainbow(n = length(uniqspp)))
colormap[, 2] <- c("red", "blue", "green")
colormap
## Now color the inner dendrogram edges
color_dendro <- function(node, colormap){
if(is.leaf(node)){
nodecol <- colormap$color[match(attr(node, "label"), colormap$Species)]
attr(node, "nodePar") <- list(pch = NA, lab.col = nodecol)
attr(node, "edgePar") <- list(col = nodecol)
}else{
spp <- attr(node, "label")
dominantspp <- levels(spp)[which.max(tabulate(spp))]
edgecol <- colormap$color[match(dominantspp, colormap$Species)]
attr(node, "edgePar") <- list(col = edgecol)
}
return(node)
}
dnd <- dendrapply(dnd, color_dendro, colormap = colormap)
## Plot the dendrogram
plot(dnd)
The function you are looking for is color_brances from the dendextend R package, using the arguments clusters and col. Here is an example (based on Shaun Wilkinson's example):
library(cluster)
set.seed(999)
iris2 <- iris[sample(x = 1:150,size = 50,replace = F),]
clust <- diana(iris2)
dend <- as.dendrogram(clust)
temp_col <- c("red", "blue", "green")[as.numeric(iris2$Species)]
temp_col <- temp_col[order.dendrogram(dend)]
temp_col <- factor(temp_col, unique(temp_col))
library(dendextend)
dend %>% color_branches(clusters = as.numeric(temp_col), col = levels(temp_col)) %>%
set("labels_colors", as.character(temp_col)) %>%
plot
there are suspicions that misunderstood the question however I'll try to answer:
from my previous objectives were rewritten by the example of iris
clrs <- rainbow(n = 3) # create palette
clrs <- clrs[iris$Species] # assign colors
plot(x = iris$Sepal.Length,y = iris$Sepal.Width,col=clrs) # simple test colors
# cluster
dt <- cbind(iris,clrs)
dt <- dt[sample(x = 1:150,size = 50,replace = F),] # create short dataset for visualization convenience
empty.labl <- gsub("."," ",dt$Species) # create a space vector with length of names intended for reserve place to future text labels
dst <- dist(x = scale(dt[,1:4]),method = "manhattan")
hcl <- hclust(d = dst,method = "complete")
plot(hcl,hang=-1,cex=1,labels = empty.labl, xlab = NA,sub=NA)
dt <- dt[hcl$order,] # sort rows for order objects in dendrogramm
text(x = seq(nrow(dt)), y=-.5,labels = dt$Species,srt=90,cex=.8,xpd=NA,adj=c(1,0.7),col=as.character(dt$clrs))
I'm using RasterVis and levelplot to make a trellis plot of some rasters. I am currently ok for most things but I would like to change the header for each panel from the filename to a chosen string (the filename is convoluted and long, i want to use just a year, for example '2004').
Looking at the levelplot page, it would indicate that levelplot goes looking for some settings as per the argument 'useRaster', either it goes to panel.levelplot or panel.levelplot.raster, but im struggling to use these latter functions.
Any help much appreciated, here's some sample code;
require(rasterVis)
layers <- c(1:4)
s2 <- stack()
for (i in layers) {
r <- raster(nrows=100, ncols=100,ext)
r[] <- sample(seq(from = 1, to = 6, by = 1), size = 10000, replace = TRUE)
rasc <- ratify(r)
rat <- levels(rasc)[[1]]
rat$legend <- c("A","B","C","D","E","F")
levels(rasc) <- rat
s2 <- stack(s2, rasc)
}
levelplot(s2, col.regions=rev(terrain.colors(6)),main = "example")
In the above e.g., I would like "layer.1.1" to be "2004", and so on through to 2007
require(rasterVis)
layers <- c(1:4)
s2 <- stack()
for (i in layers) {
r <- raster(nrows=100, ncols=100)
r[] <- sample(seq(from = 1, to = 6, by = 1), size = 10000, replace = TRUE)
rasc <- ratify(r)
rat <- levels(rasc)[[1]]
rat$legend <- c("A","B","C","D","E","F")
levels(rasc) <- rat
s2 <- stack(s2, rasc)
}
levelplot(s2, col.regions=rev(terrain.colors(6)),main = "example", names.attr=2004:2007)
p.strip <- list(cex=1.5, lines=1, col="blue", fontfamily='Serif')
levelplot(s2, col.regions=rev(terrain.colors(6)), main = "example",
names.attr=2004:2007, par.strip.text=p.strip)