k means clustering limit?

k means clustering limit? - r

I'm doing a kmeans clustering to analyze my data. So far Its working perfectly.
This is my code so far:
library(Ckmeans.1d.dp)
file=read.csv(file.choose(),header=T)
attach(file)
sortfile=file[order(normalized),]
results=Ckmeans.1d.dp(normalized,3)
plot(results)
Now, I'm able to get the clusters,and the centers, but I'm more interested in getting the "limits" of the cluster. Not the maximum value in one cluster among the data I used, but the limits of the cluster I have now. Is that possible? how can I do it?

K-Means labels points based on their closest centroids (cluster centers). So the "limits" between clusters (called the decision boundary) are the points which have at least two different centroids as their closest centroids (e.g. have the exact same distance from them).
For example in 2D, for each point in the plane calculate it's closest centroids. If it has more than one (e.g. at least two centroids are at minimal distance from it), then it is part of the decision boundary.

Related

Is there a way to specify the number of segments crossings when creating random pattern of line segments?

I have been using the spatstat package to determine if a point pattern is clustered, random or regular by comparing it to relative frequency distribution of nearest-neighbor distances generated under complete spatial randomness (CSR). Code is as follows (Y is a ppp object of x,y coordinates):
Y<-rpoint(60, 2, fmax=NULL, win=unit.square(),giveup=1000, verbose=FALSE,
nsim=1, drop=TRUE)
envelope(Y, Gest, nsim = 999, nrank = 25, global=FALSE, fix.n=TRUE)
However, I just realized that the random points have to be distributed at the intersections of a polygon network within the window; they cannot be compared against complete spatial randomness, since they are restricted to crossings of segments of a polygon network. Is there any way to simulate spatial randomness on a network pattern?
I managed to create random line segments and add points at each crossings:
l<-rpoisline(4, win=owin())
i<-selfcrossing.psp(l)
par(mfrow=c(1,2))
plot(l)
plot(i)
However, I am not sure how to integrate this with the envelope() function.
I also need the number of crossing to be constant. So is there anyway to specify the number of crossings in the rpoisline() function?

Is it possible to use a metric (distance) unit in igraph cutoff argument?

Newbie igraph user here. I'm trying to calculate betweenness values for every segment (edge) in a street network. I would ideally like to restrict the calculations so that only paths of less than X metres be considered. The igraph::edge.betweenness.estimate function has a cutoff argument which restricts this for steps (turns) but I would like to know if it is possible to use a metric distance instead.
So far the closest question I have been able to find is http://lists.nongnu.org/archive/html/igraph-help/2012-11/msg00083.html on the igraph help, and this suggests that it might not be possible.
I have been using my network aspatially, as a simple graph, but have an attribute of street segment length - LnkLength. From reading other StackOverflow posts it is possible to use spatial networks with igraph (with the help of spatial packages). If LnkLength could be used as a weight for the network would this solve my problem?
If anyone has any ideas I'd be very grateful to hear them.
data <- data.frame(
Node1 = as.factor(c(AA, AB, AC, AD, AE, AF, AG, AH, AI, AJ)),
Node2 = as.factor(c(BA, BB, BC, AA, AB, AC, BA, BB, BC, AA)),
LnkLength =as.numeric(c(23.05, 42.81, 77.08, 39.63, 147.87, 56.46, 13.43, 25.53, 197.19, 34.9)))
data.graph <- graph.data.frame(data, directed=FALSE, vertices=NULL)
# attempt to limit the betweeness estimates on 800m
btw.trunc <- edge.betweenness.estimate(d.graph, e=E(d.graph), directed = FALSE, cutoff=20, weights = NULL)

I'm assuming you are computing geodesic betweenness scores (e.g. counts of the number of times a shortest path uses a given edge) as opposed to, say current-flow betweenness (which would treat the graph like a network of electrical reisistors).
I think one solution would be to first use the 'distances' function to obtain the matrix of all pairwise distances
e.g.
distMat=distances(g,weights=g$LnkLength)
Now assuming you have a cutoff dCut, you can then sparsify the matrix
distMat[which(distMat>dCut)]=0
distMat=as(distMat,'sparseMatrix')
This would allow you to select which pairs of points to consider...
fromInds=row(distMat)[which(!distMat==0)]
toInds=col(distMat)[which(!distMat==0)]
Assuming there are no one way streets and the LnkLength is not a directed property, then you would only have to consider either the upper or lower triangular portion of the matrix. Since the 'toInds' array will automatically be listed in ascending order, you can accomplish this by taking only the first half of the fromInds and toInds respectively
fromInds=fromInds[c(1:(length(fromInds)/2)]
toInds=toInds[c(1:length(toInds)/2)]
If your graph is directed (e.g. LnkLength is not the same in both directions or there are one way streets) then you would just leave fromInds and toInds as is.
You would now be ready to start assigning betweenness scores iteratively by looping over the index list and computing shortest_path and using the returned path to increment the betweenness scores of the appropriate edges
E(g)$betweenness=0
for(iPair in c(1:length(fromInds)){
vpath=as.numeric(
shortest_paths(g,weights=g$LnkLength,output="vpath")$vpath[[1]])
#need to convert the list of path vertices into a list of edge indices
pathlist=c(1:(2*length(vpath)-1))
pathlist[c(1:length(vpath)-1))*2]=vpath[c(2:length(vpath)))]
pathlist[c(1:length(vpath)-1))*2-1]=vpath[c(1:(length(vpath)-1))]
eList=get.edge.ids(g,pathlist)
#increment betweenness scores of edges in eList
E(g)$betweenness[eList]=E(g)$betweenness+1
}
your graph should now have the property 'betweenness' assigned to its edges using a geodesic betweenness metric computed over the subset of paths with a distance of less than 'dCut'
If your graph is quite large, it may be worth while to port the above for loop over to Rcpp, since it could take quite a long time otherwise. But that is a whole new can of worms...

How to create a legend for edge colors when plotting networks in R?

I have generated a connectivity matrix representing a network of geographical points connected by ocean currents. Each point releases particles that are received by the others. The number of particles released and received by each point is summarized in this square matrix. For example an element Aij of the matrix correspond to the amount of particles emitted by the ith point and received by the jth.
My purpose is to be able to plot this as a network such that each point constitutes a vertex and the connections between two points constitute an edge. I would like those edges to be of different colors according to the amount of particles exchanged. Those have to be marked by an arrow.
I could plot those points according to their geographic coordinates and I could plot those edges the way I wanted. My only concern is now how to add a legend relating the color of the edges with the amount of particles they represent.
Can anyone help me with that? Here is my code so far:
library(ggplot2)
library(plyr)
library(sp)
library(statnet)
connectivityMatrix <- as.matrix(read.table(file='settlementMatrix001920.dat'))
coordinates <- as.matrix(read.table(file='NoTakeReefs_center_LonLat.dat'))
net <- as.network(connectivityMatrix, matrix.type = "adjacency", directed = TRUE)
minX<-min(coordinates[,1])#-0.5
maxX<-max(coordinates[,1])#+0.5
minY<-min(coordinates[,2])#-0.5
maxY<-max(coordinates[,2])#+0.5
p<-plot(net, coord=coordinates,xlim=c(minX,maxX),ylim=c(minY,maxY),edge.col=connectivityMatrix,object.scale=0.01)

without having your real data, here as a sample example
matrixValues<-matrix(c(0,1,2,3,
0,0,0,0,
0,0,0,0,
0,0,0,0),ncol=4)
net<-as.network(matrixValues)
plot(net,edge.col=matrixValues)
# plot legend using non-zero values from matrix
legend(1,1,fill = unique(as.vector(matrixValues[matrixValues>0])),
legend=unique(as.vector(matrixValues[matrixValues>0])))
you may have to adjust the first two coordinate values in legend to draw it where you need on the plot. You could also construct your network slightly differently so that the values were loaded in from the matrix (see the ignore.eval argument to as.network(). In which case you would use edge.col='myValueName' for the plot command and get.edge.attribute(net,'myValueName') to feed the values into legend.

SpatialLinesDataFrame (R): how to limit number of polygons in calculating distance between points and polygons

I am looking to calculate the distance between points (about 47K) and the closest X countries (of all world countries). I have imported the lat/long of points as SpatialPoints, and loaded a world map as a SpatialPolygons. I think I could build off of the advice given here:
SpatialLinesDataFrame: how to calculate the min. distance between a point and a line
It looks like I have to calculate the distance between all countries and all points and then extract the X closest, which is a bit intense with so many points.
In short, is there a way to impose a polygon limit? If not, what would you suggest- my only thought is to import a smaller number of points and then loop through this code (I am a new R user).
Thanks!

Trying to find networking metrics with R

I have created a directed network in R. I have to find the average degree, which I think I have, the diameter and the maximum/minimum clustering. The diameter is the longest of the shortest distances between two nodes. If this makes sense to anyone, please point me in the right direction. I have what I have coded below so far.
library(igraph)
ghw <- graph.formula(1-+4:5:9:12:14, 2-+11:16:17, 3-+4:5:7,
4-+1:3:6:7:8, 5-+1:3:6:7, 6-+4:5:8,
7-+3:4:5:8:13, 8-+4:6:7, 9-+10:12:14:15,
10-+9:12:14, 11-+2:16:17, 12-+1:9:10:14,
13-+7:15:18, 14-+1:9:10:12, 15-+13:16:18,
16-+2:11:15:17:18, 17-+2:11:16:18, 18-+13:15:16:17)
plot(ghw)
get.adjacency(ghw)
Total number of directed edges
numdeg <- ecount(ghw)
Average number of edges per node
avgdeg <- numdeg / 18

How about looking at the documentation?
diameter(ghw)
I am not sure what you mean by maximum/minimum clustering, but maybe this:
range(transitivity(ghw, type="local"))
Btw. your average number of edges per node is wrong, because every edge belongs to two nodes.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

k means clustering limit? - r

Related

Is there a way to specify the number of segments crossings when creating random pattern of line segments?

Is it possible to use a metric (distance) unit in igraph cutoff argument?

How to create a legend for edge colors when plotting networks in R?

SpatialLinesDataFrame (R): how to limit number of polygons in calculating distance between points and polygons

Trying to find networking metrics with R

Categories

Resources