Currently, I am working on a project that involves NYC Taxi data, in which I am given where a person is picked up and dropped off in a network.
I am working with an ESRI shapefile, which I can load into R as an igraph object with the shp2graph package; I need to utilize Dijkstra's algorithm (or a similar shortest-path algorithm) to find the single shortest path between two given vertices. I thought that the get.shortest.paths() method of the igraph package would be my solution, but to my surprise, this calculates all shortest paths from a vertex to all others in a network.
To me, this seems like overkill, because I need only one single path between two specified nodes. I did some poking around online and in the igraph documentation, but all I can find are methods surrounding calculating many shortest paths from a given vertex to all others.
Due to how computationally expensive it would be to calculate every single shortest path from a vertex, and then just select one from the behemoth of a list, I'm looking for a way to utilize Dijkstra's algorithm between two specified vertices in a graph. Is there a way to do this in the igraph package, or if not, is there a good way to do this with a different package in R?
EDIT: In the end, I am hoping to look for a function that will take in the graph object and the ID of two vertices I wish to find the shortest path between, then return a list of paths/edges (or IDs) along that shortest path. This would help me to inspect each individual street along the shortest path between the two vertices.
EDIT: As an example of how I am currently using the function:
path <- get.shortest.paths(NYCgraph, from=32, mode="out"). Something I would hope to find is path <- shortestPathFunction(NYCgraph, from=32, to=37) to arbitrary calculate a shortest path between vertex ID 32 and vertex ID 37 (two random street intersections in the network).
I found my issue, which occurred before I called get.shortest.paths(). For those who are curious on how to read in an ESRI shapefile, and find a single shortest path between two points (which was my dilemma):
myShapefile <- readOGR(dsn=".", layer="MyShapefileName") # i.e. "MyShapefileName.shp"
shpData <- readshpnw(myShapefile, ELComputed=TRUE)
igraphShpObject <- nel2igraph(shpData[[2]], shpData[[3]], weight=shpData[[4]])
testPath <- get.shortest.paths(igraphShpObject, from=42, to=52) # arbitrary nodes
testPath[1] # print the node IDs to the console
Furthermore, if one was interested in getting the ID of the edge connecting two nodes (perhaps from nodes in the testPath):
get.edge.ids(igraphShpObject, c(42,45) # arbitrary nodes 42 and 45
This indexing is the same as the indexing in shpData; for example, if you want to get the length of edge ID x, as found in get.edge.ids(), you may type shpData[[4]][x].
I hope these tidbits may be helpful to somebody in the future encountering the same problems! This method utilizes the shp2graph, rgdal, and igraph packages in R.
Related
I am trying to solve a trip assignment problem (transport planning). Available data is this: trips between nodes and links shapefile with 'from' and 'to' (match with those in trips data) codes. The approach i am adopting is this:
take each Origin-Destination pair from trip data
find all the possible paths between that OD pair
sort those paths based on length
start with smallest path and assign trips to that path until its capacity is reached
then take 2nd smallest and assign the trips...and so on
the problem i am facing is at 2nd step. I am using all_simple_paths() to get all possible paths between two nodes but it is taking too long. Here is that line
paths_between <- all_simple_paths(g_2, from = "240", to = "14")
how to work around this? is there any algorithm that I can use to get all possible paths? any help would be appreciated. thank you.
I'm trying to run K-means algorithm with predefined centroids. I have had a look at the following posts:
1.R k-means algorithm custom centers
2.Set static centers for kmeans in R
However, every time I run the command:
km = kmeans(df_std[,c(10:13)], centers = centroids)
I get the following error:
**Error: empty cluster: try a better set of initial centers**
I have defined the centroids as:
centroids = matrix(c(140.12774, 258.62615, 239.36800, 77.43235,
33.37736, 58.73077, 68.80000, 12.11765,
0.8937264, 0.8118462, 0.8380000, 0.8052941,
11.989858, 12.000000, 8.970000, 1.588235),
ncol = 4, byrow = T)
And my data, is a subset of a data frame say: df_std. It has been scaled already
df_std[,c(10:13)]
I'm wondering why would the system give the above error?
Any help on this would be highly appreciated!
Use a nearest neighbor classifier using the centers only, do not recluster.
That means every point is labeled just as the nearest center. This is similar to k-means but you do not change the centers, you do not need to iterate, and every new data point can be processed independently and in any order. No problem arises when processing just a single point at a time (in your case, k-means failed because one cluster became empty!)
While browsing for the specific error that I posted above:
Error: empty cluster: try a better set of initial centers
I found the following link to a conversation:
http://r.789695.n4.nabble.com/Empty-clusters-in-k-means-possible-solution-td4667114.html
Broadly speaking, the above error is generated when the centroids don't match with the data.
It can happen when
k
is a number:
due to random starts of the k-means algorithm, there is a possibility that the centres do not match with data
It may also happen when
k
represents the centroids (my case). The problem was: my data was scaled but my centroids were unscaled.
The above shared link made me realise that there is a bug in my code. Hope it will help someone in a similar situation as mine!
In R I'm trying to map all Madrid tube stations using igraph and then calculate the shortest route between two stations (just the number of stations, not the distance). I'm following this syntax: "An undirected graph with two vertices called ‘A’ and ‘B’ and one edge only:
graph.formula(A-B)"
Below I just copy two tube lines for clarity's sake.
library("igraph")
metro<- graph.formula(PinardeChamartin-Bambu-Chamartin-PlazadeCastilla-Valdeacederas-Tetuan-Estrecho-Alvarado-CuatroCaminos-RiosRosas-Iglesia-Bilbao-Tribunal-GranVia-Sol-TirsodeMolina-AntonMartin-Atocha-AtochaRenfe-MenendezPelayo-Pacifico-PuentedeVallecas-NuevaNumancia-Portazgo,LasRosas-AvenidadeGuadalajara-Alsacia-LaAlmudena-LaElipa-Ventas-ManuelBecerra-Goya-PrincipedeVergara-Retiro-BancodeEspana-Sevilla-Sol-Opera-SantoDomingo-Noviciado-SanBernardo-Quevedo-Canal-CuatroCaminos)
sp <- get.shortest.paths(metro,from="Canal",to="Chamartin")
V(metro)[sp[[1]]]
It seems to work but I have two question:
1. How can I input the tube stations (nodes) and their relationships A-B for long lists into the graph more efficiently, reading a csv for instance?
2.How can I rename those nodes to include tildes, spaces and "ñ"? Because I tried double quotes before and after each node's name but I get an error. A + sign. I haver checked the long string many times and I cannot see the error, no parenthesis missing.
Sorry if they're very basic questions. I'm a very novice user.
Thank you very much
For the first question, see ?graph.data.frame and ?read.csv.
I am not quite sure what you are asking in the second question, what is the error you are getting. Your code works fine for me, with the modification required for igraph 0.7.x:
V(metro)[sp$vpath[[1]]]
# Vertex sequence:
# [1] "Canal" "CuatroCaminos" "Alvarado" "Estrecho"
# [5] "Tetuan" "Valdeacederas" "PlazadeCastilla" "Chamartin"
I’m looking for a data structure that would help me find the smallest interval (the (low, high) pair) that encloses a given point. Intervals may nest properly. For example:
Looking for point 3 in (2,7), (2,3), (4,5), (8,12), (9,10) should yield (2,3).
During the construction of the data structure, intervals are added in no particular order and, specifically, not according to their nesting. Is there a good way to map this problem to a search tree data structure?
Segment tree should do the job. In nodes of a segment tree you keep the length of the shortest interval that covers this node, as well as the reference to the interval itself. When processing a query for a given point, you simply return the interval referenced by the node of that point.
I have data representing the paths people take across a fixed set of points (discrete, e.g., nodes and edges). So far I have been using igraph.
I haven't found a good way yet (in igraph or another package) to create canonical paths summarizing what significant sub-groups of respondents are doing.
A canonical path can be operationalized in any reasonable way and is just meant to represent a typical path or sub-path for a significant portion of the population.
Does there already exist a function to create these within igraph or another package?
One option: represent each person's movement as a directed edge. Create an aggregate graph such that each edge has a weight corresponding to the number of times that edge occurred. Those edges with large weights will be "typical" 1-paths.
Of course, it gets more interesting to find common k-paths or explore how paths vary among individuals. The naive approach for 2-paths would be to create N additional nodes that correspond to nodes when visited in the middle of the 2-path. For example, if you have nodes a_1, ..., a_N you would create nodes b_1, ..., b_N. The aggregate network might have an edge (a_3, b_5, 10) and an edge (b_5, a_7, 10); this would represent the two-path (a_3, b_5, a_7) occurring 10 times. The task you're interested in corresponds to finding those two-paths with large weights.
Both the igraph and network packages would suffice for this sort of analysis.
If you have some bound on k (ie. only 6-paths occur in your dataset), I might also suggest enumerating all the paths that are taken and computing the histogram of each unique path. I don't know of any functions that do this automagically for you.