Is the below mentioned problem related to travelling salesman problem - graph

I came across this problem, and the first thing that comes to my mind is use TSP.
A person is visiting a new country which has several states. Each state has cities connected via bidirectional roads.
States are divided into cities such that for any two cities A and B,if it is possible to go from A to B by a road and then return to A,A and B belong to the same state.
furthemore,everytime a person enters a new state, you need to pay a cost of 1 $.For travelling on roads that belong to same state there is no cost.
Given an image where cities are represented by colored dots and roads by straight lines, what is the minimum cost a person needs to pay so that he visits every city in every state.
For the image part, I think we can convert it to a graph by using some online library(recommendations on how to do this will be appreciated).
Also, if anyone could give me some ideas/suggestions on how to go about solving the problem, or if they have seen something similar, would be appreciated.
Enclosed are some images that illustrate the graph
I also tried using opencv flood fill to compute the results as mentioned in the comments but it seems I am getting the incorrect result.
import cv2
import numpy as np
img=cv2.imread('graph1.png',cv2.IMREAD_GRAYSCALE)
M,N=img.shape
n_objects=0
for i in range(M):
for j in range(N):
if img[i,j]==255:
n_objects+=1
cv2.floodFill(img,None,(j,i),n_objects)
print(n_objects)
for the first image,expected output is 6,but this returns 3 as the result.Any ideas what can be done to improve the result

The entire prosa about cities etc. is just a red herring. It's not even a graph problem.
It's a basic flooding algorithm on the image which you need:
N = 0
While any pixel != White
Flood with white from pixel
N++
If N > 1
Return N - 1
Else
Return 0
Effectively you only count the number of connected regions which don't correspond to the background color.

Related

A large amount of points to create separate polygons (ArcGIS/QGIS)

Visual example of the data
I used a drone to create a DOF of a small area. During the flight, it takes a photo every 20sh seconds (40sh meters of a flight). I have created a CSV file, which I transferred to a point shapefile. In total, I made with drone 10 so-called "missions", each with 100-200 points which are "shaped" as squares on the map. What I want now is to create a polygon shapefile from the point shapefile.
Because those points sometimes overlap, I cannot use the "Aggregate Points" task, as it's only distance-based. I want to make polygons automatically, using some kind of script. What could help is the fact that a maximum time between two points (AKA photos taken) is 10-20 seconds, so if the time distance is over 3 minutes, it's another "mission". Can you help with such a script, that would quickly and automatically create as many polygons as there are missions?
Okay, I think I understand what you are trying to accomplish. Since no one replied I am going to give it a quick shot, so you have something to try.
I think the best strategy would be to:
Clustering algorithm: Try running a Clustering algorithm such as DBSCAN around the timestamp dimension to classify them based on time groups, instead of the distance (since, as you said, distance based separation is not enough to properly identify and separate the points). After which, you should have all the points classified between different groups with a column group id. Maximum distance parameter in the algorithm should be around 20 seconds steps, or even a minute (since you said each mission was separated at least about 3 minutes apart).
Feature based Polygon to point: At that point, then you run your generic Polygon_from_points(...) function that transforms these clustered points to polygons shapes based on a specific discriminant feature (which in your case is going to be each group id).
How does this work?: This would properly separate the groups first (time-based) and then you should be able to find a generic point to polygon based on a feature (Arcgis should have some).
I dont have an example dataset, nor any code written, but based on what you described I think it would work, hope it helps.

Determine minimum number of "invisible squares" to cover all markers on a map?

I have a map full of markers corresponding to GPS coordinates, represented as a PostgreSQL + PostGIS database table using "geography" type for the GPS column.
Imagine, if you will, one semi-transparent square on top of each of these points corresponding to 1x1 mile, based from the centre, often intersecting with each other.
I'm trying to determine the minimum number of such "squares" and their GPS coordinates, so that they "cover" all of the markers with a minimum of 25 meters to the nearest border.
If it makes it any easier, the positions of these "squares" don't have to match the positions of any of the markers.
The purpose of this is to attempt to cut down the number of API requests to a "houses for sale" service significantly, since most of the positions are close to each other and the API takes a 1x1 mile square "bounding box" as the input for each call. It would be insanely wasteful to call the API many times for basically the same area when maybe 1 or 2 times would do it if I can first figure out where these imaginary "squares" go.
I get the feeling that this is considered a "known, common and solved" problem, but so far, I've not been able to figure out how to do it.
Sorry, but it seems like you have no idea what you are doing and are just being rude, both here and at PostGIS irc-channel.
You give no information about your api.
What is creating your maps?
Is it a wms-service or what?
What most people would do is setting up a mapservice with a tile cache. Then the mapservice will pich the tiles needed for each house you want to show ( or multiple houses).
The tiles will be prepared o will be created on the fly. But they will be cached for next time.
So, I think you should read up on things like
MapServer
Mapnik
MapCache
Mapproxy
GeoServer
That is not a complete list, but might give you some ideas about what it is that you want

What direction do the DICOM instance numbers grow along?

By direction I mean for example from a patient's head to bottom or from his bottom to head. The CHEST CT scans I have seen so far indicates that Instance Number 1 slice is usually the first one down from the upper part of the body but I don't know whether this is part of the standard or there are some other tags that I should inspect into to determine.
There is no rule in DICOM that requires the Instance Number to be related to the slice position in a particular way. The link of Bartloimiej shows that there is a rule how the slice coordinates defined by Image Position Patient (0020,0027) and Image Orientation Patient (0020, 0037) are related to directions in the patient's body (head, feet, etc.)
So if you want to apply spatial ordering, these attributes are what you want to use. Slice Location (0020,1041) will not help you as well:
C.7.6.2.1.2 [...] This information is relative to an unspecified implementation specific reference point.
For original (i.e. Image Type (0008,0008) is ORIGINAL\PRIMARY...) CT slices, it is quite safe to assume that some growth in the Z-Direction is always present in a volumetric dataset. But for MRI or for reconstructed CT-slices (MPR), you may find datasets in which slices are parallel to the xz or yz plane. If your application is supposed to handle such images, make sure to avoid division by zero...
Yes, the standard defines it. DICOM PS3.3, part C.7.6.2:
The direction of the axes is defined fully by the patient's orientation.
If Anatomical Orientation Type (0010,2210) is absent or has a value of BIPED, the x-axis is increasing to the left hand side of the patient. The y-axis is increasing to the posterior side of the patient. The z-axis is increasing toward the head of the patient.
There is also a tag (0020,0037), Image Orientation (Patient), which relates actual position of the patient to the global coordinate frame. In trunk CT it is almost always 1 0 0 0 1 0 (no rotation) and you don't need to deal with it. Otherwise, see comments under the link above.
You are correct. The chest CT series are sorted from head to feet. The slice closest to the head should have the lowest Instance Number.
I don't know if this is defined by the DICOM standard or not, but I have seen a lot of DICOM images and the convention is this:
AXIAL - sorted by Z axis high to low (head to feet)
CORONAL - sorted by Y axis high to low (back to front)
SAGITTAL - sorted by X axis low to high (right to left)
Notice in all cases, the first slice in the series will be farthest from the observer.
If you need to generate Instance Number, you should sort the images by the dot product of Image Position Patient and (1,-1,-1) from low to high. In the rare degenerate case (all dot products are the same), I don't know. Pick another direction to sort, but probably (0,-1,-1) would be a good choice.
EDIT: I just discussed this with a friend who is more experienced. He said it varies. Some departments prefer back to front order, some prefer front to back. Also some DICOM viewers will give users the choice of how the slices are sorted (by Instance Number, Slice Location, IPP, Content Time, etc)

Feature engineering of X,Y coordinates in neighborhoods of San Francisco

I am participating in a starter Kaggle competition(Crimes in San Francisco) in which I want to predict the category of a crime using a bunch of predictor variables including X and Y coordinates of a crime. As I doubt of the predictive power of the coordinates, I want to transform these variables to something more relevant to the crime category.
So I am thinking that if I had the neighbourhood of San Francisco in which the crime took place, it would be more informative than the actual coordinates of the crime. I can find the neighbourhoods online but of course I cant use the borders of each neighbour to classify the corresponding crime because their shapes are not rectangular or anything like that.
Does anyone have any idea about how I could solve this one?
Thanks guys
Well that's interesting AntoniosK and it's getting close to what I want to accomplish. The problem is that the information " south-east and 2km from city center" can lead to more than one neighborhoods.
I am still thinking that the partition of the city in neighborhoods is valuable because the socio-economic and structural differences between them ( there is a reason why the neighborhoods of each city are separated as such, right?) can lead to a higher probability for a certain category crime and a lower one for another.
That said, your idea made me thinking of using the south-east etc mapping and then use the angle of the segment(point to city center) with x axis to map the point to appropriate neighborhood. I am on it right now. Thanks
After some time on the problem I found that the procedure I want to perform is titled " reverse geocoding". It also turns out that there are some api's to solve this. The best according to my opinion is revgeocode() function contained in ggmap package(google's edition). This one though has a query limit per day(2500 queries) unless you pay for extra.
The one that I turned to though is geonames package and GNneighbourhood function that turns coordinates to neighbours. It is free, though I have experienced some errors(keep in mind that this one is only for US and Canada cities)
revgeocode function-ggmap package
Gnneighbourhood-geonames package

Finding shortest paths to all desired points in a graph

I was asked the following question in an interview
You are given a 4 X 4 grid. Some locations on the grid contain
treasure. Your task is the visit all the locations that contain the
treasure and collect it. You are allowed to move on the four adjacent
cells (up, down, left, right). Each movement and the action of
"treasure collection" is of a single unit cost. You need to traverse
the entire grid, and collect all the treasure on the grid, minimizing
the cost taken.
If I can recall properly, here is a sample graph that was given:
U..X
..X.
X..X
..X.
Where, U is my current position and X marks the position of the treasure.
The solution that I presented was to use breadth first search traversing the graph and "collecting the treasure" while doing so. However, the interviewer insisted that there was a better way to minimize the cost. I hope you could help me in figuring it out.
You should have recognized that this is a Traveling Salesman Problem in a small disguise. Using breadth-first, you can determine the shortest way between the different vertices you have to visit which gives you a derived graph containing just those ways as weighted edges between the vertices. From then on, it's a classic TSP.
BFS alone is not going to solve it for you, because you cannot move in all directions at the same time. It is not a single-source shortest path problem, because once you collect the treasure, you start your path to the next one from your current spot, not from the original spot.
The time that it takes to collect all treasure depends on the order in which you visit the boxes with X. Since there are only five of them, you can try all 120 orderings, compute the cost, and pick the best one.
Note that if the order is fixed, the solution is trivial: you add up manhattan distances between the cells in order, and pick the smallest result.

Resources