Feature engineering of X,Y coordinates in neighborhoods of San Francisco - r

I am participating in a starter Kaggle competition(Crimes in San Francisco) in which I want to predict the category of a crime using a bunch of predictor variables including X and Y coordinates of a crime. As I doubt of the predictive power of the coordinates, I want to transform these variables to something more relevant to the crime category.
So I am thinking that if I had the neighbourhood of San Francisco in which the crime took place, it would be more informative than the actual coordinates of the crime. I can find the neighbourhoods online but of course I cant use the borders of each neighbour to classify the corresponding crime because their shapes are not rectangular or anything like that.
Does anyone have any idea about how I could solve this one?
Thanks guys
Well that's interesting AntoniosK and it's getting close to what I want to accomplish. The problem is that the information " south-east and 2km from city center" can lead to more than one neighborhoods.
I am still thinking that the partition of the city in neighborhoods is valuable because the socio-economic and structural differences between them ( there is a reason why the neighborhoods of each city are separated as such, right?) can lead to a higher probability for a certain category crime and a lower one for another.
That said, your idea made me thinking of using the south-east etc mapping and then use the angle of the segment(point to city center) with x axis to map the point to appropriate neighborhood. I am on it right now. Thanks

After some time on the problem I found that the procedure I want to perform is titled " reverse geocoding". It also turns out that there are some api's to solve this. The best according to my opinion is revgeocode() function contained in ggmap package(google's edition). This one though has a query limit per day(2500 queries) unless you pay for extra.
The one that I turned to though is geonames package and GNneighbourhood function that turns coordinates to neighbours. It is free, though I have experienced some errors(keep in mind that this one is only for US and Canada cities)
revgeocode function-ggmap package
Gnneighbourhood-geonames package

Related

Voronoi approach to making districts while also using a multilinestring

I have a spatial objects with multiple points (buildings) on a map. What I wanted to do is divide an area based on the nearest point. The calculated Voronoi looks quite promising, but has some strange aspects if you know the "real world". For example a small part of a district is also at the other side of a river because of the closeness (surprise).
What I want to do is combine this with a multi linestring which contains rivers, railroads. What I want to do either end the district at this line OR add a penalty for 'crossing' it. Is anyone able to shed light on the problem, or possible suggest an alternative voronoi method that works?

Compare Survey Results Across Regions

I have results from a survey of nurse practitioners asking to what degree (Likert scale, values from 1-5) they feel certain barriers prevent them from adequate practice (i.e. time constraint, location restrictions, etc.). They were also asked to locate where in the state they practice (fill in a bubble). I was wondering if there was a way to code a picture of a U.S. state (say Texas) and superimpose the survey results onto the map by region?
For example: Say one nurse indicated a 1 for feeling time constrained, and she was from the Southern region of Texas. Then, I wold like to show that, out of say a sample of 100, that 1% who responded with a value of 1 came from the Southern region, and have that appear on a map of Texas. Does that make sense?

The unit of area in R package {UScensus2010}

I am using the {UScensus2010} package in R and trying to get the area for each county. I found the areaPoly() in the package. Does anyone know the unit of the area? Is it square mile?
Thank you.
Assuming you are using US Census data, this is from the explanation of the dataset that UScensus2010 links to:
Land area measurement in square meters. The accuracy of the area
measurement is limited by the inaccuracy inherent in the mapping of
the various boundary features in the Census Bureau’s geographic
database. Land area includes areas classified as intermittent water,
swamps, and glaciers, which appear on census maps and in the Census
Bureau’s geographic database as hydrographic features. Square miles
can be derived by dividing square meters by 2,589,988. See Appendix A,
“Geographic Terms and Concepts,” for definition of this field.
http://www.census.gov/prod/cen2010/doc/sf1.pdf
If you are still unsure, pick your home county and check it against the area that wikipedia or the official county website claims.

How to detect "MINIMAL" cycles in a graph

I have a question regarding graphs.
I have a street network (imagine Manhattan, NY as an example, but it could be any street network) represented as a graph (where junctions are represented as nodes und the streets are links between the nodes).
The problem now is that I somehow have to get the "city blocks" (think of the blocks in Manhattan for instance), i.e. the set of arcs that define a city block. I thought of cycle detection algorithms but that obviously won't give me the real blocks only, but also all the other cycles that I don't really need. Of course I could filter those out probably quite easily but this can't be the real solution.
Do you have any (simple) idea how I can get the "real" city block from a graph?
Thanks in advance!

Convert lat/lon to zipcode / neighborhood name

I have a large collection of pictures with GPS locations, encoded as lat/lon coordinates, mostly in Los Angeles. I would like to convert these to (1) zipcodes, and (2) neighborhood names. Are there any free web services or databases to do so?
The best I can come up with so far is scrape the neighborhood polygons from the LA times page and try to find out in which polygon every coordinate is. However this might be quite a lot of work, and not all of my coordinates are in LA. As for the zipcodes, this 2004 database is the best I can find, however zipcodes are encoded as a single coordinates instead of a polygon. So the best I can do is find the minimum distance from a given coordinate to the given zipcode-coordinates, which is not optimal.
I was under the impression that google-maps or open-street-maps should be able to do this (as they seem to 'know' exactly where every neighboorhood and zipcode is), however I cannot find any API's to do the lookups / queries.
You can now do this directly within R itself thanks to the rather awesome ggmap package.
Like others mention, you'll be reverse geocoding using the google maps API (and therefore limited to 2,500 queries daily), but it's as simple as:
library("ggmap")
# generate a single example address
lonlat_sample <- as.numeric(geocode("the hollyood bowl"))
lonlat_sample # note the order is longitude, latitiude
res <- revgeocode(lonlat_sample, output="more")
# can then access zip and neighborhood where populated
res$postal_code
res$neighborhood
Use Reverse Geocoding to convert your lat/lon to addresses. It has some limit on the number of queries per day though.
Here is a nice blog post with examples how to geocode and reverse geocode using google-maps.
Try this one:
http://www.usnaviguide.com/zip.htm
There is some limit as to how many queries per day you can do on the site, but they also sell the complete database, which changes every few months.
Sorry that I don't know of any free resources.
As others suggested, geocode them into street address should work fine for zip code. i am not too sure about neighborhood, because you may have to look if street number is odd/even to see if it is located which side of a road that determines neighborhood.
An alternative way is to prepare GIS polygon feature (ESRI shape file for example), test each point against this set of polygons see which one it intersects.
zip code is very straighforward, you can download shape file from the census.
http://www.census.gov/cgi-bin/geo/shapefiles2010/main
neighborhood is harder, i'd guess. In another part of US i had to create my shape file on my own by combining definitions from municipal government, real-estate website, newspaper etc so that it looks like what people thinks neighborhood in the city are without having any overlap or gap. It can take some time to compose such set of polygons. you may crab census "block group", or even census "block" from the above page and merge them
Once you prepared polygon features, there are couple of GIS tools on different environment (stand-alone executable, GUI program, c/python/sql etc API, probably R as well, to do intersection of polygons and points.

Resources