I have a dataset which corresponding of Zipcode along with lat and log.I want to find out list of hospital/bank(within 2km) from that latitude and longitude.
How to do it?
The Long/Lat data looks like
store_zip lon lat
410710 73.8248981 18.5154681
410209 73.0907 19.0218215
400034 72.8148177 18.9724162
400001 72.836334 18.9385352
400102 72.834424 19.1418961
400066 72.8635299 19.2313448
400078 72.9327444 19.1570343
400078 72.9327444 19.1570343
400007 72.8133825 18.9618411
400050 72.8299518 19.0551695
400062 72.8426858 19.1593396
400083 72.9374227 19.1166191
400603 72.9781047 19.1834148
401107 72.8929 19.2762702
401105 72.8663173 19.3053477
400703 72.9992013 19.0793547
401209 NA NA
401203 72.7983705 19.4166761
400612 73.0287209 19.1799265
400612 73.0287209 19.1799265
400612 73.0287209 19.1799265
If your Points of Interest are unknown and you need to find them, you can use Google's API through my googleway package (as you've suggested in the comments). You will need a valid API key for this to work.
As the API can only accept one request at a time, you'll need to iterate over your data one row at a time. For that you can use whatever looping method you're most comforatable with
library(googleway) ## using v2.4.0 on CRAN
set_key("your_api_key")
lst <- lapply(1:nrow(df), function(x){
google_places(search_string = "Hospital",
location = c(df[x, 'lat'], df[x, 'lon']),
radius = 2000)
})
lst is now a list that contains the results of the queries. For example, the names of the hospitals it has returned for the first row of your data is
place_name(lst[[1]])
# [1] "Jadhav Hospital"
# [2] "Poona Hospital Medical Store"
# [3] "Sanjeevan Hospital"
# [4] "Suyash Hospital"
# [5] "Mehta Hospital"
# [6] "Deenanath Mangeshkar Hospital"
# [7] "Sushrut Hospital"
# [8] "Deenanath Mangeshkar Hospital and Research Centre"
# [9] "MMF Ratna Memorial Hospital"
# [10] "Maharashtra Medical Foundation's Joshi Multispeciality Hospital"
# [11] "Sahyadri Hospitals"
# [12] "Deendayal Memorial Hospital"
# [13] "Jehangir Specialty Hospital"
# [14] "Global Hospital And Research Institute"
# [15] "Prayag Hospital"
# [16] "Apex Superspeciality Hospital"
# [17] "Deoyani Multi Speciality Hospital"
# [18] "Shashwat Hospital"
# [19] "Deccan Multispeciality Hardikar Hospital"
# [20] "City Hospital"
You can also view them on a map
set_key("map_api_key", api = "map")
## the lat/lon of the returned results are found through `place_location()`
# place_location(lst[[1]])
df_hospitals <- place_location(lst[[1]])
df_hospitals$name <- place_name(lst[[1]])
google_map() %>%
add_circles(data = df[1, ], radius = 2000) %>%
add_markers(data = df_hospitals, info_window = "name")
Note:
Google's API is limited to 2,500 queries per key per day, unless you pay for a premium account.
Related
I'm trying to scrape some data from the following wikipedia table:
Link: https://en.wikipedia.org/wiki/Aire-la-Ville
I am using this code to scrape Area, Elevation, and density using css selectors. I am storing the data in canton_table but only getting the elevation data and not for the other variables.
My code:
# Get labels and data
labels <- current_html %>% html_elements(css = ".infobox-label") %>% html_text()
data <- current_html %>% html_elements(css = ".infobox-data") %>% html_text()
Output for labels and data variables:
> labels
[1] "Country" "Canton" "District"
[4] " • Mayor" " • Total" "Elevation"
[7] " • Total" " • Density" "Time zone"
[10] " • Summer (DST)" "Postal code(s)" "SFOS number"
[13] "Surrounded by" "Website"
>data
[1] "Switzerland"
[2] "Geneva"
[3] "n.a."
[4] "MaireRaymond Gavillet"
[5] "6.50 km2 (2.51 sq mi)"
[6] "428 m (1,404 ft)"
[7] "11,609"
[8] "1,800/km2 (4,600/sq mi)"
[9] "UTC+01:00 (Central European Time)"
[10] "UTC+02:00 (Central European Summer Time)"
[11] "1234,1255"
[12] "6645"
[13] "Bossey (FR-74), Carouge, Chêne-Bougeries, Étrembières (FR-74), Gaillard (FR-74), Geneva (Genève), Plan-les-Ouates, Thônex, Troinex"
[14] "www.veyrier.ch SFSO statistics"
I am able to populate the table with only elevation data and not area and density. Please help. Thanks!
# Clean text and store in data frame
canton_table[canton_table$name == current_name, "area"] <- helper_function(" • Total", labels, data)
canton_table[canton_table$name == current_name, "elevation"] <- helper_function("Elevation", labels, data)
canton_table[canton_table$name == current_name, "density"] <- helper_function(" • Density", labels, data)
My output table:
My output table:
You have to change the labels name in the labels array, from " • Total" to "Total" eccs.
The names like this " • Total" are probably giving references problems.
And then create the table
canton_table[canton_table$name == current_name, "area"] <- helper_function("Total", labels, data)
Explanation
I'm trying to write a function which has to find the lowest number in one of this columns and return hospital name.
"Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack"
"Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure"
"Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia"
I cannot understand why my results are not the same like in samples from PDF. Please hold in mind that I'm a fresh R programmer.
Example
For best("TX", "heart attack") function should return "CYPRESS FAIRBANKS MEDICAL CENTER". While my function returns: (Pay attention that correct result isn't even in this vector)
[1] "HEREFORD REGIONAL MEDICAL CENTER"
[2] "EAST TEXAS MEDICAL CENTER MOUNT VERNON"
[3] "ALLEGIANCE SPECIALTY HOSPITAL OF KILGORE"
[4] "KNOX COUNTY HOSPITAL"
[5] "EAST TEXAS MEDICAL CENTER TRINITY"
[6] "COMMUNITY GENERAL HOSPITAL"
[7] "KELL WEST REGIONAL HOSPITAL"
[8] "GOOD SHEPHARD MEDICAL CENTER-LINDEN"
[9] "BURLESON ST JOSEPH HEALTH CENTER"
[10] "MCCAMEY HOSPITAL"
[11] "FISHER COUNTY HOSPITAL DISTRICT"
[12] "HANSFORD COUNTY HOSPITAL"
[13] "ST LUKES LAKESIDE HOSPITAL"
Code
best <- function(state, outcome) {
file <- read.csv("outcome-of-care-measures.csv")
data <- file[file$State == state, ]
if (outcome == "heart attack") {
number <- 15 #column number
} else if (outcome == "heart failure") {
number <- 21
} else if (outcome == "pneumonia") {
number <- 27
}
col <- as.numeric(data[, number]) #column data
lowest <- min(col, na.rm = TRUE) #lowest number
data$Hospital.Name[data[, number] == lowest] #result
}
Sources
Data I work with
PDF with instructions Check point 2.
I'm gonna public solution, after hour of searching I found it! In firs steps I accidentally write wrong column numbers from documentation.
Column numbers are incorrect.
Solution
Simply change wrong column numbers (15, 21, 27) to (11, 17, 23)
Thanks
Thank you for your answers, it increased my knowledge. Have a nice weekend.
Attempting to use googleway package in r to list retirement villages within a specified radius of a location. Get the radar argument is now deprecated error message and null results as a consequence.
library(googleway)
a <- google_places(location = c(-36.796578,174.768836),search_string = "Retirement Village",radius=10000, key = "key")
a$results$name
```
#Would expect this to give me retirement villages within 10km radius, instead get error message
```> library(googleway)
> a <- google_places(location = c(-36.796578,174.768836),search_string = "Retirement Village",radius=10000, key = "key")
The radar argument is now deprecated
> a$results$name
NULL
```
There's nothing wrong with the code you've written, and that 'message' you get is not an error, it's a message, but it probably should be removed - I've made an issue to remove it here
a <- google_places(
location = c(-36.796578,174.768836)
, search_string = "Retirement Village"
, radius = 10000
, key = "key"
)
place_name( a )
# [1] "Fairview Lifestyle Village" "Eastcliffe Retirement Village"
# [3] "Meadowbank Retirement Village" "The Poynton - Metlifecare Retirement Village"
# [5] "The Orchards - Metlifecare Retirement Village" "Bupa Hugh Green Care Home & Retirement Village"
# [7] "Bert Sutcliffe Retirement Village" "Grace Joel Retirement Village"
# [9] "Bupa Remuera Retirement Village and Care Home" "7 Saint Vincent - Metlifecare Retirement Village"
# [11] "Remuera Gardens Retirement Village" "William Sanders Retirement Village"
# [13] "Puriri Park Retirement Village" "Selwyn Village"
# [15] "Aria Bay Retirement Village" "Highgrove Village & Patrick Ferry House"
# [17] "Settlers Albany Retirement Village" "Knightsbridge Village"
# [19] "Remuera Rise" "Northbridge Residential Village"
Are you sure the API key you're using is enabled on the Places API?
I'm trying to reverse geocode with R. I first used ggmap but couldn't get it to work with my API key. Now I'm trying it with googleway.
newframe[,c("Front.lat","Front.long")]
Front.lat Front.long
1 -37.82681 144.9592
2 -37.82681 145.9592
newframe$address <- apply(newframe, 1, function(x){
google_reverse_geocode(location = as.numeric(c(x["Front.lat"],
x["Front.long"])),
key = "xxxx")
})
This extracts the variables as a list but I can't figure out the structure.
I'm struggling to figure out how to extract the address components listed below as variables in newframe
postal_code, administrative_area_level_1, administrative_area_level_2, locality, route, street_number
I would prefer each address component as a separate variable.
Google's API returns the response in JSON. Which, when translated into R naturally forms nested lists. Internally in googleway this is done through jsonlite::fromJSON()
In googleway I've given you the choice of returning the raw JSON or a list, through using the simplify argument.
I've deliberately returned ALL the data from Google's response and left it up to the user to extract the elements they're interested in through usual list-subsetting operations.
Having said all that, in the development version of googleway I've written a few functions to help accessing elements of various API calls. Here are three of them that may be useful to you
## Install the development version
# devtools::install_github("SymbolixAU/googleway")
res <- google_reverse_geocode(
location = c(df[1, 'Front.lat'], df[1, 'Front.long']),
key = apiKey
)
geocode_address(res)
# [1] "45 Clarke St, Southbank VIC 3006, Australia"
# [2] "Bank Apartments, 275-283 City Rd, Southbank VIC 3006, Australia"
# [3] "Southbank VIC 3006, Australia"
# [4] "Melbourne VIC, Australia"
# [5] "South Wharf VIC 3006, Australia"
# [6] "Melbourne, VIC, Australia"
# [7] "CBD & South Melbourne, VIC, Australia"
# [8] "Melbourne Metropolitan Area, VIC, Australia"
# [9] "Victoria, Australia"
# [10] "Australia"
geocode_address_components(res)
# long_name short_name types
# 1 45 45 street_number
# 2 Clarke Street Clarke St route
# 3 Southbank Southbank locality, political
# 4 Melbourne City Melbourne administrative_area_level_2, political
# 5 Victoria VIC administrative_area_level_1, political
# 6 Australia AU country, political
# 7 3006 3006 postal_code
geocode_type(res)
# [[1]]
# [1] "street_address"
#
# [[2]]
# [1] "establishment" "general_contractor" "point_of_interest"
#
# [[3]]
# [1] "locality" "political"
#
# [[4]]
# [1] "colloquial_area" "locality" "political"
After reverse geocoding into newframe$address the address components could be extracted further as follows:
# Make a boolean array of the valid ("OK" status) responses (other statuses may be "NO_RESULTS", "REQUEST_DENIED" etc).
sel <- sapply(c(1: nrow(newframe)), function(x){
newframe$address[[x]]$status == 'OK'
})
# Get the address_components of the first result (i.e. best match) returned per geocoded coordinate.
address.components <- sapply(c(1: nrow(newframe[sel,])), function(x){
newframe$address[[x]]$results[1,]$address_components
})
# Get all possible component types.
all.types <- unique(unlist(sapply(c(1: length(address.components)), function(x){
unlist(lapply(address.components[[x]]$types, function(l) l[[1]]))
})))
# Get "long_name" values of the address_components for each type present (the other option is "short_name").
all.values <- lapply(c(1: length(address.components)), function(x){
types <- unlist(lapply(address.components[[x]]$types, function(l) l[[1]]))
matches <- match(all.types, types)
values <- address.components[[x]]$long_name[matches]
})
# Bind results into a dataframe.
all.values <- do.call("rbind", all.values)
all.values <- as.data.frame(all.values)
names(all.values) <- all.types
# Add columns and update original data frame.
newframe[, all.types] <- NA
newframe[sel,][, all.types] <- all.values
Note that I've only kept the first type given per component, effectively skipping the "political" type as it appears in multiple components and is likely superfluous e.g. "administrative_area_level_1, political".
You can use ggmap:revgeocode easily; look below:
library(ggmap)
df <- cbind(df,do.call(rbind,
lapply(1:nrow(df),
function(i)
revgeocode(as.numeric(
df[i,2:1]), output = "more")
[c("administrative_area_level_1","locality","postal_code","address")])))
#output:
df
# Front.lat Front.long administrative_area_level_1 locality
# 1 -37.82681 144.9592 Victoria Southbank
# 2 -37.82681 145.9592 Victoria Noojee
# postal_code address
# 1 3006 45 Clarke St, Southbank VIC 3006, Australia
# 2 3833 Cec Dunns Track, Noojee VIC 3833, Australia
You can add "route" and "street_number" to the variables that you want to extract but as you can see the second address does not have street number and that will cause an error.
Note: You may also use sub and extract the information from the address.
Data:
df <- structure(list(Front.lat = c(-37.82681, -37.82681), Front.long =
c(144.9592, 145.9592)), .Names = c("Front.lat", "Front.long"), class = "data.frame",
row.names = c(NA, -2L))
When I extract content from the following URL, using XPath 1.0, the cities that are returned contain duplicates, starting with Birmingham. (The complete set of values returned is more than 140, so I have truncated it.) Is there a way with the XPath expression to avoid the duplicates?
require(XML)
doc <- htmlTreeParse("http://www.littler.com/locations", useInternal = TRUE)
xpathSApply(doc, "//div[#class = 'mm-location-usa']//a[position() < 12]", xmlValue, trim = TRUE)
[1] "Birmingham" "Mobile" "Anchorage" "Phoenix" "Fayetteville" "Fresno"
[7] "Irvine" "L.A. - Century City" "L.A. - Downtown" "Sacramento" "San Diego" "Birmingham"
[13] "Mobile" "Anchorage" "Phoenix" "Fayetteville" "Fresno" "Irvine"
[19] "L.A. - Century City" "L.A. - Downtown" "Sacramento" "San Diego"
Is there an XPath expression or work around along the lines of [not-duplicate()]?
Also, various [position() < X] permutations don't produce only the cities and only one instance of each. In fact, it's hard to figure out how positions are counted.
I would appreciate any guidance or finding out that the best I can do is limit the number of duplicates returned.
BTW XPath result with duplicates is not the same problem nor are the questions that pertain to duplicate nodes, e.g., How do I identify duplicate nodes in XPath 1.0 using an XPathNavigator to evaluate?
There is a function for this, it is called distinct-values(), but unfortunately, it is only available in XPath 2.0. In R, you are limited to XPath 1.0.
What you can do is
//div[#class = 'mm-location-usa']//a[position() < 12 and not(normalize-space(.) = normalize-space(following::a))]
What it does, in plain English:
Look for div elements, but only if their class attribute value equals "mm-location-usa". Look for descendant a element of those div elements, but only if the a element's position is less than 12 and if the normalized text content of that a element is not equal to the text content of an a element that follows.
But is is a computationally intensive approach and not the most elegant one. I recommend you take jlhoward's solution.
Can't you just do it this way??
require(XML)
doc <- htmlTreeParse("http://www.littler.com/locations", useInternal = TRUE)
xPath <- "//div[#class = 'mm-location-usa']//a[position() < 12]"
unique(xpathSApply(doc, xPath, xmlValue, trim = TRUE))
# [1] "Birmingham" "Mobile" "Anchorage"
# [4] "Phoenix" "Fayetteville" "Fresno"
# [7] "Irvine" "L.A. - Century City" "L.A. - Downtown"
# [10] "Sacramento" "San Diego"
Or, you can just create an XPath to process the li tags in the first div (since they are duplicate divs):
xpathSApply(doc, "//div[#id='lmblocks-mega-menu---locations'][1]/
div[#class='mm-location-usa']/
ul/
li[#class='mm-list-item']", xmlValue, trim = TRUE)
## [1] "Birmingham" "Mobile" "Anchorage"
## [4] "Phoenix" "Fayetteville" "Fresno"
## [7] "Irvine" "L.A. - Century City" "L.A. - Downtown"
## [10] "Sacramento" "San Diego" "San Francisco"
## [13] "San Jose" "Santa Maria" "Walnut Creek"
## [16] "Denver" "New Haven" "Washington, DC"
## [19] "Miami" "Orlando" "Atlanta"
## [22] "Chicago" "Indianapolis" "Overland Park"
## [25] "Lexington" "Boston" "Detroit"
## [28] "Minneapolis" "Kansas City" "St. Louis"
## [31] "Las Vegas" "Reno" "Newark"
## [34] "Albuquerque" "Long Island" "New York"
## [37] "Rochester" "Charlotte" "Cleveland"
## [40] "Columbus" "Portland" "Philadelphia"
## [43] "Pittsburgh" "San Juan" "Providence"
## [46] "Columbia" "Memphis" "Nashville"
## [49] "Dallas" "Houston" "Tysons Corner"
## [52] "Seattle" "Morgantown" "Milwaukee"
I made an assumption here that you're going after US locations.