Incorrect NYC Subway data from HERE - here-api

Trying to get the specific subway lines that service a subway station in NYC given a lat/long. HERE is returning some data, but it's incomplete.
I'm using the following endpoint--
https://transit.hereapi.com/v8/stations?apiKey=xxxxxxxxxxxxxx&in=40.734376,-73.990714&return=transport
It's returning the bus stations as well, but the only one I care about is
{"place":{"name":"Union Sq - 14 St","type":"station","location":{"lat":40.734789,"lng":-73.99073},"id":"717081137"},"transports":[{"mode":"subway","name":"L","color":"#A7A9AC","textColor":"#000000","headsign":"8 Av"},{"mode":"subway","name":"L","color":"#A7A9AC","textColor":"#000000","headsign":"Canarsie - Rockaway Pkwy"},{"mode":"subway","name":"L","color":"#A7A9AC","textColor":"#000000","headsign":"Myrtle - Wyckoff Avs"}]}
The Union Square 14th st subway station has the L/N/Q/R/W/4/5/6 subway lines. Is this an error with HERE data or am I missing something in my query?

Your co-ordinates for the Union Square 14th st subway station seems misplaced. You can get all the subway lines L/N/Q/R/W/4/5/6 by below query:
https://transit.hereapi.com/v8/stations?apiKey=YOUR_API_KEY&return=transport&in=40.735088,-73.989952

Related

Changing the value in one column based on a subset of values of another column in r

I have a dataset that contains columns city and country. Some of the country columns are incorrectly mislabelled as 'Other'. I know this because some of the city values contain labels like saddle lake (Canada). Is there a way I can search a subset of the value in the city to change the value in Country. IE search for any city value containing the word 'Canada' and change country to 'Canada'. I'd like to do this for multiple countries including the USA and UK. which might mean my search would need an 'or' element and search usa, US, USA etc
Current dataset:
City - Country
Saddle(Canada) - Other
Dublin - Other
Detroit - USA
Vancouver - Canada
NYC: US - Other
Output:
Saddle(Canada) - Canada
Dublin -Other
Detroit - USA
Vancouver - Canada
NYC: US - USA
I've played around with if statements using grep() but no success.
Edit: some code I have tried:
for (i in Data$city){
if (Data$city == '.*canada*.'){
Data$country = Canada
}}

How to have all elements of text file processed in R

I have a single text file, NPFile, that contains 100 different newspaper articles that is 3523 lines in length. I am trying to pick out and parse different data fields for each article for text processing. These fields are: Full text: Publication date:, Publication title: etc....
I am using grep to pick out the different lines that contain the data fields I want. Although I can get the line numbers (start and end positions of the fields), I am getting an error when I try to use the line numbers to extract the actual text and put it into a vector:
#Find full text of article, clean and store in a variable
findft<-grep ('Full text:', NPFile, ignore.case=TRUE)
endft<-grep ('Publication date:', NPFile)
ftfield<-(NPFile[findft:endft])
The last line ftfield<-(NPFile[findft:endft] is giving this warning message:
1: In findft:endft :
numerical expression has 100 elements: only the first used
The starting findft and ending points endft each contain 100 elements, but as the warning indicated, ftfield only contains the first element (which is 11 lines in length). I was assuming (wrongly/mistakenly) that the respective lines for each 100 instances of the full text field would be extracted and stored in ftfield - but obviously I have not coded this correctly. Any help would be appreciated.
Example of Data (These are the fields and data associated with one of the 100 in the text file):
Waiting for the 500-year flood; Red River rampage: Severe weather events, new records are more frequent than expected.
Full text: AS THE RED River raged over makeshift dikes futilely erected against its wrath in North Dakota, drowning cities beneath a column of water 26 feet above flood level, meteorologists were hard pressed to describe its magnitude in human chronology.
A 500-year flood, some call it, a catastrophic weather event that would have occurred only once since Christopher Columbus arrived on the shores of the New World. Whether it could be termed a 700-year flood or a 300-year flood is open to question.
The flood's size and power are unprecedented. While the Red River has ravaged the upper Midwest before, the height of the flood crest in Fargo and Grand Forks has been almost incomprehensible.
But climatological records are being broken more rapidly than ever. A 100-year-storm may as likely repeat within a few years as waiting another century. It is simply a way of classifying severity, not the frequency. "There isn't really a hundred-year event anymore," states climatologist Tom Karl of the National Oceanic and Atmospheric Administration.
Reliable, consistent weather records in the U.S. go back only 150 years or so. Human development has altered the Earth's surface and atmosphere, promoting greater weather changes and effects than an untouched environment would generate by itself.
What might be a 500-year event in the Chesapeake Bay is uncertain. Last year was the record for freshwater gushing into the bay. The January 1996 torrent of melted snowfall into the estuary recorded a daily average that exceeded the flow during Tropical Storm Agnes in 1972, a benchmark for 100-year meteorological events in these parts. But, according to the U.S. Geological Survey, the impact on the bay's ecosystem was not as damaging as in 1972.
Sea level in the Bay has risen nearly a foot in the past century, three times the rate of the past 5,000 years, which University of Maryland scientist Stephen Leatherman ties to global climate warming. Estuarine islands and upland shoreline are eroding at an accelerated pace.
The topography of the bay watershed is, of course, different from that of the Red River. It's not just flow rates and rainfall, but how the water is directed and where it can escape without intruding too far onto dry land. We can only hope that another 500 years really passes before the Chesapeake region is so tested.
Pub Date: 4/22/97
Publication date: Apr 22, 1997
Publication title: The Sun; Baltimore, Md.
Title: Waiting for the 500-year flood; Red River rampage: Severe weather events, new records are more frequent than expected.:   [FINAL Edition ]
From this data example above, ftfield has 11 lines when I examined it:
[1] "Full text: AS THE RED River raged over makeshift dikes futilely erected against its wrath in North Dakota, drowning cities beneath a column of water 26 feet above flood level, meteorologists were hard pressed to describe its magnitude in human chronology."
[2] "A 500-year flood, some call it, a catastrophic weather event that would have occurred only once since Christopher Columbus arrived on the shores of the New World. Whether it could be termed a 700-year flood or a 300-year flood is open to question."
[3] "The flood's size and power are unprecedented. While the Red River has ravaged the upper Midwest before, the height of the flood crest in Fargo and Grand Forks has been almost incomprehensible."
[4] "But climatological records are being broken more rapidly than ever. A 100-year-storm may as likely repeat within a few years as waiting another century. It is simply a way of classifying severity, not the frequency. \"There isn't really a hundred-year event anymore,\" states climatologist Tom Karl of the National Oceanic and Atmospheric Administration."
[5] "Reliable, consistent weather records in the U.S. go back only 150 years or so. Human development has altered the Earth's surface and atmosphere, promoting greater weather changes and effects than an untouched environment would generate by itself."
[6] "What might be a 500-year event in the Chesapeake Bay is uncertain. Last year was the record for freshwater gushing into the bay. The January 1996 torrent of melted snowfall into the estuary recorded a daily average that exceeded the flow during Tropical Storm Agnes in 1972, a benchmark for 100-year meteorological events in these parts. But, according to the U.S. Geological Survey, the impact on the bay's ecosystem was not as damaging as in 1972."
[7] "Sea level in the Bay has risen nearly a foot in the past century, three times the rate of the past 5,000 years, which University of Maryland scientist Stephen Leatherman ties to global climate warming. Estuarine islands and upland shoreline are eroding at an accelerated pace."
[8] "The topography of the bay watershed is, of course, different from that of the Red River. It's not just flow rates and rainfall, but how the water is directed and where it can escape without intruding too far onto dry land. We can only hope that another 500 years really passes before the Chesapeake region is so tested."
[9] "Pub Date: 4/22/97"
[10] ""
[11] "Publication date: Apr 22, 1997"
And, lastly, findft[1] corresponds with endft[1] and so on until findft[100] and endft[100].
I'll assume that findft will contain several indexes as well as endft. I'm also assuming that both of them have the same length and that they are paired by the same index ( e.g. findft[5] corresponds to endft[5]) and that you want all NPfile elements between these two indexes as well as the other pairs.
If this is so, try:
ftfield = lapply(1:length(findft), function(x){ NPFile[findft[x]:endft[x]] })
This will return a list. I can't guarantee that this will work because there is no data example to work with.
We can do this with Map. Get the sequence of values for each corresponding element of 'findft' to 'endft', then subset the 'NPFile' based on that index
Map(function(x, y) NPFile[x:y], findft, endft)

How do I convert city names to time zones?

Sorry if this is repetitive, but I've looked everywhere and can't seem to find anything that addresses my specific problem in R. I have a column with city names:
cities <-data.frame(c("Sydney", "Dusseldorf", "LidCombe", "Portland"))
colnames(cities)[1]<-"CityName"
Ideally I'd like to attach a column with either the lat/long for each city or the time zone. I have tried using the "ggmap" package in R, but my request exceeds the maximum number of requests they allow per day. I found the "geonames" package that converts lat/long to timezones, so if I get the lat/long for the city I should be able to take it from there.
Edit to address potential duplicate question: I would like to do this without using the ggmap package, as I have too many rows and they have a maximum # of requests per day.
You can get at least many major cities from the world.cities data in the maps package.
## Changing your data to a vector
cities <- c("Sydney", "Dusseldorf", "LidCombe", "Portland")
## Load up data
library(maps)
data(world.cities)
world.cities[match(cities, world.cities$name), ]
name country.etc pop lat long capital
36817 Sydney Australia 4444513 -33.87 151.21 0
10026 Dusseldorf Germany 573521 51.24 6.79 0
NA <NA> <NA> NA NA NA NA
29625 Portland Australia 8757 -38.34 141.59 0
Note: LidCombe was not included.
Warning: For many names, there is more than one world city. For example,
world.cities[grep("Portland", world.cities$name), ]
name country.etc pop lat long capital
29625 Portland Australia 8757 -38.34 141.59 0
29626 Portland USA 542751 45.54 -122.66 0
29627 Portland USA 62882 43.66 -70.28 0
Of course the two in the USA are Portland, Maine and Portland, Oregon.
match is just giving the first one on the list. You may need to use more information than just the name to get a good result.

How do I preserve prexisting identifiers when geocoding a list of addresses in R?

I'm currently working with an R script set up to use RDSTK, a wrapper for the Data Science Toolkit API based on this, to geocode a list of addresses from a CSV.
The script appears to work, but the list of addresses has a preexisting unique identifier which isn't preserved in the process - the input file has two columns: id, and address. The id column, for the purposes of the geocoding process, is meaningless, but I'd like the output to retain it - that is, I'd like the output, which has three columns (address, long, and lat) to have four - id being the first.
The issue is that
The output is not in the same order as the input addresses, or doesn't appear to be, so I cannot simply tack on the column of addresses at the end, and
The output does not include nulls, so the two would not be the same number of rows in any case, even if it was the same order, and
I am not sure how to effectively tie the id column in such that it becomes a part of the geocoding process, which obviously would be the ideal solution.
Here is the script:
require("RDSTK")
library(httr)
library(rjson)
dff = read.csv("C:/Users/name/Documents/batchtestv2.csv")
data <- paste0("[",paste(paste0("\"",dff$address,"\""),collapse=","),"]")
url <- "http://www.datasciencetoolkit.org/street2coordinates"
response <- POST(url,body=data)
json <- fromJSON(content(response,type="text"))
geocode <- do.call(rbind,lapply(json, function(x) c(long=x$longitude,lat=x$latitude)))
geocode
write.csv(geocode, file = "C:/Users/name/Documents/geocodetest.csv")
And here is a sample of the output:
2633 Camino Ramon Suite 500 San Ramon California 94583 United States -121.96208 37.77027
555 Lordship Boulevard Stratford Connecticut 6615 United States -73.14098 41.16542
500 West 13th Street Fort Worth Texas 76102 United States -97.33288 32.74782
50 North Laura Street Suite 2500 Jacksonville Florida 32202 United States -81.65923 30.32733
7781 South Little Egypt Road Stanley North Carolina 28164 United States -81.00597 35.44482
Maybe the solution is extraordinarily simple and I'm just being dense - it's entirely possible (I don't have extensive experience with any particular language, so I sometimes miss obvious things) but I haven't been able to solve it.
Thanks in advance!

Extracting String in R

I am wanting to extract strings from elements in a data frame. Having gone through numerous previous questions, I am still unable to understand what to do! This is what I have tried to do so far:
unlist(strsplit(pcode2$Postcode,"'"))
I get the following error:
Error in strsplit(pcode2$Postcode, "'") : non-character argument
which I understand because I am trying to reference the data rather than putting the text in the code itself. I have 16,000 cases in a dataframe so also not sure how to vectorise the operation.
Any help would be greatly appreciated.
Data:
Postcode Locality State Latitude Longitude
1 ('200', Australian National University ACT -35.280, 149.120),
2 ('221', Barton ACT -35.200, 149.100),
3 ('3030', Werribee VIC -12.800, 130.960),
4 ('3030', Point Cook VIC -12.800, 130.960),
I want to get rid of the commas and braces etc so that I am left with the numeric part of Column 1 which is Postcode, numeric part of Latitude andLongitude. This is how the I am hoping the final result will look like:
Postcode Locality State Latitude Longitude
1 200 Australian National University ACT -35.280 149.120
2 221 Barton ACT -35.200 149.100
3 3030 Werribee VIC -12.800 130.960
4 3030 Point Cook VIC -12.800 130.960
Lastly, I would also like to understand how to nicely format the data in the questions.

Resources