We use HERE Geocoder Autocomplete API. We'd like to show housenumbers first and only if there are no corresponding housenumnbers - streets. Is there a way to do so?
For example, for the query 2625 I get two results with "matchLevel": "houseNumber" and all other results with "matchLevel": "street", but here.com DB contains much more addresses with 2526, for example:
USA, CA, 95134, San Jose, 2625 Zanker Rd
USA, CA, Santa Clara, 95054, Santa Clara, 2625 Augustine Dr
USA, CA, Santa Clara, 95051, Santa Clara, 2625 Keystone Ave
Can you please try to add Country filter parameter to your request as-
http://autocomplete.geocoder.api.here.com/6.2/suggest.json?app_id=XXX&app_code=YYY&query=2625&beginHighlight=<b>&endHighlight=</b>&country=USA
Related
I'm doing a project relate to GraphDB, and I'm using ArangoDB to create graph and try some queries with that . I have 2 Json files below and I have already imported them into ArangoDB and created graph ( airports : document collection, flights : edge collection).
I have 2 examples of AQL queries for the graph, but I'm struggling with converting them to Gremlin queries.
ex1 ( flights that leave JFK airport):
FOR v,e,p IN 1..1 OUTBOUND
'airport/JFK'
GRAPH 'flights'
RETURN p
ex2 ( flights from SF to KOA international airport and have VIP lounges):
FOR airport IN airports
FILTER airport.city == "San Francisco"
FILTER airport.VIP == true
FOR v,e,p IN 1..1 OUTBOUND
airport flights
FILTER v._id == 'airports/KOA'
RETURN p
Can you help me with this? thank you
You may find the examples here [1] of interest as they are specific to Gremlin and air route use cases. Without having your data model or some sample data it is not possible to give you a 100% accurate translation of your queries. However, the Gremlin will look something like this:
// FLights that leave JFK airport
g.V().has('airport','code','JFK').
out().
path()
// Flights from SFO to KOA that have a VIP lounge
g.V().has('airport','city','San Francisco).
has('VIP','true').
out().
has('code','KOA').
path()
[1] http://www.kelvinlawrence.net/book/PracticalGremlin.html
I am working on a sentence-level LDA in R and am currently trying to split my text data into individual sentences with the sent_detect() function from the openNLP package.
However, my text data contains a lot of abbreviations that have a "period symbol" but do not mark the end of a sentence. Here are some examples: "st. patricks day", "oxford st.", "blue rd.", "e.g."
Is there a way to create a gsub() function to account for such 2-character abbreviations and remove their "."-symbol so that it is not wrongly detected by the sent_detect() function? Unfortunately, these abbreviations are not always in between two words but sometimes they could indeed also mark the end of a sentence:
Example:
"I really liked Oxford st." - the "st." marks the end of a sentence and the "." should remain.
vs
"Oxford st. was very busy." - the "st." does not stand at the end of a sentence, thus, the "."-symbol should be replaced.
I am not sure whether there is a solution for this, but maybe someone else who is more familiar with sentence-level analysis knows a way of how to deal with such issues.
Thank you!
Looking at your previously asked questions, I would suggest looking into the textclean package. A lot of what you want has been included in that package. Any missing functions can be appropriated or reused or expanded upon.
Just replacing "st." with something is going to lead to problems as it could mean street or saint, but "st. patricks day" is easy to find. The problem what you will have is to make a list of possible occurences and find alternatives for them. The easiest to use are translation tables. Below I create a table for a few abbreviations and their expected long names. Now it is up to you (or your client) to specify what you want as an end result. The best way is to create a table in excel or database and load this into a data.frame (and store somewhere for easy access). Depending on your text this might be a lot of work, but it will improve the quality of your outcome.
Example:
library(textclean)
text <- c("I really liked Oxford st.", "Oxford st. was very busy.",
"e.g. st. Patricks day was on oxford st. and blue rd.")
# Create abbreviations table, making sure that we are looking for rd. and not just rd. Also should it be road or could it mean something else?
abbreviations <- data.frame(abbreviation = c("st. patricks day", "oxford st.", "rd\\.", "e.g."),
replacement = c("saint patricks day","oxford street","road", "eg"))
# I use the replace_contraction function since you can replace the default contraction table with your own table.
text <- replace_contraction(text, abbreviations)
text
[1] "I really liked oxford street" "oxford street was very busy."
[3] "eg saint patricks day was on oxford street and blue road"
# as the result from above show missing end marks we use the following function to add them again.
text <- add_missing_endmark(text, ".")
text
[1] "I really liked oxford street." "oxford street was very busy."
[3] "eg saint patricks day was on oxford street and blue road."
textclean has a range of replace_zzz functions, most are based on the mgsub function that is in the package. Check the documentation with all the functions to get an idea of what they do.
I have a large dataset of addresses that I plan to geocode in ArcGIS (Google geolocating is too expensive). Examples of the addresses are below.
9999 ST PAUL ST BSMT
GARRISON BL & BOARMAN AVENUE REAR
1234 MAIN STREET 123
1234 MAIN ST UNIT1
ArcGIS doesn't recognize addresses that include units and other words at the end. So I want to remove these words so that it looks like the below.
9999 ST PAUL ST
GARRISON BL & BOARMAN AVENUE
1234 MAIN STREET
1234 MAIN ST
The key challenges include
ST is used both to abbreviate streets and indicate "SAINT" in street names.
Addresses end in many different indicators such as STREET and AVENUE
There are intersections (indicated with &) that might include indicators like ST and AVENUE twice.
Using R, I'm attempting to apply the sub() function to solve the problem but I have not had success. Below is my latest attempt.
sub("(.*)ST","\\1",df$Address,perl=T)
I know that many questions ask similar questions but none address this problem directly and I suspect it is relevant to other users.
Although I feel removing the last word should work for you, but just to be little safer, you can use this regex to retain what you want and discard what you don't want in safer way.
(.*(?:ST|AVENUE|STREET)\b).*
Here, .*(?:ST|AVENUE|STREET)\b captures your intended data by capturing everything from start in greedy manner and only stop when it encounters any of those words ST or AVENUE or STREET (i.e. last occurrence of those words), and whatever comes after that, will be discarded which is what you wanted. In your current case you only have one word but it can discard more than one word or indeed anything that occurs after those specific words. Intended data gets captured in group 1 so just replace that with \1
So instead of this,
sub("(.*)ST","\\1",df$Address,perl=T)
try this,
sub("(.*(?:ST|AVENUE|STREET)\b).*","\\1",df$Address,perl=T)
See this demo
I'm looking for an R package that helps me to find the respective Metropolitan Statistical Areas (MSA) for input data in the form of [Cityname, State Abbreviation]. For instance: "New York, NY", "San Francisco, CA". I do not have: County name, ZIP Code, FIPS, or anything else.
What I found:
MSA to County Relationships (2015) are provided by the U.S. Census, as "Core based statistical areas (CBSAs), metropolitan divisions, and combined statistical areas (CSAs)". A "Principal cities of metropolitan and micropolitan statistical areas" (2015) list is available on the same page. In worst case, I imagine, one could take the second list, attach the state code to the "Principal City Name" field and then match the corresponding string with an MSA.
Before trying that, it would be good to know whether this problem has been solved already? I did not find the desired function in the noncensus or USCensus2010 package.
Question therefore: Do you know a package that matches (Principal) City with MSA?
Thanks!
What's the exchange suffix for German and Australian stocks for GoogleFinance API? For London stocks, it's .L (e.g. VOD.L). Just wonder what's the suffix for Germany and Aussie?
I tried something like .DE for German but it didn't work..(that's the exchange suffix for Yahoofinance anyway)
btw, below is my code to call GoogleFinance API with R
ticker <- "VOD.L"
a <- getSymbols(ticker, src="google",
from = as.Date("2010-01-01"), to = as.Date("2017-05-16"))
Here in Australia, our main exchange is the Australian Securities Exchange (ASX).
Personally, when I query Google Finance manually (i.e. through the web interface), I write my queries as ASX:WOW, like so. Note that some vendors treat this differently. E.g. Yahoo Finance prefers the WOW.AX convention (I believe Bloomberg does also, from memory).
Example for Germany (Software AG): ETR:SOW or FRA:SOW (ETR refers to the Xetra electronic exchange, where a large majority of the volume is nowadays traded. It is also the exchange that is most commonly used for reference data. FRA, on the other hand, refers to the "manual" trading floor. The main reason why you might sometimes want to use FRA is because it has longer trading hours than ETR. See here for more details.)
Example for Australia (Australia and New Zealand Banking Group): ASX:ANZ