I got all the " Google Map API Requests" in a row, but when I tried to loop to call and parse it. I am getting an error. If I don't use a loop and do it manually it works.
a <- c("1780 N Washington Ave Scranton PA 18509", "1858 Hunt Ave Bronx NY 10462", "140 N Warren St Trenton NJ 08608-1308")
#API Key need to be added to run:
w <- c("https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=19+East+34th+Street+New York+NY+10016&destinations=1780+N+Washington+Ave+Scranton+PA+18509&mode=transit&language=fr-FR&key=API_KEY_HERE",
"https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=19+East+34th+Street+New York+NY+10016&destinations=1858+Hunt+Ave+Bronx+NY+10462&mode=transit&language=fr-FR&key=API_KEY_HERE",
"https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=19+East+34th+Street+New York+NY+10016&destinations=140+N+Warren+St+Trenton+NJ+08608-1308&mode=transit&language=fr-FR&key=API_KEY_HERE")
df <- data.frame(a,w)
for (i in cpghq) {
url <- df$w
testdf <- jsonlite::fromJSON(url, simplifyDataFrame = TRUE)
list <- unlist(testdf$rows)
transit_time <- as.data.frame(t(as.data.frame(list)))
cpghq$transit_time <- transit_time
The error I get is:
Error: lexical error: invalid char in json text.
https://maps.googleapis.com/map
(right here) ------^
My API call was wrong because "New York" have space. I fixed using gsub("[[:space:]]", "+", a) , but also utils::URLencode() would have work.
Build the API call
a <- c("1780 N Washington Ave Scranton PA 18509", "1858 Hunt Ave Bronx NY 10462", "140 N Warren St Trenton NJ 08608-1308")
fix_address <- gsub("[[:space:]]", "+", a)
key <- "YOUR_GOOGLE_API_KEY_HERE"
travel_mode <- "transit"
root <- "https://maps.googleapis.com/maps/api/distancematrix/json
units=imperial&origins="
api_call <- paste0(root,"350+5th+Ave+New+York+NY+10118",
"&destinations=",
fix_address,
"&mode=",
travel_mode,
"&language=en-EN",
"&key=", key)
My problem with the loop was very simple. I wasn't using lapply()
Now used RSJSONIO::fromJSON to send the call to Google Map API
require("RJSONIO")
if(verbose) cat(address,"\n")
# Get json returns from Google
doc <- lapply(api_call, RCurl::getURL)
As pointed out in my other answer to you, you can also use my googleway to do the work for you.
library(googleway)
key <- "your_api_key"
a <- c("1780 N Washington Ave Scranton PA 18509",
"1858 Hunt Ave Bronx NY 10462",
"140 N Warren St Trenton NJ 08608-1308")
google_distance(origins = "350 5th Ave New York NY 10188",
destinations = as.list(a),
mode = "transit",
key = key,
simplify = T)
# $destination_addresses
# [1] "1780 N Washington Ave, Scranton, PA 18509, USA" "1858 Hunt Ave, Bronx, NY 10462, USA"
# [3] "140 N Warren St, Trenton, NJ 08608, USA"
#
# $origin_addresses
# [1] "Empire State Building, 350 5th Ave, New York, NY 10118, USA"
#
# $rows
# elements
# 1 ZERO_RESULTS, OK, OK, NA, 19.0 km, 95.8 km, NA, 18954, 95773, NA, 54 mins, 1 hour 44 mins, NA, 3242, 6260
#
# $status
# [1] "OK"
Related
Goal: To change a column of NAs in one dataframe based on a "key" in another dataframe (something like a VLookUp, except only in R)
Given df1 here (For Simplicity's sake, I just have 6 rows. The key I have is 50 rows for 50 states):
Index
State_Name
Abbreviation
1
California
CA
2
Maryland
MD
3
New York
NY
4
Texas
TX
5
Virginia
VA
6
Washington
WA
And given df2 here (This is just an example. The real dataframe I'm working with has a lot more rows) :
Index
State
Article
1
NA
Texas governor, Abbott, signs new abortion bill
2
NA
Effort to recall California governor Newsome loses steam
3
NA
New York governor, Cuomo, accused of manipulating Covid-19 nursing home data
4
NA
Hogan (Maryland, R) announces plans to lift statewide Covid restrictions
5
NA
DC statehood unlikely as Manchin opposes
6
NA
Amazon HQ2 causing housing prices to soar in northern Virginia
Task: To create an R function that loops and reads the state in each df2$Article row; then cross-reference it with df1$State_Name to replace the NAs in df2$State with the respective df1$Abbreviation key based on the state in df2$Article. I know it's quite a mouthful. I'm stuck with how to start, and finish this puzzle. Hard-coding is not an option as the real datasheet I have have thousands of rows like this, and will update as we add more articles to text-scrape.
The output should look like:
Index
State
Article
1
TX
Texas governor, Abbott, signs new abortion bill
2
CA
Effort to recall California governor Newsome loses steam
3
NY
New York governor, Cuomo, accused of manipulating Covid-19 nursing home data
4
MD
Hogan (Maryland, R) announces plans to lift statewide Covid restrictions
5
NA
DC statehood unlikely as Manchin opposes
6
VA
Amazon HQ2 causing housing prices to soar in northern Virginia
Note: The fifth entry with DC is intended to be NA.
Any links to guides, and/or any advice on how to code this is most appreciated. Thank you!
You can create create a regex pattern from the State_Name and use str_extract to extract it from Article. Use match to get the corresponding Abbreviation name from df1.
library(stringr)
df2$State <- df1$Abbreviation[match(str_extract(df2$Article,
str_c(df1$State_Name, collapse = '|')), df1$State_Name)]
df2$State
#[1] "TX" "CA" "NY" "MD" NA "VA"
You can also use inbuilt state.name and state.abb instead of df1 to get state name and abbreviations.
Here's a way to do this in for loop -
for(i in seq(nrow(df1))) {
inds <- grep(df1$State_Name[i], df2$Article)
if(length(inds)) df2$State[inds] <- df1$Abbreviation[i]
}
df2
# Index State Article
#1 1 TX Texas governor, Abbott, signs new abortion bill
#2 2 CA Effort to recall California governor Newsome loses steam
#3 3 NY New York governor, Cuomo, accused of manipulating Covid-19 nursing home data
#4 4 MD Hogan (Maryland, R) announces plans to lift statewide Covid restrictions
#5 5 <NA> DC statehood unlikely as Manchin opposes
#6 6 VA Amazon HQ2 causing housing prices to soar in northern Virginia
Not as concise as above but a Base R approach:
# Unlist handling 0 length vectors: list_2_vec => function()
list_2_vec <- function(lst){
# Coerce 0 length vectors to na values of the appropriate type:
# .zero_to_nas => function()
.zero_to_nas <- function(x){
if(identical(x, character(0))){
NA_character_
}else if(identical(x, integer(0))){
NA_integer_
}else if(identical(x, numeric(0))){
NA_real_
}else if(identical(x, complex(0))){
NA_complex_
}else if(identical(x, logical(0))){
NA
}else{
x
}
}
# Unlist cleaned list: res => vector
res <- unlist(lapply(lst, .zero_to_nas))
# Explictly define return object: vector => GlobalEnv()
return(res)
}
# Classify each article as belonging to the appropriate state:
# clean_df => data.frame
clean_df <- transform(
df2,
State = df1$Abbreviation[
match(
list_2_vec(
regmatches(
Article,
gregexpr(
paste0(df1$State_Name, collapse = "|"), Article
)
)
),
df1$State_Name
)
]
)
# Data:
df1 <- structure(list(Index = 1:6, State_Name = c("California", "Maryland",
"New York", "Texas", "Virginia", "Washington"), Abbreviation = c("CA",
"MD", "NY", "TX", "VA", "WA")), class = "data.frame", row.names = c(NA, -6L))
df2 <- structure(list(Index = 1:6, State = c(NA, NA, NA, NA, NA, NA),
Article = c("Texas governor, Abbott, signs new abortion bill",
"Effort to recall California governor Newsome loses steam",
"New York governor, Cuomo, accused of manipulating Covid-19 nursing home data",
"Hogan (Maryland, R) announces plans to lift statewide Covid restrictions",
"DC statehood unlikely as Manchin opposes", "Amazon HQ2 causing housing prices to soar in northern Virginia"
)), class = "data.frame", row.names = c(NA, -6L))
I have a long list of addresses. some of them only contains CA or USA or both.
What I need is I need to convert those to NA and leave other intact.
An example, I have the vector like below:
loc = c('CA, USA',
'USA',
'2 main st CA',
'35 1st ave CA, USA',
'CA')
What I need is:
loc = c( NA, NA, '2 main st CA',
'35 1st ave CA, USA', NA)
This is just an example. The actual list is very long.
Thanks a lot in advance.
nchar will count the letters in each element of the vector of strings.
ifelse(nchar(string) > 7, string, NA) #to account for spaces
string<-c('CA, USA',
'USA',
'2 main st CA',
'35 1st ave CA, USA',
'CA')
string
[1] "CA, USA" "USA" "2 main st CA"
[4] "35 1st ave CA, USA" "CA"
ifelse(nchar(string) > 7, string, NA)
[1] NA NA "2 main st CA"
[4] "35 1st ave CA, USA" NA
Or you can collapse all strings using:
st <- gsub(" ", "", gsub(",", "", string))
st
[1] "CAUSA" "USA" "2mainstCA" "351staveCAUSA"
[5] "CA"
replace(string, nchar(st) < 6, NA)
[1] NA NA "2 main st CA"
[4] "35 1st ave CA, USA" NA
Or if you know exactly your criteria:
ifelse((grepl("^USA$", st) | grepl("^CA$", st) |
grepl("^USACA$", st) | grepl("^CAUSA$", st)), NA, string)
[1] NA NA "2 main st CA"
[4] "35 1st ave CA, USA" NA
If the pattern you want to retain always starts with a number, then you can use this
> loc[grep("^\\d", loc, invert = T)] <- NA
> loc
[1] NA NA "2 main st CA" "35 1st ave CA, USA" NA
I have some property sale data downloaded from Internet. It is a PDF file. When I copy and paste the data into a text file, it looks like this:
> a
[1] "Airport West 1/26 Cameron St 3 br t $830000 S Nelson Alexander" "Albert Park 106 Graham St 2 br h $0 SP RT Edgar"
Let's take the first line as an example. Every row is a record of a property, including suburb (Airport West), address (1/26 Cameron St), the count of bedrooms (3), property type (t), price ($830000), sale type (S). The last one (Nelson) is about the agent, which I do not need here.
I want to analyse this data. I need to extract the information first. I hope I can get the data like this: (b is a data frame)
> b
Suburb Address Bedroom PropertyType Price SoldType
1 Airport West 1/26 Cameron St 3 t 830000 S
2 Albert Park 106 Graham St 2 h 0 SP
Could anyone please tell me how to use stringr package or other methods to split the long string into the sub strings that I need?
1) gsubfn::read.pattern read.pattern in the gsubfn package takes a regular expression whose capture groups (the parts within parentheses) are taken to be the fields of the input and a data frame is created to assemble them.
library(gsubfn)
pat <- "^(.*?) (\\d.*?) (\\d) br (.) [$](\\d+) (\\w+) .*"
cn <- c("Suburb", "Address", "Bedroom", "PropertyType", "Price", "SoldType")
read.pattern(text = a, pattern = pat, col.names = cn, as.is = TRUE)
giving this data.frame:
Suburb Address Bedroom PropertyType Price SoldType
1 Airport West 1/26 Cameron St 3 t 830000 S
2 Albert Park 106 Graham St 2 h 0 SP
2) no packages This could also be done without any packages like this (pat and cn are from above):
replacement <- "\\1,\\2,\\3,\\4,\\5,\\6"
read.table(text = sub(pat, replacement, a), col.names = cn, as.is = TRUE, sep = ",")
Note: The input a in reproducible form is:
a <- c("Airport West 1/26 Cameron St 3 br t $830000 S Nelson Alexander",
"Albert Park 106 Graham St 2 br h $0 SP RT Edgar")
Need to read the txt file in
https://raw.githubusercontent.com/fonnesbeck/Bios6301/master/datasets/addr.txt
and convert them into a data frame R with column number as: LastName, FirstName, streetno, streetname, city, state, and zip...
Tried to use sep command to separate them but failed...
Expanding on my comments, here's another approach. You may need to tweak some of the code if your full data set has a wider range of patterns to account for.
library(stringr) # For str_trim
# Read string data and split into data frame
dat = readLines("addr.txt")
dat = as.data.frame(do.call(rbind, strsplit(dat, split=" {2,10}")), stringsAsFactors=FALSE)
names(dat) = c("LastName", "FirstName", "address", "city", "state", "zip")
# Separate address into number and street (if streetno isn't always numeric,
# or if you don't want it to be numeric, then just remove the as.numeric wrapper).
dat$streetno = as.numeric(gsub("([0-9]{1,4}).*","\\1", dat$address))
dat$streetname = gsub("[0-9]{1,4} (.*)","\\1", dat$address)
# Clean up zip
dat$zip = gsub("O","0", dat$zip)
dat$zip = str_trim(dat$zip)
dat = dat[,c(1:2,7:8,4:6)]
dat
LastName FirstName streetno streetname city state zip
1 Bania Thomas M. 725 Commonwealth Ave. Boston MA 02215
2 Barnaby David 373 W. Geneva St. Wms. Bay WI 53191
3 Bausch Judy 373 W. Geneva St. Wms. Bay WI 53191
...
41 Wright Greg 791 Holmdel-Keyport Rd. Holmdel NY 07733-1988
42 Zingale Michael 5640 S. Ellis Ave. Chicago IL 60637
Try this.
x<-scan("https://raw.githubusercontent.com/fonnesbeck/Bios6301/master/datasets/addr.txt" ,
what = list(LastName="", FirstName="", streetno="", streetname="", city="", state="",zip=""))
data<-as.data.frame(x)
I found it easiest to fix up the file into a csv by adding the commas where they belong, then read it.
## get the page as text
txt <- RCurl::getURL(
"https://raw.githubusercontent.com/fonnesbeck/Bios6301/master/datasets/addr.txt"
)
## fix the EOL (end-of-line) markers
g1 <- gsub(" \n", "\n", txt, fixed = TRUE)
## read it
df <- read.csv(
## add most comma-separators, then the last for the house number
text = gsub("(\\d+) (\\D+)", "\\1,\\2", gsub("\\s{2,}", ",", g1)),
header = FALSE,
## set the column names
col.names = c("LastName", "FirstName", "streetno", "streetname", "city", "state", "zip")
)
## result
head(df)
# LastName FirstName streetno streetname city state zip
# 1 Bania Thomas M. 725 Commonwealth Ave. Boston MA O2215
# 2 Barnaby David 373 W. Geneva St. Wms. Bay WI 53191
# 3 Bausch Judy 373 W. Geneva St. Wms. Bay WI 53191
# 4 Bolatto Alberto 725 Commonwealth Ave. Boston MA O2215
# 5 Carlstrom John 933 E. 56th St. Chicago IL 60637
# 6 Chamberlin Richard A. 111 Nowelo St. Hilo HI 96720
Here your problem is not how to use R to read in this data, but rather it's that your data is not sufficiently structured using regular delimiters between the variable-length fields you have as inputs. In addition, the zip code field contains some alpha "O" characters that should be "0".
So here is a way to use regular expression substitution to add in delimiters, and then parse the delimited text using read.csv(). Note that depending on exceptions in your full set of text, you may need to adjust the regular expressions. I have done them step by step here to make it clear what is being done and so that you can adjust them as you find exceptions in your input text. (For instance, some city names like `Wms. Bay" are two words.)
addr.txt <- readLines("https://raw.githubusercontent.com/fonnesbeck/Bios6301/master/datasets/addr.txt")
addr.txt <- gsub("\\s+O(\\d{4})", " 0\\1", addr.txt) # replace O with 0 in zip
addr.txt <- gsub("(\\s+)([A-Z]{2})", ", \\2", addr.txt) # state
addr.txt <- gsub("\\s+(\\d{5}(\\-\\d{4}){0,1})\\s*", ", \\1", addr.txt) # zip
addr.txt <- gsub("\\s+(\\d{1,4})\\s", ", \\1, ", addr.txt) # streetno
addr.txt <- gsub("(^\\w*)(\\s+)", "\\1, ", addr.txt) # LastName (FirstName)
addr.txt <- gsub("\\s{2,}", ", ", addr.txt) # city, by elimination
addr <- read.csv(textConnection(addr.txt), header = FALSE,
col.names = c("LastName", "FirstName", "streetno", "streetname", "city", "state", "zip"),
stringsAsFactors = FALSE)
head(addr)
## LastName FirstName streetno streetname city state zip
## 1 Bania Thomas M. 725 Commonwealth Ave. Boston MA 02215
## 2 Barnaby David 373 W. Geneva St. Wms. Bay WI 53191
## 3 Bausch Judy 373 W. Geneva St. Wms. Bay WI 53191
## 4 Bolatto Alberto 725 Commonwealth Ave. Boston MA 02215
## 5 Carlstrom John 933 E. 56th St. Chicago IL 60637
## 6 Chamberlin Richard A. 111 Nowelo St. Hilo HI 96720
I want to split a street address into street name and street number in r.
My input data has a column that reads for example
Street.Addresses
205 Cape Road
32 Albany Street
cnr Kempston/Durban Roads
I want to split the street number and street name into two separate columns, so that it reads:
Street Number Street Name
205 Cape Road
32 Albany Street
cnr Kempston/Durban Roads
Is it in anyway possible to split the numeric value from the non numeric entries in a factor/string in R?
Thank you
you can try:
y <- lapply(strsplit(x, "(?<=\\d)\\b ", perl=T), function(x) if (length(x)<2) c("", x) else x)
y <- do.call(rbind, y)
colnames(y) <- c("Street Number", "Street Name")
hth
I'm sure that someone is going to come along with a cool regex solution with lookaheads and so on, but this might work for you:
X <- c("205 Cape Road", "32 Albany Street", "cnr Kempston/Durban Roads")
nonum <- grepl("^[^0-9]", X)
X[nonum] <- paste0(" \t", X[nonum])
X[!nonum] <- gsub("(^[0-9]+ )(.*)", "\\1\t\\2", X[!nonum])
read.delim(text = X, header = FALSE)
# V1 V2
# 1 205 Cape Road
# 2 32 Albany Street
# 3 NA cnr Kempston/Durban Roads
Here is another way:
df <- data.frame (Street.Addresses = c ("205 Cape Road", "32 Albany Street", "cnr Kempston/Durban Roads"),
stringsAsFactors = F)
new_df <- data.frame ("Street.Number" = character(),
"Street.Name" = character(),
stringsAsFactors = F)
for (i in 1:nrow (df)) {
new_df [i,"Street.Number"] <- unlist(strsplit (df[["Street.Addresses"]], " ")[i])[1]
new_df [i,"Street.Name"] <- paste (unlist(strsplit (df[["Street.Addresses"]], " ")[i])[-1], collapse = " ")
}
> new_df
Street.Number Street.Name
1 205 Cape Road
2 32 Albany Street
3 cnr Kempston/Durban Roads