I am pulling drive times and distances using the distance matrix API in R. Currently I am utilizing the mapdist function which returns the time and distance for an origin and destination.
However, I also found the gmapsdistance function in R. This function also allows for further parameters such as eliminating ferry/toll routes ect (which may become important in the future).
I was curious to whether one is better than the other. So I utilized the same origin and destination to see whether the results are the same using the code below:
mapdist(from = "02081", to = "89801", mode = "driving")
gmapsdistance(origin = "02081", destination = "89801", mode = "driving",
key = "XXXXXXXXXXXXXXXXXXX")
And the following outputs were returned:
# A tibble: 1 x 9
from to m km miles seconds minutes hours mode
<chr> <chr> <int> <dbl> <dbl> <int> <dbl> <dbl> <chr>
1 02081 89801 4185463 4185. 2601. 141708 2362. 39.4 driving
> gmapsdistance(origin = "02081", destination = "89801", mode = "driving",
+ key = "XXXXXXXXXXXXXXXX")
$Time
[1] 141630
$Distance
[1] 4185370
$Status
[1] "OK"
I though the distances and times would return the same value, but they are slightly off? Has anybody encountered this and have any idea why they may return different values?
Also - Is it possible to eliminate the possibility of toll routes/ferry routes using the mapdist function?
Related
I need help merging two data frames with R. I'm a little desperate, since I have tried everthing I could. Any help would be appreciated.
The thing is that I'm doing some daily web scraping, and I need to compare today's results whith yesterday's results in order to to detect if there have been any changes.
I only have two variables (title of the page and url) in two dataframes (one for today and one for yesterday), and I want to merge them in one.
The possible changes are:
Changes in the name.
Changes in the url.
New programs (new name and new url).
Deleted programs.
I've tried with merge, cast & melt, ifelse, etc. etc. and I can't solve the problem. For example:
yesterday <- read.csv2("Yesterday.csv")
today <- read.csv2("Today.csv")
new <- merge(x = today, y = yesterday, all = TRUE, sort = TRUE)
But without the desired result. I'm attaching three files:
Today.csv, with the results of today scraping
Yesterdat.csv, with the results of yesterday scraping
Results.xlsx with the desired output. A VLOOKUP in Excel, highlighting the changes I want to detect (in this case name changes).
I would need a solution for the four changes options. The output could be different, I don't care about that, but I need the comparison to be correct Even if you found that this question is duplicated I would need the link to the other one, because I haven't been able to find it.
Thanks in advance.
Answer is updated in response to the comments bellow:
library(tidyverse)
bind_rows(
anti_join(today, yest) %>%
mutate(
label = ifelse(programa %in% yest$programa, 'changed', 'added')
),
anti_join(yest, select(today, programa)) %>% mutate(label = "deleted")
)
Which, while applying it to the whole data sets, returns following results:
# # A tibble: 6 x 3
# programa url label
# <chr> <chr> <chr>
# 1 Carrera de Derecho a distancia |~ https://universidadeuropea.es/onlin~ added
# 2 "Carrera de Criminolog\xeda a di~ https://universidadeuropea.es/onlin~ added
# 3 "Carrera Ingenier\xeda Inform\xe~ https://universidadeuropea.es/onlin~ added
# 4 Grado en Derecho a distancia | U~ https://universidadeuropea.es/onlin~ dele~
# 5 "Grado en Criminolog\xeda a dist~ https://universidadeuropea.es/onlin~ dele~
# 6 "Grado Ingenier\xeda Inform\xe1t~ https://universidadeuropea.es/onlin~ dele~
In order to check, if it is able to register changes in the programm, we can do following:
yest[22, 2] <- yest[23, 2]
Piping the changed data into the code above, returns table with additional record, labelled as changed:
# # A tibble: 7 x 3
# programa url label
# <chr> <chr> <chr>
# 1 "M\xe1ster en Direcci\xf3n Hotel~ https://universidadeuropea.es/onlin~ chan~
# 2 Carrera de Derecho a distancia |~ https://universidadeuropea.es/onlin~ added
# 3 "Carrera de Criminolog\xeda a di~ https://universidadeuropea.es/onlin~ added
# 4 "Carrera Ingenier\xeda Inform\xe~ https://universidadeuropea.es/onlin~ added
# 5 Grado en Derecho a distancia | U~ https://universidadeuropea.es/onlin~ dele~
# 6 "Grado en Criminolog\xeda a dist~ https://universidadeuropea.es/onlin~ dele~
# 7 "Grado Ingenier\xeda Inform\xe1t~ https://universidadeuropea.es/onlin~ dele~
Explanation:
Everything enclosed inside bind_rows() is combined into the single tibble. As far as we have two separate anti_join() statements here, and each of them returns it's own tibble, we have to rbind them into the one;
anti_join() is a set operation, which, giving two sets A and B, returns another set C which is subset of A but not subset of B. In other words, C is the difference between A and B.
When we call anti_join(today, yest) we obtain a subset of today with records either not present in yest at all, or those with program or url changed comparing to yest. We pipe those results into mutate() call, and assign the value changed to label, if the value of programa is the same as yesterday (programa %in% yest$programa), while url value was changed. If programa %in% yest$programa is FALSE, it means that program name wasn't present in yest so it is a new program, and we label it as added.
When we call anti_join() for a second time, we are looking for the difference between yest and today program names. In other words: 'Which programs present in yest are not present in today?' We achieve this by looking for subset of yest with program names which are not in program names of today (that's why you need to select(today, programa)). If any of such records where detected, they are labeled by deleted.
Sorry if this explanation is somewhat clumsy, but I hope it will help you to navigate the code.
Data:
tmp <- tempfile()
download.file(
"https://drive.google.com/uc?authuser=0&id=1scYdZrGYaSDr-TE8IZsy1tKSdLjMn7jt&export=download",
tmp
)
today <- read_delim(tmp, delim = ";")
download.file(
"https://drive.google.com/uc?authuser=0&id=1uJ-ThiKykTjoY1gc3jlBHoab8WAJD-wP&export=download",
tmp
)
yest <- read_delim(tmp, delim = ";")
file.remove(tmp)
Thanks for helping me with my first Stack Overflow question. I am trying to retrieve all the data from several fields in an Object called "Applied Questionnaire"; however, I do not want to retrieve any records that have the name "Training Site".
Currently, this is my code, which works:
quarterly_site_scores = rforcecom.retrieve(session, "AppliedQuestionnaire__c",
c("Site__c", "Site_Name__c", "Total_Score__c"))
%>% rename(site_id = Site__c, site_name = Site_Name__c)
quarterly_site_scores = quarterly_site_scores[!(quarterly_site_scores$site_name == "TRAINING PARK SITE" |
quarterly_site_scores$status != "Completed"),]
However, I'm wondering if there's a more elegant, streamlined solution here. Can I filter at the same time I retrieve? Or is there a better way to filter here?
(I've simplified the code here - I'm actually pulling in about ten fields and filtering on about five or six criteria, just in this one example).
Thank you.
Adding what the OP discovered as an answer using the salesforcer package which returns the SOQL resultset as a tbl_df.
library(salesforcer)
library(tidyverse)
sf_auth(username, password, security_token)
# list all object names in a Salesforce org
ped_objects <- sf_list_objects() %>% .$sobjects %>% map_chr(~pluck(., "name"))
# list all the fields on a particular object
fields <- sf_describe_object_fields('AppliedQuestionnaireBundle2__c')
# write a query to retrieve certain records from that object
site_scores_soql <- "SELECT Site__c,
Site_Name__c,
Total_Score__c
FROM AppliedQuestionnaireBundle2__c
WHERE Site_Name__c != 'GENERIC SITE'
AND Site_Name__c != 'TRAINING PARK SITE'
AND Status__c = 'Completed'"
# run the query
quarterly_site_scores <- sf_query(site_scores_soql)
quarterly_site_scores
#> # A tibble: 3 x 3
#> Site__c Site_Name__c Total_Score__c
#> <chr> <chr> <dbl>
#> 1 A Site Name1 78
#> 2 B Site Name2 52
#> 3 C Site Name3 83
I am trying to retrieve information from an API, which gives the name of the product from the barcode, through the API.
I am using the httr::GET().
The URL needed for the API contains the barcode itself, but I do not know how to automate the system so it can read the barcode contained in every entry, and plugging it into the url without me copying and pasting the code manually in the script.
one_code <- GET("api.upcdatabase.org/json/aebfed7a26f24a05efd7f77749dc2fcc/…")
result <- content(one_code)
result$description
A couple extra things to consider.
First, the site provides https for the API so it should be used since you're exposing your API key on any network you make requests from otherwise.
Test the core HTTP status code and halt on major HTTP errors (not API errors).
You should also put your API key in something like an environment variable so it never ends up in scripts or GitHub repo commits. Use ~/.Renviron (make a single line entry for UPCDATABASE_API_KEY=your_key and then restart R).
You should handle error and success conditions and consider returning a data frame so you can have all the fields in a tidy, accessible fashion.
Finally, do some basic type conversion prior to returning the values to make return field values easier to use.
library(httr)
library(jsonlite)
library(purrr)
get_upc_code_info <- function(code, api_key=Sys.getenv("UPCDATABASE_API_KEY")) {
URL <- sprintf("https://api.upcdatabase.org/json/%s/%s", api_key, code)
res <- GET(URL)
stop_for_status(res)
res <- content(res, as="text", encoding="UTF-8")
res <- fromJSON(res, flatten=TRUE)
if (res$valid == "true") {
res <- flatten_df(res)
res$valid <- TRUE
res$avg_price <- as.numeric(res$avg_price)
res$rate_up <- as.numeric(res$rate_up)
res$rate_down <- as.numeric(res$rate_down)
return(res)
} else {
message(res$reason)
return(data.frame(number = code, valid = FALSE, stringsAsFactors=FALSE))
}
}
xdf <- get_upc_code_info("0111222333446")
dplyr::glimpse(xdf)
## Observations: 1
## Variables: 8
## $ valid <lgl> TRUE
## $ number <chr> "0111222333446"
## $ itemname <chr> "UPC Database Testing Code"
## $ alias <chr> "Testing Code"
## $ description <chr> "http://upcdatabase.org/code/0111222333446"
## $ avg_price <dbl> 123.45
## $ rate_up <dbl> 14
## $ rate_down <dbl> 3
Similar to what Aurèle suggested, you can use the function to make it easier to get multiple codes. Since this function returns a data frame, you can easily get a larger, complete data frame from individual lookups with purrr::map_df():
codes <- c("0057000006976", "3228881010711", "0817346023170", "44xx4444444")
xdf <- map_df(codes, get_upc_code_info)
dplyr::glimpse(xdf)
## Observations: 4
## Variables: 8
## $ valid <lgl> TRUE, TRUE, TRUE, FALSE
## $ number <chr> "0057000006976", "3228881010711", "0817346023170",...
## $ itemname <chr> "Heinz Original Beans (Pork & Molasses)", "Lip...
## $ alias <chr> "", "", "", NA
## $ description <chr> "", "Boîte de 20 sachets", "", NA
## $ avg_price <dbl> NA, NA, 39.99, NA
## $ rate_up <dbl> 0, 0, 1, NA
## $ rate_down <dbl> 0, 0, 0, NA
Consider putting finishing touches on this, adding a function to POST to the API, possibly make a Shiny app so folks can submit new entries through R and turning it into a package. You might even get extra free credits on the site if you do so.
Store your barcodes in one data structure (list or vector):
barcodes <- c(
"aebfed7a26f24a05efd7f77749dc2fcc",
"xyz1234567f24a05efd7f77749dc2fcc",
"pqr9876543f24a05efd7f77749dc2fcc"
)
Write a function:
scrape <- function(barcode) {
sample=GET(paste0("api.upcdatabase.org/json/", barcode, "/rest/of/the/url"))
result=content(sample)
result$description
}
And apply:
res <- lapply(barcodes, scrape)
The results are stored in a single list, so that they're easier to manipulate.
I'm trying to process multiple operations in GRASS 7.2.0 using R 3.3.2 with the rgrass7 library.
My main goal is to calculate road network distances between multiple locations with GRASS network tools.
My problem is that the locations in my database are organized by individuals and I want to calculate network distances for locations nested in individual, but not for all the locations in the database.
Here's an example of my data structure
ID | Locations
--------------
A | 1
A | 2
A | 3
B | 4
B | 5
I was able to write in GRASS this script that calculates all distances for one ID. I omit the beginning of the script where I load the location (my_locations) and the road (street_vector) layers and set the region.
# Select locations for ID == 'A'
v.extract --overwrite --verbose input=my_locations output=my_locations_sub where="ID='A'"
# Prepare network: connect my_locations to street vector map
v.net input=street_vectors points=my_locations_sub output=network operation=connect thresh=500
## The option operation=connect allows to create the links between lines and
## points and to create a single vector map containing lines, nodes and
## links.
## The parameter thresh=500 means that only points within the distance of
## 500 metres are connected to the lines.
# verify result
v.category input=network option=report
# shortest path between all points
v.net.allpairs input=network out=network_dist --overwrite
# output results
v.db.select network_dist
Next, I used the rgrass7 package to execute GRASS7.2.0. command from R 3.3.2.
The objective was to use a for loop to generate all my network distance tables with one script.
Here's my code:
library(rgrass7)
IDs <- read.table("./path/to/my/ID_list.txt", header = TRUE)
# initialisation of GRASS7
initGRASS(gisBase = "C:/Program Files/GRASS GIS 7.2.0",
gisDbase = "C:/Users/Me/GRASSdata",
location = "My_project", mapset = "R", override = TRUE)
# For loop to calculate road network distance by IDs
for (i in 1: length(IDs)){
ID <- IDs[i]
condition <- paste0("ID=\'", as.character(ID), "\'")
execGRASS('v.extract', parameters = list(input='my_locations',
output='my_locations_sub', where=condition))
execGRASS('v.net', parameters = list(input = street_vectors',
points = 'my_locations_sub', output = 'network',
operation = 'connect', thresh = 500))
execGRASS('v.net.allpairs', parameters = list(input='network',
out='netword_dist'),
flags = 'overwrite')
# Set the path to write files
path <- paste0("./data/", ID, ".csv")
# Write file
execGRASS('db.out.ogr', parameters = list(input = 'network_dist',
output = path))
}
But when I execute execGRASS function with v.net from GRASS, I got the following error:
Error in doGRASS(cmd, flags = flags, ..., parameters = parameters,
echoCmd = echoCmd, :
Invalid parameter name: thresh
It's like it doGRASS does not recognize thresh as a valid v.net parameter. I'm a bit stuck here, so if someone has a clue of what I did wrong, that would really help.
I just realize that I made a mistake will copying my code. I missed a ' in input = 'street_vectors' in the following section:
execGRASS('v.net', parameters = list(input = 'street_vectors',
points = 'my_locations_sub', output = 'network',
operation = 'connect', thresh = 500))
Here http://www.bom.gov.au/climate/data/ I can enter a substation number, say 009572; choose the variable (say Temperature) and its type (say Maximum). Clicking "get data" brings me to a page with a link "All years of data". Click it, and you got a zip file. I am aware of this questions, but here I don't have a direct link to a zip file. Can something be done to automate weather data extraction from the Australian Bureau Of Meteorology website with R?
I had the same question and this S.O. question was one of the first pages to come up. After further searching I found the R package Bomrang (https://github.com/ropensci/bomrang) that:
Provides functions to interface with Australian Government Bureau of
Meteorology (BOM) data, fetching data and returning a tidy data frame
of précis forecasts, current weather data from stations, ag
information bulletins, historical weather data and downloading and
importing radar or satellite imagery.
Bomrang is apart of rOpenSci and is actively developed. It has a good set of functions:
Several functions are provided by bomrang to retrieve Australian
Bureau of Meteorology (BOM) data. A family of functions retrieve
weather data and return tidy data frames;
get_precis_forecast(), which retrieves the précis (short) forecast;
get_current_weather(), which fetches the current weather for a given station;
get_ag_bulletin(), which retrieves the agriculture bulletin;
get_weather_bulletin(), which retrieves the BOM 0900 or 1500 bulletins;
get_coastal_forecast(), which returns coastal waters forecasts; and
get_historical(), which retrieves historical daily observations for a given station.
A second group of functions retrieve information pertaining to
satellite and radar imagery,
get_available_imagery();
the satellite imagery itself, get_satellite_imagery();
get_available_radar(); and
the radar imagery itself, get_radar_imagery().
The function get_historical() seems to do what OP is needing. For example, to get the historical daily rainfall from a weather station in Sydney is as easy as:
> rain_066062 <- bomrang::get_historical(stationid = 066062,
+ type = 'rain',
+ meta = T)
> head(rain_066062)
$`meta`
# A tibble: 1 x 10
site name lat lon start end years percent AWS ncc_obs_code
<int> <chr> <dbl> <dbl> <date> <date> <dbl> <int> <chr> <chr>
1 66062 SYDNEY (OBSERVATORY HILL) -33.9 151. 1858-07-01 2018-11-01 160. 100 Y 136
$historical_data
Product_code Station_number Year Month Day Rainfall Period Quality
1 IDCJAC0009 66062 1858 1 1 NA NA
2 IDCJAC0009 66062 1858 1 2 NA NA
3 IDCJAC0009 66062 1858 1 3 NA NA
4 IDCJAC0009 66062 1858 1 4 NA NA
5 IDCJAC0009 66062 1858 1 5 NA NA
<<SNIP>>
Another nice feature is if you have the longitude and latitude of a place of interest, get_historical() will find the nearest weather station to that location.
To install from CRAN:
install.packages("bomrang")
Or install the development version from Github:
if (!require("remotes")) {
install.packages("remotes", repos = "http://cran.rstudio.com/")
library("remotes")
}
install_github("ropensci/bomrang", build_vignettes = TRUE)
Here's the code that I have done to download instantly and it also resolves your p_c problem. You can improve the function if you want and post.
#daily code = 136
#monthy code = 139
bomdata<- function(station,code){
for(i in 1: length(station)){
p.url<-paste("http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_stn_num=",station[i],"&p_display_type=availableYears&p_nccObsCode=",code,sep ="")
download.file(p.url,"test.txt")
filelist <- list.files(pattern = ".txt")
foo<- file(filelist,"r")
text<- suppressWarnings(readLines(foo))
close(foo)
l<- regexpr(":",text[1])
m<- unlist(gregexpr(",", text[1], perl = TRUE))
pc<- substr(text[1],l[[1]]+1,l[[1]]+(m[2]-(l[[1]]+1)))
url<-paste("http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile&p_stn_num=",station[i],"&p_c=",pc,"&p_nccObsCode=",code,"&p_startYear=2013", sep ="")
suppressWarnings(download.file(url,paste(station[i],".zip",sep= ""), mode = "wb"))
unlink("test.txt")
}
}
Example
bomdata(073137,136)
You can try this, it is a code sequence used by metvurst package. metvurst
## SET URL FOR DATA DOWNLOAD
url <- "http://www.bom.gov.au/ntc/IDO70004/IDO70004_"
## YEARS TO BE DOWNLOADED
yr <- 1993:2012
## READ DATA FOR ALL YEARS FROM URL INTO LIST
fijilst <- lapply(seq(yr), function(i) {
read.csv(paste(url, yr[i], ".csv", sep = ""), na.strings = c(-9999, 999))
})
While I still can't see how to do this with download.file(), the following almost does the job provided Chrome's "Ask where to save each file before downloading" is unticked.
system(paste('"C:/Documents and Settings/UserName/Local Settings/Application Data/Google/Chrome/Application/chrome.exe"',
'-url http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile&p_stn_num=009572&p_c=-18465084&p_nccObsCode=136'), wait = FALSE)
Then I could use paste0() and loop through various station numbers if I knew what p_c=-18465084 means and how it changes from station to station.