I am currently trying to configure the rnoaa library to connect city, state data with a weather station, and therefore output ANNUAL weather data, namely temperature. I have included a hardcoded input for reference, but I intend on feeding in hundreds of geocoded cities eventually. This isn't the issue so much as it is retrieving data.
require(rnoaa)
require(ggmap)
city<-geocode("birmingham, alabama", output = "all")
bounds<-city$results[[1]]$geometry$bounds
se<-bounds$southwest$lat
sw<-bounds$southwest$lng
ne<-bounds$northeast$lat
nw<-bounds$northeast$lng
stations<-ncdc_stations(extent = c(se, sw, ne, nw),token = noaakey)
I am calculating a MBR (rectangle) around the geographic area, in this case Birmingham, and then getting a list of stations. I'm then pulling out the station_id and then attempting to retrieve results with any type of parameters with no success. I'm looking to associate annual temperatures with each city.
test <- ncdc(datasetid = "ANNUAL", locationid = topStation[1],
datatypeid = "DSNW",startdate = "2000-01-01", enddate = "2010-01-01",
limit = 1000, token = noaakey)
Warning message:
Sorry, no data found
Looks like location ID is creating issue. Try without it ( as it is optional field )
ncdc_locs(datasetid = "ANNUAL",datatypeid = "DSNW",startdate = "2000-01-01", enddate = "2010-01-01", limit = 1000,token = <your token key>)
and then with valid location ID
ncdc_locs(datasetid = "ANNUAL",datatypeid = "DSNW",startdate = "2000-01-01", enddate = "2010-01-01", limit = 1000,locationid='CITY:US000001',token = <your token>)
returns
$meta
NULL
$data
mindate maxdate name datacoverage id
1 1872-01-01 2016-04-16 Washington D.C., US 1 CITY:US000001
attr(,"class")
[1] "ncdc_locs"
Related
Using the following codes, I would be able to extract daily tempreature data from NOAA data base for specific weather station with latitude of 62.1925 and longitude of -150.5033.
install.packages("pacman")
pacman::p_load(rgdol,ggplot2,patchwork,rnoaa)
stns = meteo_distance(station_data = ghcnd_stations(), lat= 62.1925,
long = -150.5033, units = "deg", radius =0.01, limit="NUL")
WXData = meteo_pull_monitors(
monitors = stns[1,1],
keep_flags = FALSE,
date_min = "1990-01-01",
date_max = "2022-01-01",
var = c("TMAX","TMIN","TAVG","TOBS")
The output of above codes is a table. It means that for every weather station, we have such table. I have a csv file called "station" from which I should import latitude and longitude of each staions. My question is that how I can insert the generated tempreatuer table in front of its coordinate in the "station" file
I did not try anything, since I am very new to r.
I am currently trying to download a particular series from the Direction Of Trade Statistics at the IMF for a calculation of trade volumes between countries. There is a r-package imfr that does a fantastic job at doing this. However, when going for a particular set, I run into problems.
This code, works just fine and gets me the full data-series I am interested in for the fiven countries:
library(imfr)
# get the list of imf datasets
imf_ids()
# I am interested in direction of trade "DOT", so check the list of codes that are in the datastructure
imf_codelist(database_id = "DOT")
# I want the export and import data between countries FOB so "TXG_FOB_USD" and "TMG_FOB_USD"
imf_codes("CL_INDICATOR_DOT")
# works nicely for exports:
data_list_exports <- imf_data(database_id = "DOT", indicator = c("TXG_FOB_USD"),
country = c("US","JP","KR"),
start = "1995",
return_raw = TRUE,
freq = "A")
# however the same code does not work for imports
data_list_imports <- imf_data(database_id = "DOT", indicator = c("TMG_FOB_USD"),
country = c("US","JP","KR"),
start = "1995",
return_raw = TRUE,
freq = "A")
This will return an empty series and I did not understand why. So I thought, maybe the US is not in the dataset (although unlikely)
library(httr)
library(jsonlite)
# look at the API endpoint, that provides us with the data structure behind a dataset
result <- httr::GET("http://dataservices.imf.org/REST/SDMX_JSON.svc/DataStructure/DTO") %>% httr::content(as = "parsed")
structure_url <- "http://dataservices.imf.org/REST/SDMX_JSON.svc/DataStructure/DOT"
raw_data <- jsonlite::fromJSON(structure_url )
test <- raw_data$Structure$CodeLists
However, the result indicates that indeed the US is in the data. So what if I just donĀ“t specify a country? The result finally does download the data, but only the 60 first countries because of rate limits. When doing the same with an httr::GET I directly hit the rate limit and get an error back.
data_list_imports <- imf_data(database_id = "DOT", indicator = c("TMG_FOB_USD"),
start = "1995",
return_raw = TRUE,
freq = "A")
Does anybody have an idea what I am doing wrong? I am really at a loss and just hope it is a typo somewhere...
Thanks and all the best!
This kind of answers the question:
cjyetman over at github gave me the following hint:
You can use the print_url = TRUE argument to see the actual API call.
With...
imf_data(database_id = "DOT", indicator = c("TMG_FOB_USD"),
country = c("US","JP","KR"),
start = "1995",
return_raw = TRUE,
freq = "A",
print_url = TRUE)
you get...
http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/DOT/.US+JP+KR.TMG_FOB_USD?startPeriod=1995&endPeriod=2021
which does not return any data.
But if you add "AU" as a country to that list, you do get data with...
http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/DOT/.AU+US+JP+KR.TMG_FOB_USD?startPeriod=1995&endPeriod=2021
So I guess either there is something wrong currently with their API,
or they actually do not have data for specifically that indicator for
those countries with that frequency, etc.
This does work indeed and makes apparent that either there is truly "missing data" in the API, or I am simply looking for data, where there is none. Since the original quest was to look at trade volumes, I have since found out, that the import value is usually used, with the CIF value and not FOB.
Hence the correct indicator for the API call would have been the following:
library(imfr)
data_list_imports <- imf_data(database_id = "DOT", indicator = c("TMG_CIF_USD"),
country = c("US","JP","KR"),
start = "1995",
return_raw = TRUE,
freq = "A")
I am working on a research assignment on COVID and using the datalake API to fetch different kind of datasets available to us.
I am wondering if it's possible to fetch all outbreak countries.
ids = list("Australia"), this works with individual country, it doesnt seem to accept wildcard or all.
Can anyone give me any insights on this please.
# Total number of confirmed cases in Australia and proportion of getting infected.
today <- Sys.Date()
casecounts <- evalmetrics(
"outbreaklocation",
list(
spec = list(
**ids = list("Australia"),**
expressions = list("JHU_ConfirmedCasesInterpolated","JHU_ConfirmedDeathsInterpolated"),
start = "2019-12-20",
end = today-1,
interval = "DAY"
)
)
)
casecounts
The easiest way to access a list of countries is in the Excel file linked at https://c3.ai/covid-19-api-documentation/#tag/OutbreakLocation. It has a list of countries in the first sheet, and shows which of those have data from JHU.
You could also fetch an approximate list of country-level locations with:
locations <- fetch(
"outbreaklocation",
list(
spec = list(
filter = "not(contains(id, '_'))"
)
)
)
That should contain all of the countries, but could have some non-countries like World Bank regions.
Then, you'd use this code to get the time series data for all of those locations:
location_ids <-
locations %>%
dplyr::select(-location) %>%
unnest_wider(fips, names_sep = ".") %>%
sample_n(15) %>% # include this to test on a smaller set
pull(id)
today <- Sys.Date()
casecounts <- evalmetrics(
"outbreaklocation",
list(
spec = list(
ids = location_ids,
expressions = list("JHU_ConfirmedCasesInterpolated","JHU_ConfirmedDeathsInterpolated"),
start = "2019-12-20",
end = today-1,
interval = "DAY"
)
),
get_all = TRUE
)
casecounts
I am attempting to extract unsampled data for the past nine months. The website is pretty active, and as such, I'm unable to get the data in its entirety (over 3 m rows) unsampled. I'm currently attempting to break out the filtering so that I'm only returning under 10k rows at a time (which is the API response limit). Is there a way I can loop over a number of days? I tried using the batch function with no success. I have included my code for reference, I was thinking of writing a loop and doing it in 10 day intervals? I appreciate any input.
Thanks!
library(RGA)
gaData <- get_ga(id, start.date = start_date,
end.date= "today" , metrics = "ga:sessions",
dimensions = "ga:date, ga:medium, ga:country, ga:hour, ga:minute",
filters = "ga:country==United States;ga:medium==organic",
max.results = NULL,
batch = TRUE,
sort = "ga:date")
The get_ga function havn't batch param (see ?get_ga). Try it with the fetch.by option. You could test a different variants: "month", "week", "day".
library(RGA)
authorize()
gaData <- get_ga(id, start.date = start_date,
end.date= "today" , metrics = "ga:sessions",
dimensions = "ga:date, ga:medium, ga:country, ga:hour, ga:minute",
filters = "ga:country==United States;ga:medium==organic",
sort = "ga:date", fetch.by = "week")
I'm using skardhamar's rga ga$getData to query GA and get all data in an unsampled manner. The data is based on more than 500k sessions per day.
At https://github.com/skardhamar/rga, paragraph 'extracting more observations than 10,000' mentions this is possible by using batch = TRUE. Also, paragraph 'Get the data unsampled' mentions that by walking over the days, you can get unsampled data. I'm trying to combine these two, but I can not get it to work. E.g.
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE, walk = TRUE
)
.. indeed gets unsampled data, but not all data. I get a dataframe with only 20k rows (10k per day). This is limiting to chunks of 10k per day, contrary to what I expect because of using the batch = TRUE setting. So for the 30th of march, I get a dataframe of 20k rows after seeing this output:
Run (1/2): for date 2015-03-30
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
Run (2/2): for date 2015-03-31
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
When I leave out the walk = TRUE setting, I do get all observations (771k rows, around 335k per day), but only in a sampled manner:
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE
)
Notice: Data set contains sampled data
Pulling 771501 observations in batches of 10000
Run (1/78): observations [1;10000]. Batch size: 10000
Notice: Data set contains sampled data
...
Is my data just too big to get all observations unsampled?
You could try querying by device with filters = "ga:deviceCategory==desktop" (and filters = "ga:deviceCategory!=desktop" respectively) and then merging the resulting dataframes.
I'm assuming that your users uses different devices to access your site. The underlying logic is that when you filter data, Google Analytics servers filter it before you get it, so you can "divide" your query and get unsampled data. I think is the same methododology of the "walk" function.
Desktop only
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "ga:deviceCategory==desktop",
segment = "",
,batch = TRUE, walk = TRUE
)
Mobile and Tablet
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "ga:deviceCategory!=desktop",
segment = "",
,batch = TRUE, walk = TRUE
)