Ausbeer lag plot using gg_lag() fails - tsibble

ausbeer %>% as_tsibble() %>%
filter(year(index) >= 1992) %>%
mutate(Year = year(index), Quarter = quarter(index)) %>%
as_tibble() %>%
select(Year, Quarter, value) %>%
pivot_wider(names_from = Quarter, values_from = value) %>%
as_tsibble(index = Year) %>%
gg_lag(,2:5)
I'm trying to make a lag plot of 'ausbeer' dataset, and doing this using verbs from 'fpp3' package. Of course, the easy way is to use the former version of 'gglagplot()' but, I want to keep using verbs from fpp3 package.
When I run the above code, it shows 5 seasons(0~5), instead of 4 (q1~q4).
Can anyone fix this problem?

A lag plot of fpp2::ausbeer from 1992 onwards can be produced with:
library(fpp3)
#> ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
#> ✓ tibble 3.1.2 ✓ tsibble 1.0.1
#> ✓ dplyr 1.0.6 ✓ tsibbledata 0.3.0
#> ✓ tidyr 1.1.3 ✓ feasts 0.2.1.9000
#> ✓ lubridate 1.7.10 ✓ fable 0.3.1
#> ✓ ggplot2 3.3.3.9000
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> x lubridate::date() masks base::date()
#> x dplyr::filter() masks stats::filter()
#> x tsibble::intersect() masks base::intersect()
#> x tsibble::interval() masks lubridate::interval()
#> x dplyr::lag() masks stats::lag()
#> x tsibble::setdiff() masks base::setdiff()
#> x tsibble::union() masks base::union()
as_tsibble(fpp2::ausbeer) %>%
filter(year(index) >= 1992) %>%
gg_lag(value)
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
This gives the same output as was given from the gglagplot() function.
library(fpp2)
#> ── Attaching packages ────────────────────────────────────────────── fpp2 2.4 ──
#> ✓ forecast 8.14 ✓ expsmooth 2.3
#> ✓ fma 2.4
#>
#>
#> Attaching package: 'fpp2'
#> The following object is masked from 'package:fpp3':
#>
#> insurance
gglagplot(window(ausbeer, start = 1992))
There is no need to pivot the quarters into columns of the data. The tsibble
format has each column being a different variable (in this case the amount
of beer produced in Australia).
The y argument is used to specify which column to plot, and the separation
of seasonal periods is controlled using the period argument. The default
here will choose a common seasonal window, which in this case is
period = "1 year" to show the quarters in a year separately.
Created on 2021-06-15 by the reprex package (v2.0.0)

Related

why does forecast function from package fabletools (R) back transform log(.) but not box_cox(., lambda)

It was my impression that the forecast function from the R package fabletools automatically back transformed forecasts: "If the response variable has been transformed in the model formula, the transformation will be automatically back-transformed". It does so for the log transform, but not box_cox. Am I missing something obvious?
library(fpp3)
#> ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
#> ✓ tibble 3.1.6 ✓ tsibble 1.1.0
#> ✓ dplyr 1.0.7 ✓ tsibbledata 0.3.0
#> ✓ tidyr 1.1.4 ✓ feasts 0.2.2
#> ✓ lubridate 1.8.0 ✓ fable 0.3.1
#> ✓ ggplot2 3.3.5
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> x lubridate::date() masks base::date()
#> x dplyr::filter() masks stats::filter()
#> x tsibble::intersect() masks base::intersect()
#> x tsibble::interval() masks lubridate::interval()
#> x dplyr::lag() masks stats::lag()
#> x tsibble::setdiff() masks base::setdiff()
#> x tsibble::union() masks base::union()
library(dplyr)
lambda <- us_employment %>%
features(Employed, features = guerrero)%>%
filter(!is.na(lambda_guerrero))%>%
head(n = 2)
#> Warning: 126 errors (1 unique) encountered for feature 1
#> [126] missing value where TRUE/FALSE needed
with_lambda <- left_join(lambda, us_employment)%>%
as_tsibble(key = Series_ID, index = Month)
#> Joining, by = "Series_ID"
box_fit <- with_lambda %>%
model(box_model = ARIMA(box_cox(Employed, lambda_guerrero)))
box_fcast <- box_fit %>%
forecast(h = "3 years")
log_fit <- with_lambda %>%
model(log_model = ARIMA(log(Employed)))
log_fcast <- log_fit %>%
forecast(h = "3 years")
autoplot(filter(us_employment, Series_ID=="CEU0500000001"))+
autolayer(filter(box_fcast, Series_ID=="CEU0500000001"))+
autolayer(filter(log_fcast, Series_ID=="CEU0500000001"))
#> Plot variable not specified, automatically selected `.vars = Employed`
Created on 2022-01-03 by the reprex package (v2.0.1)
Found the solution here: https://github.com/tidyverts/fabletools/issues/103 Hope this helps someone else. The crux of the issue is that you need to supply the value of lambda for the forecast period.

Extracting a list of links from a webpage by using its class

I am trying to extract from this website a list of four links that are clearly named as:
PNADC_012018_20190729.zip
PNADC_022018_20190729.zip
PNADC_032018_20190729.zip
PNADC_042018_20190729.zip
I've seen that they are all part of a class called 'jstree-wholerow'. I'm not really good at scraping, yet I've tried to capture such links using this regularity:
x <- rvest::read_html('https://www.ibge.gov.br/estatisticas/downloads-estatisticas.html?caminho=Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018') %>%
rvest::html_nodes("jstree-wholerow") %>%
rvest::html_text()
However, I received an empty vector as output.
Can someone help fixing this?
Although the webpage uses javascript, the files are stored in a ftp. It also has very predictable directory names.
library(tidyverse)
library(stringr)
library(rvest)
#>
#> Attaching package: 'rvest'
#> The following object is masked from 'package:readr':
#>
#> guess_encoding
library(RCurl)
#>
#> Attaching package: 'RCurl'
#> The following object is masked from 'package:tidyr':
#>
#> complete
link <- 'https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018/PNADC_042018_20190729.zip'
zip_names <- c('PNADC_012018_20190729.zip', 'PNADC_022018_20190729.zip', 'PNADC_032018_20190729.zip', 'PNADC_042018_20190729.zip')
links <- str_replace(link, '/2018.*\\.zip$', str_c('/2018/', zip_names))
links
#> [1] "https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018/PNADC_012018_20190729.zip"
#> [2] "https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018/PNADC_022018_20190729.zip"
#> [3] "https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018/PNADC_032018_20190729.zip"
#> [4] "https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018/PNADC_042018_20190729.zip"
#option 2
links <- RCurl::getURL(url = 'https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018/') %>% read_html() %>%
html_nodes(xpath = '//td/a[#href]') %>% html_attr('href')
links <- links[-1]
links
#> [1] "PNADC_012018_20190729.zip" "PNADC_022018_20190729.zip"
#> [3] "PNADC_032018_20190729.zip" "PNADC_042018_20190729.zip"
str_c('https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018/', links)
#> [1] "https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018/PNADC_012018_20190729.zip"
#> [2] "https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018/PNADC_022018_20190729.zip"
#> [3] "https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018/PNADC_032018_20190729.zip"
#> [4] "https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Trimestral/Microdados/2018/PNADC_042018_20190729.zip"
Created on 2021-06-11 by the reprex package (v2.0.0)

Wrap function passing NULL to lower-level haven::read_dta function in R

I am trying to build a function wrapping over haven::read_dta() similar to the wrap_function() defined in the code below.
My wrap_function() has a default variables = NULL, which should be able to pass NULL to haven::read_dta()'s col_select argument if no values are specified. However, passing the NULL from variables to col_select throws an error (i.e. 'Error: Can't find any columns matching col_select in data.').
Can someone help me understand why this happens and how could I build a wrap_function capable of passing a NULL default value to the lower-level function?
Thanks!
library(reprex)
library(haven)
df_ <- data.frame(a = 1:5,
b = letters[1:5])
haven::write_dta(df_,
path = "file.dta")
# works well:
haven::read_dta(file = "file.dta",
col_select = NULL)
#> # A tibble: 5 x 2
#> a b
#> <dbl> <chr>
#> 1 1 a
#> 2 2 b
#> 3 3 c
#> 4 4 d
#> 5 5 e
# does not work:
wrap_function <- function(file, variables = NULL){
haven::read_dta(file = file,
col_select = variables)
}
wrap_function("file.dta")
#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(variables)` instead of `variables` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> Error: Can't find any columns matching `col_select` in data.
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.3 (2020-10-10)
#> os CentOS Linux 8
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> date 2021-05-14
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> cli 2.4.0 2021-04-05 [1] CRAN (R 4.0.3)
#> crayon 1.4.1.9000 2021-04-16 [1] Github (r-lib/crayon#965d1dc)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2)
#> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3)
#> forcats 0.5.1 2021-01-27 [1] CRAN (R 4.0.3)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> haven * 2.3.1 2020-06-01 [1] CRAN (R 4.0.2)
TLDR: You just need to embrace the argument by wrapping it in double curly brackets{{ }}, previously called "curly-curly". This passes the variable properly. See the programming with dplyr vignette for more info.
wrap_function <- function(file, variables = NULL){
haven::read_dta(file = file,
col_select = {{ variables }})
}
wrap_function("file.dta")
#> # A tibble: 5 x 2
#> a b
#> <dbl> <chr>
#> 1 1 a
#> 2 2 b
#> 3 3 c
#> 4 4 d
#> 5 5 e
Unfortunately it's a little hard to understand that this is necessary without looking at the code. If you look up the haven repository, you can see that read_dta uses the double-curly around col_select as well. This is a pretty good indication that you need to use it in your wrapper function.
If you look further, it is using them to pass the argument to a function skip_cols, which uses it inside tidyselect::vars_select. The reason this is needed is so that you can delay evaluation of the argument until the point that you actually need it. In other words, it lets you call the function like this:
wrap_function("file.dta", variables = a)
instead of forcing you to do something like
wrap_function("file.dta", variables = "a")
and saves you a lot of typed quotes, especially with a lot of columns. You see this pattern in dplyr and other tidyverse functions a lot, especially any time an argument refers to a dataframe column rather than a variable.
In other words, you want to not have the code check exactly what a is until you reach skip_cols, which knows that a refers to a column inside the file you're reading. If you don't use the curly braces, it will think that a is some object in your working environment.

Polynomial Function Expansion in R

I am currently reviewing this question on SO and see that the OP stated that by adding more for loops you can expand the polynomials. How exactly would you do so? I am trying to expand to polyorder 5.
Polynomial feature expansion in R
Here is the code below:
polyexp = function(df){
df.polyexp = df
colnames = colnames(df)
for (i in 1:ncol(df)){
for (j in i:ncol(df)){
colnames = c(colnames, paste0(names(df)[i],'.',names(df)[j]))
df.polyexp = cbind(df.polyexp, df[,i]*df[,j])
}
}
names(df.polyexp) = colnames
return(df.polyexp)
}
Ultimately, I'd like to order the matrix so that it expands in order of degree. I tried using the poly function but I'm not sure if you can order the result so that it returns a matrix that starts with degree 1 then moves to degree 2, then 3, 4, and 5.
To "sort by degree" is a little ambiguous. x^2 and x*y both have degree 2. I'll assume you want to sort by total degree, and then within each of those, by degree of the 1st column; within that, by degree of the second column, etc. (I believe the default is to ignore total degree and sort by degree of the last column, within that the second last, and so on, but this is not documented so I won't count on it.)
Here's how to use polym to do this. The columns are named things like "2.0" or "1.1". You could sort these alphabetically and it would be fine up to degree 9, but if you convert those names using as.numeric_version, there's no limit. So convert the column names to version names, get the sort order, and use that plus degree to re-order the columns of the result. For example,
df <- data.frame(x = 1:6, y = 0:5, z = -(1:6))
expanded <- polym(as.matrix(df), degree = 5)
o <- order(attr(expanded, "degree"),
as.numeric_version(colnames(expanded)))
sorted <- expanded[,o]
# That lost the attributes, so put them back
attr(sorted, "degree") <- attr(expanded, "degree")[o]
attr(sorted, "coefs") <- attr(expanded, "coefs")
class(sorted) <- class(expanded)
# If you call predict(), it comes out in the default order,
# so will need sorting too:
predict(sorted, newdata = as.matrix(df[1,]))[, o]
#> 0.0.1 0.1.0 1.0.0 0.0.2 0.1.1 0.2.0
#> 0.59761430 -0.59761430 -0.59761430 0.54554473 -0.35714286 0.54554473
#> 1.0.1 1.1.0 2.0.0 0.0.3 0.1.2 0.2.1
#> -0.35714286 0.35714286 0.54554473 0.37267800 -0.32602533 0.32602533
#> 0.3.0 1.0.2 1.1.1 1.2.0 2.0.1 2.1.0
#> -0.37267800 -0.32602533 0.21343368 -0.32602533 0.32602533 -0.32602533
#> 3.0.0 0.0.4 0.1.3 0.2.2 0.3.1 0.4.0
#> -0.37267800 0.18898224 -0.22271770 0.29761905 -0.22271770 0.18898224
#> 1.0.3 1.1.2 1.2.1 1.3.0 2.0.2 2.1.1
#> -0.22271770 0.19483740 -0.19483740 0.22271770 0.29761905 -0.19483740
#> 2.2.0 3.0.1 3.1.0 4.0.0 0.0.5 0.1.4
#> 0.29761905 -0.22271770 0.22271770 0.18898224 0.06299408 -0.11293849
#> 0.2.3 0.3.2 0.4.1 0.5.0 1.0.4 1.1.3
#> 0.20331252 -0.20331252 0.11293849 -0.06299408 -0.11293849 0.13309928
#> 1.2.2 1.3.1 1.4.0 2.0.3 2.1.2 2.2.1
#> -0.17786140 0.13309928 -0.11293849 0.20331252 -0.17786140 0.17786140
#> 2.3.0 3.0.2 3.1.1 3.2.0 4.0.1 4.1.0
#> -0.20331252 -0.20331252 0.13309928 -0.20331252 0.11293849 -0.11293849
#> 5.0.0
#> -0.06299408
Created on 2020-03-21 by the reprex package (v0.3.0)

Crayons concat gives NULL

I am using
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 18.3
and tidyverse_1.2.1. Using the %+% operator provided by the crayons package (loaded by tdiyverse) gives NULL. Why? Is this a bug?
E.g. reproducing the example from the manual gives:
> "foo" %+% "bar" %>% print
NULL
instead of "foobar".
ggplot2 has its own version of %+%, which can mask the one from crayon. If I make sure that I load ggplot2/tidyverse first, before loading crayon, I get the expected results:
> library(tidyverse)
-- Attaching packages ---------------------- tidyverse 1.2.1 --
v ggplot2 3.1.0 v purrr 0.2.5
v tibble 1.4.2 v dplyr 0.7.8
v tidyr 0.8.2 v stringr 1.3.1
v readr 1.2.1 v forcats 0.3.0
-- Conflicts ------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
> library(crayon)
Attaching package: ‘crayon’
The following object is masked from ‘package:ggplot2’:
%+%
> "foo" %+% "bar" %>% print
[1] "foobar"
This is indeed just because both ggplot2 and crayon define a %+% function! Then which function is used will depend on the order of the packages loaded, making your code fragile.
To make sure to avoid any conflict, you can give an alias to these operators, such as (see stack post):
library(tidyverse)
`%+c%` <- crayon::`%+%`
"foo" %+% "bar" %>% print
#> NULL
"foo" %+c% "bar" %>% print
#> [1] "foobar"
Created on 2021-08-13 by the reprex package (v0.3.0)

Resources