Not being able to execute stl decomposition properly

Not being able to execute stl decomposition properly - r

from this ts:
australia_data <- tourism %>%
select(Quarter, Trips) %>%
summarise(TotalTrips = sum(Trips))
> head(australia_data)
# A tsibble: 6 x 4 [1D]
# Key: Region, Purpose [1]
# Groups: Region [1]
Region Purpose Quarter TotalTrips
<chr> <chr> <date> <dbl>
1 Adelaide Business 1998-01-01 135.
2 Adelaide Business 1998-04-01 110.
3 Adelaide Business 1998-07-01 166.
4 Adelaide Business 1998-10-01 127.
5 Adelaide Business 1999-01-01 137.
6 Adelaide Business 1999-04-01 200.
I want to do a STL decomposition, in order to get seasonally adjusted data :
australia_data_dcmp <- australia_data %>%
model(STL(TotalTrips))
but I'm not being able to get components
> components(australia_data_dcmp)
Error: Problem with `mutate()` column `cmp`.
i `cmp = map(.fit, components)`.
x no applicable method for 'components' applied to an object of class "null_mdl"
> head(augment(australia_data_dcmp))
# A tsibble: 6 x 8 [1D]
# Key: Region, Purpose, .model [1]
Region Purpose .model Quarter TotalTrips .fitted .resid .innov
<chr> <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl>
1 Adelaide Business STL(TotalTrips) 1998-01-01 135. NA NA NA
2 Adelaide Business STL(TotalTrips) 1998-04-01 110. NA NA NA
3 Adelaide Business STL(TotalTrips) 1998-07-01 166. NA NA NA
4 Adelaide Business STL(TotalTrips) 1998-10-01 127. NA NA NA
5 Adelaide Business STL(TotalTrips) 1999-01-01 137. NA NA NA
6 Adelaide Business STL(TotalTrips) 1999-04-01 200. NA NA NA
can someone explain me the mistake I'm commiting please ?
Best regards

The tourism object you show is not what you get when using the latest versions of the various packages loaded by fpp3. This is what I get.
library(fpp3)
#> ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
#> ✓ tibble 3.1.5 ✓ tsibble 1.1.0
#> ✓ dplyr 1.0.7 ✓ tsibbledata 0.3.0
#> ✓ tidyr 1.1.4 ✓ feasts 0.2.2
#> ✓ lubridate 1.8.0 ✓ fable 0.3.1
#> ✓ ggplot2 3.3.5
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> x lubridate::date() masks base::date()
#> x dplyr::filter() masks stats::filter()
#> x tsibble::intersect() masks base::intersect()
#> x tsibble::interval() masks lubridate::interval()
#> x dplyr::lag() masks stats::lag()
#> x tsibble::setdiff() masks base::setdiff()
#> x tsibble::union() masks base::union()
australia_data <- tourism %>%
select(Quarter, Trips) %>%
summarise(TotalTrips = sum(Trips))
australia_data
#> # A tsibble: 80 x 2 [1Q]
#> Quarter TotalTrips
#> <qtr> <dbl>
#> 1 1998 Q1 23182.
#> 2 1998 Q2 20323.
#> 3 1998 Q3 19827.
#> 4 1998 Q4 20830.
#> 5 1999 Q1 22087.
#> 6 1999 Q2 21458.
#> 7 1999 Q3 19914.
#> 8 1999 Q4 20028.
#> 9 2000 Q1 22339.
#> 10 2000 Q2 19941.
#> # … with 70 more rows
Created on 2021-11-01 by the reprex package (v2.0.1)
Perhaps you are over-writing the tourism object with a grouped version. Or perhaps you are using an old version of the tsibble package where the keys were not dropped using summarise().
In any case, without a reproducible example it is hard to provide more substantial help.

Related

How to forecast with lagged external regressors using fable::VAR

I'd like to use lagged external regressors in my VAR forecast. Using the VAR() function from the fable package, I am able to fit a model, but I can't use it to forecast, as I return NAs for the dependent variables. My reprex follows examples from Forecasting: Principles and Practice v3.
Thanks in advance for any guidance.
require(fpp3)
#> Loading required package: fpp3
#> ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
#> ✔ tibble 3.1.7 ✔ tsibble 1.0.1
#> ✔ dplyr 1.0.9 ✔ tsibbledata 0.3.0
#> ✔ tidyr 1.1.3 ✔ feasts 0.2.2
#> ✔ lubridate 1.7.10 ✔ fable 0.3.1
#> ✔ ggplot2 3.3.5
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> ✖ lubridate::date() masks base::date()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ tsibble::intersect() masks base::intersect()
#> ✖ tsibble::interval() masks lubridate::interval()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ tsibble::setdiff() masks base::setdiff()
#> ✖ tsibble::union() masks base::union()
us_change <- fpp3::us_change
fit <- us_change %>%
model(
xregs_lag1 = VAR(vars(Consumption, Income) ~ xreg(Unemployment, lag(Unemployment, 1)))
)
fit
#> # A mable: 1 x 1
#> xregs_lag1
#> <model>
#> 1 <VAR(5) w/ mean>
new_data_ex <- new_data(us_change, 4) %>%
mutate(Unemployment = mean(us_change$Unemployment))
#############
# Here I tried creating a new_data frame that included one lag of Unemployment, and pass that to the new_data argument of forecast, but it doesn't work either
#
# new_data_ex_lags <- us_change %>%
# tail(1) %>%
# bind_rows(new_data_ex) %>%
# select(colnames(new_data_ex))
#############
fit %>%
select(xregs_lag1) %>%
forecast(new_data = new_data_ex)
#> # A fable: 4 x 6 [1Q]
#> # Key: .model [1]
#> .model Quarter .distribution .mean_Consumption .mean_Income Unemployment
#> <chr> <qtr> <dist> <dbl> <dbl> <dbl>
#> 1 xregs_lag1 2019 Q3 MVN[2] NA NA 0.00101
#> 2 xregs_lag1 2019 Q4 MVN[2] NA NA 0.00101
#> 3 xregs_lag1 2020 Q1 MVN[2] NA NA 0.00101
#> 4 xregs_lag1 2020 Q2 MVN[2] NA NA 0.00101
fit %>%
select(xregs_lag1) %>%
report()
#> Series: Consumption, Income
#> Model: VAR(5) w/ mean
#>
#> Coefficients for Consumption:
#> lag(Consumption,1) lag(Income,1) lag(Consumption,2) lag(Income,2)
#> 0.1156 0.1062 0.1479 0.0079
#> s.e. 0.0772 0.0483 0.0753 0.0509
#> lag(Consumption,3) lag(Income,3) lag(Consumption,4) lag(Income,4)
#> 0.2248 -0.0207 -0.0729 -0.0544
#> s.e. 0.0730 0.0499 0.0746 0.0500
#> lag(Consumption,5) lag(Income,5) constant Unemployment
#> -0.0217 0.0327 0.3923 -0.8602
#> s.e. 0.0708 0.0491 0.0923 0.1331
#> lag(Unemployment, 1)
#> 0.4563
#> s.e. 0.1402
#>
#> Coefficients for Income:
#> lag(Consumption,1) lag(Income,1) lag(Consumption,2) lag(Income,2)
#> 0.3715 -0.2991 0.0836 -0.0410
#> s.e. 0.1212 0.0758 0.1182 0.0799
#> lag(Consumption,3) lag(Income,3) lag(Consumption,4) lag(Income,4)
#> 0.4531 -0.1445 0.2481 -0.2475
#> s.e. 0.1145 0.0783 0.1170 0.0785
#> lag(Consumption,5) lag(Income,5) constant Unemployment
#> -0.1270 -0.1878 0.6142 -0.1100
#> s.e. 0.1111 0.0771 0.1449 0.2089
#> lag(Unemployment, 1)
#> -0.0401
#> s.e. 0.2201
#>
#> Residual covariance matrix:
#> Consumption Income
#> Consumption 0.2602 0.1341
#> Income 0.1341 0.6410
#>
#> log likelihood = -350.43
#> AIC = 760.86 AICc = 772.34 BIC = 858.74
Created on 2022-07-22 by the reprex package (v2.0.0)

Using lag() with VAR() models was not fully implemented, but I have added support for this in the development version of the fable package (https://github.com/tidyverts/fable/commit/bb15c9462b80850565aee13d8f9b33e49dfd0f33).
There are some other changes not yet pushed to CRAN such as how forecast means are represented in the fable, but the code is otherwise the same.
require(fpp3)
#> Loading required package: fpp3
#> ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
#> ✔ tibble 3.1.7 ✔ tsibble 1.1.1
#> ✔ dplyr 1.0.9 ✔ tsibbledata 0.4.0
#> ✔ tidyr 1.2.0 ✔ feasts 0.2.2
#> ✔ lubridate 1.8.0 ✔ fable 0.3.1.9000
#> ✔ ggplot2 3.3.6
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> ✖ lubridate::date() masks base::date()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ tsibble::intersect() masks base::intersect()
#> ✖ tsibble::interval() masks lubridate::interval()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ tsibble::setdiff() masks base::setdiff()
#> ✖ tsibble::union() masks base::union()
us_change <- fpp3::us_change
fit <- us_change %>%
model(
xregs_lag1 = VAR(vars(Consumption, Income) ~ xreg(Unemployment, lag(Unemployment, 1)))
)
fit
#> # A mable: 1 x 1
#> xregs_lag1
#> <model>
#> 1 <VAR(5) w/ mean>
new_data_ex <- new_data(us_change, 4) %>%
mutate(Unemployment = mean(us_change$Unemployment))
#############
# Here I tried creating a new_data frame that included one lag of Unemployment, and pass that to the new_data argument of forecast, but it doesn't work either
#
# new_data_ex_lags <- us_change %>%
# tail(1) %>%
# bind_rows(new_data_ex) %>%
# select(colnames(new_data_ex))
#############
fit %>%
select(xregs_lag1) %>%
forecast(new_data = new_data_ex)
#> Warning in if (is_transformed) {: the condition has length > 1 and only the
#> first element will be used
#> # A fable: 4 x 5 [1Q]
#> # Key: .model [1]
#> .model Quarter .distribution .mean[,"Consumption… [,"Income"] Unemployment
#> <chr> <qtr> <dist> <dbl> <dbl> <dbl>
#> 1 xregs_lag1 2019 Q3 MVN[2] 0.548 0.657 0.00101
#> 2 xregs_lag1 2019 Q4 MVN[2] 0.679 0.316 0.00101
#> 3 xregs_lag1 2020 Q1 MVN[2] 0.763 0.832 0.00101
#> 4 xregs_lag1 2020 Q2 MVN[2] 0.697 0.733 0.00101
fit %>%
select(xregs_lag1) %>%
report()
#> Series: Consumption, Income
#> Model: VAR(5) w/ mean
#>
#> Coefficients for Consumption:
#> lag(Consumption,1) lag(Income,1) lag(Consumption,2) lag(Income,2)
#> 0.1156 0.1062 0.1479 0.0079
#> s.e. 0.0772 0.0483 0.0753 0.0509
#> lag(Consumption,3) lag(Income,3) lag(Consumption,4) lag(Income,4)
#> 0.2248 -0.0207 -0.0729 -0.0544
#> s.e. 0.0730 0.0499 0.0746 0.0500
#> lag(Consumption,5) lag(Income,5) constant Unemployment
#> -0.0217 0.0327 0.3923 -0.8602
#> s.e. 0.0708 0.0491 0.0923 0.1331
#> lag(Unemployment, 1)
#> 0.4563
#> s.e. 0.1402
#>
#> Coefficients for Income:
#> lag(Consumption,1) lag(Income,1) lag(Consumption,2) lag(Income,2)
#> 0.3715 -0.2991 0.0836 -0.0410
#> s.e. 0.1212 0.0758 0.1182 0.0799
#> lag(Consumption,3) lag(Income,3) lag(Consumption,4) lag(Income,4)
#> 0.4531 -0.1445 0.2481 -0.2475
#> s.e. 0.1145 0.0783 0.1170 0.0785
#> lag(Consumption,5) lag(Income,5) constant Unemployment
#> -0.1270 -0.1878 0.6142 -0.1100
#> s.e. 0.1111 0.0771 0.1449 0.2089
#> lag(Unemployment, 1)
#> -0.0401
#> s.e. 0.2201
#>
#> Residual covariance matrix:
#> Consumption Income
#> Consumption 0.2602 0.1341
#> Income 0.1341 0.6410
#>
#> log likelihood = -350.43
#> AIC = 760.86 AICc = 772.34 BIC = 858.74
Created on 2022-07-23 by the reprex package (v2.0.1)

read file from google drive

I have spreadsheet uploaded as csv file in google drive unlocked so users can read from it.
This is the link to the csv file:
https://docs.google.com/spreadsheets/d/170235QwbmgQvr0GWmT-8yBsC7Vk6p_dmvYxrZNfsKqk/edit?usp=sharing
I am trying to read it from R but I am getting a long list of error messages. I am using:
id = "170235QwbmgQvr0GWmT-8yBsC7Vk6p_dmvYxrZNfsKqk"
read.csv(sprint("https://docs.google.com/spreadsheets/d/uc?id=%s&export=download",id))
Could someone suggest how to read files from google drive directly into R?

I would try to publish the sheet as a CSV file (doc), and then read it from there.
It seems like your file is already published as a CSV. So, this should work. (Note that the URL ends with /pub?output=csv)
read.csv("https://docs.google.com/spreadsheets/d/170235QwbmgQvr0GWmT-8yBsC7Vk6p_dmvYxrZNfsKqk/pub?output=csv")

To read the CSV file faster you can use vroom which is even faster than fread(). See here.
Now using vroom,
library(vroom)
vroom("https://docs.google.com/spreadsheets/d/170235QwbmgQvr0GWmT-8yBsC7Vk6p_dmvYxrZNfsKqk/pub?output=csv")
#> Rows: 387048 Columns: 14
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (6): StationCode, SampleID, WeatherCode, OrganismCode, race, race2
#> dbl (7): WaterTemperature, Turbidity, Velocity, ForkLength, Weight, Count, ...
#> date (1): SampleDate
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 387,048 × 14
#> StationCode SampleDate SampleID WeatherCode WaterTemperature Turbidity
#> <chr> <date> <chr> <chr> <dbl> <dbl>
#> 1 Gate 11 2000-04-25 116_00 CLD 13.1 2
#> 2 Gate 5 1995-04-26 117_95 CLR NA 2
#> 3 Gate 2 1995-04-21 111_95 W 10.4 12
#> 4 Gate 6 2008-12-13 348_08 CLR 49.9 1.82
#> 5 Gate 5 1999-12-10 344_99 CLR 7.30 1.5
#> 6 Gate 6 2012-05-25 146_12 CLR 55.5 1.60
#> 7 Gate 10 2011-06-28 179_11 RAN 57.3 3.99
#> 8 Gate 11 1996-04-25 116_96 CLR 13.8 21
#> 9 Gate 9 2007-07-02 183_07 CLR 56.6 2.09
#> 10 Gate 6 2009-06-04 155_09 CLR 58.6 3.08
#> # … with 387,038 more rows, and 8 more variables: Velocity <dbl>,
#> # OrganismCode <chr>, ForkLength <dbl>, Weight <dbl>, Count <dbl>,
#> # race <chr>, year <dbl>, race2 <chr>
Created on 2022-07-08 by the reprex package (v2.0.1)

How to merge two columns with the same names from two different data frames and compare and print the ones that are similar

I currently have this code:
install.packages(c("httr", "jsonlite", "tidyverse"))
library(httr)
library(jsonlite)
library(tidyverse)
res1<-GET("https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/songs.json")
res1
rawToChar(res1$content)
data1 = fromJSON(rawToChar(res1$content))
us100<-data1$feed$results
res2 <- GET("https://rss.applemarketingtools.com/api/v2/gb/music/most-played/100/songs.json")
data2<-fromJSON(rawToChar(res2$content))
uk100<-data2$feed$results
I want to compare the two data frames and make a new one printing the results of the artist names and name of the songs that both data frames have in common, how do I do this?

I think you're just looking for an inner_join
us100 %>% inner_join(uk100, by = "id") %>% as_tibble()
#> # A tibble: 16 x 21
#> artistName.x id name.x releaseDate.x kind.x artistId.x artistUrl.x
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Jack Harlow 1618~ First~ 2022-04-08 songs 1047679432 https://mu~
#> 2 Harry Styles 1615~ As It~ 2022-03-31 songs 471260289 https://mu~
#> 3 Lil Baby 1618~ In A ~ 2022-04-08 songs 1276656483 https://mu~
#> 4 Lauren Spencer-Smith 1618~ Flowe~ 2022-04-14 songs 1462708784 https://mu~
#> 5 Glass Animals 1508~ Heat ~ 2020-06-29 songs 528928008 https://mu~
#> 6 Carolina GaitÃ¡n - ~ 1594~ We Do~ 2021-11-19 songs 1227636438 https://mu~
#> 7 The Kid LAROI & Jus~ 1574~ STAY 2021-07-09 songs 1435848034 https://mu~
#> 8 Frank Ocean 1440~ Lost 2012-07-10 songs 442122051 https://mu~
#> 9 Elton John & Dua Li~ 1578~ Cold ~ 2021-08-13 songs 54657 https://mu~
#> 10 Tate McRae 1606~ she's~ 2022-02-04 songs 1446365464 https://mu~
#> 11 Adele 1590~ Easy ~ 2021-10-14 songs 262836961 https://mu~
#> 12 Lil Tjay 1613~ In My~ 2022-04-01 songs 1436446949 https://mu~
#> 13 Lizzo 1619~ About~ 2022-04-14 songs 472949623 https://mu~
#> 14 TiÃ«sto & Ava Max 1590~ The M~ 2021-11-04 songs 4091218 https://mu~
#> 15 Ed Sheeran 1581~ Shive~ 2021-09-09 songs 183313439 https://mu~
#> 16 The Weeknd 1488~ Blind~ 2019-11-29 songs 479756766 https://mu~
#> # ... with 14 more variables: contentAdvisoryRating.x <chr>,
#> # artworkUrl100.x <chr>, genres.x <list>, url.x <chr>, artistName.y <chr>,
#> # name.y <chr>, releaseDate.y <chr>, kind.y <chr>, artistId.y <chr>,
#> # artistUrl.y <chr>, contentAdvisoryRating.y <chr>, artworkUrl100.y <chr>,
#> # genres.y <list>, url.y <chr>
``z

split char currency into two separate columns in data frame

I have this data frame df
Items Item Code Prices
1 Beds 1630 $135.60
2 Big Shelve 1229 89.5USD
3 Small Shelve 1229 ¥3680.03
4 Chair 445 92.63€
5 Desk 802 206.43 euro
6 Lamp 832 25307.1 JPY
I want to split the prices column into three column: Prices and Currency and Exchange rate from USD using
Items Item Code Prices Currency Exchange rates
1 Beds 1630 135.60 USD 1.00
2 Big Shelve 1229 89.50 USD 1.00
3 Small Shelve 1229 3680.03 JPY 115.71
4 Chair 445 92.63 EUR 0.90
5 Desk 802 206.43 EUR 0.90
6 Lamp 832 25307.10 JPY 115.71
I tried using dplyr::separate() but instead it would separate at comma instead.
If I try using the gsub() it gives me this error
> df2 <- df %>%
+ mutate(price = as.numeric(gsub'[$,€,¥,]','', df$Col3))
Error: unexpected string constant in:
"df2 <- df %>%
mutate(price = as.numeric(gsub'[$,€,¥,]'"
Any ideas what to do? Also, how would I able to reference the currency to correct items?

This should solve the problem. Using the quantmod package, you can get the current exchange rate and add that into the data:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
library(tidyr)
library(quantmod)
#> Loading required package: xts
#> Loading required package: zoo
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
#>
#> Attaching package: 'xts'
#> The following objects are masked from 'package:dplyr':
#>
#> first, last
#> Loading required package: TTR
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
dat <- tibble::tribble(
~Items, ~"Item Code", ~Prices,
"Beds", 1630, "$135.60",
"Big Shelve", 1229, "89.5USD",
"Small Shelve", 1229, "¥3680.03",
"Chair", 445, "92.63€",
"Desk", 802, "206.43 euro",
"Lamp", 832, "25307.1 JPY")
dat <- dat %>%
mutate(currency = c(trimws(str_extract_all(Prices, "[^\\d\\.]+", simplify = TRUE))),
currency = case_when(currency %in% c("€", "euro") ~ "EUR",
currency == "$" ~ "USD",
currency == "¥" ~ "JPY",
TRUE ~ currency),
Prices = as.numeric(str_extract_all(Prices, "\\d+\\.\\d+", simplify=TRUE)),
xr = paste0("USD", currency, "=X")) %>%
left_join(getQuote(unique(.$xr)) %>% as_tibble(rownames = "xr") %>% select(xr, Last)) %>%
select(-xr) %>%
rename("Exchange rates" = "Last")
#> Joining, by = "xr"
dat
#> # A tibble: 6 × 5
#> Items `Item Code` Prices currency `Exchange rates`
#> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 Beds 1630 136. USD 1
#> 2 Big Shelve 1229 89.5 USD 1
#> 3 Small Shelve 1229 3680. JPY 116.
#> 4 Chair 445 92.6 EUR 0.902
#> 5 Desk 802 206. EUR 0.902
#> 6 Lamp 832 25307. JPY 116.
Created on 2022-03-03 by the reprex package (v2.0.1)

step_pca() arguments are not being applied

I'm new to tidymodels but apparently the step_pca() arguments such as nom_comp or threshold are not being implemented when being trained. as in example below, I'm still getting 4 component despite setting nom_comp = 2.
library(tidyverse)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
rec <- recipe( ~ ., data = USArrests) %>%
step_normalize(all_numeric()) %>%
step_pca(all_numeric(), num_comp = 2)
prep(rec) %>% tidy(number = 2, type = "coef") %>%
pivot_wider(names_from = component, values_from = value, id_cols = terms)
#> # A tibble: 4 x 5
#> terms PC1 PC2 PC3 PC4
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Murder -0.536 0.418 -0.341 0.649
#> 2 Assault -0.583 0.188 -0.268 -0.743
#> 3 UrbanPop -0.278 -0.873 -0.378 0.134
#> 4 Rape -0.543 -0.167 0.818 0.0890

The full PCA is determined (so you can still compute the variances of each term) and num_comp only specifies how many of the components are retained as predictors. If you want to specify the maximal rank, you can pass that through options:
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
rec <- recipe( ~ ., data = USArrests) %>%
step_normalize(all_numeric()) %>%
step_pca(all_numeric(), num_comp = 2, options = list(rank. = 2))
prep(rec) %>% tidy(number = 2, type = "coef")
#> # A tibble: 8 × 4
#> terms value component id
#> <chr> <dbl> <chr> <chr>
#> 1 Murder -0.536 PC1 pca_AoFOm
#> 2 Assault -0.583 PC1 pca_AoFOm
#> 3 UrbanPop -0.278 PC1 pca_AoFOm
#> 4 Rape -0.543 PC1 pca_AoFOm
#> 5 Murder 0.418 PC2 pca_AoFOm
#> 6 Assault 0.188 PC2 pca_AoFOm
#> 7 UrbanPop -0.873 PC2 pca_AoFOm
#> 8 Rape -0.167 PC2 pca_AoFOm
Created on 2022-01-12 by the reprex package (v2.0.1)
You could also control this via the tol argument from stats::prcomp(), also passed in as an option.

If you bake the recipe it seems to work as intended but I don't know what you aim to achieve afterward.
library(tidyverse)
library(tidymodels)
USArrests <- USArrests %>%
rownames_to_column("Countries")
rec <-
recipe( ~ ., data = USArrests) %>%
step_normalize(all_numeric()) %>%
step_pca(all_numeric(), num_comp = 2)
prep(rec) %>%
bake(new_data = NULL)
#> # A tibble: 50 x 3
#> Countries PC1 PC2
#> <fct> <dbl> <dbl>
#> 1 Alabama -0.976 1.12
#> 2 Alaska -1.93 1.06
#> 3 Arizona -1.75 -0.738
#> 4 Arkansas 0.140 1.11
#> 5 California -2.50 -1.53
#> 6 Colorado -1.50 -0.978
#> 7 Connecticut 1.34 -1.08
#> 8 Delaware -0.0472 -0.322
#> 9 Florida -2.98 0.0388
#> 10 Georgia -1.62 1.27
#> # ... with 40 more rows
Created on 2022-01-11 by the reprex package (v2.0.1)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Not being able to execute stl decomposition properly - r

Related

How to forecast with lagged external regressors using fable::VAR

read file from google drive

How to merge two columns with the same names from two different data frames and compare and print the ones that are similar

split char currency into two separate columns in data frame

step_pca() arguments are not being applied

Categories

Resources