Regular expression in R, splitting sentence at keywords in R - r

Hi,
I would like to split the sentence into two portions, from everything keyword_1 to keyword_2, and keyword_2 to the end of the sentence, preferably using regular expressions.
For example (my ideal output - shown below):
Below is a data set that I made.
Data set
library(tibble)
keyword_1 <- c("coffee", "apple", "rainbow", "strawberry shortcake")
keyword_2 <- c("life", "new york", "seven colours", "sweet and yummy")
raw <-
tibble(
sentence = c(
"coffee is keyword_1_1 life is keyword_2_1",
"apple is keyword_1_2 new york is keyword_2_2",
"rainbow is keyword_1_3 seven colours is keyword_2_3",
"strawberry shortcake is keyword_1_4 sweet and yummy is keyword 2_4"
))
raw
#> # A tibble: 4 x 1
#> sentence
#> <chr>
#> 1 coffee is keyword_1_1 life is keyword_2_1
#> 2 apple is keyword_1_2 new york is keyword_2_2
#> 3 rainbow is keyword_1_3 seven colours is keyword_2_3
#> 4 strawberry shortcake is keyword_1_4 sweet and yummy is keyword 2_4
Intended Output
library(tibble)
output = tibble(
output1 = c(
"coffee is keyword_1_1",
"apple is keyword_1_2",
"rainbow is keyword_1_3",
"strawberry shortcake is keyword_1_4"
),
output2 = c("life is keyword_2_1", "new york is keyword_2_2",
"seven colours is keyword_2_3", "sweet and yummy is keyword 2_4")
)
output
#> # A tibble: 4 x 2
#> output1 output2
#> <chr> <chr>
#> 1 coffee is keyword_1_1 life is keyword_2_1
#> 2 apple is keyword_1_2 new york is keyword_2_2
#> 3 rainbow is keyword_1_3 seven colours is keyword_2_3
#> 4 strawberry shortcake is keyword_1_4 sweet and yummy is keyword 2_4
Created on 2021-03-18 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS 10.16
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2021-03-18
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2)
#> cli 2.3.1 2021-02-23 [1] CRAN (R 4.0.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
#> debugme 1.1.0 2017-10-22 [1] CRAN (R 4.0.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
#> knitr 1.31 2021-01-27 [1] CRAN (R 4.0.2)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
#> pillar 1.5.0 2021-02-22 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2)
#> ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.2)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2)
#> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.2)
#> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.2)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 3.0.0 2020-10-31 [1] CRAN (R 4.0.2)
#> tibble * 3.1.0 2021-02-25 [1] CRAN (R 4.0.2)
#> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.2)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.2)
#> vctrs 0.3.4 2020-08-29 [1] CRAN (R 4.0.2)
#> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2)
#> xfun 0.19.3 2020-11-06 [1] Github (yihui/xfun#12e77f5)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Assuming that the pattern is always "keyword_number_number", so that the fourth entrance is missing a "_" and should be:
raw[4,1] = "strawberry shortcake is keyword_1_4 sweet and yummy is keyword_2_4"
Then we can write:
pattern = "([a-z ]+ keyword_[0-9]_[0-9]) ([a-z ]+ keyword_[0-9]_[0-9])"
a = matrix(NA, nrow(raw), 2)
for(i in 1:nrow(raw)){
for(j in 1:2)
a[i,j] = gsub(pattern, paste0("\\",j), raw[i,1])}
Output:
> a
[,1] [,2]
[1,] "coffee is keyword_1_1" "life is keyword_2_1"
[2,] "apple is keyword_1_2" "new york is keyword_2_2"
[3,] "rainbow is keyword_1_3" "seven colours is keyword_2_3"
[4,] "strawberry shortcake is keyword_1_4" "sweet and yummy is keyword_2_4"

here is a data.table approach, using a look-behind regex pattern for splitting
library( data.table )
setDT(raw)[, paste0( "output", 1:2 ) :=
lapply( tstrsplit(sentence, "(?<=_[0-9]{1}_[0-9]{1})", perl = TRUE ),
trimws ) ][, sentence := NULL][]
# output1 output2
# 1: coffee is keyword_1_1 life is keyword_2_1
# 2: apple is keyword_1_2 new york is keyword_2_2
# 3: rainbow is keyword_1_3 seven colours is keyword_2_3
# 4: strawberry shortcake is keyword_1_4 sweet and yummy is keyword 2_4

Related

strange behavior using rbind with data.table (>= 1.13.0) in combination with data.frame

Trying to rbind a data.table containing an IDate (result of fread) to a data.frame containing a character converts the IDate to its internal integer representation. Probably this is by design, but if not it's a bug. fread supports IDate since data.table 1.13.0 (see https://github.com/Rdatatable/data.table/blob/master/NEWS.md).
The example below shows that the data.table method of rbind can deal with it correctly (throw an error), but the data.frame method of rbind does not.
I don't know how and where this can/should be fixed.
library(data.table)
df1 <- data.frame(date = "2020-11-05")
dt1 <- data.table(date = "2020-11-05")
dt2 <- fread("date\n2020-11-05")
rbind(dt1, dt2) # ok -- throws error: rbind.data.table
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 1 of item 2 does not match with column 1 of item 1.
## not ok -- converts int representation of IDate to character: rbind.data.frame
rbind(df1, dt2)
#> date
#> 1 2020-11-05
#> 2 18571
## the other way round: ok -- throws an error: rbind.data.table
rbind(dt2, df1)
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 1 of item 2 does not match with column 1 of item 1.
### solution
dt3 <- fread("date\n2020-11-05", colClasses = "character")
rbind(dt1, dt3)
#> date
#> 1: 2020-11-05
#> 2: 2020-11-05
Created on 2020-11-05 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.3 (2020-10-10)
#> os Debian GNU/Linux 10 (buster)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate de_AT.UTF-8
#> ctype de_AT.UTF-8
#> tz Europe/Vienna
#> date 2020-11-05
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> backports 1.2.0 2020-11-02 [1] CRAN (R 4.0.3)
#> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.3)
#> cli 2.1.0 2020-10-12 [1] CRAN (R 4.0.3)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
#> data.table * 1.13.2 2020-10-19 [1] CRAN (R 4.0.3)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.3)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.3)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
#> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.3)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.3)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.3)
#> ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.3)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.3)
#> rlang 0.4.8 2020-10-08 [1] CRAN (R 4.0.3)
#> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.3)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.3)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 3.0.0 2020-10-31 [1] CRAN (R 4.0.3)
#> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.3)
#> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.3)
#> xfun 0.19 2020-10-30 [1] CRAN (R 4.0.3)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /usr/local/lib/R/site-library
#> [2] /usr/lib/R/site-library
#> [3] /usr/lib/R/library

Column not being recognised as variable in R [duplicate]

This question already has answers here:
Convert row names into first column
(9 answers)
Closed 2 years ago.
Hi,
I just transposed a large data set and I realised that the first row doesn't have a column name. I have included an extract of the dataset, I tried to use names(df)[1] <- "Year" but it changed the variable name for the second column instead of the first. Is there a way I can include a variable name for the first column?
df <- structure(list(Construction = c("3209.4", "3307.0", "3519.3", "3693.0",
"3545.1", "3620.2"), Manufacturing = c(" 654.9", " 692.9", " 785.1",
" 810.1", " 744.8", " 793.6")), row.names = c("1975 1Q", "1975 2Q",
"1975 3Q", "1975 4Q", "1976 1Q", "1976 2Q"), class = "data.frame")
df
#> Construction Manufacturing
#> 1975 1Q 3209.4 654.9
#> 1975 2Q 3307.0 692.9
#> 1975 3Q 3519.3 785.1
#> 1975 4Q 3693.0 810.1
#> 1976 1Q 3545.1 744.8
#> 1976 2Q 3620.2 793.6
Created on 2020-09-03 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS Catalina 10.15.5
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-09-03
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> backports 1.1.9 2020-08-24 [1] CRAN (R 4.0.2)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.2)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.2)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
#> knitr 1.29 2020-06-23 [1] CRAN (R 4.0.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.3 2020-07-05 [1] CRAN (R 4.0.2)
#> ps 1.3.4 2020-08-11 [1] CRAN (R 4.0.2)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.2)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
#> rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.2)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.2)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.2)
#> xfun 0.16 2020-07-24 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
It is the row.names and not a column. If we need to create a column with row names, use rownames_to_column from tibble
library(tibble)
library(dplyr)
df <- df %>%
rownames_to_column('Year')

R: How to match or filter variables with same character strings but different sequence?

I've a data set with two variables consisting of full names (name and surname). However, these two variables are ordered in a different sequence:
variable1 is ordered by
variable2 is ordered by
How do I filter the rows such that variable1 = variable2? Or can I modify the order of variable2 to match that of variable1?
I created a small sample to replicate the dataset(to note, some full names contain 3 or more words):
library(tidyverse)
name_surname <- c("John Smith One", "Jane Smith Two", "John Doe", "Nick Doe", "Chris Froome", "Van den Broeck", "Lance", "Van Dae Le Phillipe")
surname_name <- c("Smith One John", "Smith Two Jane", "Doe John", "Nick Doe", "Froome Chris", "Broeck Van den", "Lance", "Phillipe Van Dae Le")
tibble <- tibble(variable1 = name_surname, variable2 = surname_name)
tibble
#> # A tibble: 8 x 2
#> variable1 variable2
#> <chr> <chr>
#> 1 John Smith One Smith One John
#> 2 Jane Smith Two Smith Two Jane
#> 3 John Doe Doe John
#> 4 Nick Doe Nick Doe
#> 5 Chris Froome Froome Chris
#> 6 Van den Broeck Broeck Van den
#> 7 Lance Lance
#> 8 Van Dae Le Phillipe Phillipe Van Dae Le
Created on 2020-08-25 by the reprex package (v0.3.0)
Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS Catalina 10.15.5
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-08-25
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> backports 1.1.8 2020-06-17 [1] CRAN (R 4.0.2)
#> blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.2)
#> broom 0.7.0 2020-07-09 [1] CRAN (R 4.0.2)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.2)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.2)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.2)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.2)
#> dbplyr 1.4.4 2020-05-27 [1] CRAN (R 4.0.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.2)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.2)
#> dplyr * 1.0.1 2020-07-31 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
#> forcats * 0.5.0 2020-03-01 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.2)
#> ggplot2 * 3.3.2 2020-06-19 [1] CRAN (R 4.0.2)
#> glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.2)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2)
#> haven 2.3.1 2020-06-01 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.2)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
#> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
#> jsonlite 1.7.0 2020-06-25 [1] CRAN (R 4.0.2)
#> knitr 1.29 2020-06-23 [1] CRAN (R 4.0.2)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2)
#> lubridate 1.7.9 2020-06-08 [1] CRAN (R 4.0.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
#> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.2)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2)
#> pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.3 2020-07-05 [1] CRAN (R 4.0.2)
#> ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.2)
#> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.2)
#> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2)
#> readr * 1.3.1 2018-12-21 [1] CRAN (R 4.0.2)
#> readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.2)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> reprex 0.3.0 2019-05-16 [1] CRAN (R 4.0.2)
#> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
#> rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
#> rvest 0.3.6 2020-07-25 [1] CRAN (R 4.0.2)
#> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.2)
#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.2)
#> tibble * 3.0.3 2020-07-10 [1] CRAN (R 4.0.2)
#> tidyr * 1.1.1 2020-07-31 [1] CRAN (R 4.0.2)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2)
#> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 4.0.2)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.2)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.2)
#> vctrs 0.3.2 2020-07-15 [1] CRAN (R 4.0.2)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.2)
#> xfun 0.16 2020-07-24 [1] CRAN (R 4.0.2)
#> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
Split the variables on space and order variable2 based on variable1.
tibble$variable3 <- mapply(function(x, y) paste(y[match(x, y)], collapse = " "),
strsplit(tibble$variable1, '\\s+'), strsplit(tibble$variable2, '\\s+'))
tibble
# A tibble: 8 x 3
# variable1 variable2 variable3
# <chr> <chr> <chr>
#1 John Smith One Smith One John John Smith One
#2 Jane Smith Two Smith Two Jane Jane Smith Two
#3 John Doe Doe John John Doe
#4 Nick Doe Nick Doe Nick Doe
#5 Chris Froome Froome Chris Chris Froome
#6 Van den Broeck Broeck Van den Van den Broeck
#7 Lance Lance Lance
#8 Van Dae Le Phillipe Phillipe Van Dae Le Van Dae Le Phillipe
Created a new variable (variable3) for comparison purposes, if needed you can overwrite variable2 in the tibble.
A similar logic to #Ronak Shah, but using dplyr and tidyr:
tibble %>%
rowid_to_column() %>%
separate_rows(variable1, variable2) %>%
group_by(rowid) %>%
mutate(variable2 = variable2[match(variable1, variable2)]) %>%
summarise(across(starts_with("variable"), paste, collapse = " "))
rowid variable1 variable2
<int> <chr> <chr>
1 1 John Smith One John Smith One
2 2 Jane Smith Two Jane Smith Two
3 3 John Doe John Doe
4 4 Nick Doe Nick Doe
5 5 Chris Froome Chris Froome
6 6 Van den Broeck Van den Broeck
7 7 Lance Lance
8 8 Van Dae Le Phillipe Van Dae Le Phillipe

Why does dplyr::mutate_at() on the first element in a rowwise-tibble also take effect on the rest of the elements?

In the following code, I defined a tibble df with two columns: name column contains a character vector of c("a", "b", "c"), and data column contains a list of tibbles, each with the column value. Then I'd like to change the column name of each tibble's value column to the character in the corresponding row, e.g. "a", "b" and "c". To manipulate the tibble in a row-wise manner, I used dplyr::rowwise(), but then I found that the changes taking effect on the first element (changing the column name to "a") also took effect on the rest of the elements (since after the first row, the printed tibble before the change of the column name showed the column name of "a"). And therefore, it can be expected that the change of column names to the following elements in the column failed, since there were no longer column names of "value" (all changed to "a"). Do I have to use a purrr::map() function here instead of the tidier row-wise tibble manipulation?
Would you please give me an answer using rowwise-mutate_at method? Thanks.
library(tidyverse)
#> Warning: 程辑包'tidyverse'是用R版本3.6.3 来建造的
#> Warning: 程辑包'ggplot2'是用R版本3.6.1 来建造的
#> Warning: 程辑包'tibble'是用R版本3.6.3 来建造的
#> Warning: 程辑包'tidyr'是用R版本3.6.1 来建造的
#> Warning: 程辑包'readr'是用R版本3.6.1 来建造的
#> Warning: 程辑包'purrr'是用R版本3.6.1 来建造的
#> Warning: 程辑包'dplyr'是用R版本3.6.3 来建造的
#> Warning: 程辑包'stringr'是用R版本3.6.1 来建造的
#> Warning: 程辑包'forcats'是用R版本3.6.3 来建造的
df <- tibble::tibble(name = c("a", "b", "c"),
data = list(tibble::tibble(value = 1:10)))
df_mutate <- df %>%
dplyr::rowwise() %>%
dplyr::mutate_at("data", ~ {
print(.x)
colnames(.x)[colnames(.x) %in% "value"] <- name
list(.x)
}) %>%
dplyr::ungroup()
#> # A tibble: 10 x 1
#> value
#> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
#> 6 6
#> 7 7
#> 8 8
#> 9 9
#> 10 10
#> # A tibble: 10 x 1
#> a
#> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
#> 6 6
#> 7 7
#> 8 8
#> 9 9
#> 10 10
#> # A tibble: 10 x 1
#> a
#> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
#> 6 6
#> 7 7
#> 8 8
#> 9 9
#> 10 10
Created on 2020-06-19 by the reprex package (v0.3.0)
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 3.6.0 (2019-04-26)
#> os Windows Server x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Chinese (Simplified)_China.936
#> ctype Chinese (Simplified)_China.936
#> tz Asia/Taipei
#> date 2020-06-19
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
#> broom 0.5.6 2020-04-20 [1] CRAN (R 3.6.3)
#> callr 3.4.0 2019-12-09 [1] CRAN (R 3.6.2)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.1)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.1)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.2)
#> dbplyr 1.4.2 2019-06-17 [1] CRAN (R 3.6.3)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.1)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.1)
#> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.2)
#> dplyr * 1.0.0 2020-05-29 [1] CRAN (R 3.6.3)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.1)
#> fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.1)
#> forcats * 0.5.0 2020-03-01 [1] CRAN (R 3.6.3)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.1)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.1)
#> ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.1)
#> glue 1.4.1 2020-05-13 [1] CRAN (R 3.6.3)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.1)
#> haven 2.2.0 2019-11-08 [1] CRAN (R 3.6.3)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.1)
#> hms 0.5.2 2019-10-30 [1] CRAN (R 3.6.2)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1)
#> httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.1)
#> jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.1)
#> knitr 1.26 2019-11-12 [1] CRAN (R 3.6.2)
#> lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.0)
#> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.1)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.3)
#> lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.1)
#> modelr 0.1.6 2020-02-22 [1] CRAN (R 3.6.3)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.1)
#> nlme 3.1-143 2019-12-10 [1] CRAN (R 3.6.2)
#> pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.2)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.1)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.1)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.1)
#> purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.1)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.2)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.2)
#> readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.1)
#> readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.1)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.1)
#> reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.3)
#> rlang 0.4.6 2020-05-02 [1] CRAN (R 3.6.3)
#> rmarkdown 2.0 2019-12-12 [1] CRAN (R 3.6.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.1)
#> rvest 0.3.5 2019-11-08 [1] CRAN (R 3.6.3)
#> scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
#> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.1)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.2)
#> tibble * 3.0.1 2020-04-20 [1] CRAN (R 3.6.3)
#> tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.6.1)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.3)
#> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 3.6.3)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.1)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.1)
#> vctrs 0.3.0 2020-05-11 [1] CRAN (R 3.6.3)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.1)
#> xfun 0.11 2019-11-12 [1] CRAN (R 3.6.2)
#> xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.1)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#>
#> [1] C:/Users/xzhu/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.0/library
Yes, you can use map2 :
library(dplyr)
df %>% mutate(data = purrr::map2(name, data, ~{names(.y) <- .x;.y}))
Or Map in base R :
df$data <- Map(function(x, y) {names(y) <- x;y}, df$name, df$data)
If you want to use rowwise a similar approach would be :
df %>% rowwise() %>% mutate(data = {names(data) <- name;list(data)})

How can I tell what arima model this code is running?

I'm reading over some R code, and I've come across a line that where the function prototype doesn't seem to match what I've seen in the library's api (fabletools).
fitted_model = a_time_series %>%
filter(date <= tsibble::year(someyear)) %>%
fabletools::model(arima = ARIMA(time)
...Where time is a column from a a_time_series. How do I tell what arima model this is using?
(e.g. arima(1,1,1) or arima(0,1,1) ,etc)
I've checked this documentation however, the function prototypes don't seem to match.
You can identify the ARIMA output by looking at the formatted output in the console. If you need to obtain this display as text, you can use the format() function.
library(fable)
#> Loading required package: fabletools
library(tsibble)
library(dplyr)
tourism %>%
group_by(Purpose) %>%
summarise(Trips = sum(Trips)) %>%
model(auto_arima = ARIMA(Trips)) %>%
mutate(format(auto_arima))
#> # A mable: 4 x 3
#> # Key: Purpose [4]
#> Purpose auto_arima `format(auto_arima)`
#> <chr> <model> <chr>
#> 1 Business <ARIMA(0,1,1)(0,1,1)[4]> <ARIMA(0,1,1)(0,1,1)[4]>
#> 2 Holiday <ARIMA(0,1,1)(0,1,1)[4]> <ARIMA(0,1,1)(0,1,1)[4]>
#> 3 Other <ARIMA(0,1,1)(1,0,0)[4]> <ARIMA(0,1,1)(1,0,0)[4]>
#> 4 Visiting <ARIMA(1,0,1)(2,1,0)[4]> <ARIMA(1,0,1)(2,1,0)[4]>
Created on 2020-06-12 by the reprex package (v0.3.0)
Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.2 (2019-12-12)
#> os Ubuntu 18.04.4 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-06-12
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> anytime 0.3.7 2020-01-20 [1] CRAN (R 3.6.1)
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
#> backports 1.1.7 2020-05-13 [1] RSPM (R 3.6.3)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.2)
#> cli 2.0.2 2020-02-28 [1] RSPM (R 3.6.2)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.1)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.1)
#> devtools 2.2.2 2020-02-17 [1] RSPM (R 3.6.2)
#> digest 0.6.25 2020-02-23 [1] RSPM (R 3.6.2)
#> distributional 0.1.0.9000 2020-06-10 [1] local
#> dplyr * 1.0.0 2020-05-29 [1] CRAN (R 3.6.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 3.6.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.1)
#> fable * 0.2.1 2020-06-11 [1] local
#> fabletools * 0.2.0 2020-06-11 [1] local
#> fansi 0.4.1 2020-01-08 [1] RSPM (R 3.6.2)
#> farver 2.0.3 2020-01-16 [1] CRAN (R 3.6.1)
#> feasts 0.1.4 2020-06-04 [1] local
#> fs 1.4.1 2020-04-04 [1] RSPM (R 3.6.3)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.1)
#> ggplot2 3.3.1 2020-05-28 [1] CRAN (R 3.6.2)
#> glue 1.4.1.9000 2020-05-26 [1] Github (tidyverse/glue#a605000)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.1)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.1)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1)
#> knitr 1.28 2020-02-06 [1] RSPM (R 3.6.2)
#> lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.2)
#> lifecycle 0.2.0.9000 2020-03-19 [1] Github (r-lib/lifecycle#355dcba)
#> lubridate 1.7.8 2020-04-06 [1] RSPM (R 3.6.3)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.1)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.1)
#> nlme 3.1-142 2019-11-07 [2] CRAN (R 3.6.2)
#> pillar 1.4.4 2020-05-25 [1] Github (r-lib/pillar#2f5ad11)
#> pkgbuild 1.0.8 2020-05-07 [1] RSPM (R 3.6.3)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 3.6.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
#> processx 3.4.2 2020-02-09 [1] RSPM (R 3.6.2)
#> progressr 0.6.0 2020-05-19 [1] CRAN (R 3.6.2)
#> ps 1.3.3 2020-05-08 [1] RSPM (R 3.6.3)
#> purrr 0.3.4 2020-04-17 [1] RSPM (R 3.6.3)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
#> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 3.6.2)
#> remotes 2.1.1 2020-02-15 [1] RSPM (R 3.6.2)
#> rlang 0.4.6.9000 2020-05-20 [1] Github (r-lib/rlang#691b5a8)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.1)
#> scales 1.1.1 2020-05-11 [1] RSPM (R 3.6.3)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.1)
#> testthat 2.3.2 2020-03-02 [1] RSPM (R 3.6.3)
#> tibble 3.0.1 2020-04-20 [1] RSPM (R 3.6.3)
#> tidyr 1.1.0 2020-05-20 [1] RSPM (R 3.6.3)
#> tidyselect 1.1.0 2020-05-11 [1] RSPM (R 3.6.3)
#> tsibble * 0.9.0 2020-06-02 [1] Github (tidyverts/tsibble#c837e83)
#> urca 1.3-0 2016-09-06 [1] CRAN (R 3.6.1)
#> usethis 1.5.1.9000 2020-01-31 [1] Github (r-lib/usethis#7d8b066)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.1)
#> vctrs 0.3.0.9000 2020-05-28 [1] Github (r-lib/vctrs#373e1ce)
#> withr 2.2.0 2020-04-20 [1] RSPM (R 3.6.3)
#> xfun 0.13 2020-04-13 [1] RSPM (R 3.6.3)
#> yaml 2.2.1 2020-02-01 [1] RSPM (R 3.6.2)
#>
#> [1] /home/mitchell/R/x86_64-pc-linux-gnu-library/3.6
#> [2] /opt/R/3.6.2/lib/R/library

Resources