Add a column to function with fixed variable - r

I have this code as a function which generates the table of a Premier League season from Wiki.
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5)
}
read_prem_league(2021)
Who would create the following tibble:
#> # A tibble: 20 x 11
#> Pos Team Pld W D L GF GA GD Pts
#> <int> <chr> <int> <int> <int> <int> <int> <int> <chr> <int>
#> 1 1 Manchester City (C) 38 27 5 6 83 32 +51 86
#> 2 2 Manchester United 38 21 11 6 73 44 +29 74
#> 3 3 Liverpool 38 20 9 9 68 42 +26 69
#> 4 4 Chelsea 38 19 10 9 58 36 +22 67
#> 5 5 Leicester City 38 20 6 12 68 50 +18 66
#> 6 6 West Ham United 38 19 8 11 62 47 +15 65
#> 7 7 Tottenham Hotspur 38 18 8 12 68 45 +23 62
#> 8 8 Arsenal 38 18 7 13 55 39 +16 61
#> 9 9 Leeds United 38 18 5 15 62 54 +8 59
#> 10 10 Everton 38 17 8 13 47 48 -1 59
#> 11 11 Aston Villa 38 16 7 15 55 46 +9 55
#> 12 12 Newcastle United 38 12 9 17 46 62 -16 45
#> 13 13 Wolverhampton Wande~ 38 12 9 17 36 52 -16 45
#> 14 14 Crystal Palace 38 12 8 18 41 66 -25 44
#> 15 15 Southampton 38 12 7 19 47 68 -21 43
#> 16 16 Brighton & Hove Alb~ 38 9 14 15 40 46 -6 41
#> 17 17 Burnley 38 10 9 19 33 55 -22 39
#> 18 18 Fulham (R) 38 5 13 20 27 53 -26 28
#> 19 19 West Bromwich Albio~ 38 5 11 22 35 76 -41 26
#> 20 20 Sheffield United (R) 38 7 2 29 20 63 -43 23
#> # ... with 1 more variable: `Qualification or relegation` <chr>
What I would like to do is to add a column called Season to the left of Pos which shows the current season, so it it's the season ending in 2020 I want it to to say 2019-20.
read_prem_league$Season <- (year)
The above code should work and I want to put it within the function. However, I get the error: Error in View : object of type 'closure' is not subsettable

We may use mutate
library(dplyr)
library(rvest)
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4),
"_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5) %>%
dplyr::mutate(Season = year, .before = Pos)
}
-testing
> dat <- read_prem_league(2021)
> dat
# A tibble: 20 × 12
Season Pos Team Pld W D L GF GA GD Pts `Qualification or relegation`
<dbl> <int> <chr> <int> <int> <int> <int> <int> <int> <chr> <int> <chr>
1 2021 1 Manchester City (C) 38 27 5 6 83 32 +51 86 "Qualification for the Champions League group stage"
2 2021 2 Manchester United 38 21 11 6 73 44 +29 74 "Qualification for the Champions League group stage"
3 2021 3 Liverpool 38 20 9 9 68 42 +26 69 "Qualification for the Champions League group stage"
4 2021 4 Chelsea 38 19 10 9 58 36 +22 67 "Qualification for the Champions League group stage"
5 2021 5 Leicester City 38 20 6 12 68 50 +18 66 "Qualification for the Europa League group stage[a]"
6 2021 6 West Ham United 38 19 8 11 62 47 +15 65 "Qualification for the Europa League group stage[a]"
7 2021 7 Tottenham Hotspur 38 18 8 12 68 45 +23 62 "Qualification for the Europa Conference League play-off round[b]"
8 2021 8 Arsenal 38 18 7 13 55 39 +16 61 ""
9 2021 9 Leeds United 38 18 5 15 62 54 +8 59 ""
10 2021 10 Everton 38 17 8 13 47 48 −1 59 ""
11 2021 11 Aston Villa 38 16 7 15 55 46 +9 55 ""
12 2021 12 Newcastle United 38 12 9 17 46 62 −16 45 ""
13 2021 13 Wolverhampton Wanderers 38 12 9 17 36 52 −16 45 ""
14 2021 14 Crystal Palace 38 12 8 18 41 66 −25 44 ""
15 2021 15 Southampton 38 12 7 19 47 68 −21 43 ""
16 2021 16 Brighton & Hove Albion 38 9 14 15 40 46 −6 41 ""
17 2021 17 Burnley 38 10 9 19 33 55 −22 39 ""
18 2021 18 Fulham (R) 38 5 13 20 27 53 −26 28 "Relegation to the EFL Championship"
19 2021 19 West Bromwich Albion (R) 38 5 11 22 35 76 −41 26 "Relegation to the EFL Championship"
20 2021 20 Sheffield United (R) 38 7 2 29 20 63 −43 23 "Relegation to the EFL Championship"

Related

How to change name on column within a function? [duplicate]

This question already has answers here:
How to dplyr rename a column, by column index?
(4 answers)
Closed 7 months ago.
This is a generic question related to functions:
Let's say I have the following function with random code within brackets. I found this code from a earlier thread from today: Add a column to function with fixed variable
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5)
}
read_prem_league(2015)
Which generates the following tibble:
#> # A tibble: 20 x 11
#> Pos Team Pld W D L GF GA GD Pts
#> <int> <chr> <int> <int> <int> <int> <int> <int> <chr> <int>
#> 1 1 Manchester City (C) 38 27 5 6 83 32 +51 86
#> 2 2 Manchester United 38 21 11 6 73 44 +29 74
#> 3 3 Liverpool 38 20 9 9 68 42 +26 69
#> 4 4 Chelsea 38 19 10 9 58 36 +22 67
#> 5 5 Leicester City 38 20 6 12 68 50 +18 66
#> 6 6 West Ham United 38 19 8 11 62 47 +15 65
#> 7 7 Tottenham Hotspur 38 18 8 12 68 45 +23 62
#> 8 8 Arsenal 38 18 7 13 55 39 +16 61
#> 9 9 Leeds United 38 18 5 15 62 54 +8 59
#> 10 10 Everton 38 17 8 13 47 48 -1 59
#> 11 11 Aston Villa 38 16 7 15 55 46 +9 55
#> 12 12 Newcastle United 38 12 9 17 46 62 -16 45
#> 13 13 Wolverhampton Wande~ 38 12 9 17 36 52 -16 45
#> 14 14 Crystal Palace 38 12 8 18 41 66 -25 44
#> 15 15 Southampton 38 12 7 19 47 68 -21 43
#> 16 16 Brighton & Hove Alb~ 38 9 14 15 40 46 -6 41
#> 17 17 Burnley 38 10 9 19 33 55 -22 39
#> 18 18 Fulham (R) 38 5 13 20 27 53 -26 28
#> 19 19 West Bromwich Albio~ 38 5 11 22 35 76 -41 26
#> 20 20 Sheffield United (R) 38 7 2 29 20 63 -43 23
#> # ... with 1 more variable: `Qualification or relegation` <chr>
I would like to change name of Team column to Club so it always has the name Club. I want to find a general code that works for column 2 in other functions aswell as there are functions where the data are same but column names differs (and I want one column name).
Something similar to below code that was brought as an previous answer is what I'm looking for:
dat <- read.csv(url)
names(dat)[2] <- "year"
dat
You can rename by index
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5) %>%
rename(Club=2)
}
It can be done as
library(rvest)
library(dplyr)
read_prem_league <- function(year) {
dat <- "https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5)
names(dat)[2] <- "Club"
dat
}
-testing
> read_prem_league(2015)
# A tibble: 20 × 11
Pos Club Pld W D L GF GA GD Pts `Qualification or relegation`
<int> <chr> <int> <int> <int> <int> <int> <int> <chr> <int> <chr>
1 1 Chelsea (C) 38 26 9 3 73 32 +41 87 "Qualification for the Champions League group stage"
2 2 Manchester City 38 24 7 7 83 38 +45 79 "Qualification for the Champions League group stage"
3 3 Arsenal 38 22 9 7 71 36 +35 75 "Qualification for the Champions League group stage"
4 4 Manchester United 38 20 10 8 62 37 +25 70 "Qualification for the Champions League play-off round"
5 5 Tottenham Hotspur 38 19 7 12 58 53 +5 64 "Qualification for the Europa League group stage[a]"
6 6 Liverpool 38 18 8 12 52 48 +4 62 "Qualification for the Europa League group stage[a]"
7 7 Southampton 38 18 6 14 54 33 +21 60 "Qualification for the Europa League third qualifying round[a]"
8 8 Swansea City 38 16 8 14 46 49 −3 56 ""
9 9 Stoke City 38 15 9 14 48 45 +3 54 ""
10 10 Crystal Palace 38 13 9 16 47 51 −4 48 ""
11 11 Everton 38 12 11 15 48 50 −2 47 ""
12 12 West Ham United 38 12 11 15 44 47 −3 47 "Qualification for the Europa League first qualifying round[b]"
13 13 West Bromwich Albion 38 11 11 16 38 51 −13 44 ""
14 14 Leicester City 38 11 8 19 46 55 −9 41 ""
15 15 Newcastle United 38 10 9 19 40 63 −23 39 ""
16 16 Sunderland 38 7 17 14 31 53 −22 38 ""
17 17 Aston Villa 38 10 8 20 31 57 −26 38 ""
18 18 Hull City (R) 38 8 11 19 33 51 −18 35 "Relegation to the Football League Championship"
19 19 Burnley (R) 38 7 12 19 28 53 −25 33 "Relegation to the Football League Championship"
20 20 Queens Park Rangers (R) 38 8 6 24 42 73 −31 30 "Relegation to the Football League Championship"

Sum Columns in a dataframe where the names match a vector list

I have a dataframe made up largely of integers and community names.
I have made a list of the community names grouped by their regions like so;
RegionA <- c(a,c,d)
RegionB <- c(b,e,f)
RegionC <- c(g,h,i)
Year a b c d e f g h i `5`
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <dbl>
1 2021 61 44 1 78 37 46 33 16 57 5
2 2020 60 54 60 2 72 59 60 34 60 5
3 2019 53 77 39 66 85 82 65 95 50 5
4 2018 78 20 63 26 41 29 19 82 46 5
5 2017 62 38 22 23 6 11 20 51 65 5
6 2021 39 15 38 74 90 83 73 12 71 5
7 2020 28 23 76 57 100 89 62 14 56 5
8 2019 82 48 40 45 93 72 40 45 29 5
9 2018 13 69 100 13 5 52 99 52 47 5
10 2017 92 13 13 96 98 17 46 49 74 5
I am trying to select the names from the Regions vector and sum them in a new columns
I have tried using
df <- df %>%
mutate(Region_A = rowSums(select(., colnames %in% RegionA)))
and
df <- df %>%
rowwise %>%
mutate(Region_A = sum(c_across(where(colnames %in% RegionA))))
with no success, getting this error
Caused by error in `match()`:
! 'match' requires vector arguments
What could be the proper solution?
A possible solution:
library(dplyr)
RegionA <- c("a","c","d")
RegionB <- c("b","e","f")
RegionC <- c("g","h","i")
df %>%
rowwise %>%
mutate(RegionA = sum(c_across(all_of(RegionA))),
RegionB = sum(c_across(all_of(RegionB))),
RegionC = sum(c_across(all_of(RegionC)))) %>%
ungroup
#> # A tibble: 10 × 13
#> Year a b c d e f g h i RegionA RegionB
#> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 2021 61 44 1 78 37 46 33 16 57 140 127
#> 2 2020 60 54 60 2 72 59 60 34 60 122 185
#> 3 2019 53 77 39 66 85 82 65 95 50 158 244
#> 4 2018 78 20 63 26 41 29 19 82 46 167 90
#> 5 2017 62 38 22 23 6 11 20 51 65 107 55
#> 6 2021 39 15 38 74 90 83 73 12 71 151 188
#> 7 2020 28 23 76 57 100 89 62 14 56 161 212
#> 8 2019 82 48 40 45 93 72 40 45 29 167 213
#> 9 2018 13 69 100 13 5 52 99 52 47 126 126
#> 10 2017 92 13 13 96 98 17 46 49 74 201 128
#> # … with 1 more variable: RegionC <int>

R dataset not found?

I'm trying to load the dataset life.expectancy.1971
, but seem to have trouble loading it. I'm inputting
data(life.expectancy.1971)
life.expectancy.1971
and keep getting the following error:
data set �life.expectancy.1971� not foundError: object 'life.expectancy.1971' not found.
I'm still pretty new to R so it could be a simple error on my part, but I haven't been able to figure out what's wrong since that has worked for loading other datasets. Can anyone help me figure out what I'm missing?
Pasting the answers from the comments to an answer so that the question can be closed.
Install the cluster.datasets package and its dependencies
install.packages(c("cluster.datasets"), dependencies = TRUE)
load cluster.datasets
library(cluster.datasets)
load the dataset life.expectancy.1971,
data(life.expectancy.1971)
look at the dataset life.expectancy.1971,
life.expectancy.1971
#> country year m0 m25 m50 m75 f0 f25 f50 f75
#> 1 Algeria 1965 63 51 30 13 67 54 34 15
#> 2 Cameroon 1964 34 29 13 5 38 32 17 6
#> 3 Madagascar 1966 38 30 17 7 38 34 20 7
#> 4 Mauritius 1966 59 42 20 6 64 46 25 8
#> 5 Reunion 1963 56 38 18 7 62 46 25 10
#> 6 Seychelles 1960 62 44 24 7 69 50 28 14
#> 7 South Africa (Nonwhite) 1961 50 39 20 7 55 43 23 8
#> 8 South Africa (White) 1961 65 44 22 7 72 50 27 9
#> 9 Tunisia 1960 56 46 24 11 63 54 33 19
#> 10 Canada 1966 69 47 24 8 75 53 29 10
#> 11 Costa Rica 1966 65 48 26 9 68 50 27 10
#> 12 Dominican Republic 1966 64 50 28 11 66 51 29 11
#> 13 El Salvador 1961 56 44 25 10 61 48 27 12
#> 14 Greenland 1960 60 44 22 6 65 45 25 9
#> 15 Grenada 1961 61 45 22 8 65 49 27 10
#> 16 Guatemala 1964 49 40 22 9 51 41 23 8
#> 17 Honduras 1966 59 42 22 6 61 43 22 7
#> 18 Jamaica 1963 63 44 23 8 67 48 26 9
#> 19 Mexico 1966 59 44 24 8 63 46 25 8
#> 20 Nicaragua 1965 65 48 28 14 68 51 29 13
#> 21 Panama 1966 65 48 26 9 67 49 27 10
#> 22 Trinidad 1962 64 43 21 7 68 47 25 9
#> 23 Trinidad 1967 64 43 21 6 68 47 24 8
#> 24 US 1966 67 45 23 8 74 51 28 10
#> 25 US (Nonwhite) 1966 61 40 21 10 67 46 25 11
#> 26 US (White) 1966 68 46 23 8 75 52 29 10
#> 27 US 1967 67 45 23 8 74 51 28 10
#> 28 Argentina 1964 65 46 24 9 71 51 28 10
#> 29 Chile 1967 59 43 23 10 66 49 27 12
#> 30 Columbia 1965 58 44 24 9 62 47 25 10
#> 31 Ecuador 1965 57 46 25 9 60 49 28 11

What is the name and reason for the [1] at the output prompt?

What's the name for the [1] below.
What is its significance?
Is it always only [1]? If not, then under what conditions is it something else? (example please)
> bb <- c(5,6,7)
> bb
[1] 5 6 7
It shows the count of the variables. In your case, it shows
bb <- c(5,6,7)
> bb
# [1] 5 6 7
Try,
c(1:50)
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
#[35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
You can also avoid that being displayed by using cat
cat(c(1:50))
#1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Converting weekly data to yearly data

I am new to R and I cant understand how to aggregate the weekly data i have to yearly data?
Week obs
1 2004-01-04 23
2 2004-01-11 36
3 2004-01-18 18
4 2004-01-25 26
5 2004-02-01 17
6 2004-02-08 17
7 2004-02-15 26
8 2004-02-22 21
9 2004-02-29 34
10 2004-03-07 21
11 2004-03-14 30
12 2004-03-21 31
13 2004-03-28 31
14 2004-04-04 38
15 2004-04-11 14
16 2004-04-18 16
17 2004-04-25 44
18 2004-05-02 17
19 2004-05-09 43
20 2004-05-16 31
21 2004-05-23 31
22 2004-05-30 33
23 2004-06-06 13
24 2004-06-13 13
25 2004-06-20 46
26 2004-06-27 34
27 2004-07-04 27
28 2004-07-11 24
29 2004-07-18 20
30 2004-07-25 29
31 2004-08-01 29
32 2004-08-08 12
33 2004-08-15 16
34 2004-08-22 26
35 2004-08-29 29
36 2004-09-05 27
37 2004-09-12 8
38 2004-09-19 18
39 2004-09-26 14
40 2004-10-03 25
41 2004-10-10 26
42 2004-10-17 11
43 2004-10-24 24
44 2004-10-31 17
45 2004-11-07 11
46 2004-11-14 19
47 2004-11-21 8
48 2004-11-28 16
49 2004-12-05 19
50 2004-12-12 14
51 2004-12-19 13
52 2004-12-26 29
I want to just retain
2004 1215
Using data.table, given df$Week is of class Date :
library(data.table)
setDT(df)[,.(abs = sum(obs)), by = year(df$Week)]
# year abs
#1: 2004 1215
In base R,
aggregate(df$obs, list(year = format(df$Week, '%Y')), sum)
# year x
# 1 2004 1215
or with lubridate
library(lubridate)
aggregate(df$obs, list(year = year(df$Week)), sum)
# year x
# 1 2004 1215
or with lubridate and dplyr
library(dplyr)
df %>% group_by(year = year(Week)) %>% summarise(obs = sum(obs))
# Source: local data frame [1 x 2]
#
# year obs
# (dbl) (int)
# 1 2004 1215

Resources