How to change name on column within a function? [duplicate] - r

This question already has answers here:
How to dplyr rename a column, by column index?
(4 answers)
Closed 7 months ago.
This is a generic question related to functions:
Let's say I have the following function with random code within brackets. I found this code from a earlier thread from today: Add a column to function with fixed variable
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5)
}
read_prem_league(2015)
Which generates the following tibble:
#> # A tibble: 20 x 11
#> Pos Team Pld W D L GF GA GD Pts
#> <int> <chr> <int> <int> <int> <int> <int> <int> <chr> <int>
#> 1 1 Manchester City (C) 38 27 5 6 83 32 +51 86
#> 2 2 Manchester United 38 21 11 6 73 44 +29 74
#> 3 3 Liverpool 38 20 9 9 68 42 +26 69
#> 4 4 Chelsea 38 19 10 9 58 36 +22 67
#> 5 5 Leicester City 38 20 6 12 68 50 +18 66
#> 6 6 West Ham United 38 19 8 11 62 47 +15 65
#> 7 7 Tottenham Hotspur 38 18 8 12 68 45 +23 62
#> 8 8 Arsenal 38 18 7 13 55 39 +16 61
#> 9 9 Leeds United 38 18 5 15 62 54 +8 59
#> 10 10 Everton 38 17 8 13 47 48 -1 59
#> 11 11 Aston Villa 38 16 7 15 55 46 +9 55
#> 12 12 Newcastle United 38 12 9 17 46 62 -16 45
#> 13 13 Wolverhampton Wande~ 38 12 9 17 36 52 -16 45
#> 14 14 Crystal Palace 38 12 8 18 41 66 -25 44
#> 15 15 Southampton 38 12 7 19 47 68 -21 43
#> 16 16 Brighton & Hove Alb~ 38 9 14 15 40 46 -6 41
#> 17 17 Burnley 38 10 9 19 33 55 -22 39
#> 18 18 Fulham (R) 38 5 13 20 27 53 -26 28
#> 19 19 West Bromwich Albio~ 38 5 11 22 35 76 -41 26
#> 20 20 Sheffield United (R) 38 7 2 29 20 63 -43 23
#> # ... with 1 more variable: `Qualification or relegation` <chr>
I would like to change name of Team column to Club so it always has the name Club. I want to find a general code that works for column 2 in other functions aswell as there are functions where the data are same but column names differs (and I want one column name).
Something similar to below code that was brought as an previous answer is what I'm looking for:
dat <- read.csv(url)
names(dat)[2] <- "year"
dat

You can rename by index
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5) %>%
rename(Club=2)
}

It can be done as
library(rvest)
library(dplyr)
read_prem_league <- function(year) {
dat <- "https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5)
names(dat)[2] <- "Club"
dat
}
-testing
> read_prem_league(2015)
# A tibble: 20 × 11
Pos Club Pld W D L GF GA GD Pts `Qualification or relegation`
<int> <chr> <int> <int> <int> <int> <int> <int> <chr> <int> <chr>
1 1 Chelsea (C) 38 26 9 3 73 32 +41 87 "Qualification for the Champions League group stage"
2 2 Manchester City 38 24 7 7 83 38 +45 79 "Qualification for the Champions League group stage"
3 3 Arsenal 38 22 9 7 71 36 +35 75 "Qualification for the Champions League group stage"
4 4 Manchester United 38 20 10 8 62 37 +25 70 "Qualification for the Champions League play-off round"
5 5 Tottenham Hotspur 38 19 7 12 58 53 +5 64 "Qualification for the Europa League group stage[a]"
6 6 Liverpool 38 18 8 12 52 48 +4 62 "Qualification for the Europa League group stage[a]"
7 7 Southampton 38 18 6 14 54 33 +21 60 "Qualification for the Europa League third qualifying round[a]"
8 8 Swansea City 38 16 8 14 46 49 −3 56 ""
9 9 Stoke City 38 15 9 14 48 45 +3 54 ""
10 10 Crystal Palace 38 13 9 16 47 51 −4 48 ""
11 11 Everton 38 12 11 15 48 50 −2 47 ""
12 12 West Ham United 38 12 11 15 44 47 −3 47 "Qualification for the Europa League first qualifying round[b]"
13 13 West Bromwich Albion 38 11 11 16 38 51 −13 44 ""
14 14 Leicester City 38 11 8 19 46 55 −9 41 ""
15 15 Newcastle United 38 10 9 19 40 63 −23 39 ""
16 16 Sunderland 38 7 17 14 31 53 −22 38 ""
17 17 Aston Villa 38 10 8 20 31 57 −26 38 ""
18 18 Hull City (R) 38 8 11 19 33 51 −18 35 "Relegation to the Football League Championship"
19 19 Burnley (R) 38 7 12 19 28 53 −25 33 "Relegation to the Football League Championship"
20 20 Queens Park Rangers (R) 38 8 6 24 42 73 −31 30 "Relegation to the Football League Championship"

Related

Add a column to function with fixed variable

I have this code as a function which generates the table of a Premier League season from Wiki.
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5)
}
read_prem_league(2021)
Who would create the following tibble:
#> # A tibble: 20 x 11
#> Pos Team Pld W D L GF GA GD Pts
#> <int> <chr> <int> <int> <int> <int> <int> <int> <chr> <int>
#> 1 1 Manchester City (C) 38 27 5 6 83 32 +51 86
#> 2 2 Manchester United 38 21 11 6 73 44 +29 74
#> 3 3 Liverpool 38 20 9 9 68 42 +26 69
#> 4 4 Chelsea 38 19 10 9 58 36 +22 67
#> 5 5 Leicester City 38 20 6 12 68 50 +18 66
#> 6 6 West Ham United 38 19 8 11 62 47 +15 65
#> 7 7 Tottenham Hotspur 38 18 8 12 68 45 +23 62
#> 8 8 Arsenal 38 18 7 13 55 39 +16 61
#> 9 9 Leeds United 38 18 5 15 62 54 +8 59
#> 10 10 Everton 38 17 8 13 47 48 -1 59
#> 11 11 Aston Villa 38 16 7 15 55 46 +9 55
#> 12 12 Newcastle United 38 12 9 17 46 62 -16 45
#> 13 13 Wolverhampton Wande~ 38 12 9 17 36 52 -16 45
#> 14 14 Crystal Palace 38 12 8 18 41 66 -25 44
#> 15 15 Southampton 38 12 7 19 47 68 -21 43
#> 16 16 Brighton & Hove Alb~ 38 9 14 15 40 46 -6 41
#> 17 17 Burnley 38 10 9 19 33 55 -22 39
#> 18 18 Fulham (R) 38 5 13 20 27 53 -26 28
#> 19 19 West Bromwich Albio~ 38 5 11 22 35 76 -41 26
#> 20 20 Sheffield United (R) 38 7 2 29 20 63 -43 23
#> # ... with 1 more variable: `Qualification or relegation` <chr>
What I would like to do is to add a column called Season to the left of Pos which shows the current season, so it it's the season ending in 2020 I want it to to say 2019-20.
read_prem_league$Season <- (year)
The above code should work and I want to put it within the function. However, I get the error: Error in View : object of type 'closure' is not subsettable
We may use mutate
library(dplyr)
library(rvest)
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4),
"_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5) %>%
dplyr::mutate(Season = year, .before = Pos)
}
-testing
> dat <- read_prem_league(2021)
> dat
# A tibble: 20 × 12
Season Pos Team Pld W D L GF GA GD Pts `Qualification or relegation`
<dbl> <int> <chr> <int> <int> <int> <int> <int> <int> <chr> <int> <chr>
1 2021 1 Manchester City (C) 38 27 5 6 83 32 +51 86 "Qualification for the Champions League group stage"
2 2021 2 Manchester United 38 21 11 6 73 44 +29 74 "Qualification for the Champions League group stage"
3 2021 3 Liverpool 38 20 9 9 68 42 +26 69 "Qualification for the Champions League group stage"
4 2021 4 Chelsea 38 19 10 9 58 36 +22 67 "Qualification for the Champions League group stage"
5 2021 5 Leicester City 38 20 6 12 68 50 +18 66 "Qualification for the Europa League group stage[a]"
6 2021 6 West Ham United 38 19 8 11 62 47 +15 65 "Qualification for the Europa League group stage[a]"
7 2021 7 Tottenham Hotspur 38 18 8 12 68 45 +23 62 "Qualification for the Europa Conference League play-off round[b]"
8 2021 8 Arsenal 38 18 7 13 55 39 +16 61 ""
9 2021 9 Leeds United 38 18 5 15 62 54 +8 59 ""
10 2021 10 Everton 38 17 8 13 47 48 −1 59 ""
11 2021 11 Aston Villa 38 16 7 15 55 46 +9 55 ""
12 2021 12 Newcastle United 38 12 9 17 46 62 −16 45 ""
13 2021 13 Wolverhampton Wanderers 38 12 9 17 36 52 −16 45 ""
14 2021 14 Crystal Palace 38 12 8 18 41 66 −25 44 ""
15 2021 15 Southampton 38 12 7 19 47 68 −21 43 ""
16 2021 16 Brighton & Hove Albion 38 9 14 15 40 46 −6 41 ""
17 2021 17 Burnley 38 10 9 19 33 55 −22 39 ""
18 2021 18 Fulham (R) 38 5 13 20 27 53 −26 28 "Relegation to the EFL Championship"
19 2021 19 West Bromwich Albion (R) 38 5 11 22 35 76 −41 26 "Relegation to the EFL Championship"
20 2021 20 Sheffield United (R) 38 7 2 29 20 63 −43 23 "Relegation to the EFL Championship"

Sum Columns in a dataframe where the names match a vector list

I have a dataframe made up largely of integers and community names.
I have made a list of the community names grouped by their regions like so;
RegionA <- c(a,c,d)
RegionB <- c(b,e,f)
RegionC <- c(g,h,i)
Year a b c d e f g h i `5`
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <dbl>
1 2021 61 44 1 78 37 46 33 16 57 5
2 2020 60 54 60 2 72 59 60 34 60 5
3 2019 53 77 39 66 85 82 65 95 50 5
4 2018 78 20 63 26 41 29 19 82 46 5
5 2017 62 38 22 23 6 11 20 51 65 5
6 2021 39 15 38 74 90 83 73 12 71 5
7 2020 28 23 76 57 100 89 62 14 56 5
8 2019 82 48 40 45 93 72 40 45 29 5
9 2018 13 69 100 13 5 52 99 52 47 5
10 2017 92 13 13 96 98 17 46 49 74 5
I am trying to select the names from the Regions vector and sum them in a new columns
I have tried using
df <- df %>%
mutate(Region_A = rowSums(select(., colnames %in% RegionA)))
and
df <- df %>%
rowwise %>%
mutate(Region_A = sum(c_across(where(colnames %in% RegionA))))
with no success, getting this error
Caused by error in `match()`:
! 'match' requires vector arguments
What could be the proper solution?
A possible solution:
library(dplyr)
RegionA <- c("a","c","d")
RegionB <- c("b","e","f")
RegionC <- c("g","h","i")
df %>%
rowwise %>%
mutate(RegionA = sum(c_across(all_of(RegionA))),
RegionB = sum(c_across(all_of(RegionB))),
RegionC = sum(c_across(all_of(RegionC)))) %>%
ungroup
#> # A tibble: 10 × 13
#> Year a b c d e f g h i RegionA RegionB
#> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 2021 61 44 1 78 37 46 33 16 57 140 127
#> 2 2020 60 54 60 2 72 59 60 34 60 122 185
#> 3 2019 53 77 39 66 85 82 65 95 50 158 244
#> 4 2018 78 20 63 26 41 29 19 82 46 167 90
#> 5 2017 62 38 22 23 6 11 20 51 65 107 55
#> 6 2021 39 15 38 74 90 83 73 12 71 151 188
#> 7 2020 28 23 76 57 100 89 62 14 56 161 212
#> 8 2019 82 48 40 45 93 72 40 45 29 167 213
#> 9 2018 13 69 100 13 5 52 99 52 47 126 126
#> 10 2017 92 13 13 96 98 17 46 49 74 201 128
#> # … with 1 more variable: RegionC <int>

R plot numbers of factor levels having n, n+1, .... counts

I have a very large dataset (> 200000 lines) with 6 variables (only the first two shown)
>head(gt7)
ChromKey POS
1 2447 25
2 2447 183
3 26341 75
4 26341 2213
5 26341 2617
6 54011 1868
I have converted the Chromkey variable to a factor variable made up of > 55000 levels.
> gt7[1] <- lapply(gt7[1], factor)
> is.factor(gt7$ChromKey)
[1] TRUE
I can further make a table with counts of ChromKey levels
> table(gt7$ChromKey)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
88 88 44 33 11 11 33 22 121 11 22 11 11 11 22 11 33
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
22 22 44 55 22 11 22 66 11 11 11 22 11 11 11 187 77
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
77 11 44 11 11 11 11 11 11 22 66 11 22 11 44 22 22
... outut cropped
Which I can save in table format
> table <- table(gt7$ChromKey)
> head(table)
1 2 3 4 5 6
88 88 44 33 11 11
I would like to know whether is it possible to have a table (and histogram) of the number of levels with specific count numbers. From the example above, I would expect
88 44 33 11
2 1 1 2
I would very much appreciate any hint.
We can apply table again on the output to get the frequency count of the frequency
table(table(gt7$ChromKey))

R dataset not found?

I'm trying to load the dataset life.expectancy.1971
, but seem to have trouble loading it. I'm inputting
data(life.expectancy.1971)
life.expectancy.1971
and keep getting the following error:
data set �life.expectancy.1971� not foundError: object 'life.expectancy.1971' not found.
I'm still pretty new to R so it could be a simple error on my part, but I haven't been able to figure out what's wrong since that has worked for loading other datasets. Can anyone help me figure out what I'm missing?
Pasting the answers from the comments to an answer so that the question can be closed.
Install the cluster.datasets package and its dependencies
install.packages(c("cluster.datasets"), dependencies = TRUE)
load cluster.datasets
library(cluster.datasets)
load the dataset life.expectancy.1971,
data(life.expectancy.1971)
look at the dataset life.expectancy.1971,
life.expectancy.1971
#> country year m0 m25 m50 m75 f0 f25 f50 f75
#> 1 Algeria 1965 63 51 30 13 67 54 34 15
#> 2 Cameroon 1964 34 29 13 5 38 32 17 6
#> 3 Madagascar 1966 38 30 17 7 38 34 20 7
#> 4 Mauritius 1966 59 42 20 6 64 46 25 8
#> 5 Reunion 1963 56 38 18 7 62 46 25 10
#> 6 Seychelles 1960 62 44 24 7 69 50 28 14
#> 7 South Africa (Nonwhite) 1961 50 39 20 7 55 43 23 8
#> 8 South Africa (White) 1961 65 44 22 7 72 50 27 9
#> 9 Tunisia 1960 56 46 24 11 63 54 33 19
#> 10 Canada 1966 69 47 24 8 75 53 29 10
#> 11 Costa Rica 1966 65 48 26 9 68 50 27 10
#> 12 Dominican Republic 1966 64 50 28 11 66 51 29 11
#> 13 El Salvador 1961 56 44 25 10 61 48 27 12
#> 14 Greenland 1960 60 44 22 6 65 45 25 9
#> 15 Grenada 1961 61 45 22 8 65 49 27 10
#> 16 Guatemala 1964 49 40 22 9 51 41 23 8
#> 17 Honduras 1966 59 42 22 6 61 43 22 7
#> 18 Jamaica 1963 63 44 23 8 67 48 26 9
#> 19 Mexico 1966 59 44 24 8 63 46 25 8
#> 20 Nicaragua 1965 65 48 28 14 68 51 29 13
#> 21 Panama 1966 65 48 26 9 67 49 27 10
#> 22 Trinidad 1962 64 43 21 7 68 47 25 9
#> 23 Trinidad 1967 64 43 21 6 68 47 24 8
#> 24 US 1966 67 45 23 8 74 51 28 10
#> 25 US (Nonwhite) 1966 61 40 21 10 67 46 25 11
#> 26 US (White) 1966 68 46 23 8 75 52 29 10
#> 27 US 1967 67 45 23 8 74 51 28 10
#> 28 Argentina 1964 65 46 24 9 71 51 28 10
#> 29 Chile 1967 59 43 23 10 66 49 27 12
#> 30 Columbia 1965 58 44 24 9 62 47 25 10
#> 31 Ecuador 1965 57 46 25 9 60 49 28 11

Converting weekly data to yearly data

I am new to R and I cant understand how to aggregate the weekly data i have to yearly data?
Week obs
1 2004-01-04 23
2 2004-01-11 36
3 2004-01-18 18
4 2004-01-25 26
5 2004-02-01 17
6 2004-02-08 17
7 2004-02-15 26
8 2004-02-22 21
9 2004-02-29 34
10 2004-03-07 21
11 2004-03-14 30
12 2004-03-21 31
13 2004-03-28 31
14 2004-04-04 38
15 2004-04-11 14
16 2004-04-18 16
17 2004-04-25 44
18 2004-05-02 17
19 2004-05-09 43
20 2004-05-16 31
21 2004-05-23 31
22 2004-05-30 33
23 2004-06-06 13
24 2004-06-13 13
25 2004-06-20 46
26 2004-06-27 34
27 2004-07-04 27
28 2004-07-11 24
29 2004-07-18 20
30 2004-07-25 29
31 2004-08-01 29
32 2004-08-08 12
33 2004-08-15 16
34 2004-08-22 26
35 2004-08-29 29
36 2004-09-05 27
37 2004-09-12 8
38 2004-09-19 18
39 2004-09-26 14
40 2004-10-03 25
41 2004-10-10 26
42 2004-10-17 11
43 2004-10-24 24
44 2004-10-31 17
45 2004-11-07 11
46 2004-11-14 19
47 2004-11-21 8
48 2004-11-28 16
49 2004-12-05 19
50 2004-12-12 14
51 2004-12-19 13
52 2004-12-26 29
I want to just retain
2004 1215
Using data.table, given df$Week is of class Date :
library(data.table)
setDT(df)[,.(abs = sum(obs)), by = year(df$Week)]
# year abs
#1: 2004 1215
In base R,
aggregate(df$obs, list(year = format(df$Week, '%Y')), sum)
# year x
# 1 2004 1215
or with lubridate
library(lubridate)
aggregate(df$obs, list(year = year(df$Week)), sum)
# year x
# 1 2004 1215
or with lubridate and dplyr
library(dplyr)
df %>% group_by(year = year(Week)) %>% summarise(obs = sum(obs))
# Source: local data frame [1 x 2]
#
# year obs
# (dbl) (int)
# 1 2004 1215

Resources