Webscraping in R - commented out table [duplicate] - r

This question already has an answer here:
Not able to scrape a second table within a page using rvest
(1 answer)
Closed 4 years ago.
I'm trying to webscrape the final table in https://www.baseball-reference.com/leagues/MLB/2015-standings.shtml
i.e. the "MLB Detailed Standings"
My R code is as follows:
library(XML)
library(httr)
library(plyr)
library(stringr)
url <- paste0("http://www.baseball-reference.com/leagues/MLB/", 2015, "-standings.shtml")
tab <- GET(url)
data <- readHTMLTable(rawToChar(tab$content))
however the it does not seem to pickup the table I want. Looking at the source code it seems as though the table is commented out somehow?
Any help would be great

From the answer MrFlick linked:
library(XML)
library(tidyverse)
library(rvest)
page <- xml2::read_html("https://www.baseball-reference.com/leagues/MLB/2015-standings.shtml")
alt_tables <- xml2::xml_find_all(page,"//comment()") %>% {
#Find only commented nodes that contain the regex for html table markup
raw_parts <- as.character(.[grep("\\</?table", as.character(.))])
# Remove the comment begin and end tags
strip_html <- stringi::stri_replace_all_regex(raw_parts, c("<\\!--","-->"),c("",""),
vectorize_all = FALSE)
# Loop through the pieces that have tables within markup and
# apply the same functions
lapply(grep("<table", strip_html, value = TRUE), function(i){
rvest::html_table(xml_find_all(read_html(i), "//table")) %>%
.[[1]]
})
}
tbl <- alt_tables[[2]]
tbl <- as.tibble(tbl)
tbl
# A tibble: 31 x 23
Rk Tm Lg G W L `W-L%` R RA Rdiff SOS SRS pythWL Luck Inter Home Road ExInn
<int> <chr> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <chr> <chr> <chr>
1 1 STL NL 162 100 62 0.617 4 3.2 0.8 -0.3 0.5 96-66 4 11-9 55-26 45-36 8-8
2 2 PIT NL 162 98 64 0.605 4.3 3.7 0.6 -0.3 0.3 93-69 5 13-7 53-28 45-36 12-9
3 3 CHC NL 162 97 65 0.599 4.3 3.8 0.5 -0.3 0.2 90-72 7 10-10 49-32 48-33 13-5
4 4 KCR AL 162 95 67 0.586 4.5 4 0.5 0.2 0.7 90-72 5 13-7 51-30 44-37 10-6
5 5 TOR AL 162 93 69 0.574 5.5 4.1 1.4 0.2 1.6 102-60 -9 12-8 53-28 40-41 8-6
6 6 LAD NL 162 92 70 0.568 4.1 3.7 0.4 -0.3 0.1 89-73 3 10-10 55-26 37-44 6-9
7 7 NYM NL 162 90 72 0.556 4.2 3.8 0.4 -0.4 0 89-73 1 9-11 49-32 41-40 9-6
8 8 TEX AL 162 88 74 0.543 4.6 4.5 0.1 0.2 0.4 83-79 5 11-9 43-38 45-36 5-4
9 9 NYY AL 162 87 75 0.537 4.7 4.3 0.4 0.3 0.8 88-74 -1 11-9 45-36 42-39 4-9
10 10 HOU AL 162 86 76 0.531 4.5 3.8 0.7 0.2 0.9 93-69 -7 16-4 53-28 33-48 8-6
# ... with 21 more rows, and 5 more variables: `1Run` <chr>, vRHP <chr>, vLHP <chr>, `≥.500` <chr>, `<.500` <chr>
>

Related

Iterate over an xpath (string) in R for data scraping

I've got a (pretty simple) code to download a table with data:
library(rvest)
link = "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/team/2442/statistics"
aguada = read_html(link)
stats = aguada %>% html_nodes("tbody")
stats = aguada %>% html_nodes(xpath="/html/body/div[1]/div[6]/div/div/div/div[4]/table") %>% html_table()
my_df <- as.data.frame(stats)
And now I'm trying to do the same, but for the URLs for each player in the same table
for (i in 1:17){
url_path="/html/body/div[1]/div[6]/div/div/div/div[4]/table/tbody/tr[i]/td[1]/a"
jugador[i] = aguada %>% html_nodes(xpath=url_path)%>% html_attr("href")
}
I've tried the code above, and while it doesn't crash, it doesn't work as intended either. I want to create an array with the urls or something like that so I can then get the stats for each player easily. While we're at it, I'd like to know if, instead of doing 1:17 in the for and manually counting the players, there's a way to automate that too, so I can do something like for i in 1:table_length
You need to initialise the vector jugador to be able to append the links to it. Also, when you create a path that invloves changing a character within the path, paste concatenates the strings with the number i to create the path, as shown below:
jugador <- vector()
for(i in 1:17){
url_path <- paste("/html/body/div[1]/div[6]/div/div/div/div[4]/table/tbody/tr[", i, "]/td[1]/a", sep = "")
jugador[i] <- aguada %>% html_nodes(xpath=url_path)%>% html_attr("href")
}
Result:
> jugador
[1] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/15257?"
[2] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/17101?"
[3] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/17554?"
[4] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/43225?"
[5] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/262286?"
[6] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/623893?"
[7] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/725720?"
[8] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/858052?"
[9] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1645559?"
[10] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1651515?"
[11] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1717089?"
[12] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1924883?"
[13] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1924884?"
[14] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1931124?"
[15] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1950388?"
[16] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1971299?"
[17] "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/person/1991297?"
Links in the last column. Without loop
library(tidyverse)
library(rvest)
page <-
"https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/team/2442/statistics" %>%
read_html()
df <- page %>%
html_table() %>%
pluck(1) %>%
janitor::clean_names() %>%
mutate(link = page %>%
html_elements("td a") %>%
html_attr("href") %>%
unique())
# A tibble: 17 x 21
jugador p i pts_pr pts as_pr as ro_pr rd_pr rt_pr rt bl_prom bl re_pr re min_pr tc_percent x2p_percent x3p_percent tl_percent link$value
<chr> <int> <int> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 F. MEDINA 22 9 6 131 1.3 29 0.5 0.8 1.3 28 0 0 0.6 13 22 37 55.6 26.8 60 https://hosted.dcd.share~
2 J. SANTISO 23 23 12 277 5.6 128 0.4 2.9 3.3 75 0 0 0.7 15 31 43.1 43.2 43 75 https://hosted.dcd.share~
3 A. ZUVICH 17 1 8.2 139 0.7 11 2 2.9 4.9 83 0.5 8 1.1 19 15.9 59.8 67.1 16.7 76.5 https://hosted.dcd.share~
4 A. YOUNG 15 14 12.5 187 1.3 20 0.4 3.3 3.7 55 0.5 7 0.6 9 30.5 36.2 41.9 32 78.8 https://hosted.dcd.share~
5 E. VARGAS 23 23 16.1 370 1.9 44 3.5 8.4 11.9 273 1.6 37 1.1 25 30.3 53.3 53.5 0 62.6 https://hosted.dcd.share~
6 L. PLANELLS 23 0 3.6 83 1.6 37 0.5 1.1 1.6 37 0.1 2 0.7 17 15.1 35.4 35.1 35.6 90 https://hosted.dcd.share~
7 T. METZGER 11 9 6.8 75 0.6 7 1.7 3.3 5 55 0.4 4 0.5 5 23.1 37 44.2 28.9 40 https://hosted.dcd.share~
8 L. SILVA 19 0 1.1 21 0.1 2 0.2 0.2 0.3 6 0.1 1 0 0 4 35 71.4 15.4 100 https://hosted.dcd.share~
9 J. STOLL 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2 0 0 0 0 https://hosted.dcd.share~
10 G. BRUN 4 0 0.8 3 0 0 0.3 0 0.3 1 0 0 0 0 0.6 50 0 50 0 https://hosted.dcd.share~
11 A. GENTILE 3 0 0 0 0 0 0.3 0.3 0.7 2 0 0 0 0 1 0 0 0 0 https://hosted.dcd.share~
12 L. CERMINATO 19 5 8.6 163 1.7 33 1.3 3.6 4.9 93 0.7 14 0.9 17 20.9 44.1 51.9 27.1 57.1 https://hosted.dcd.share~
13 J. ADAMS 8 8 16.6 133 1.9 15 1 2.5 3.5 28 0.3 2 1.9 15 28.9 46.2 53.9 26.7 81.8 https://hosted.dcd.share~
14 K. FULLER 5 5 4.6 23 1.8 9 0.6 0.6 1.2 6 0 0 0.4 2 20.1 17.1 0 28.6 83.3 https://hosted.dcd.share~
15 S. MAC 4 4 12.5 50 2 8 0 3 3 12 0.5 2 1.8 7 29.9 37.8 35.5 42.9 76.9 https://hosted.dcd.share~
16 O. JOHNSON 12 12 15.4 185 3.4 41 1 3.2 4.2 50 0.3 4 0.8 9 31.8 47.3 53.6 34.7 75 https://hosted.dcd.share~
17 G. SOLANO 2 2 15.5 31 6.5 13 0.5 5.5 6 12 0 0 1 2 32.4 41.4 55.6 18.2 71.4 https://hosted.dcd.share~
Inside the string, i is just a regular character, and XPath doesn’t know it: it has no connection to the variables in your R session.
However, if you want to select all elements with a given XPath, you don’t need the index at all. That is, the following XPath expression works (I’ve simply removed the [i] part):
/html/body/div[1]/div[6]/div/div/div/div[4]/table/tbody/tr/td[1]/a
Here’s the corresponding ‘rvest’ code. Note that it uses no loop:
library(rvest)
link = "https://hosted.dcd.shared.geniussports.com/fubb/es/competition/34409/team/2442/statistics"
aguada = read_html(link)
jugador = aguada %>%
html_nodes(xpath = "/html/body/div[1]/div[6]/div/div/div/div[4]/table/tbody/tr/td[1]/a/#href")
Or, alternatively:
jugador = aguada %>%
html_nodes(xpath = "/html/body/div[1]/div[6]/div/div/div/div[4]/table/tbody/tr/td[1]/a") %>%
html_attr("href")
Both return a vector of hyperrefs. The first solution has a slightly different return type (xml_nodeset) but for most purposes they will be similar.

making a loop creating new vectors in R

I have a dataset of 70 patients. At 6 different datapoints 2 laboratory values were obtained that probably correlate with each other. Here you can see some extracted data
id w2_crp w2_alb w6_crp w6_alb w10_crp w10_alb
001 1.2 35 1.1 38 0.5 39
002 10 27 0.5 42.5 0.5 40
003 2.4 30 1.7 30 1.2 32
004 0.5 37.4 0.7 38.2 0.5 35.5
For each patient I want to plot crp values on x-axis and albumin values on y axis at corresponding timepoints.
I made these vectors for 10 first IDs:
vec1 <- pull(df, w2_crp)
vec2 <- pull(df, w2_alb)
...
crp1 <- (c(first(vec1)), (first(vec3)), (first(vec5)))
and similar vectors for albumin and plotted them normally with
plot_ly(df, x ~crp1, y ~alb1, type = "scatter", mode = "lines")
but this is obviously very tedious. Do you have any ideas how to automize creating vectors and plotting them against each other with a for loop? I tried but constantly got errors... I would be grateful for your help!
I assume that the number in the column name is the timepoint. If so, you could do this:
library(tidyverse)
#example data
dat <- read_table("id w2_crp w2_alb w6_crp w6_alb w10_crp w10_alb
001 1.2 35 1.1 38 0.5 39
002 10 27 0.5 42.5 0.5 40
003 2.4 30 1.7 30 1.2 32
004 0.5 37.4 0.7 38.2 0.5 35.5")
#make each timepoint a row and each group a column
new_dat <- dat |>
pivot_longer(-id, names_pattern="\\w(\\d+)_(\\w+)",
names_to = c("time", "type")) |>
pivot_wider(names_from = type, values_from = value)
new_dat
#> # A tibble: 12 x 4
#> id time crp alb
#> <chr> <chr> <dbl> <dbl>
#> 1 001 2 1.2 35
#> 2 001 6 1.1 38
#> 3 001 10 0.5 39
#> 4 002 2 10 27
#> 5 002 6 0.5 42.5
#> 6 002 10 0.5 40
#> 7 003 2 2.4 30
#> 8 003 6 1.7 30
#> 9 003 10 1.2 32
#> 10 004 2 0.5 37.4
#> 11 004 6 0.7 38.2
#> 12 004 10 0.5 35.5
#plot data
new_dat|>
ggplot(aes(crp, alb, color = time))+
geom_point()+
facet_wrap(~id, scales = "free")

Summarizing using function requiring multiple parameters in R

I'm trying to get the area under the curve of some data for each run of a set of simulation runs. My data is of the form:
run year data1 data2 data3
--- ---- ----- ----- -----
1 2001 2.3 45.6 30.2
1 2002 2.4 35.4 23.4
1 2003 2.6 45.6 23.6
2 2001 2.3 45.6 30.2
2 2002 2.4 35.4 23.4
2 2003 2.6 45.6 23.6
3 2001 ... and so on
So, I'd like to get the area under the curve for each data trace for run 1, run 2, ... where the x axis is always the year column and the y axis is each data column. So, as output I want something like:
run Data1_auc Data2_auc Data3_auc
--- --------- --------- ---------
1 4.5 6.7 27.5
2 3.4 6.8 35.4
3 4.5 7.8 45.6
(Theses are not actual areas for the data above)
I want to use the pracma package 'trapz' function to compute the area which takes x and y values: trapz(x, y) where x=year in my case and y=Data column.
I've tried
dataCols <- colnames(myData %>% select(-c("run","year"))
myData <- group_by(run) %>% summarize_at(vars(dataCols), list(auc = trapz(year,.)))
but I can't get it to work without error. I've tried different variations on this, but can't seem it get it right.
Is this possible? If so, how do I do it?
library(dplyr)
library(pracma)
set.seed(1)
df <- tibble(
run = rep(1:3, each = 3),
year = rep(2001:2003, 3),
data1 = runif(9, 2, 3),
data2 = runif(9, 30, 50),
data3 = runif(9, 20, 40)
)
df
#> # A tibble: 9 x 5
#> run year data1 data2 data3
#> <int> <int> <dbl> <dbl> <dbl>
#> 1 1 2001 2.27 31.2 27.6
#> 2 1 2002 2.37 34.1 35.5
#> 3 1 2003 2.57 33.5 38.7
#> 4 2 2001 2.91 43.7 24.2
#> 5 2 2002 2.20 37.7 33.0
#> 6 2 2003 2.90 45.4 22.5
#> 7 3 2001 2.94 40.0 25.3
#> 8 3 2002 2.66 44.4 27.7
#> 9 3 2003 2.63 49.8 20.3
df %>%
group_by(run) %>%
summarise_at(vars(starts_with("data")), list(auc = ~trapz(year, .)))
#> # A tibble: 3 x 4
#> run data1_auc data2_auc data3_auc
#> <int> <dbl> <dbl> <dbl>
#> 1 1 4.79 66.5 68.7
#> 2 2 5.10 82.3 56.4
#> 3 3 5.45 89.2 50.5

Replace all NAs with -1 in r with dplyr

I'm currently working with the tidyverse in R. After using mice to impute NAs some of the columns still have NAs due to the fact that they are poorly populated to begin with (I believe). As a final check I want to replace all of the remaining NAs with -1. It usually just happens in a single column depending on the dataset. Long story short I'm doing the same process on multiple locations and sometimes Col1 is populated wonderfully in region A, but badly in region B.
Currently I'm doing the following.
Clean.df <- df %>% mutate(
coalesce(Col1 ,-1),
coalesce(Col2, -1),
....)
And I'm doing that for 31 columns which makes me think there must be an easier way. I attempted read the coalesce documentation and tried to replace it with the name of the data frame, no luck.
Thanks for the insight.
Since you didn't provide any data, I am using a sample data frame to show how every NA in a data frame can be replaced with a given value (-1):
library(tidyverse)
# creating example dataset
example_df <- ggplot2::msleep
# looking at NAs
example_df
#> # A tibble: 83 x 11
#> name genus vore order conservation sleep_total sleep_rem sleep_cycle
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Chee~ Acin~ carni Carn~ lc 12.1 NA NA
#> 2 Owl ~ Aotus omni Prim~ <NA> 17 1.8 NA
#> 3 Moun~ Aplo~ herbi Rode~ nt 14.4 2.4 NA
#> 4 Grea~ Blar~ omni Sori~ lc 14.9 2.3 0.133
#> 5 Cow Bos herbi Arti~ domesticated 4 0.7 0.667
#> 6 Thre~ Brad~ herbi Pilo~ <NA> 14.4 2.2 0.767
#> 7 Nort~ Call~ carni Carn~ vu 8.7 1.4 0.383
#> 8 Vesp~ Calo~ <NA> Rode~ <NA> 7 NA NA
#> 9 Dog Canis carni Carn~ domesticated 10.1 2.9 0.333
#> 10 Roe ~ Capr~ herbi Arti~ lc 3 NA NA
#> # ... with 73 more rows, and 3 more variables: awake <dbl>, brainwt <dbl>,
#> # bodywt <dbl>
# replacing NAs with -1
purrr::map_dfr(.x = example_df,
.f = ~ tidyr::replace_na(data = ., -1))
#> # A tibble: 83 x 11
#> name genus vore order conservation sleep_total sleep_rem sleep_cycle
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Chee~ Acin~ carni Carn~ lc 12.1 -1 -1
#> 2 Owl ~ Aotus omni Prim~ -1 17 1.8 -1
#> 3 Moun~ Aplo~ herbi Rode~ nt 14.4 2.4 -1
#> 4 Grea~ Blar~ omni Sori~ lc 14.9 2.3 0.133
#> 5 Cow Bos herbi Arti~ domesticated 4 0.7 0.667
#> 6 Thre~ Brad~ herbi Pilo~ -1 14.4 2.2 0.767
#> 7 Nort~ Call~ carni Carn~ vu 8.7 1.4 0.383
#> 8 Vesp~ Calo~ -1 Rode~ -1 7 -1 -1
#> 9 Dog Canis carni Carn~ domesticated 10.1 2.9 0.333
#> 10 Roe ~ Capr~ herbi Arti~ lc 3 -1 -1
#> # ... with 73 more rows, and 3 more variables: awake <dbl>, brainwt <dbl>,
#> # bodywt <dbl>
Created on 2018-10-10 by the reprex package (v0.2.1)
An alternative to Indrajeet's answer that is pure dplyr. Using Indrajeet's recommendation of ggplot2::msleep:
library(dplyr)
ggplot2::msleep %>%
mutate_at(vars(sleep_rem, sleep_cycle), ~ if_else(is.na(.), -1, .))
# # A tibble: 83 x 11
# name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
# <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Chee~ Acin~ carni Carn~ lc 12.1 -1 -1 11.9
# 2 Owl ~ Aotus omni Prim~ <NA> 17 1.8 -1 7
# 3 Moun~ Aplo~ herbi Rode~ nt 14.4 2.4 -1 9.6
# 4 Grea~ Blar~ omni Sori~ lc 14.9 2.3 0.133 9.1
# 5 Cow Bos herbi Arti~ domesticated 4 0.7 0.667 20
# 6 Thre~ Brad~ herbi Pilo~ <NA> 14.4 2.2 0.767 9.6
# 7 Nort~ Call~ carni Carn~ vu 8.7 1.4 0.383 15.3
# 8 Vesp~ Calo~ <NA> Rode~ <NA> 7 -1 -1 17
# 9 Dog Canis carni Carn~ domesticated 10.1 2.9 0.333 13.9
# 10 Roe ~ Capr~ herbi Arti~ lc 3 -1 -1 21
# # ... with 73 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>
If you want the nuclear option over all columns (numeric and character), then use:
ggplot2::msleep %>%
mutate_all(~ ifelse(is.na(.), -1, .))
# # A tibble: 83 x 11
# name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
# <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Chee~ Acin~ carni Carn~ lc 12.1 -1 -1 11.9
# 2 Owl ~ Aotus omni Prim~ -1 17 1.8 -1 7
# 3 Moun~ Aplo~ herbi Rode~ nt 14.4 2.4 -1 9.6
# 4 Grea~ Blar~ omni Sori~ lc 14.9 2.3 0.133 9.1
# 5 Cow Bos herbi Arti~ domesticated 4 0.7 0.667 20
# 6 Thre~ Brad~ herbi Pilo~ -1 14.4 2.2 0.767 9.6
# 7 Nort~ Call~ carni Carn~ vu 8.7 1.4 0.383 15.3
# 8 Vesp~ Calo~ -1 Rode~ -1 7 -1 -1 17
# 9 Dog Canis carni Carn~ domesticated 10.1 2.9 0.333 13.9
# 10 Roe ~ Capr~ herbi Arti~ lc 3 -1 -1 21
# # ... with 73 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>
Note that I'm no longer using dplyr::if_else, since the function needs to be versatile with (or ignorant of) the different types. Since base::ifelse will happily/silently(/sloppily?) convert, we're good.

Selecting rows with rowSums and mutate with dplyr error

data:
head(well_being_df2)
# A tibble: 6 x 70
Age Gender EmploymentStatus PWI1 PWI2 PWI3 PWI4 PWI5 PWI6 PWI7 Personality1 Personality2 Personality3
<dbl> <dbl+l> <dbl+lbl> <dbl+> <dbl+> <dbl+> <dbl+> <dbl> <dbl> <dbl> <dbl+lbl> <dbl+lbl>
I am selecting a subset of columns and trying to mutate them. I have played
around with the solution provided here but I am getting various errors. I am trying to select the PWI columns, then mutate with rowSums to a new variable called PWI_Index.
This works:
rowSums(select(well_being_df2, contains("PWI")))
[1] 50 32 48 32 58 52 41 51 49 37 50 53 58 47....
[38] 58 60 63 60 63 56 43 30 45 53 45 44 57 55....
[75] 50 55 57 58 57 58 58 58 62 62 44 59 58....
But then when I try to mutate:
mutate(well_being_df2, x = rowSums(select(well_being_df2,
contains("PWI"))))
Which outputs/selects the entire set of columns not the "PWI" columns. Example:
# A tibble: 169 x 71
Age Gender EmploymentStatus PWI1 PWI2 PWI3 PWI4 PWI5 PWI6 PWI7 Personality1 Personality2 Personality3
<dbl> <dbl+l> <dbl+lbl> <dbl+> <dbl+> <dbl+> <dbl> <dbl> <dbl> <dbl> <dbl+lbl> <dbl+lbl> <dbl+lbl>
1 22 2 3 8 8 6 8 8 6 6 1 1 1
2 20 2 1 4 6 1 8 8 4 1 4 5 4
It selects the entire dataframe instead of the selected rowSums of "PWI". Using [.4:10] doesnt work either. Any other solution and I am getting the following error:
select(well_being_df2[.4:10]) %>%
mutate(PWI_Index = rowSums(.)) %>% left_join(well_being_df2)
Error: Column indexes must be integer, not 0.11, 1.11,...
Plus working through previous examples with:
well_being_df2 %>%
mutate(x = rowSums(select(., contains("PWI")))) %>%
head()
And it takes the entire set of columns like before.
I'm not sure I understand (or can reproduce) your issue.
Here is an example using the iris data that works just fine.
iris %>%
mutate(x = rowSums(select(., contains("Width")))) %>%
head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species x
#1 5.1 3.5 1.4 0.2 setosa 3.7
#2 4.9 3.0 1.4 0.2 setosa 3.2
#3 4.7 3.2 1.3 0.2 setosa 3.4
#4 4.6 3.1 1.5 0.2 setosa 3.3
#5 5.0 3.6 1.4 0.2 setosa 3.8
#6 5.4 3.9 1.7 0.4 setosa 4.3
As you can see x is the sum of columns Sepal.Width and Petal.Width, and is the same as
rowSums(select(iris, contains("Width"))) %>% head()
#[1] 3.7 3.2 3.4 3.3 3.8 4.3

Resources