Separating values into existing column in R - r

I'm tidying some data that I read into R from a PDF using tabulizer. Unfortunately some cells haven't been read properly. In column 9 (Split 5 at 37.1km) rows 3 and 4 contain information that should have ended up in column 10 (Final Time).
How do I separate that column (9) just for these rows and paste the necessary data into an already existing column (10)?
I know how to use tidyr::separate function but can't figure out how (an if) to apply it here. Any help and guidance will be appreciated.
structure(list(Rank = c("23", "24", "25", "26"), `Race Number` = c("13",
"11", "29", "30"), Name = c("FOSS Tobias S.", "McNULTY Brandon",
"BENNETT George", "KUKRLE Michael"), `NOC Code` = c("NOR", "USA",
"NZL", "CZE"), `Split 1 at 9.7km` = c("13:47.65(22)", "13:28.23(15)",
"14:05.46(30)", "14:05.81(32)"), `Split 2 at 15.0km` = c("19:21.16(22)",
"19:04.80(18)", "19:47.53(31)", "19:48.77(32)"), `Split 3 at 22.1km` = c("29:17.44(24)",
"29:01.94(20)", "29:58.88(28)", "29:58.09(27)"), `Split 4 at 31.8km` = c("44:06.82(24)",
"43:51.67(23)", "44:40.28(25)", "44:42.74(26)"), `Split 5 at 37.1km` = c("49:49.65(24)",
"49:40.49(23)", "50:21.82(25)1:00:28.39 (25)", "50:30.02(26)1:00:41.55 (26)"
), `Final Time` = c("59:51.68 (23)", "59:57.73 (24)", "", ""),
`Time Behind` = c("+4:47.49", "+4:53.54", "+5:24.20", "+5:37.36"
), `Average Speed` = c("44.302", "44.228", "43.854", "43.696"
)), class = "data.frame", row.names = c(NA, -4L))

My answer is not really fancy, but it does the job for any number in the final time column. It works as long as there are always numbers in brackets at the end.
# dummy df
df <- data.frame("split" = c("49:49.65(24)", "49:40.49(23)", "50:21.82(25)1:00:28.39 (25)", "50:30.02(26)1:00:41.55 (26)"),
"final" = c("59:51.68 (23)", "59:57.73 (24)", "", ""))
# combining & splitting strings
merge_strings <- paste0(df$split, df$final)
split_strings <- strsplit(merge_strings, ")")
df$split <- paste0(unlist(lapply(split_strings, "[[", 1)),")")
df$final <- paste0(unlist(lapply(split_strings, "[[", 2)),")")
This gives:
split final
1 49:49.65(24) 59:51.68 (23)
2 49:40.49(23) 59:57.73 (24)
3 50:21.82(25) 1:00:28.39 (25)
4 50:30.02(26) 1:00:41.55 (26)

Calling df to your dataframe:
library(tidyr)
library(dplyr)
df %>%
separate(`Split 5 at 37.1km`, into = c("Split 5 at 37.1km","aux"), sep = "\\)") %>%
mutate(`Final Time` = coalesce(if_else(`Final Time`!="",`Final Time`, NA_character_), paste0(aux, ")")),
aux = NULL,
`Split 5 at 37.1km` = paste0(`Split 5 at 37.1km`, ")"))
Rank Race Number Name NOC Code Split 1 at 9.7km Split 2 at 15.0km Split 3 at 22.1km Split 4 at 31.8km Split 5 at 37.1km Final Time
1 23 13 FOSS Tobias S. NOR 13:47.65(22) 19:21.16(22) 29:17.44(24) 44:06.82(24) 49:49.65(24) 59:51.68 (23)
2 24 11 McNULTY Brandon USA 13:28.23(15) 19:04.80(18) 29:01.94(20) 43:51.67(23) 49:40.49(23) 59:57.73 (24)
3 25 29 BENNETT George NZL 14:05.46(30) 19:47.53(31) 29:58.88(28) 44:40.28(25) 50:21.82(25) 1:00:28.39 (25)
4 26 30 KUKRLE Michael CZE 14:05.81(32) 19:48.77(32) 29:58.09(27) 44:42.74(26) 50:30.02(26) 1:00:41.55 (26)
Time Behind Average Speed
1 +4:47.49 44.302
2 +4:53.54 44.228
3 +5:24.20 43.854
4 +5:37.36 43.696

You could use dplyr and stringr:
library(dplyr)
library(stringr)
data %>%
mutate(`Final Time` = ifelse(`Final Time` == "", str_remove(`Split 5 at 37.1km`, "\\d+:\\d+\\.\\d+\\(\\d+\\)"), `Final Time`),
`Split 5 at 37.1km` = str_extract(`Split 5 at 37.1km`, "\\d+:\\d+\\.\\d+\\(\\d+\\)"))
which returns
Rank Race Number Name NOC Code Split 1 at 9.7km Split 2 at 15.0km Split 3 at 22.1km Split 4 at 31.8km
1 23 13 FOSS Tobias S. NOR 13:47.65(22) 19:21.16(22) 29:17.44(24) 44:06.82(24)
2 24 11 McNULTY Brandon USA 13:28.23(15) 19:04.80(18) 29:01.94(20) 43:51.67(23)
3 25 29 BENNETT George NZL 14:05.46(30) 19:47.53(31) 29:58.88(28) 44:40.28(25)
4 26 30 KUKRLE Michael CZE 14:05.81(32) 19:48.77(32) 29:58.09(27) 44:42.74(26)
Split 5 at 37.1km Final Time Time Behind Average Speed
1 49:49.65(24) 59:51.68 (23) +4:47.49 44.302
2 49:40.49(23) 59:57.73 (24) +4:53.54 44.228
3 50:21.82(25) 1:00:28.39 (25) +5:24.20 43.854
4 50:30.02(26) 1:00:41.55 (26) +5:37.36 43.696

I like to use regex and stringr. Whilst theres some suboptimal code here the key step is with str_extract(). Using this we can select the two substrings we want, that of the first time and that of the second time. If either time is missing then we will have a missing value. So we can then fill in the columns based on where missingness occurs.
The regex string is as follows^((\\d+:)?\\d{2}:\\d{2}.\\d{2}\\(\\d+\\))\\.?+((\\d+:)?\\d{2}:\\d{2}.\\d{2} \\(\\d+\\))$. Here we have 4 capture groups, the first and third group capture the two whole times respectively. the second and fourth select the optional groups containing the hour (this ensures that times over an hour are completely captured. Additionally we check for an optional space.
My code is as follows:
library(tidyverse)
data <- structure(list(Rank = c("23", "24", "25", "26"), `Race Number` = c("13",
"11", "29", "30"), Name = c("FOSS Tobias S.", "McNULTY Brandon",
"BENNETT George", "KUKRLE Michael"), `NOC Code` = c("NOR", "USA",
"NZL", "CZE"), `Split 1 at 9.7km` = c("13:47.65(22)", "13:28.23(15)",
"14:05.46(30)", "14:05.81(32)"), `Split 2 at 15.0km` = c("19:21.16(22)",
"19:04.80(18)", "19:47.53(31)", "19:48.77(32)"), `Split 3 at 22.1km` = c("29:17.44(24)",
"29:01.94(20)", "29:58.88(28)", "29:58.09(27)"), `Split 4 at 31.8km` = c("44:06.82(24)",
"43:51.67(23)", "44:40.28(25)", "44:42.74(26)"), `Split 5 at 37.1km` = c("49:49.65(24)",
"49:40.49(23)", "50:21.82(25)1:00:28.39 (25)", "50:30.02(26)1:00:41.55 (26)"
), `Final Time` = c("59:51.68 (23)", "59:57.73 (24)", "", ""),
`Time Behind` = c("+4:47.49", "+4:53.54", "+5:24.20", "+5:37.36"
), `Average Speed` = c("44.302", "44.228", "43.854", "43.696"
)), class = "data.frame", row.names = c(NA, -4L))
# Take data and use a matching string to the regex pattern
data |>
mutate(match = map(`Split 5 at 37.1km`, ~unlist(str_match(., "^((\\d+:)?\\d{2}:\\d{2}.\\d{2}\\(\\d+\\))((\\d+:)?\\d{2}:\\d{2}.\\d{2} ?\\(\\d+\\))$")))) |>
# Grab the strings that match the whole first and second/final times
mutate(match1 = map(match, ~.[[2]]), match2 = map(match, ~.[[4]]), .keep = "unused") |>
# Check where the NAs are and put into the dataframe accordingly
mutate(`Split 5 at 37.1km`= ifelse(is.na(match1), `Split 5 at 37.1km`, match1),
`Final Time` = ifelse(is.na(match2), `Final Time`, match2), .keep = "unused")
#> Rank Race Number Name NOC Code Split 1 at 9.7km Split 2 at 15.0km
#> 1 23 13 FOSS Tobias S. NOR 13:47.65(22) 19:21.16(22)
#> 2 24 11 McNULTY Brandon USA 13:28.23(15) 19:04.80(18)
#> 3 25 29 BENNETT George NZL 14:05.46(30) 19:47.53(31)
#> 4 26 30 KUKRLE Michael CZE 14:05.81(32) 19:48.77(32)
#> Split 3 at 22.1km Split 4 at 31.8km Split 5 at 37.1km Final Time
#> 1 29:17.44(24) 44:06.82(24) 49:49.65(24) 59:51.68 (23)
#> 2 29:01.94(20) 43:51.67(23) 49:40.49(23) 59:57.73 (24)
#> 3 29:58.88(28) 44:40.28(25) 50:21.82(25) 1:00:28.39 (25)
#> 4 29:58.09(27) 44:42.74(26) 50:30.02(26) 1:00:41.55 (26)
#> Time Behind Average Speed
#> 1 +4:47.49 44.302
#> 2 +4:53.54 44.228
#> 3 +5:24.20 43.854
#> 4 +5:37.36 43.696
Created on 2021-07-28 by the reprex package (v2.0.0)
Note in the above I use the base pipe from R 4.1 onwards |> this can be replaced simply with the magrittr pipe %>% if you are on an earlier R version.

Related

Is there a way to write this in a single Dplyr statement / more efficiently?

My (simplified) dataset consists of donor occupation and contribution amounts. I'm trying to determine what the average contribution amount by occupation is (note: donor occupations are often repeated in the column, so I use that as a grouping variable). Right now, I'm using two dplyr statements -- one to get a sum of contributions amount by each occupation and another to get a count of the number of donations from that specific occupation. I am then binding the dataframes with cbind and creating a new column with mutate, where I can divide the sum by the count.
Data example:
contributor_occupation contribution_receipt_amount
1 LISTING COORDINATOR 5.00
2 NOT EMPLOYED 2.70
3 TEACHER 2.70
4 ELECTRICAL DESIGNER 2.00
5 STUDENT 50.00
6 SOFTWARE ENGINEER 10.00
7 TRUCK DRIVER 2.70
8 NOT EMPLOYED 50.00
9 CONTRACTOR 5.00
10 ENGINEER 6.00
11 FARMER 2.70
12 ARTIST 50.00
13 CIRCUS ARTIST 100.00
14 CIRCUS ARTIST 27.00
15 INFORMATION SECURITY ANALYST 2.00
16 LAWYER 5.00
occupation2 <- b %>%
select(contributor_occupation, contribution_receipt_amount) %>%
group_by(contributor_occupation) %>%
summarise(total = sum(contribution_receipt_amount)) %>%
arrange(desc(contributor_occupation))
occupation3 <- b %>%
select(contributor_occupation) %>%
count(contributor_occupation) %>%
group_by(contributor_occupation) %>%
arrange(desc(contributor_occupation))
final_occ <- cbind(occupation2, occupation3[, 2]) # remove duplicate column
occ_avg <- final_occ %>%
select(contributor_occupation:n) %>%
mutate("Average Donation" = total/n) %>%
rename("Number of Donations"= n, "Occupation" = contributor_occupation, "Total Donated" = total)
occ_avg %>%
arrange(desc(`Average Donation`))
This gives me the result I want but seems like a very cumbersome process. It seems I get the same result by using the following code; however, I am confused as to why it works:
avg_donation_occupation <- b %>%
select(contributor_occupation, contribution_receipt_amount) %>%
group_by(contributor_occupation) %>%
summarize(avg_donation_by_occupation = sum(contribution_receipt_amount)/n()) %>%
arrange(desc(avg_donation_by_occupation))
Wouldn't dividing by n divide by the number of rows (i.e., number of occupations) as opposed to the number of people in that occupation (which is what I used the count function for previously)?
Thanks for the help clearing up any confusion!
We may need both sum and mean along with n() which gives the number of observations in the grouped data. According to ?context
n() gives the current group size.
and `?mean
mean - Generic function for the (trimmed) arithmetic mean.
which is basically the sum of observations divided by the number of observations
library(dplyr)
out <- b %>%
group_by(Occupation = contributor_occupation) %>%
summarise(`Total Donated` = sum(contribution_receipt_amount),
`Number of Donations` = n(),
`Average Donation` = mean(contribution_receipt_amount),
#or
#`Average Donation` = `Total Donated`/`Number of Donations`,
.groups = 'drop') %>%
arrange(desc(`Average Donation`))
-output
out
# A tibble: 14 × 4
Occupation `Total Donated` `Number of Donations` `Average Donation`
<chr> <dbl> <int> <dbl>
1 CIRCUS ARTIST 127 2 63.5
2 ARTIST 50 1 50
3 STUDENT 50 1 50
4 NOT EMPLOYED 52.7 2 26.4
5 SOFTWARE ENGINEER 10 1 10
6 ENGINEER 6 1 6
7 CONTRACTOR 5 1 5
8 LAWYER 5 1 5
9 LISTING COORDINATOR 5 1 5
10 FARMER 2.7 1 2.7
11 TEACHER 2.7 1 2.7
12 TRUCK DRIVER 2.7 1 2.7
13 ELECTRICAL DESIGNER 2 1 2
14 INFORMATION SECURITY ANALYST 2 1 2
data
b <- structure(list(contributor_occupation = c("LISTING COORDINATOR",
"NOT EMPLOYED", "TEACHER", "ELECTRICAL DESIGNER", "STUDENT",
"SOFTWARE ENGINEER", "TRUCK DRIVER", "NOT EMPLOYED", "CONTRACTOR",
"ENGINEER", "FARMER", "ARTIST", "CIRCUS ARTIST", "CIRCUS ARTIST",
"INFORMATION SECURITY ANALYST", "LAWYER"), contribution_receipt_amount = c(5,
2.7, 2.7, 2, 50, 10, 2.7, 50, 5, 6, 2.7, 50, 100, 27, 2, 5)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16"))

R: How do I substitute some old strings in a dataframe with new strings?

I have a dataframe of the list of student:
No name class
1 Isaac Physics
2 Napoleon History
3 Sigmund Psychology
4 Ludwig Music
5 LeBron Sport
6 Jeff Economy
I want to change the name of some students, the new name is in the second dataframe:
No Old New
1 Isaac Newton
2 Sigmund Freud
3 LeBron James
So the student data will look like this:
No name class
1 Newton Physics
2 Napoleon History
3 Freud Psychology
4 Ludwig Music
5 James Sport
6 Jeff Economy
I can use substitute, but it takes too much time. I want to do it quickly by making use of the second dataframe which contains new name database. How can I do that?
Using tidyverse:
library(tidyverse)
df$name <- recode(df$name, !!!deframe(new[c("Old","New")]))
Output
No name class
1 1 Newton Physics
2 2 Napoleon History
3 3 Freud Psychology
4 4 Ludwig Music
5 5 James Sport
6 6 Jeff Economy
How it works
deframe will turn a two column dataframe into a named vector.
!!! is special syntax for recode to apply a named vector to df$name.
Note: tidyverse is a collection of very useful packages for data science/manipulation. This loads several packages. deframe is from the library tibble and recode is from dplyr.
Data
df <- structure(list(No = 1:6, name = c("Newton", "Napoleon", "Freud",
"Ludwig", "James", "Jeff"), class = c("Physics", "History", "Psychology",
"Music", "Sport", "Economy")), row.names = c(NA, -6L), class = "data.frame")
new <- structure(list(No = 1:3, Old = c("Isaac", "Sigmund", "LeBron"
), New = c("Newton", "Freud", "James")), class = "data.frame", row.names = c(NA,
-3L))
We can use a join on the 'name' and the 'Old' column from the first and second dataset and assign the 'New' from the second to the 'name' column
library(data.table)
setDT(df1)[df2, name := New, on = .(name = Old)]
-output
df1
No name class
1: 1 Newton Physics
2: 2 Napoleon History
3: 3 Freud Psychology
4: 4 Ludwig Music
5: 5 James Sport
6: 6 Jeff Economy
NOTE: Using data.table, we can do this much efficiently
Or use coalesce
library(dplyr)
df1$name <- coalesce(setNames(df2$New, df2$Old)[df1$name], df1$name)
data
df1 <- structure(list(No = 1:6, name = c("Isaac", "Napoleon", "Sigmund",
"Ludwig", "LeBron", "Jeff"), class = c("Physics", "History",
"Psychology", "Music", "Sport", "Economy")), class = "data.frame",
row.names = c(NA,
-6L))
df2 <- structure(list(No = 1:3, Old = c("Isaac", "Sigmund", "LeBron"
), New = c("Newton", "Freud", "James")), class = "data.frame", row.names = c(NA,
-3L))

Look up value based on partial string match in R

I have a table (table 1) with a bunch of cities (punctuation, capitalization and spaces have been removed).
I want to scan through the 2nd table (table 2) and pull out any record (the first) that exactly matches or contains the string anywhere within it.
# Table 1
city1
1 waterloo
2 kitchener
3 toronto
4 guelph
5 ottawa
# Table 2
city2
1 waterlookitchener
2 toronto
3 hamilton
4 cityofottawa
This would give the 3rd table seen below.
# Table 3
city1 city2
1 waterloo waterlookitchener
2 kitchener waterlookitchener
3 toronto toronto
4 guelph <N/A>
5 ottawa cityofottawa
I believe there are more sophisticated ways of completing your task, but here is a simple approach using tidyverse.
df <- read_table2("city1
waterloo
kitchener
toronto
guelph
ottawa")
df2 <- read_table2("city2
waterlookitchener
toronto
hamilton
cityofottawa")
df3 <- df$city1 %>%
lapply(grep, df2$city2, value=TRUE) %>%
lapply(function(x) if(identical(x, character(0))) NA_character_ else x) %>%
unlist
df3 <- cbind(df, df3)
Search for every element of df$city1 in df2$city2 (partial or complete match) and return this element of df2$city2. See ?grep for more information.
Replace the character(0) (element not found) with NA. See How to convert character(0) to NA in a list with R language? for details.
Convert list into a vector (unlist).
Attach result to list of cities (cbind).
You can also try using fuzzyjoin. In this case, you can use the function stri_detect_fixed from stringi package to identify at least one occurrence of a fixed pattern in a string.
library(fuzzyjoin)
library(stringi)
library(dplyr)
fuzzy_right_join(table2, table1, by = c("city2" = "city1"), match_fun = stri_detect_fixed) %>%
select(city1, city2)
Output
city1 city2
1 waterloo waterlookitchener
2 kitchener waterlookitchener
3 toronto toronto
4 guelph <NA>
5 ottawa cityofottawa
Data
table1 <- structure(list(city1 = c("waterloo", "kitchener", "toronto",
"guelph", "ottawa")), class = "data.frame", row.names = c(NA,
-5L))
table2 <- structure(list(city2 = c("waterlookitchener", "toronto", "hamilton",
"cityofottawa")), class = "data.frame", row.names = c(NA, -4L
))

How to use gather in R instead of unite

I'm trying to use gather function, instead of unite to get the output. is it possible to do so?
This is my data:
Description Temp
<fctr> <dbl>
1 location1:48:2018-10-23 -0.9381736
2 location2:83:2018-01-05 1.1714643
3 location3:73:2018-11-05 -0.7064954
4 location4:27:2018-07-26 0.4420571
5 location5:33:2018-02-03 0.9060360
6 location6:88:2018-04-27 1.9407284
I've used to separate to separate the data by the following command;
library(tidyr)
sepData <- separate(data, Description, c("Location", "ID", "Date"), sep = ":")
Location ID Date Temp
<chr> <chr> <chr> <dbl>
1 location1 48 2018-10-23 -0.9381736
2 location2 83 2018-01-05 1.1714643
3 location3 73 2018-11-05 -0.7064954
4 location4 27 2018-07-26 0.4420571
5 location5 33 2018-02-03 0.9060360
6 location6 88 2018-04-27 1.9407284
Now i want to get the data to its original form, using gather.
please help if possible.
If we check the ?separate, it also has an argument remove which is by default TRUE. Changing it to FALSE, will also return the original column without removing it from the dataset
separate(data, Description, c("Location", "ID", "Date"), sep = ":", remove = FALSE)
# Description Location ID Date Temp
#1 location1:48:2018-10-23 location1 48 2018-10-23 -0.9381736
#2 location2:83:2018-01-05 location2 83 2018-01-05 1.1714643
#3 location3:73:2018-11-05 location3 73 2018-11-05 -0.7064954
#4 location4:27:2018-07-26 location4 27 2018-07-26 0.4420571
#5 location5:33:2018-02-03 location5 33 2018-02-03 0.9060360
#6 location6:88:2018-04-27 location6 88 2018-04-27 1.9407284
data
data <- structure(list(Description = c("location1:48:2018-10-23",
"location2:83:2018-01-05",
"location3:73:2018-11-05", "location4:27:2018-07-26", "location5:33:2018-02-03",
"location6:88:2018-04-27"), Temp = c(-0.9381736, 1.1714643, -0.7064954,
0.4420571, 0.906036, 1.9407284)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
You can use paste
togetherDate <- sepData %>%
mutate(Description = as.factor(paste(Location, ID, Date, sep = ':'))) %>%
select(-Location, -ID, -Date) %>% select(Description, Temp)
Should return you the same data.frame as earlier.

Construct a new data frame by merging two rows of an existing data frame using common column values in both the row

https://www.dropbox.com/s/prqiojwzpax339z/Test123.xlsx?dl=0
The link contains an xlsx file which contains the details of a batsman batting in one sheet where runs scored in each innings by him in a test match is recorded.So the details of the rows contains identical values w.r.t some columns between two rows because in a test match a batsman gets the chance to bat in two innings so details mentioned in columns like opposition,Ground,StartDateAscending,MatchNumber,Result will be common when we compare two rows for a test match.
Question:so how can we club the data present in the rows based on this matching values and create a new data frame with merged rows.
Ex:In data shared through the link,i am taking the first two rows as a sample to tell what i want to achieve and below is the text representation of the r object of this sample data derived using r function
structure(list(Runs = c("10", "27"), Mins = c("30", "93"), BF = c("19",
"65"), X4s = c("1", "4"), X6s = c("0", "0"), SR = c("52.63",
"41.53"), Pos = c("6", "6"), Dismissal = c("bowled", "caught"
), Inns = c(2, 4), Opposition = c("v England", "v England"),
Ground = c("Lord's", "Lord's"), Start.DateAscending = structure(c(648930600,
648930600), class = c("POSIXct", "POSIXt"), tzone = ""),
Match.Number = c("Test # 1148", "Test # 1148"), Result = c("Loss",
"Loss")), .Names = c("Runs", "Mins", "BF", "X4s", "X6s",
"SR", "Pos", "Dismissal", "Inns", "Opposition", "Ground", "Start.DateAscending",
"Match.Number", "Result"), row.names = 1:2, class = "data.frame")
The data derived from the above block will be something like below:
Runs Mins BF X4s X6s SR Pos Dismissal Inns Opposition Ground
1 10 30 19 1 0 52.63 6 bowled 2 v England Lord's
2 27 93 65 4 0 41.53 6 caught 4 v England Lord's
Start.DateAscending Match.Number Result
1 1990-07-26 Test # 1148 Loss
2 1990-07-26 Test # 1148 Loss
So what i want to achieve is to sum up the runs column values based on the common column values like Match.Number,Opposition,Ground,Start.DateAscending.
I expect the values like below which will be stored in a new data frame
Runs Opposition Ground Start.DateAscending Match.Number Result
1 37 v England Lord's 1990-07-26 Test # 1148 Loss
We subset the columns of the dataset, using aggregate after conveting the 'Runs' to numeric class
colsofinterest <- names(df1)[c(1, 10:ncol(df1))]
aggregate(Runs~., df1[colsofinterest], sum)
# Opposition Ground Start.DateAscending Match.Number Result Runs
#1 v England Lord's 1990-07-26 Test # 1148 Loss 37
Or we can use tidyverse
colsofinterest2 <- names(df1)[10:ncol(df1)]
library(dplyr)
df1 %>%
group_by_(.dots = colsofinterest2) %>%
summarise(Runs = sum(Runs))
# A tibble: 1 x 6
# Groups: Opposition, Ground, Start.DateAscending, Match.Number [?]
# Opposition Ground Start.DateAscending Match.Number Result Runs
# <chr> <chr> <dttm> <chr> <chr> <int>
#1 v England Lord's 1990-07-26 Test # 1148 Loss 37

Resources