right_join and mutate does not preserve the index in R - r

I am Mapping column_data to master and if column value is present in master than it saves it Key
ex:Parent for P and Child for C
Problem is i am getting the output but output is indexed differently
DATA
column_data <- c("", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "P", "C", "C")
master <- list("Parent" = c("P"),
"Child" = c("C")
)
CODE
library(dplyr)
df <- data.frame("column" = column_data)
df <-stack(master) %>%
type.convert(as.is = TRUE) %>%
right_join(df, by = c('values' = 'column')) %>%
mutate(output = coalesce(ind, values))
This Should be the output:
structure(list(values = c("", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "P", "C", "C"), ind = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Parent",
"Child", "Child"), output = c("", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "Parent", "Child", "Child")), class = "data.frame", row.names = c(NA,
-19L))
but instead i get this as output:
structure(list(values = c("P", "C", "C", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", ""), ind = c("Parent",
"Child", "Child", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), output = c("Parent", "Child", "Child", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "")), row.names = c(NA,
-19L), class = "data.frame")

With dplyr, if you do a right_join(x, y) then the result will include a subset of the matched rows for x, then unmatched rows for y.
From R documentation on mutating joins, the value returned will be:
An object of the same type as x. The order of the rows and columns of
x is preserved as much as possible. The output has the following
properties:
For inner_join(), a subset of x rows. For left_join(), all x rows. For
right_join(), a subset of x rows, followed by unmatched y rows. For
full_join(), all x rows, followed by unmatched y rows.
That is why you have the 3 matched rows at the beginning of your resulting data.frame.
To get the desired result preserving the row order of df, try a left_join as follows:
df2 <- stack(master) %>%
type.convert(as.is = TRUE)
df %>%
left_join(df2, by = c('column' = 'values')) %>%
mutate(output = coalesce(ind, column))
Output
column ind output
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 <NA>
6 <NA>
7 <NA>
8 <NA>
9 <NA>
10 <NA>
11 <NA>
12 <NA>
13 <NA>
14 <NA>
15 <NA>
16 <NA>
17 P Parent Parent
18 C Child Child
19 C Child Child

Related

Need help merging string data from column that runs into below rows. Problem in multiple columns, leaving empty data in cells for other columns

In a nutshell - I have multiple columns in my data frame and some of the columns have string data that spill into the rows below, which means that those near-empty rows only have info for those spillover columns. I would like to merge the rows, and combine all the string data into that specific cell for that column with the spillover issue (I need to do this all in R please...). I also have this problem in different columns, and it does not happen in every row.... This is hard to explain with words, but my output below explains the problem the best. I figured that a dput output would be better than pasting a table here so that people could actually use this with code. This is a very simplified version of my data frame and the problem.
structure(list(SECTION = c(10207L, NA, 14097L, NA, NA, NA, NA,
21290L, NA, 3359L, NA, NA, NA, NA, 50903L, NA), SCHOOL = c("ACAD",
"", "ACCT", "", "", "", "", "ANSC", "", "LAW", "", "", "", "",
"XPPD", "PPD"), COURSE_CODE = c("ACAD-181", "", "ACCT-410", "",
"", "", "", "PR-463", "", "LAW-680A", "", "", "", "", "PPDE-630",
""), COURSE_TITLE = c("Disruptive Innovation", "", "Foundations of Accounting",
"", "", "", "", "Strategic Public Relations Research, Analysis",
"and Insights", "Review of Law and Social Justice Editing", "",
"", "", "", "Community Health Planning", ""), INSTRUCTOR_NAME = c("Smith, Tim",
"Bob, Scott", "Gem, Silvia", "", "", "", "", "OBrien, James",
"", "Harvey, Tony", "", "", "", "", "Sloth, Ryan", ""), ASSIGNED_ROOM = c("IYH210/211",
"", "ONLINE", "", "", "", "", "ONLINE", "", "ONLINE", "", "",
"", "", "ONLINE", ""), TOTAL_ENR = c(32L, NA, 55L, NA, NA, NA,
NA, 17L, NA, 13L, NA, NA, NA, NA, 16L, NA), COURSE_DESCRIPTION = c("Critical approaches to social and cultural changes.",
"", "Non-technical presentation of accounting for users of accounting",
"information; introduction to financial and managerial accounting.",
"Not open to students with course credits in accounting. Not",
"available for unit or course credit toward a degree in accounting",
"or business administration.", "Identification of key strategic insights.",
"", "Supervision of research and writing, and final editing of articles",
"and comments for publication in the Review of Law and Social",
"Justice. For officers of the Review. Open to law students only.",
"Graded IP to CR/D/F.", "", "The role of planning in sustaining community health.",
"")), class = "data.frame", row.names = c(NA, -16L))
I think this will work.
library(tidyverse)
X <- structure(list(SECTION = c(10207L, NA, 14097L, NA, NA, NA, NA, 21290L, NA, 3359L, NA, NA, NA, NA, 50903L, NA),
SCHOOL = c("ACAD", "", "ACCT", "", "", "", "", "ANSC", "", "LAW", "", "", "", "", "XPPD", "PPD"),
COURSE_CODE = c("ACAD-181", "", "ACCT-410", "", "", "", "", "PR-463", "", "LAW-680A", "", "", "", "", "PPDE-630", ""),
COURSE_TITLE = c("Disruptive Innovation", "", "Foundations of Accounting", "", "", "", "", "Strategic Public Relations Research, Analysis", "and Insights", "Review of Law and Social Justice Editing", "", "", "", "", "Community Health Planning", ""),
INSTRUCTOR_NAME = c("Smith, Tim", "Bob, Scott", "Gem, Silvia", "", "", "", "", "OBrien, James", "", "Harvey, Tony", "", "", "", "", "Sloth, Ryan", ""),
ASSIGNED_ROOM = c("IYH210/211", "", "ONLINE", "", "", "", "", "ONLINE", "", "ONLINE", "", "", "", "", "ONLINE", ""),
TOTAL_ENR = c(32L, NA, 55L, NA, NA, NA, NA, 17L, NA, 13L, NA, NA, NA, NA, 16L, NA),
COURSE_DESCRIPTION = c("Critical approaches to social and cultural changes.", "", "Non-technical presentation of accounting for users of accounting", "information; introduction to financial and managerial accounting.", "Not open to students with course credits in accounting. Not", "available for unit or course credit toward a degree in accounting", "or business administration.", "Identification of key strategic insights.", "", "Supervision of research and writing, and final editing of articles", "and comments for publication in the Review of Law and Social", "Justice. For officers of the Review. Open to law students only.", "Graded IP to CR/D/F.", "", "The role of planning in sustaining community health.", "")),
class = "data.frame", row.names = c(NA, -16L))
X_collapsed <- X
for(i in seq(nrow(X), 2, -1)) { # Work on the table in bottom to top so we can merge the values
if(is.na(X_collapsed[i, "SECTION"])) { # only work on rows with NA in the SECTION column
# Work on the current row and the previous row.
# Don't modify numeric columns (don't want to merge NA values).
# Use lead() to check across rows, and turn NA values into an empty string.
# Use paste() to combine the row values.
# use trim() to get rid of any excess white space produced.
X_collapsed[i-1,] <- (X_collapsed[c(i-1,i),] %>%
mutate(across(.fns = ~ ifelse(is.numeric(.x), .x, trim(paste(.x, ifelse(is.na(lead(.x)), "", lead(.x)))))))
)[1, ]
}
}
X_collapsed <- X_collapsed %>%
filter(!is.na(SECTION)) # remove rows we don't want.
X_collapsed
This also removes extra whitespace due to the use of trim(). Without it you may end up with trailing spaces.

Conditional str_remove based on data frame column

I have a dataframe (pasted below), in which I am trying to set to blank the value of one column based on the value of another column. The idea is that if X6 equals Nbre CV or if X6equals Nbre BVD then I want X6for that row to be blank.
Unfortunately using the following code the entire X6 column turns to NA or missing.
extractstack <- extractstack %>%
mutate(across(everything(), as.character) %>%
mutate(X6 = if_else(X6 == `Nbre CV`, str_remove(X6, `Nbre CV`), X6)) %>%
mutate(X6 = if_else(X6 == `Nbre CV`, str_remove(X6, `Nbre BVD`), X6)))
structure(list(X1 = c("", "", "40", "", "", "41", "", "", "42",
"", "", "43", "", "", "44", ""), X2 = c("", "", "EP. KAPALA",
"", "", "INST. MOTULE", "", "", "CABANE BABOA", "", "", "CABANE BANANGI",
"", "", "E.P.BINZI", ""), X3 = c("", "", "MOBATI-BOYELE", "",
"", "MOBATI-BOYELE", "", "", "MOBATI-BOYELE", "", "", "AVURU-GATANGA",
"", "", "AVURU-GATANGA", ""), X4 = c("", "", "BOGBASA", "", "",
"BOSOBEA", "", "", "BOSOBEA", "", "", "BANANGI", "", "", "GURUZA",
""), X5 = c("", "", "", "", "", "MOBENGE", "", "", "BABOA", "",
"", "DIFONGO", "", "", "DULIA", ""), X6 = c("", "", "BOGBASA",
"", "", "", "1", "", "", "1", "", "", "1", "", "", "1"), X7 = c("1",
"", "", "1", "", "", "4", "", "", "1", "", "", "1", "", "", "5"
), X8 = c("2", "", "", "2", "", "", "510 110", "", "", "510 111",
"", "", "510 112", "", "", "510 113"), X9 = c("510 108", "",
"", "510 109", "", "", "A - D", "", "", "A", "", "", "A", "",
"", "A - E"), page = c("4", "4", "4", "4", "5", "5", "5", "5",
"5", "5", "5", "5", "5", "5", "5", "5"), Plage = c("A - B", NA,
NA, "A - B", NA, NA, "A - D", NA, NA, "A", NA, NA, "A", NA, NA,
"A - E"), `Code SV` = c("510 108", NA, NA, "510 109", NA, NA,
"510 110", NA, NA, "510 111", NA, NA, "510 112", NA, NA, "510 113"
), `Nbre BVD` = c("2", NA, NA, "2", NA, NA, "4", NA, NA, "1",
NA, NA, "1", NA, NA, "5"), `Nbre CV` = c("1", NA, NA, "1", NA,
NA, "1", NA, NA, "1", NA, NA, "1", NA, NA, "1")), class = "data.frame", row.names = c(NA,
-16L))
That's basically Chris Ruehlemann's answer (I don't know why he removed it, I would remove this one for the original one):
library(dplyr)
extractstack %>%
mutate(across(everything(), as.character),
X6 = coalesce(ifelse(X6 == `Nbre BVD` | X6 == `Nbre CV`, "", X6), X6))
compares X6 with the columns Nbre BVD and Nbre CV. If there is matching content, X6 will be changed to an empty string "", else X6 stays unchanged. But for your given data, this code doesn't replace anything, since there are simply no matches in X6 with Nbre BVD and Nbre CV besides NA-values.

combining 2 columns using ifelse dropping a variable R

I am trying to combine the male and female columns I have created into one column. I tried using some answers I found on stack, but the second sex I queried was excluded.
Build Data Frame:
ID <- 1:10
SPAYDT <- c("", "2011-12-01", "", "2006-05-01", "", "", "", "", "", "")
SPAYDTU <- c(1, NA, NA, NA, NA, NA, NA, NA, NA, NA)
NEUTDT <- c("", "", "", "", "", "", "2013-03-01", "", "", "")
NEUTDTU <- c(NA, NA, NA, NA, NA, NA, NA, 1, NA, NA)
df <- as.data.frame(cbind(ID, SPAYDT, SPAYDTU, NEUTDT, NEUTDTU))
df
The goal is to have a column for sex, formated as a factor with 2 levels - Male and Female
It should say female if the SPAYDT or SPAYDTU have a value in them, and male if the NEUTDT or NEUTDTU have a value in them.
What I have tried:
using a nested if-else statement to build one sex column
making two columns then combining using
df$male <- ifelse(NEUTDT!="", "Male",
ifelse(NEUTDTU=1, "Male", NA))
df$female <- ifelse(SPAYDT!="", "Female",
ifelse(SPAYDTU==1, "Female", NA))
df$sex <- ifelse(!is.na(df$female), df$female, df$male)
and
df$sex <- ifelse(SPAYDT!="", "Female",
ifelse(SPAYDTU==1, "Female",
ifelse(NEUTDT!="", "Male",
ifelse(NEUTDTU=1, "Male", NA))))
However, no matter what I do, the sex column at the end only has one sex. I made sure my df was attached for use of column names as variables. I tried restarting R and running the setup code again. I just don't know why the ifelse statement is ignoring the second sex input.
Any help is greatly appreciated!
Clarifications:
In the larger dataframe I am working with I have done data clean up so that each ID only corresponds to 1 sex. Sorry about the mistake in the code.
Desired output:
ID <- 1:10
SPAYDT <- c("", "2011-12-01", "", "2006-05-01", "", "", "", "", "", "")
SPAYDTU <- c(1, NA, NA, NA, NA, NA, NA, NA, NA, NA)
NEUTDT <- c("", "", "", "", "", "", "2013-03-01", "", "", "")
NEUTDTU <- c(NA, NA, NA, NA, NA, NA, NA, 1, NA, NA)
SEX <- c("Female", "Female", NA, "Female", NA, NA, "Male", "Male", NA, NA)
df <- as.data.frame(cbind(ID, SPAYDT, SPAYDTU, NEUTDT, NEUTDTU, SEX))
df
Is this what you are after?
ID <- 1:10
SPAYDT <- c("", "2011-12-01", "", "2006-05-01", "", "", "", "", "", "")
SPAYDTU <- c(1,NA,NA,NA,NA,NA,NA,NA,NA,NA)
NEUTDT <- c("", "", "", "", "", "", "2013-03-01", "", "", "")
NEUTDTU <- c(NA,NA,NA,1,NA,NA,NA,NA,NA,NA)
df <- data.frame(ID, SPAYDT, SPAYDTU, NEUTDT, NEUTDTU)
df %>%
mutate(
sex = case_when(
NEUTDT!="" | NEUTDTU==1 ~ "Male",
SPAYDT!="" | SPAYDTU==1 ~ "Female",
TRUE ~ NA_character_))

loop through a r dataframe and pass rows as parameters to a function

I want to loop through a dataframe and pass the rows as arguments to a function to summarise the totals from a dataframe named df3.
I have tried code using a traditional for loop but there are not results.
I have looked at pmap in https://adv-r.hadley.nz/functionals.html#pmap
but the I don't see how to apply this example to my code.
Here is some data from the original data:
dput(head(df3,n=3))
structure(list(id = c("81", "83", "85"), look_work = c("yes",
"yes", "yes"), current_work = c("no", "yes", "no"), hf_l5k = c("",
"", ""), ac_l5k = c("", "", ""), hf_5_10k = c("", "1", "1"),
ac_5_10k = c("", "1", "1"), hf_11_20k = c("", "", ""), ac_11_20k = c("",
"", ""), hf_21_50k = c("", "", ""), ac_21_50k = c("", "",
""), hf_51_100k = c("", "", ""), ac_51_100k = c("", "", ""
), hf_m100k = c("", "", ""), ac_m100k = c("", "", ""), s_l1000 = c("",
"", ""), se_l1000 = c("", "", "1"), s_1001_1500 = c("", "1",
"1"), se_1001_1500 = c("", "", ""), s_2001_3000 = c("", "",
""), se_2001_3000 = c("", "1", ""), s_3001_4000 = c("", "",
""), se_3001_4000 = c("", "", ""), s_4001_5000 = c("", "",
""), se_4001_5000 = c("", "", ""), s_5001_6000 = c("", "",
""), se_5001_6000 = c("", "", ""), s_m6000 = c("", "", ""
), se_m6000 = c("", "", ""), s_n_ans = c("", "", ""), se_n_ans = c("",
"", ""), before_work = c("no", "NULL", "yes"), keen_move = c("yes",
"yes", "no"), city_size = c("village", "more than 500k inhabitants",
"more than 500k inhabitants"), gender = c("male", "female",
"female"), age = c("18 - 24 years", "18 - 24 years", "more than 50 years"
), education = c("secondary", "vocational", "secondary")), row.names = c(NA,
3L), class = "data.frame")
Here is the dataframe hf_names for the parameters:
structure(list(hf_names = c("hf_l5k", "hf_5_10k", "hf_11_20k",
"hf_21_50k", "hf_51_100k", "hf_m100k"), job = c("hf_l5k_job",
"hf_5_10k_job", "hf_11_20k_job", "hf_21_50k_job", "hf_51_100k_job",
"hf_m100k_job"), tot = c("hf_l5k_tot", "hf_5_10k_tot", "hf_11_20k_tot",
"hf_21_50k_tot", "hf_51_100k_tot", "hf_m100k_tot")), class = "data.frame", row.names = c(NA,
-6L))
Here is the code I have tried with a traditional for loop:
library(dplyr)
tot_function <- function(df, filter_tot, col_name1, col_name2) {
# filter desired columns for all jobs
filter_tot <- df %>% filter(col_name1=="1") %>%
summarise(col_name2 = n())
}
for (i in seq_along(hf_names3)) {
tot_function(df3, hf_names3$tot[i], hf_names3$hf_names[i], hf_names3$job[i])
}
The expected results would be dataframes or vectors:
hf_l5k_jobs hf_l5_10k_jobs
10 193
but nothing is generated by this code as it looks at simple functions such as trim and runif.
I don't think you need to overcomplicate this. You can take names from hf_names, subset that column from df3 and count the number of 1's in that column.
sapply(hf_names$hf_names, function(x) sum(df3[[x]] == 1))
# hf_l5k hf_5_10k hf_11_20k hf_21_50k hf_51_100k hf_m100k
# 0 2 0 0 0 0
If you prefer tidyverse you can change sapply to map.* variations
purrr::map_int(hf_names$hf_names, ~sum(df3[[.]] == 1))

Conditional means based on other columns in R with dplyr

Let's say I have the following data:
structure(list(political_spectrum = c(5L, 15L, 12L, 30L, 100L,
0L, 27L, 52L, 38L, 64L, 0L, 0L, 76L, 50L, 16L, 16L, 0L, 23L,
0L, 25L, 68L, 50L, 4L, 0L, 50L), politics_today = c("Independent",
"Strong Democrat", "Weak Democrat", "Weak Democrat", "Weak Republican",
"Strong Democrat", "Weak Democrat", "Weak Democrat", "Independent",
"Weak Democrat", "Strong Democrat", "Independent", "Weak Republican",
"Weak Democrat", "Weak Democrat", "Strong Democrat", "Strong Democrat",
"Strong Democrat", "Strong Democrat", "Strong Democrat", "Independent",
"Independent", "Strong Democrat", "Strong Democrat", "Independent"
), stranger_things_universe_mc = c("The Demagorgon", "", "",
"", "", "", "", "", "", "The Stranger Land", "The Demagorgon",
"The Upside Down", "", "", "", "", "", "The Upside Down", "The Shadowland",
"", "", "", "", "", "The Shadowland"), stranger_things_universe_answer = c("The Upside Down",
"", "", "", "", "", "", "", "", "The Upside Down", "The Upside Down",
"The Upside Down", "", "", "", "", "", "The Upside Down", "The Upside Down",
"", "", "", "", "", "The Upside Down"), stranger_things_universe_confidence = c(32L,
NA, NA, NA, NA, NA, NA, NA, NA, 67L, 94L, 89L, NA, NA, NA, NA,
NA, 51L, 10L, NA, NA, NA, NA, NA, 0L), stranger_things_universe_importance = c("Don't care at all",
"", "", "", "", "", "", "", "", "Care somewhat strongly", "Care a little",
"Care somewhat strongly", "", "", "", "", "", "Care somewhat",
"Don't care at all", "", "", "", "", "", "Don't care at all"),
tupac_mc = c("", "Biggie Smalls", "", "", "", "", "", "Biggie Smalls",
"Biggie Smalls", "", "", "Biggie Smalls", "", "", "", "",
"", "", "Biggie Smalls", "", "", "Ice Cube", "", "", ""),
tupac_answer = c("", "Biggie Smalls", "", "", "", "", "",
"Biggie Smalls", "Biggie Smalls", "", "", "Biggie Smalls",
"", "", "", "", "", "", "Biggie Smalls", "", "", "Biggie Smalls",
"", "", ""), tupac_confidence = c(NA, 70L, NA, NA, NA, NA,
NA, 71L, 76L, NA, NA, 100L, NA, NA, NA, NA, NA, NA, 100L,
NA, NA, 32L, NA, NA, NA), tupac_importance = c("", "Don't care at all",
"", "", "", "", "", "Care somewhat", "Don't care at all",
"", "", "Care strongly", "", "", "", "", "", "", "Care a little",
"", "", "Don't care at all", "", "", ""), uber_ceo_mc = c("John Zimmer",
"", "", "", "", "Travis Kalanick", "", "", "", "Travis Kalanick",
"", "", "", "", "", "", "", "John Zimmer", "Travis Kalanick",
"Travis Kalanick", "", "", "", "", ""), uber_ceo_answer = c("Travis Kalanick",
"", "", "", "", "Travis Kalanick", "", "", "", "Travis Kalanick",
"", "", "", "", "", "", "", "Travis Kalanick", "Travis Kalanick",
"Travis Kalanick", "", "", "", "", ""), uber_ceo_confidence = c(0L,
NA, NA, NA, NA, 94L, NA, NA, NA, 69L, NA, NA, NA, NA, NA,
NA, NA, 5L, 13L, 17L, NA, NA, NA, NA, NA), uber_ceo_importance = c("Don't care at all",
"", "", "", "", "Care strongly", "", "", "", "Care somewhat",
"", "", "", "", "", "", "", "Don't care at all", "Don't care at all",
"Care somewhat", "", "", "", "", ""), black_panther_mc = c("",
"T'Chaka", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "T'Chaka", "", ""), black_panther_answer = c("",
"T'Challa", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "T'Challa", "", ""), black_panther_confidence = c(NA,
63L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 34L, NA, NA), black_panther_importance = c("",
"Don't care at all", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "Care a little",
"", ""), the_office_mc = c("The Mindy Project", "", "", "",
"", "", "", "", "", "", "", "", "", "", "The Office", "",
"", "The Mindy Project", "", "", "", "", "The Office", "",
""), the_office_answer = c("The Office", "", "", "", "",
"", "", "", "", "", "", "", "", "", "The Office", "", "",
"The Office", "", "", "", "", "The Office", "", ""), the_office_confidence = c(43L,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, NA,
NA, 11L, NA, NA, NA, NA, 100L, NA, NA), the_office_importance = c("Don't care at all",
"", "", "", "", "", "", "", "", "", "", "", "", "", "Don't care at all",
"", "", "Care a little", "", "", "", "", "Care a little",
"", ""), arms_manufacturing_company_mc = c("J. Brockton & Sons",
"", "", "O.F. Mossberg & Sons", "", "", "", "", "", "", "",
"", "J. Brockton & Sons", "", "", "", "", "", "", "", "",
"", "", "", "J. Brockton & Sons"), arms_manufacturing_company_answer = c("J. Brockton & Sons",
"", "", "J. Brockton & Sons", "", "", "", "", "", "", "",
"", "J. Brockton & Sons", "", "", "", "", "", "", "", "",
"", "", "", "J. Brockton & Sons"), arms_manufacturing_company_confidence = c(91L,
NA, NA, 24L, NA, NA, NA, NA, NA, NA, NA, NA, 37L, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 100L), arms_manufacturing_company_importance = c("Don't care at all",
"", "", "Don't care at all", "", "", "", "", "", "", "",
"", "Don't care at all", "", "", "", "", "", "", "", "",
"", "", "", "Don't care at all")), class = c("data.table",
"data.frame"), row.names = c(NA, -25L))
I'm trying to do something like the following:
test %>%
gather(name, value, -c('political_spectrum', 'politics_today')) %>%
filter(value != "") %>%
mutate(question_id = sub("_[^_]+$", "", name)) %>%
mutate(confidence = grepl("_confidence", name)) %>%
group_by(politics_today, question_id) %>%
summarize(mean_confidence = mean(value[confidence == "TRUE"]))
in which I get the mean_confidence values for each political affiliation, but only for specific rows in the "value" column. In order to run the mean only on "confidence" columns, I am trying to do a filter via mean(value[confidence == "TRUE"]), but am not sure the correct way to do this.
I think you need to change your code to
library(tidyverse)
test %>%
gather(name, value, -c('political_spectrum', 'politics_today')) %>%
filter(value != "") %>%
mutate(question_id = sub("_[^_]+$", "", name),
confidence = grepl("_confidence", name)) %>%
group_by(politics_today, question_id) %>%
summarize(mean_confidence = mean(as.numeric(value[confidence])))
# politics_today question_id mean_confidence
# <chr> <chr> <dbl>
# 1 Independent arms_manufacturing_company 95.5
# 2 Independent stranger_things_universe 40.3
# 3 Independent the_office 43
# 4 Independent tupac 69.3
# 5 Independent uber_ceo 0
# 6 Strong Democrat black_panther 48.5
# 7 Strong Democrat stranger_things_universe 51.7
# 8 Strong Democrat the_office 55.5
# 9 Strong Democrat tupac 85
#10 Strong Democrat uber_ceo 32.2
#11 Weak Democrat arms_manufacturing_company 24
#12 Weak Democrat stranger_things_universe 67
#13 Weak Democrat the_office 2
#14 Weak Democrat tupac 71
#15 Weak Democrat uber_ceo 69
#16 Weak Republican arms_manufacturing_company 37
Since your value column has got both numeric and character values, it gets converted to a character column so you need to change the value where confidence == TRUE to numeric.

Resources