i would like to index by column name within the sum command using the sequence operator.
library(dbplyr)
library(tidyverse)
df=data.frame(
X=c("A","B","C"),
X.1=c(1,2,3),X.2=c(1,2,3),X.3=c(1,2,3),X.4=c(1,2,3),X.5=c(1,2,3),X.6=c(1,2,3),X.7=c(1,2,3),X.8=c(1,2,3),X.9=c(1,2,3),X.10=c(1,2,3),
X.11=c(1,2,3),X.12=c(1,2,3),X.13=c(1,2,3),X.14=c(1,2,3),X.15=c(1,2,3),X.16=c(1,2,3),X.17=c(1,2,3),X.18=c(1,2,3),X.19=c(1,2,3),X.20=c(1,2,3),
X.21=c(1,2,3),X.22=c(1,2,3),X.23=c(1,2,3),X.24=c(1,2,3),X.25=c(1,2,3),X.26=c(1,2,3),X.27=c(1,2,3),X.28=c(1,2,3),X.29=c(1,2,3),X.30=c(1,2,3),
X.31=c(1,2,3),X.32=c(1,2,3),X.33=c(1,2,3),X.34=c(1,2,3),X.35=c(1,2,3),X.36=c(1,2,3),X.37=c(1,2,3),X.38=c(1,2,3),X.39=c(1,2,3),X.40=c(1,2,3),
X.41=c(1,2,3),X.42=c(1,2,3),X.43=c(1,2,3),X.44=c(1,2,3),X.45=c(1,2,3),X.46=c(1,2,3),X.47=c(1,2,3),X.48=c(1,2,3),X.49=c(1,2,3),X.50=c(1,2,3),
X.51=c(1,2,3),X.52=c(1,2,3),X.53=c(1,2,3),X.54=c(1,2,3),X.55=c(1,2,3),X.56=c(1,2,3))
Is there a quicker way todo this. The following provides the correct result. However, for large datasets (larger than this one ) it becomes vary laborious to deal with especially when pivot_wider is used and the columns are not created before hand (like above)
df %>% rowwise() %>% mutate(
Result_column=case_when(
X=="A"~ sum(c(X.1,X.2,X.3,X.4,X.5)),
X=="B"~ sum(c(X.4,X.5)),
X=="C" ~ sum(c( X.3, X.4, X.5, X.6, X.7, X.8, X.9, X.10, X.11, X.12, X.13, X.14, X.15, X.16,
X.17, X.18, X.19, X.20, X.21, X.22, X.23, X.24, X.25, X.26, X.27, X.28, X.29, X.30,
X.31, X.32, X.33, X.34, X.35, X.36, X.37, X.38, X.39, X.40, X.41, X.42,X.43, X.44,
X.45, X.46, X.47, X.48, X.49, X.50, X.51, X.52, X.53, X.54, X.55, X.56)))) %>% dplyr::select(Result_column)
The following is the how it would be used when using "select" syntax, which is that i would like to use. However, does not provide correct numerical solution. One can shorter the code by ~50 entries, by using a sequence operator ":".
df %>% rowwise() %>% mutate(
Result_column=case_when(
X=="A"~ sum(c(X.1:X.5)),
X=="B"~ sum(c(X.4:X.5)),
X=="C" ~ sum(c(X.3:X.56)))) %>% dplyr::select(Result_column)
below is a related question, however, not the same because what is needed is not a column that starts with "X" but rather a sequence.
Using mutate rowwise over a subset of columns
EDIT:
the provided code (below) from cnbrowlie is correct.
df %>% mutate(
Result_column=case_when(
X=="A"~ sum(c(X.1:X.5)),
X=="B"~ sum(c(X.4:X.5)),
X=="C" ~ sum(c(X.3:X.56)))) %>% dplyr::select(Result_column)
This can be done with dplyr>=1.0.0 using rowSums() (which computes the sum for a row across multiple columns) and across() (which superceded vars() as a method for specifying columns in a dataframe, allowing the use of : to select sequences of columns):
df %>% rowwise() %>% mutate(
Result_column=case_when(
X=="A"~ rowSums(across(X.1:X.5)),
X=="B"~ rowSums(across(X.4:X.5)),
X=="C" ~ rowSums(across(X.3:X.56))
)
) %>% dplyr::select(Result_column)
library(tidyverse)
library(ggmosaic) for "happy" dataset.
I feel like this should be a somewhat simple thing to achieve, but I'm having difficulty with percentages when using purrr::map together with table(). Using the "happy" dataset, I want to create a list of frequency tables for each factor variable. I would also like to have rounded percentages instead of counts, or both if possible.
I can create frequency precentages for each factor variable separately with the code below.
with(happy,round(prop.table(table(marital)),2))
However I can't seem to get the percentages to work correctly when using table() with purrr::map. The code below doesn't work...
happy%>%select_if(is.factor)%>%map(round(prop.table(table)),2)
The second method I tried was using tidyr::gather, and calculating the percentage with dplyr::mutate and then splitting the data and spreading with tidyr::spread.
TABLE<-happy%>%select_if(is.factor)%>%gather()%>%group_by(key,value)%>%summarise(count=n())%>%mutate(perc=count/sum(count))
However, since there are different factor variables, I would have to split the data by "key" before spreading using purrr::map and tidyr::spread, which came close to producing some useful output except for the repeating "key" values in the rows and the NA's.
TABLE%>%split(TABLE$key)%>%map(~spread(.x,value,perc))
So any help on how to make both of the above methods work would be greatly appreciated...
You can use an anonymous function or a formula to get your first option to work. Here's the formula option.
happy %>%
select_if(is.factor) %>%
map(~round(prop.table(table(.x)), 2))
In your second option, removing the NA values and then removing the count variable prior to spreading helps. The order in the result has changed, however.
TABLE = happy %>%
select_if(is.factor) %>%
gather() %>%
filter(!is.na(value)) %>%
group_by(key, value) %>%
summarise(count = n()) %>%
mutate(perc = round(count/sum(count), 2), count = NULL)
TABLE %>%
split(.$key) %>%
map(~spread(.x, value, perc))