Labels not parsed in Expss for loop - r

I'm new to R and trying to explore my variables by groups and i'm using a for loop to pass all suiting variable names under expss.
Here is an reproducible example :
require(expss)
require(dplyr)
colnoms <- as.data.frame(HairEyeColor) %>% names(.)
expss_digits(2)
for (i in colnoms){
as.data.frame(HairEyeColor) %>%
tab_cells(get(i)) %>%
tab_cols(Eye) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct() %>%
tab_pivot() %>%
set_caption(i) %>%
htmlTable() %>%
print()
}
I expect the name of the variable in the output (Hair, Eye, Color) but instead i get only "get(i)".
Thanks for any advice

After get we can not to know original variable name. The simplest way to show original name is to set variable name as label:
require(expss)
data(HairEyeColor)
HairEyeColor <- as.data.frame(HairEyeColor)
colnoms <- names(HairEyeColor)
expss_digits(2)
for (i in colnoms){
# if we don't have label we assign name as label
if(is.null(var_lab(HairEyeColor[[i]]))) var_lab(HairEyeColor[[i]]) = i
HairEyeColor %>%
tab_cells(get(i)) %>%
tab_cols(Eye) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct() %>%
tab_pivot() %>%
set_caption(i) %>%
htmlTable() %>%
print()
}

Related

How to hide columns in r dataframe

I am looking to hide 2 columns in my R dataframe. I have tried to download the "gt" package, but I am unable to. Is there another function to do so?
Thanks in advance.
library("gapminder")
library("dplyr")
library("gt")
tab_1 <-
penguins %>% # table name
dplyr::filter(species == "Adelie") %>%
head(10) %>%
gt() %>%
cols_hide(columns = c(sex, year)) # columns you want to hide
print(tab_1)
Here there is an example that you can try with gt:
library(gapminder)
library(dplyr)
library(gt)
gapminder %>%
filter(country == "Germany") %>%
head(10) %>%
gt() %>%
cols_hide(columns = c(lifeExp, pop))

How to apply multiple functions to a list of data frames?

I have a list of more than 50 csv files with the same numbers of columns and rows.
I want to find the percentage of missing values for each of the data frames and I have found the code that works fine with a single file which is the following:
missing.values <- estaciones2 %>%
gather(key = "key", value = "val") %>%
mutate(is.missing = is.na(val)) %>%
group_by(key, is.missing) %>%
summarise(num.missing = n()) %>%
filter(is.missing==T) %>%
select(-is.missing) %>%
arrange(desc(num.missing))
Now I want to apply these functions to each of my data frames in my list.
I read that I can use the map function to create a loop and run the code for each of my files in the list, although I am not quite sure how to insert the map function into my code shown above and I have tried the following but doesn't seem right:
missing.values <- map(estaciones2, ~ map(estaciones2, ~ estaciones2 %>%
gather(key = "key", value = "val") %>%
mutate(is.missing = is.na(val)) %>%
group_by(key, is.missing) %>%
summarise(num.missing = n()) %>%
filter(is.missing==T) %>%
select(-is.missing) %>%
arrange(desc(num.missing)))
We need a lambda function (~) to loop over the list (assuming estaciones2 is a list object). The .x is the data.frame element of the list using the lambda call
library(purrr)
library(tidyr)
library(dplyr)
map(estaciones2, ~ .x %>%
gather(key = "key", value = "val") %>%
mutate(is.missing = is.na(val)) %>%
group_by(key, is.missing) %>%
summarise(num.missing = n()) %>%
filter(is.missing==T) %>%
select(-is.missing) %>%
arrange(desc(num.missing)))
In the OP's code, multiple map functions are called on the same list element again and again i.e. estaciones2

using dplyr::group_by in a function within apply

i'd like to produce nice summaries for a selection of grouping variables in my dataset, where for each group i would show the top 6 frequencies and their associated proportions. I can get this for a single grouping variable using the syntax:
my_db %>%
group_by(my_var) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>%
head()
How do i modify this expression so it can be used in an apply function?
For example using mtcars, I've tried something like this:
apply(mtcars[c(2:4,11)], 2,
function(x) {
group_by(!!x) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
)
but it doesn't work. Any idea how i can achieve this?
You should apply using the colnames(dat) to get the correct groupings:
dat <- mtcars[c(2:4,11)]
grp <- function(x) {
group_by(dat,!!as.name(x)) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
lapply(colnames(dat), grp)
apply(mtcars[c(2:4,11)], 2,
function(x) {
mtcars %>%
group_by(x= !!x) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
)
you just need the parent df to evaluation

R: dynamic variable name comparisons

I recoded a bunch of variables in a dataset, and and gave the newly recoded variables the prefix "r_" in my dataset.
I'd like to run table on the pairs to ensure the recoding was correct. Something like table(v1, r_v1), but I need to do it for lots of variables. They are not in any particular order, so I couldn't use indexing.
Here is a reproducible example of data one can use (also any tips on optimizing that code are appreciated!).
mtcars %>% select(c(disp,hp)) %>%
mutate_all(funs(if_else(.>100,1,0))) %>%
rename_(.dots=setNames(names(.), paste0('r_', names(.)))) %>%
cbind(mtcars,.)
Any ideas?
I would just use variable names and simple for loop. Calling your modified data dd,
orig = c("disp", "hp")
trans = paste0("r_", orig)
check_list = list()
for (i in seq_along(orig)) {
check_list[[i]] = table(dd[[orig[i]]], dd[[trans[i]]])
# or whatever other check you want to do
}
check_list
You can then examine the check_list contents one at a time.
To keep things in the tidy format with which you started:
library(purrr)
library(tidyr)
mtcars %>%
select(disp,hp) %>%
mutate_all(funs(r = if_else(.>100,1,0))) %>%
mutate(index = row_number()) %>%
gather(key = key, value = value, -index) %>%
separate(key, c("Variable", "Type")) %>%
mutate(Type = ifelse(is.na(Type), "Original", "Recode")) %>%
spread(key = Type, value = value) %>%
select(-index) %>%
split(.$Variable) %>%
map(~ select(.,-Variable)) %>%
map(~ table(.))

passing column name as variable in dplyr

Variants of this question have been asked a lot, I also read about NSE.
Still I cannot figure this out.
This is easy:
library(dplyr)
data(cars)
cars %>%
group_by(speed) %>%
summarise(d = mean(dist))
Now I want to use variable x to pass the dist column to mean
x <- "dist"
Of course this does not work:
cars %>%
group_by(speed) %>%
summarise(d = mean(x))
So I use SE version of summarise:
cars %>%
group_by(speed) %>%
summarise_(d = mean(x))
Ok, does not work, so I have to add ~ as well:
cars %>%
group_by(speed) %>%
summarise_(d = ~mean(x))
Still does not work, but if use dist instead of x:
cars %>%
group_by(speed) %>%
summarise_(d = ~mean(dist))
This works, but doesn't use x.
cars %>%
group_by(speed) %>%
summarise_(d = ~mean(~x))
This also doesn't work.
I'm basically monkeying around without any idea how to make this work, or why it fails.
cars %>%
group_by(speed) %>%
summarise_each_(funs(mean), vars(matches(x)))

Resources