I am looking to hide 2 columns in my R dataframe. I have tried to download the "gt" package, but I am unable to. Is there another function to do so?
Thanks in advance.
library("gapminder")
library("dplyr")
library("gt")
tab_1 <-
penguins %>% # table name
dplyr::filter(species == "Adelie") %>%
head(10) %>%
gt() %>%
cols_hide(columns = c(sex, year)) # columns you want to hide
print(tab_1)
Here there is an example that you can try with gt:
library(gapminder)
library(dplyr)
library(gt)
gapminder %>%
filter(country == "Germany") %>%
head(10) %>%
gt() %>%
cols_hide(columns = c(lifeExp, pop))
Related
I would like to use a tidy approach to produce correlograms by group.
My attempt with iris and libraries dplyr and corrplot:
library(corrplot)
library(dplyr)
par(mfrow=c(2,2))
iris %>%
group_by(Species) %>%
group_map(~ corrplot::corrplot(cor(.x,use = "complete.obs"),tl.cex=0.7,title =""))
It works but I would like to add the Species name on each plot.
Also, any other tidy approaches/ functions are very welcome!
We could use cur_group()
library(dplyr)
library(corrplot)
out <- iris %>%
group_by(Species) %>%
summarise(outr = list( corrplot::corrplot(cor(cur_data(),
use = "complete.obs"),tl.cex=0.7,title = cur_group()[[1]])))
Or if we are using group_map, the .keep = FALSE by default. Specify it as TRUE and extract the group element
iris %>%
group_by(Species) %>%
group_map(~ corrplot::corrplot(cor(select(.x, where(is.numeric)),
use = "complete.obs"),tl.cex=0.7,title = first(.x$Species)), .keep = TRUE)
You can use split and map approach with imap -
library(dplyr)
library(purrr)
iris %>%
split(.$Species) %>%
imap(~corrplot::corrplot(cor(.x[-5],use ="complete.obs"),tl.cex=0.7,title =.y))
Below is my data
library(gapminder)
library(tidyverse)
lst <- unique(gapminder$continent)
ylst = c(2007, 1952)
map2_dfr(lst,ylst, ~gapminder %>% filter(continent == .x & year == .y) %>%
arrange(desc(gdpPercap))
%>% slice(1) %>% select(continent, country,gdpPercap,year))
The data is the gapminder data from the R library 'gapminder'.
I want to find the country with the highest gdpPercap for each year for each continent using purrr.
However this code is giving me the error that the lengths of my two lists are not the same
What is the map syntax to iterate over two lists, when the lengths are not the same? And how should I use that to fix the code and achieve my objective?
I would do this by grouping and nesting:
gapminder %>%
filter(year %in% ylst) %>%
group_by(continent, year) %>%
nest() %>%
mutate(data=map(data, ~top_n(., 1, gdpPercap))) %>%
unnest(c(data)) %>%
select(continent, country,gdpPercap,year)
I'm new to R and trying to explore my variables by groups and i'm using a for loop to pass all suiting variable names under expss.
Here is an reproducible example :
require(expss)
require(dplyr)
colnoms <- as.data.frame(HairEyeColor) %>% names(.)
expss_digits(2)
for (i in colnoms){
as.data.frame(HairEyeColor) %>%
tab_cells(get(i)) %>%
tab_cols(Eye) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct() %>%
tab_pivot() %>%
set_caption(i) %>%
htmlTable() %>%
print()
}
I expect the name of the variable in the output (Hair, Eye, Color) but instead i get only "get(i)".
Thanks for any advice
After get we can not to know original variable name. The simplest way to show original name is to set variable name as label:
require(expss)
data(HairEyeColor)
HairEyeColor <- as.data.frame(HairEyeColor)
colnoms <- names(HairEyeColor)
expss_digits(2)
for (i in colnoms){
# if we don't have label we assign name as label
if(is.null(var_lab(HairEyeColor[[i]]))) var_lab(HairEyeColor[[i]]) = i
HairEyeColor %>%
tab_cells(get(i)) %>%
tab_cols(Eye) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct() %>%
tab_pivot() %>%
set_caption(i) %>%
htmlTable() %>%
print()
}
i'd like to produce nice summaries for a selection of grouping variables in my dataset, where for each group i would show the top 6 frequencies and their associated proportions. I can get this for a single grouping variable using the syntax:
my_db %>%
group_by(my_var) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>%
head()
How do i modify this expression so it can be used in an apply function?
For example using mtcars, I've tried something like this:
apply(mtcars[c(2:4,11)], 2,
function(x) {
group_by(!!x) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
)
but it doesn't work. Any idea how i can achieve this?
You should apply using the colnames(dat) to get the correct groupings:
dat <- mtcars[c(2:4,11)]
grp <- function(x) {
group_by(dat,!!as.name(x)) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
lapply(colnames(dat), grp)
apply(mtcars[c(2:4,11)], 2,
function(x) {
mtcars %>%
group_by(x= !!x) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
)
you just need the parent df to evaluation
Apply function table() to each column of a data.frame using dplyr
I often apply the table-function on each column of a data frame using plyr, like this:
library(plyr)
ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) ) )
Is it possible to do this in dplyr also?
My attempts fail:
mtcars %>% do( table %>% data.frame() )
melt( mtcars ) %>% do( table %>% data.frame() )
You can try the following which does not rely on the tidyr package.
mtcars %>%
lapply(table) %>%
lapply(as.data.frame) %>%
Map(cbind,var = names(mtcars),.) %>%
rbind_all() %>%
group_by(var) %>%
mutate(pct = Freq / sum(Freq))
Using tidyverse (dplyr and purrr):
library(tidyverse)
mtcars %>%
map( function(x) table(x) )
Or:
mtcars %>%
map(~ table(.x) )
Or simply:
library(tidyverse)
mtcars %>%
map( table )
In general you probably would not want to run table() on every column of a data frame because at least one of the variables will be unique (an id field) and produce a very long output. However, you can use group_by() and tally() to obtain frequency tables in a dplyr chain. Or you can use count() which does the group_by() for you.
> mtcars %>%
group_by(cyl) %>%
tally()
> # mtcars %>% count(cyl)
Source: local data frame [3 x 2]
cyl n
1 4 11
2 6 7
3 8 14
If you want to do a two-way frequency table, group by more than one variable.
> mtcars %>%
group_by(gear, cyl) %>%
tally()
> # mtcars %>% count(gear, cyl)
You can use spread() of the tidyr package to turn that two-way output into the output one is used to receiving with table() when two variables are input.
Solution by Caner did not work but from comenter akrun (credit goes to him), this solution worked great. Also using a much larger tibble to demo it. Also I added an order by percent descending.
library(nycflights13);dim(flights)
tte<-gather(flights, Var, Val) %>%
group_by(Var) %>% dplyr::mutate(n=n()) %>%
group_by(Var,Val) %>% dplyr::mutate(n1=n(), Percent=n1/n)%>%
arrange(Var,desc(n1) %>% unique()