dplyr to output class data.frame - r

I can summarise a data frame with dplyr like this:
mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg))
To convert the output back to class data.frame, my current approach is this:
as.data.frame(mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg)))
Is there any way to get dplyr to output a class data.frame without having to use as.data.frame?

As was pointed out in the comments you might not need to convert it since it might be good enough that it inherits from data frame. If that is not good enough then this still uses as.data.frame but is slightly more elegant:
mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg)) %>%
ungroup %>%
as.data.frame()
ADDED I just read in the comments that the reason you want this is to avoid the truncation of printed output. In that case just define this option, possibly in your .Rprofile file:
options(dplyr.print_max = Inf)
(Note that you can still hit the maximum defined by the "max.print" option associated with print so you would need to set that one too if it's also too low for you.)
Update: Changed %.% to %>% to reflect changes in dplyr.

In addition to what G. Grothendieck mentioned above, you can convert it into a new dataframe:
new_summary <- mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg)) %>%
as.data.frame()

Related

Tidyverse: Unnest tibble by group into seperate df/tibble

I am currently trying to find a short & tidy way to unnest a nested tibble with 2 grouping variables and a tibble/df as data for each observation into a tibble having only one of the grouping variables and the respective data in a df (or tibble). I will illustrate my sample by using the starwars dataset provided by tidyverse and show the 3 solutions I came up with so far.
library(tidyverse)
#Set example data: 2 grouping variables, name & sex, and one data column with a tibble/df for each observation
tbl_1 <- starwars %>% group_by(name, sex) %>% nest() %>% ungroup()
#1st Solution: Neither short nor tidy but gets me the result I would like to have in the end
tbl_2 <- tbl_1 %>%
group_by(sex) %>%
nest() %>%
ungroup()%>%
mutate(data = map(.$data, ~.x %>% group_by(name) %>% unnest(c(data))))
#2nd Solution: A lot shorter and more neat but still not what I have in mind
tbl_2 <- tbl_1 %>%
nest(-sex) %>%
mutate(data = map(.$data, ~.x %>% unnest(cols = c(data))))
#3rd Solution: The best so far, short and readable
tbl_2 <- tbl_1 %>%
unnest(data) %>%
group_by(name) %>%
nest(-sex)
##Solution as I have it in mind / I think should be somehow possible.
tbl_2 <- tbl_1 %>% group_by(sex) %>% unnest() #This however gives one large tibble grouped by sex, not two separate tibbles in a nested tibble
Is such a solution I am looking for even possible in the first place or is the 3rd solution as close as it gets in terms of being both short, readable and tidy?
In terms of my actual workflow tbl_1 is the "work horse" of my analysis and not subject to change, I use to apply analysis or ggplot via map for figures etc., which are sometimes on the level of "names" or "sex".
I appreciate any input!
Update:
User #caldwellst has given a sufficient enough answer for me to mark this question as answered, unfortunately only as a comment. After waiting a bit, I would now accept any other answer with the same suggestion as the solution to mark this question as solved.
As #caldwellst has pointed out in a comment, the group_by is unnecessary, the provided solution is sufficiently short and tidy enough for me in that case.
tbl_1 %>% unnest(data) %>% nest(data = -sex).
I will remove my answer and accept a different one, if #caldwellst posts the comment as answer or somebody else provides a different, but equally suitable one.

arrange delete the data frame row names

I have this dataframe
When I try to arrange it to create a ranking variable
df_tablaCruzada<-df_tablaCruzada%>%
arrange(desc(Total)) %>%
mutate(Ranking=1:nrow(df_tablaCruzada))
I get the data frame arranged and the ranking variable is fine but I have lost the original row names
Any idea, please?
regards
Dplyr does't support row.names you may want to use tibble::rownames_to_column()
Example
mtcars %>%
tibble::rownames_to_column()
In your case this should work
df_tablaCruzada<-df_tablaCruzada%>%
tibble::rownames_to_column() %>%
arrange(desc(Total)) %>%
mutate(Ranking=1:nrow(df_tablaCruzada))
you can also use the add_row function of dplyr to replace the mutate in this case

Better output with dplyr -- breaking functions and results

This is a long-lasting question, but now I really to solve this puzzle. I'm using dplyr all the time and I think it is great to summarise variables. However, I'm trying to display a pivot table with partial success only. Dplyr always reports one single row with all results, what's annoying. I have to copy-paste the results to excel to organize everything...
I got the code here
and it almost working.
This result
Should be like the following one:
Because I always report my results using this style
Use this code to get the same results:
library(tidyverse)
set.seed(123)
ds <- data.frame(group=c("american", "canadian"),
iq=rnorm(n=50,mean=100,sd=15),
income=rnorm(n=50, mean=1500, sd=300),
math=rnorm(n=50, mean=5, sd=2))
ds %>%
group_by(group) %>%
summarise_at(vars(iq, income, math),funs(mean, sd)) %>%
t %>%
as.data.frame %>%
rownames_to_column %>%
separate(rowname, into = c("feature", "fun"), sep = "_")
To clarify, I've tried this code, but spread works with only one summary (mean or sd, etc). Some people use gather(), but it's complicated to work with group_by and gather().
Thanks for any help.
Instead of transposing (t) and changing the class types, after the summarise step, do a gather to change it to 'long' format and then spread it back after doing some modifications with separate and unite
library(tidyverse)
ds %>%
group_by(group) %>%
summarise_at(vars(iq, income, math),funs(mean, sd)) %>%
gather(key, val, iq_mean:math_sd) %>%
separate(key, into = c('key1', 'key2')) %>%
unite(group, group, key2) %>%
spread(group, val)

How to pipe an output tibble into further calculations without saving the tibble as a separate object in R?

I am having a hard time manipulating a tibble output that I receive after piping (using dplyr pipe %>%) a data frame through a series of steps. This code below returns a 2 x 3 tibble ouput:
sr_df %>% group_by(ResolutionViolated) %>% tally() %>% arrange(desc(n)) %>% mutate(total = sum(n))
This gives me a count of service requests that are and aren't violated (or simply put, late). This is well and good, but I want to be able to manipulate this same tibble further without having to save the tibble as an object.
Why? Because this way, I can filter my data frame (sr_df) before this piping operations, by company/account, priority, and other factors. I am able to filter with an if function, but this filter will not have an impact on the newly created tibble object. So I am looking to do something like this:
sr_df %>% group_by(ResolutionViolated) %>% tally() %>% arrange(desc(n)) %>% mutate(total = sum(n)) %>% round(tibble[1,2]/tibble$total*100, digits = 2)
I am an R and Coding Noob. Don't hold back - I just want to learn; learn quick and learn right. Any answers are appreciated. Thank you!
I have looked at this: R: Further subset a selection using the pipe %>% and placeholder
but I don't think I get it.
In your caase, you can further manipulate the tibble you have generated using dplyr functions.
Note the existence of mutate_at and summarize_at, that lets you transform a set of columns with the option to select them by column position.
This, using . as a placeholder for the tibble you are currently manipulating, and calling an anonymous function inside mutate_at, will give you the result you expect.
sr_df %>%
group_by(ResolutionViolated) %>%
tally() %>%
arrange(desc(n)) %>%
mutate(total = sum(n)) %>%
mutate_at(.cols = c(1, 2),
.funs = function(column) round(column / .$total * 100, digits = 2))

summarise vs. summarise_each function in dplyr package

I am trying to summarise the value for one variable after splitting the data with group_by using dplyr package, the following code works fine and the output is listed below, but I can not substitute summarise_each with summriase even only one column need to be calculated, I wonder why?
iris %>% group_by(Species) %>% select(one_of('Sepal.Length')) %>%
summarise_each(funs(mean(.)))
or I will get the output like "S3:lazy".
summarize and summarize_each work quite differently. summarize is in fact simpler — just specify the expression directly:
iris %>%
group_by(Species) %>%
select(Sepal.Length) %>%
summarize(Sepal.Length = mean(Sepal.Length))
You can choose any name for the output column, it doesn’t need to be the same as the input.

Resources