arrange delete the data frame row names - r

I have this dataframe
When I try to arrange it to create a ranking variable
df_tablaCruzada<-df_tablaCruzada%>%
arrange(desc(Total)) %>%
mutate(Ranking=1:nrow(df_tablaCruzada))
I get the data frame arranged and the ranking variable is fine but I have lost the original row names
Any idea, please?
regards

Dplyr does't support row.names you may want to use tibble::rownames_to_column()
Example
mtcars %>%
tibble::rownames_to_column()
In your case this should work
df_tablaCruzada<-df_tablaCruzada%>%
tibble::rownames_to_column() %>%
arrange(desc(Total)) %>%
mutate(Ranking=1:nrow(df_tablaCruzada))
you can also use the add_row function of dplyr to replace the mutate in this case

Related

Filter the first group after group_by

Sometimes it is handy to take a test case out of your data when working with group_by() from the dplyr library. I was wondering if there is any fast way to just grab the first group of a grouped dataframe and cast it to a new dataframe.
All I could come up with was this workaround:
library(dplyr)
smalldf <- mtcars %>% group_by(gear) %>% group_split(.) %>% .[[1]]

Using the R syntax sequence operator ":" within the the sum command with more then 50 columns

i would like to index by column name within the sum command using the sequence operator.
library(dbplyr)
library(tidyverse)
df=data.frame(
X=c("A","B","C"),
X.1=c(1,2,3),X.2=c(1,2,3),X.3=c(1,2,3),X.4=c(1,2,3),X.5=c(1,2,3),X.6=c(1,2,3),X.7=c(1,2,3),X.8=c(1,2,3),X.9=c(1,2,3),X.10=c(1,2,3),
X.11=c(1,2,3),X.12=c(1,2,3),X.13=c(1,2,3),X.14=c(1,2,3),X.15=c(1,2,3),X.16=c(1,2,3),X.17=c(1,2,3),X.18=c(1,2,3),X.19=c(1,2,3),X.20=c(1,2,3),
X.21=c(1,2,3),X.22=c(1,2,3),X.23=c(1,2,3),X.24=c(1,2,3),X.25=c(1,2,3),X.26=c(1,2,3),X.27=c(1,2,3),X.28=c(1,2,3),X.29=c(1,2,3),X.30=c(1,2,3),
X.31=c(1,2,3),X.32=c(1,2,3),X.33=c(1,2,3),X.34=c(1,2,3),X.35=c(1,2,3),X.36=c(1,2,3),X.37=c(1,2,3),X.38=c(1,2,3),X.39=c(1,2,3),X.40=c(1,2,3),
X.41=c(1,2,3),X.42=c(1,2,3),X.43=c(1,2,3),X.44=c(1,2,3),X.45=c(1,2,3),X.46=c(1,2,3),X.47=c(1,2,3),X.48=c(1,2,3),X.49=c(1,2,3),X.50=c(1,2,3),
X.51=c(1,2,3),X.52=c(1,2,3),X.53=c(1,2,3),X.54=c(1,2,3),X.55=c(1,2,3),X.56=c(1,2,3))
Is there a quicker way todo this. The following provides the correct result. However, for large datasets (larger than this one ) it becomes vary laborious to deal with especially when pivot_wider is used and the columns are not created before hand (like above)
df %>% rowwise() %>% mutate(
Result_column=case_when(
X=="A"~ sum(c(X.1,X.2,X.3,X.4,X.5)),
X=="B"~ sum(c(X.4,X.5)),
X=="C" ~ sum(c( X.3, X.4, X.5, X.6, X.7, X.8, X.9, X.10, X.11, X.12, X.13, X.14, X.15, X.16,
X.17, X.18, X.19, X.20, X.21, X.22, X.23, X.24, X.25, X.26, X.27, X.28, X.29, X.30,
X.31, X.32, X.33, X.34, X.35, X.36, X.37, X.38, X.39, X.40, X.41, X.42,X.43, X.44,
X.45, X.46, X.47, X.48, X.49, X.50, X.51, X.52, X.53, X.54, X.55, X.56)))) %>% dplyr::select(Result_column)
The following is the how it would be used when using "select" syntax, which is that i would like to use. However, does not provide correct numerical solution. One can shorter the code by ~50 entries, by using a sequence operator ":".
df %>% rowwise() %>% mutate(
Result_column=case_when(
X=="A"~ sum(c(X.1:X.5)),
X=="B"~ sum(c(X.4:X.5)),
X=="C" ~ sum(c(X.3:X.56)))) %>% dplyr::select(Result_column)
below is a related question, however, not the same because what is needed is not a column that starts with "X" but rather a sequence.
Using mutate rowwise over a subset of columns
EDIT:
the provided code (below) from cnbrowlie is correct.
df %>% mutate(
Result_column=case_when(
X=="A"~ sum(c(X.1:X.5)),
X=="B"~ sum(c(X.4:X.5)),
X=="C" ~ sum(c(X.3:X.56)))) %>% dplyr::select(Result_column)
This can be done with dplyr>=1.0.0 using rowSums() (which computes the sum for a row across multiple columns) and across() (which superceded vars() as a method for specifying columns in a dataframe, allowing the use of : to select sequences of columns):
df %>% rowwise() %>% mutate(
Result_column=case_when(
X=="A"~ rowSums(across(X.1:X.5)),
X=="B"~ rowSums(across(X.4:X.5)),
X=="C" ~ rowSums(across(X.3:X.56))
)
) %>% dplyr::select(Result_column)

Counting number of levels within another variable level in data frame

I have a data frame with two character variables, state and birds. I'm trying to see how many bird types are within each state. I have tried:
data.frame %>%
group_by(state) %>%
n_distinct(data.frame$bird)
data.frame %>%
group_by(state) %>%
n_distinct(unique(data.frame$bird))
However, I am very stuck. Thank you in advance for your help; let me know if I need to add more clarification.
How about this?
data.frame %>%
group_by(state) %>%
summarize(distinct_birds = n_distinct(data.frame$bird))
This will be very simple using data.table
library(data.table)
dt=data.table(data.frame)
dt[,counts:=.N, by=.(state,birds)]
.N is the data.table function to get counts
by include grouping variables.
Also using tidyverse:
library(tidyverse)
data.frame %>% count(state)

How to Create Multiple Frequency Tables with Percentages Across Factor Variables using Purrr::map

library(tidyverse)
library(ggmosaic) for "happy" dataset.
I feel like this should be a somewhat simple thing to achieve, but I'm having difficulty with percentages when using purrr::map together with table(). Using the "happy" dataset, I want to create a list of frequency tables for each factor variable. I would also like to have rounded percentages instead of counts, or both if possible.
I can create frequency precentages for each factor variable separately with the code below.
with(happy,round(prop.table(table(marital)),2))
However I can't seem to get the percentages to work correctly when using table() with purrr::map. The code below doesn't work...
happy%>%select_if(is.factor)%>%map(round(prop.table(table)),2)
The second method I tried was using tidyr::gather, and calculating the percentage with dplyr::mutate and then splitting the data and spreading with tidyr::spread.
TABLE<-happy%>%select_if(is.factor)%>%gather()%>%group_by(key,value)%>%summarise(count=n())%>%mutate(perc=count/sum(count))
However, since there are different factor variables, I would have to split the data by "key" before spreading using purrr::map and tidyr::spread, which came close to producing some useful output except for the repeating "key" values in the rows and the NA's.
TABLE%>%split(TABLE$key)%>%map(~spread(.x,value,perc))
So any help on how to make both of the above methods work would be greatly appreciated...
You can use an anonymous function or a formula to get your first option to work. Here's the formula option.
happy %>%
select_if(is.factor) %>%
map(~round(prop.table(table(.x)), 2))
In your second option, removing the NA values and then removing the count variable prior to spreading helps. The order in the result has changed, however.
TABLE = happy %>%
select_if(is.factor) %>%
gather() %>%
filter(!is.na(value)) %>%
group_by(key, value) %>%
summarise(count = n()) %>%
mutate(perc = round(count/sum(count), 2), count = NULL)
TABLE %>%
split(.$key) %>%
map(~spread(.x, value, perc))

dplyr to output class data.frame

I can summarise a data frame with dplyr like this:
mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg))
To convert the output back to class data.frame, my current approach is this:
as.data.frame(mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg)))
Is there any way to get dplyr to output a class data.frame without having to use as.data.frame?
As was pointed out in the comments you might not need to convert it since it might be good enough that it inherits from data frame. If that is not good enough then this still uses as.data.frame but is slightly more elegant:
mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg)) %>%
ungroup %>%
as.data.frame()
ADDED I just read in the comments that the reason you want this is to avoid the truncation of printed output. In that case just define this option, possibly in your .Rprofile file:
options(dplyr.print_max = Inf)
(Note that you can still hit the maximum defined by the "max.print" option associated with print so you would need to set that one too if it's also too low for you.)
Update: Changed %.% to %>% to reflect changes in dplyr.
In addition to what G. Grothendieck mentioned above, you can convert it into a new dataframe:
new_summary <- mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg)) %>%
as.data.frame()

Resources