How to Transpose (t) in the Tidyverse Using Tidyr - r

Using the sample data (bottom), I want to use the code below to group and summarise the data. After this, I want to transpose, but I'm stuck on how to use tidyr to achieve this?
For context, I'm attempting to recreate an existing table that was created in Excel using knitr::kable, so the final product of my code below is expected to break tidy principles.
For example:
library(tidyverse)
Df <- Df %>% group_by(Code1, Code2, Level) %>%
summarise_all(funs(count = sum(!is.na(.))))
I can add t(.) using the pipe...
Df <- Df %>% group_by(Code1, Code2, Level) %>%
summarise_all(funs(count = sum(!is.na(.)))) %>%
t(.)
or I can add...
Df <- as.data.frame(t(Df)
Both of these options allow me to transpose, but I'm wondering if there's a tidyverse method of achieving this using tidyr's gather and spread functions? I want to have more control over the process and also want to remove the "V1","V2", etc, that appear as column names when using transpose (t).
How can I achieve this using tidyverse?
Sample Code:
Code1 <- c("H200","H350","H250","T400","T240","T600")
Code2 <- c("4A","4A","4A","2B","2B","2B")
Level <- c(1,2,3,1,2,3)
Q1 <- c(30,40,40,50,60,80)
Q2 <- c(50,30,50,40,80,30)
Q3 <- c(30,45,70,42,81,34)
Df <- data.frame(Code1, Code2, Level, Q1, Q2, Q3)

The general idiom in the tidyverse is to gather() your data to the maximal extent, forming a "long" data frame with one measurement per row. Then, spread() can revert this long data frame into whichever "wide" format that you like best. This procedure can effectively transpose the data: just gather() all the identifier columns except the row names, and then spread() the row names.
For example, here is how to effectively transpose mtcars:
require(tidyverse)
mtcars %>%
rownames_to_column %>%
gather(variable, value, -rowname) %>%
spread(rowname, value)
Your data does not have "row names" as understood in R, but Code1 effectively serves as a row name because it uniquely identifies each (original) row of your data.
Df1 <- Df %>%
group_by(Code1, Code2, Level) %>%
summarise_all(funs(count = sum(!is.na(.)))) %>%
gather(column, value, -Code1) %>%
spread(Code1, value)
UPDATE for tidyr 1.0 or higher (late 2019 onwards)
The new pivot_wider() and pivot_longer() functions are now preferred over the older (but still supported) gather() and spread(). Thus the preferred way to transpose mtcars is probably
require(tidyverse)
mtcars %>%
rownames_to_column() %>%
pivot_longer(-rowname, 'variable', 'value') %>%
pivot_wider(variable, rowname)

library(tidyr)
library(dplyr)
Df <- Df %>% group_by(Code1, Code2, Level) %>%
summarise_all(funs(count = sum(!is.na(.)))) %>%
gather(var, val, 2:ncol(Df)) %>%
spread(Code1, val)

Related

R filter or subset for finding a specific repeat count for data.frame

I want to use filter or subset from dplyr that will give a new dataframe only with rows in which for the selected column the value is counted exactly 2 times in the original data.frame
I try this:
df2 <-
df %>%
group_by(x) %>% mutate(duplicate = n()) %>%
filter(duplicate == 2)
and this
df2 <- subset(df,duplicated(x))
but neither option works
In the group_by, just use the unquoted column name. Also, we don't need to create a column in mutate before filtering. It can be directly done on the fly in filter
library(dplyr)
df %>%
group_by(x) %>%
filter(n() ==2) %>%
ungroup

Filter the first group after group_by

Sometimes it is handy to take a test case out of your data when working with group_by() from the dplyr library. I was wondering if there is any fast way to just grab the first group of a grouped dataframe and cast it to a new dataframe.
All I could come up with was this workaround:
library(dplyr)
smalldf <- mtcars %>% group_by(gear) %>% group_split(.) %>% .[[1]]

dataframe in wideformat to dataframe of timeseries

I am currently struggling with reshaping my dataset to a preferred result. Lets say I have the following dataset to start with:
library(tsbox)
library(dplyr)
library(tidyr)
# create df that matches my format
df1 <- ts_wide(ts_df(ts_c(mdeaths)))
df1$id <- 1
df2 <- ts_wide(ts_df(ts_c(mdeaths)))
df2$id <- 2
df <- rbind(df1, df2)
Now this dataset has a date column, a value column and an "id" column, which should specifiy which date/value points belong to the same observation object. I would now like to reshape my dataset to a 2x2 dataframe, where the first column is the id, while the second column is a timeseries object (of the date/value corresponding to that id). To do so, I tried the following:
# create a new df, with two cols (id and ts)
df_ts <- df %>%
group_by(id) %>%
nest()
The nest command creates a "a list-column of data frames", which is not exactly what I wanted. I know that a ts can be defined via ts(data$value, data$date), but I do not know how to integrate it after the group_by(id) function. Can anyone help me how to turn this column into a ts object instead of a data frame? I am new to R and grateful for any form of help.
Thanks in advance
If you have a non-atomic data type it will have to be a list column of something.
If you want a list-column of ts object you can:
df %>%
group_by(id) %>%
summarize(ts = list(ts(value, time)))
Continuing your pipe you could:
df %>%
group_by(id) %>%
nest() %>%
mutate(data = purrr::map(data, with, ts(value, time)))

How to subset a data frame with R pipeline

I am trying to subset/filter a data frame according to the corresponding column elements from another data frame.
Here is what I used to do this
df <- df1[df1$col1 %in% df2$col2,]
And then I am going to set the column as row names
df <- df %>% remove_rownames %>% column_to_rownames('col1')
However I have no idea how to combine these two codes into one using %>%
df1 %>% filter(col1 %in% df2$col2) %>% remove_rownames %>% column_to_rownames('col1')

Better output with dplyr -- breaking functions and results

This is a long-lasting question, but now I really to solve this puzzle. I'm using dplyr all the time and I think it is great to summarise variables. However, I'm trying to display a pivot table with partial success only. Dplyr always reports one single row with all results, what's annoying. I have to copy-paste the results to excel to organize everything...
I got the code here
and it almost working.
This result
Should be like the following one:
Because I always report my results using this style
Use this code to get the same results:
library(tidyverse)
set.seed(123)
ds <- data.frame(group=c("american", "canadian"),
iq=rnorm(n=50,mean=100,sd=15),
income=rnorm(n=50, mean=1500, sd=300),
math=rnorm(n=50, mean=5, sd=2))
ds %>%
group_by(group) %>%
summarise_at(vars(iq, income, math),funs(mean, sd)) %>%
t %>%
as.data.frame %>%
rownames_to_column %>%
separate(rowname, into = c("feature", "fun"), sep = "_")
To clarify, I've tried this code, but spread works with only one summary (mean or sd, etc). Some people use gather(), but it's complicated to work with group_by and gather().
Thanks for any help.
Instead of transposing (t) and changing the class types, after the summarise step, do a gather to change it to 'long' format and then spread it back after doing some modifications with separate and unite
library(tidyverse)
ds %>%
group_by(group) %>%
summarise_at(vars(iq, income, math),funs(mean, sd)) %>%
gather(key, val, iq_mean:math_sd) %>%
separate(key, into = c('key1', 'key2')) %>%
unite(group, group, key2) %>%
spread(group, val)

Resources