Formatting output in summarise_each with dplyr - r

Greetings: I am new to dplyr and having some challenges formatting my output. Here is a code snippet that produces some reproducible data, using melt to get it into the shape I need.
set.seed(1234)
library(reshape2)
library(dplyr)
val <- c(0:1)
a <- sample(val, 99, replace=T)
b <- sample(val, 99, replace=T)
c <- sample(val, 99, replace=T)
d <- sample(val, 99, replace=T)
dat <- data.frame(a,b,c,d)
melt.dat <- melt(dat)
Now, I can perform the desired summary:
SummaryTable <- melt.dat %>%
group_by(variable) %>%
summarise_each(funs(sum, sum/n()))
Here is my output:
variable sum *
1 a 50 50.50505
2 b 58 58.58586
3 c 46 46.46465
4 d 46 46.46465
My ideal output would be something as follows. I am unable able to figure out how to specify my column names in the summarise_each or melt functions, set the decimal place and suppress the row numbers. I've spent a long time getting this far, and just can't seem to get the rest figured out!
Letter Count Percent
a 50 50.5
b 58 58.6
c 46 46.5
d 46 46.5

Not sure whether it's possible within dplyr to suppress rownames (numbering), but here's how you could get the names and formatting right:
options(digits = 3)
melt.dat %>%
group_by(Letter = variable) %>%
summarise_each(funs(Count = sum(.), Percent = sum(.)/n()*100), -variable)
#Source: local data frame [4 x 3]
#
# Letter Count Percent
#1 a 45 45.5
#2 b 51 51.5
#3 c 52 52.5
#4 d 48 48.5

Related

Add a row with calculations in R

This feels like a very basic question, but I've been struggling with it. Simplified version follows. I have the following dataset:
a <- c('I' , 'E')
ja <- c(30 , 20)
fe <- c(50, 40)
ma <- c(35 , 22)
x <- data.frame(a, ja , fe , ma)
x
#> a ja fe ma
#> 1 I 30 50 35
#> 2 E 20 40 22
Created on 2020-12-04 by the reprex package (v0.3.0)
I want to add a row 3 to have the figures of row I minus row E for ja, fe and ma. So row 3 will look like:
a ja fe ma
I_E 10 10 13
There will be 12 columns, one for each month, so ideally I'd like to be able to refer to them concisely, for instance, as (in this example) ja:ma or the like. Thanks for any help!
Does this work:
library(dplyr)
library(stringr)
x %>%
bind_rows(x %>% summarise(a = str_c(a, lead(a), sep = '_')) %>% na.omit() %>%
bind_cols(x %>% summarise(across(2:4, ~ . - lead(.))) %>% na.omit()))
a ja fe ma
1 I 30 50 35
2 E 20 40 22
3 I_E 10 10 13
Don't do this.
Data.frames are not Excel tables.
The R way is to have each row as a month. You would start with 3 columns: month_name, I, and E. Then you can create a 4th column that is a I minus E.
df <- data.frame(
month_name = c("ja", "fe", "ma"),
I = c(30, 50, 35),
E = c(20, 40, 22)
)
df$E_E <- df$I - df$E
We can also use data.table approaches
library(data.table)
rbind(setDT(x), x[, c(list(a = 'I_E'),
lapply(.SD, function(x) diff(rev(x)))), .SDcols = 2:4])
# a ja fe ma
#1: I 30 50 35
#2: E 20 40 22
#3: I_E 10 10 13

How to switch these columns and rows in R data frame?

Suppose we have this data frame:
avg_1 avg_2 avg_3 avg_4
132 123 23 214
DF DM RF RM
How can I convert this in R so that the output is a new data frame that looks like:
avg key
132 DF
123 DM
23 RF
214 RM
I have tried using pivot_longer from tidyverse, but the trouble is that I'm also trying to rename the columns to avg and key. Can anyone help?
In base R I would try:
setNames(data.frame(t(df), row.names = NULL), c("avg", "key"))
Output
avg key
1 132 DF
2 123 DM
3 23 RF
4 214 RM
Does this work:
library(dplyr)
library(purrr)
library(tibble)
t(df) %>% as.tibble() %>% set_names(c('avg','key')) %>% type.convert(as.is = T)
# A tibble: 4 x 2
avg key
<int> <chr>
1 132 DF
2 123 DM
3 23 RF
4 214 RM
And here is a solution with R builtin methods:
x <- t(your.data.fram)
names(x) <- c("avg", "key")
Note that you might also want to change the data types to numeric and factor, if they are something different, e.g.
x$avg <- as.numeric(x$avg)
x$key <- as.factor(x$key)

Boxcox transformation on multiple variables with mutate_at

Say, I want to do boxcox transformation from caret package on the following data (not the data I am working with but just to explain my problem):
library(caret); library(tidyverse)
set.seed(001)
d <- tibble(a = rpois(20, 10), b = rnorm(20, 40, 10))
head(d)
# A tibble: 6 x 2
a b
<int> <dbl>
1 8 20.1
2 10 46.2
3 7 39.4
4 11 38.4
5 14 25.3
6 12 35.2
I can achieve this by running
d1 <- BoxCoxTrans(d$a) %>% predict(d$a)
I can repeat the same process to transform b. Is there a way I can do boxcox transformation on both variables a and b at the same time with dplyr? I tried the following but I am not able to figure out how to write the .funs
d %>% mutate_at(c("a", "b"), BoxCoxTrans %>% predict(d))
I have never used caret, but is there any reason these solutions would not work in your particular case? (They run fine for me.)
library(tidyverse)
library(caret)
library(e1071)
set.seed(001)
d <- tibble(a = rpois(20, 10), b = rnorm(20, 40, 10))
head(d)
#On selected columns
d %>%
mutate_at(vars(a,b), funs( BoxCoxTrans(.) %>% predict(.)))
#Or on all columns
d %>%
mutate_all(funs( BoxCoxTrans(.) %>% predict(.)))

reshape dataframe from columns to rows and collapse cell values

Here's the challenge i am facing. I am trying to transform this dataset
a b c
100 0 111
0 137 17
78 117 91
into (column to rows)
col1 col2
a 100,78
b 137,117
c 111,17,91
I know I can do this using reshape or melt function but I am not sure how to collapse and paste the cell values. Any suggestions or pointers is appreciated folks.
Here is a light weight option using toString() method to collapse each column to a string and using stack() to reshape the result list to your desired output:
stack(lapply(df, function(col) toString(col[col!=0])))
# values ind
#1 100, 78 a
#2 137, 117 b
#3 111, 17, 91 c
I would use dplyr rather than reshape.
library(dplyr)
library(tidyr)
Data <- data.frame(a=c(100,0,78),b=c(0,137,117),c=c(111,17,91))
Data %>%
gather(Column, Value) %>%
filter(Value != 0) %>%
group_by(Column) %>%
summarize(Value=paste0(Value,collapse=', '))
The gather function is similar to melt in reshape. The group_by function tells later functions that you want to seperate based off of values in Column. Finally summarize calculates whatever summary we want for each of the groups. In this case, paste all the terms together.
Which should give you:
# A tibble: 3 × 2
Column Value
<chr> <chr>
1 a 100, 78
2 b 137, 117
3 c 111, 17, 91
With library(data.table)
melt(dt)[, .(value = paste(value[value !=0], collapse=', ')), by=variable]
# variable value
# 1: a 100, 78
# 2: b 137, 117
# 3: c 111, 17, 91
The data:
dt = fread("a b c
100 0 111
0 137 17
78 117 91")

Pivot rows into a single column and index them using column names in R

I need to flip row values into a single column and create an index based on the column name and row number. I checked a lot of pivot solutions in R but none seem to simply flip things around without creating means, sums, etc. Help would be appreciated.
df1 <- read.table(textConnection("a1,a2,a3
23,34,4
34,44,98"), sep=",", header=TRUE)
df2 <- read.table(textConnection("id,val
1_a1,23
2_a2,34
3_a3,4
4_a1,34
5_a2,44
6_a3,98"), sep=",", header=TRUE)
I need to go from a data frame looking like this:
a1 a2 a3
1 23 34 4
2 34 44 98
To this:
id val
1 1_a1 23
2 2_a2 34
3 3_a3 4
4 4_a1 34
5 5_a2 44
6 6_a3 98
Many thanks!!
This can easily be done with gather from the tidyr package:
library(tidyr)
df2 <- gather(df1, id, val)
Note that this requires the latest development version of tidyr, after this commit- you can install it with devtools::install_github("hadley/tidyr"). Otherwise, you can change the line to gather(df1, id, val, a1:a3).
To add the 1_, 2_, etc, you can do:
df2$id <- paste(df2$id, 1:nrow(df2), sep = "_")
If you use the dplyr package as well, you could do this as:
library(dplyr)
library(tidyr)
df2 <- df1 %>% gather(id, val) %>% mutate(id = paste(id, seq_len(n()), sep = "_"))
You could try
m1 <- t(df1)
d1 <- data.frame(id=paste(seq_along(m1),
rownames(m1)[row(m1)], sep="_"), val=c(m1))
d1
# id val
#1 1_a1 23
#2 2_a2 34
#3 3_a3 4
#4 4_a1 34
#5 5_a2 44
#6 6_a3 98
require(dplyr) # for mutate()
require(tidyr) # for gather()
d <- data.frame(
a1 = c(23, 34),
a2 = c(34, 44),
a3 = c(4, 98)
)
gather(d, id, val, a1:a3) %>%
mutate(id = paste(row_number(), "id", sep = "_"))

Resources