How to output twice in R pipe? - r

library(psych)
library(mokken)
bfi[1:3] %>%
na.omit() %>%
mokken::check.monotonicity() %T>%
summary %>%
{.$Hi[.$Hi<0]}
A1
-0.3873723
Above script works well.I get the final output but still want to review the output of summary.
How to make summary output too in this pipe?

If we want the summary as well, place it in a list
library(psych)
library(mokken)
library(magrittr)
out <- bfi[1:3] %>%
na.omit() %>%
mokken::check.monotonicity() %>%
{list(summary(.), .$Hi[.$Hi < 0])}
out
#[[1]]
# ItemH #ac #vi #vi/#ac maxvi sum sum/#ac zmax #zsig crit
#A1 -0.39 75 54 0.72 0.52 9.79 0.1305 16.75 51 550
#A2 0.06 50 8 0.16 0.14 0.63 0.0126 4.76 7 128
#A3 0.09 30 6 0.20 0.12 0.45 0.0149 4.63 6 134
#[[2]]
# A1
#-0.3873723

You can use %T>% print() to show the result of summary() but not return it.
bfi[1:3] %>%
na.omit() %>%
mokken::check.monotonicity() %T>%
{print(summary(.))} %>%
{.$Hi[.$Hi<0]}
# ItemH #ac #vi #vi/#ac maxvi sum sum/#ac zmax #zsig crit
# A1 -0.39 75 54 0.72 0.52 9.79 0.1305 16.75 51 550
# A2 0.06 50 8 0.16 0.14 0.63 0.0126 4.76 7 128
# A3 0.09 30 6 0.20 0.12 0.45 0.0149 4.63 6 134
#
# A1
# -0.3873723
If you assign it to a variable, it doesn't store the result of summary().
out <- ...
out
# A1
# -0.3873723

Related

How to convert a list into a data.frame in R?

I've created a frequency table in R with the fdth package using this code
fdt(x, breaks = "Sturges")
The specific result was:
Class limits f rf rf(%) cf cf(%)
[-15.907,-11.817) 12 0.00 0.10 12 0.10
[-11.817,-7.7265) 8 0.00 0.07 20 0.16
[-7.7265,-3.636) 6 0.00 0.05 26 0.21
[-3.636,0.4545) 70 0.01 0.58 96 0.79
[0.4545,4.545) 58 0.00 0.48 154 1.27
[4.545,8.6355) 91 0.01 0.75 245 2.01
[8.6355,12.726) 311 0.03 2.55 556 4.57
[12.726,16.817) 648 0.05 5.32 1204 9.89
[16.817,20.907) 857 0.07 7.04 2061 16.93
[20.907,24.998) 1136 0.09 9.33 3197 26.26
[24.998,29.088) 1295 0.11 10.64 4492 36.90
[29.088,33.179) 1661 0.14 13.64 6153 50.55
[33.179,37.269) 2146 0.18 17.63 8299 68.18
[37.269,41.36) 2525 0.21 20.74 10824 88.92
[41.36,45.45) 1349 0.11 11.08 12173 100.00
It was given as a list:
> class(x)
[1] "fdt.multiple" "fdt" "list"
I need to convert it into a data frame object, so I can have a table. How can I do it?
I'm a beginner at using R :(
Since you did not provide a reproducible example of your data I have used example from the help page of ?fdt which is closer to what you have.
library(fdth)
mdf <- data.frame(c1=sample(LETTERS[1:3], 1e2, TRUE),
c2=as.factor(sample(1:10, 1e2, TRUE)),
n1=c(NA, NA, rnorm(96, 10, 1), NA, NA),
n2=rnorm(100, 60, 4),
n3=rnorm(100, 50, 4),
stringsAsFactors=TRUE)
fdt <- fdt(mdf,breaks='FD',by='c1')
class(fdt)
#[1] "fdt.multiple" "fdt" "list"
You can extract the table part from each list and bind them together.
result <- purrr::map_df(fdt, `[[`, 'table')
#In base R
#result <- do.call(rbind, lapply(fdt, `[[`, 'table'))
result
# Class limits f rf rf(%) cf cf(%)
#1 [8.1781,9.1041) 5 0.20833333 20.833333 5 20.833333
#2 [9.1041,10.03) 6 0.25000000 25.000000 11 45.833333
#3 [10.03,10.956) 10 0.41666667 41.666667 21 87.500000
#4 [10.956,11.882) 3 0.12500000 12.500000 24 100.000000
#5 [53.135,56.121) 4 0.16000000 16.000000 4 16.000000
#6 [56.121,59.107) 8 0.32000000 32.000000 12 48.000000
#7 [59.107,62.092) 8 0.32000000 32.000000 20 80.000000
#....

Create matrix from dataset in R

I want to create a matrix from my data. My data consists of two columns, date and my observations for each date. I want the matrix to have year as rows and days as columns, e.g. :
17 18 19 20 ... 31
1904 x11 x12 ...
1905
1906
.
.
.
2019
The days in this case is for December each year. I would like missing values to equal NA.
Here's a sample of my data:
> head(cdata)
# A tibble: 6 x 2
Datum Snödjup
<dttm> <dbl>
1 1904-12-01 00:00:00 0.02
2 1904-12-02 00:00:00 0.02
3 1904-12-03 00:00:00 0.01
4 1904-12-04 00:00:00 0.01
5 1904-12-12 00:00:00 0.02
6 1904-12-13 00:00:00 0.02
I figured that the first thing I need to do is to split the date into year, month and day (European formatting, YYYY-MM-DD) so I did that and got rid of the date column (the one that says Datum) and also got rid of the unrelevant days, namely the ones < 17.
cdata %>%
dplyr::mutate(year = lubridate::year(Datum),
month = lubridate::month(Datum),
day = lubridate::day(Datum))
select(cd, -c(Datum))
cu <- cd[which(cd$day > 16
& cd$day < 32
& cd$month == 12),]
and now it looks like this:
> cu
# A tibble: 1,284 x 4
Snödjup year month day
<dbl> <dbl> <dbl> <int>
1 0.01 1904 12 26
2 0.01 1904 12 27
3 0.01 1904 12 28
4 0.12 1904 12 29
5 0.12 1904 12 30
6 0.15 1904 12 31
7 0.07 1906 12 17
8 0.05 1906 12 18
9 0.05 1906 12 19
10 0.04 1906 12 20
# … with 1,274 more rows
Now I need to fit my data into a matrix with missing values as NA. Is there anyway to do this?
Base R approach, using by.
r <- `colnames<-`(do.call(rbind, by(dat, substr(dat$date, 1, 4), function(x) x[2])), 1:31)
r[,17:31]
# 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
# 1904 -0.28 -2.66 -2.44 1.32 -0.31 -1.78 -0.17 1.21 1.90 -0.43 -0.26 -1.76 0.46 -0.64 0.46
# 1905 1.44 -0.43 0.66 0.32 -0.78 1.58 0.64 0.09 0.28 0.68 0.09 -2.99 0.28 -0.37 0.19
# 1906 -0.89 -1.10 1.51 0.26 0.09 -0.12 -1.19 0.61 -0.22 -0.18 0.93 0.82 1.39 -0.48 0.65
Toy data
set.seed(42)
dat <- do.call(rbind, lapply(1904:1906, function(x)
data.frame(date=seq(ISOdate(x, 12, 1, 0), ISOdate(x, 12, 31, 0), "day" ),
value=round(rnorm(31), 2))))
You can try :
library(dplyr)
library(tidyr)
cdata %>%
mutate(year = lubridate::year(Datum),
day = lubridate::day(Datum)) %>%
filter(day >= 17) %>%
complete(day = 17:31) %>%
select(year, day, Snödjup) %>%
pivot_wider(names_from = day, values_from = Snödjup)

tidyr - spread multiple columns

I'm preparing data for a network meta-analysis and I am having difficult in tyding the columns.
If I have this initial dataset:
Study Trt y sd n
1 1 -1.22 3.70 54
1 3 -1.53 4.28 95
2 1 -0.30 4.40 76
2 2 -2.60 4.30 71
2 4 -1.2 4.3 81
How can I finish with this other one?
Study Treatment1 y1 sd1 n1 Treatment2 y2 sd2 n2 Treatment3 y3 sd3 n3
1 1 1 -1.22 3.70 54 3 -1.53 4.28 95 NA NA NA NA
2 3 1 -0.30 4.40 76 2 -2.60 4.30 71 4 -1.2 4.3 81
I'm really stuck in this step, and I'd really appreciate some help...
We can gather to 'long' format, then unite multiple columns to single and spread it to wide
library(tidyverse)
gather(df1, Var, Val, Trt:n) %>%
group_by(Study, Var) %>%
mutate(n = row_number()) %>%
unite(VarT, Var, n, sep="") %>%
spread(VarT, Val, fill=0)

Create table with nested header from pre-summarized data

How do I create a nested table from a data.frame, which have already been summarized? By nested I mean that the table has headers and subheaders.
My input data looks like this:
library(ggplot2)
library(reshape2)
df <- ggplot2::diamonds
count(df, cut,color) %>% mutate(
n = n,
pct = round(n / sum(n),2) ) %>% reshape2::melt() -> df2
head(df2 )
> head(df2 )
cut color variable value
1 Fair D n 163
2 Fair E n 224
3 Fair F n 312
4 Fair G n 314
5 Fair H n 303
6 Fair I n 175
I would like to have something this:
Color
D E F G H I J
cut n pct n pct n pct n pct n pct n pct n pct
Fair 163 0.10 224 0.14 312 0.19 314 0.20 303 0.19 175 0.11 119 0.07
Good 662 0.13 933 0.19 909 0.19 871 0.18 702 0.14 522 0.11 307 0.06
Very Good 1513 0.13 2400 0.20 2164 0.18 2299 0.19 1824 0.15 1204 0.10 678 0.06
Premium 1603 0.12 2337 0.17 2331 0.17 2924 0.21 2360 0.17 1428 0.10 808 0.06
Ideal 2834 0.13 3903 0.18 3826 0.18 4884 0.23 3115 0.14 2093 0.10 896 0.04
Below is an example of the closest I can get. The problem with this table below is that there is only one header. I would like 3 rows/headers: One which says the name of the variable: Color, one which lists the individual categories inside color, and one which lists type of summary (coming from df2$variable):
reshape2::dcast(df2, cut ~ color + variable , value.var = c("value") )
cut D_n D_pct E_n E_pct F_n F_pct G_n G_pct H_n H_pct I_n I_pct J_n J_pct
1 Fair 163 0.10 224 0.14 312 0.19 314 0.20 303 0.19 175 0.11 119 0.07
2 Good 662 0.13 933 0.19 909 0.19 871 0.18 702 0.14 522 0.11 307 0.06
3 Very Good 1513 0.13 2400 0.20 2164 0.18 2299 0.19 1824 0.15 1204 0.10 678 0.06
4 Premium 1603 0.12 2337 0.17 2331 0.17 2924 0.21 2360 0.17 1428 0.10 808 0.06
5 Ideal 2834 0.13 3903 0.18 3826 0.18 4884 0.23 3115 0.14 2093 0.10 896 0.04
I hope there is some function/package which can do this. I think it should be possible because the packages etable and tables, and the function ftable, can create the output I want, but not for pre-summarized data.
This link does what I need (I think), but I only have access to CRAN-packages on the server I use.
https://www.r-statistics.com/2012/01/printing-nested-tables-in-r-bridging-between-the-reshape-and-tables-packages/
Solution based on comments. Thanks!
# data
library(tidyr)
library(dplyr)
library(ggplot2)
library(reshape2)
df <- ggplot2::diamonds
count(df, cut,color) %>% mutate(
n = n,
pct = round(n / sum(n),2) ) %>% reshape2::melt() -> df2
head(df2 )
# Solution
spread( data = df2, key = variable, value = value ) -> df2_spread
tabular( Heading() * cut ~ color * (n + pct) * Heading() * (identity), data =df2_spread )

Break matrix into averaged time bins using R

I need to convert a matrix x like this:
head(x)
Age d18O d13C
1 0.000 3.28 0.880
2 0.000 3.58 0.150
3 0.002 3.16 0.960
4 0.002 2.91 3.228
5 0.004 3.33 0.880
6 0.004 3.16 3.328
tail(x)
Age d18O d13C
14883 66.3037 1.00 2.03
14884 66.3159 1.02 1.70
14885 66.3800 0.62 2.01
14886 67.0073 1.30 1.23
14887 67.2391 1.31 1.30
14888 67.5173 1.36 1.35
into a matrix, containing 0.5 time bins with mean values of each of the variables, such as:
Age count(x$d18O) mean(x$d18O)
1 0 500 4.1003
2 0.5 522 4.079464
3 1 412 4.032743
4 1.5 366 3.810601
5 2 498 3.749257
6 2.5 608 3.649063
. . . .
. . . .
Age is given in Million of years.
This should do the trick:
library(dplyr)
x %>%
mutate(age_bucket = cut(Age, seq(min(Age), max(Age), by = 0.05), include.lowest = TRUE)) %>%
group_by(age_bucket) %>%
summarise(n = n(),
mean_d18O = mean(d18O))
Try this:
sdf=split(x,cut(x$Age,seq(0,max(x$Age)*1.01,by=.5)))
do.call(rbind,lapply(sdf,function(sx)c(length(sx$d18O),mean(sx$d18O))))
you will get something similar to:
(23,23.5] 0 NaN
(23.5,24] 4 2.9500345
(24,24.5] 1 6.9320712
(24.5,25] 2 3.0219788
(25,25.5] 2 3.7149871
(25.5,26] 1 1.9051732
(26,26.5] 2 3.1865066
(26.5,27] 1 3.9982569

Resources