tidyr - spread multiple columns - r

I'm preparing data for a network meta-analysis and I am having difficult in tyding the columns.
If I have this initial dataset:
Study Trt y sd n
1 1 -1.22 3.70 54
1 3 -1.53 4.28 95
2 1 -0.30 4.40 76
2 2 -2.60 4.30 71
2 4 -1.2 4.3 81
How can I finish with this other one?
Study Treatment1 y1 sd1 n1 Treatment2 y2 sd2 n2 Treatment3 y3 sd3 n3
1 1 1 -1.22 3.70 54 3 -1.53 4.28 95 NA NA NA NA
2 3 1 -0.30 4.40 76 2 -2.60 4.30 71 4 -1.2 4.3 81
I'm really stuck in this step, and I'd really appreciate some help...

We can gather to 'long' format, then unite multiple columns to single and spread it to wide
library(tidyverse)
gather(df1, Var, Val, Trt:n) %>%
group_by(Study, Var) %>%
mutate(n = row_number()) %>%
unite(VarT, Var, n, sep="") %>%
spread(VarT, Val, fill=0)

Related

How to split column with double information (abs. value and %) into 2 separated columns?

In the dataframe absolute values and percentages are combined, and I want to split them into 2 separated columns:
df <- data.frame (Sales = c("74(2.08%)",
"71(2.00%)",
"58(1.63%)",
"42(1.18%)"))
Sales
1 74(2.08%)
2 71(2.00%)
3 58(1.63%)
4 42(1.18%)
Expected output
Sales Share
1 74 2.08
2 71 2.00
3 58 1.63
4 42 1.18
in Base R:
read.table(text=gsub("[()%]", ' ', df$Sales), col.names = c("Sales", "Share"))
Sales Share
1 74 2.08
2 71 2.00
3 58 1.63
4 42 1.18
df %>%
separate(Sales, c("Sales", "Share"), sep='[()%]', extra = 'drop', convert = TRUE)
Sales Share
1 74 2.08
2 71 2.00
3 58 1.63
4 42 1.18
Using tidyr::extract you could split your column into separate columns using a regex:
library(tidyr)
df |>
extract(Sales, into = c("Sales", "Share"), regex = "^(\\d+)\\((\\d+\\.\\d+)\\%\\)$", convert = TRUE)
#> Sales Share
#> 1 74 2.08
#> 2 71 2.00
#> 3 58 1.63
#> 4 42 1.18

Particular ratio using dplyr and tidyr

I'd like to create a new velocity variable. In my data set:
library(dplyr)
library(tidyr)
day <- c(0,47,76,118,160,193,227,262,306,355,396,450)
AT <- c(0.14,0.48,0.83,0.83,0.94,0.94,0.94,0.94,0.94,11.93,12.81,29.36)
ClassType <- c("Class_0_1","Class_0_1","Class_0_1","Class_0_1","Class_0_1","Class_0_1",
"Class_0_1","Class_0_1","Class_0_1","Class_9_25","Class_9_25","Class_25_50")
ClassMax <-c(1,1,1,1,1,1,1,1,1,25,25,50)
my.ds <- data.frame(day,AT,ClassType,ClassMax)
my.ds
# day AT ClassType ClassMax
# 1 0 0.14 Class_0_1 1
# 2 47 0.48 Class_0_1 1
# 3 76 0.83 Class_0_1 1
# 4 118 0.83 Class_0_1 1
# 5 160 0.94 Class_0_1 1
# 6 193 0.94 Class_0_1 1
# 7 227 0.94 Class_0_1 1
# 8 262 0.94 Class_0_1 1
# 9 306 0.94 Class_0_1 1
# 10 355 11.93 Class_9_25 25
# 11 396 12.81 Class_9_25 25
# 12 450 29.36 Class_25_50 50
If ClassType changes, take the next AT value minus actual ClassType values and divide by the difference between the two correspondent dates. In my case:
(11.93 0.94) / (355-306)
#[1] 0.2242857
(12.81-11.93) / (396-355)
#[1] 0.02146341
(29.36-12.81) / (450-396)
#[1] 0.3064815
But if AT is in a new ClassType but do not change based in ClassMax then ignore it.
I have a min to max custom ordination complte.cases <- c("Class_0_1","Class_1_3","Class_3_9", "Class_9_25","Class_25_50","Class_50").
I'd like to repeat the last velocity value inside the intermediate absent ClassType.
I try to do without success:
my.ds$velocity <- c(0,diff(my.ds$AT))/c(0,diff(my.ds$day))
final.ds <- %>%
group_by(nest,ClassType)%>%
summarize(velocity=mean(velocity)) %>%
complete(ClassType, tidyr:fill = list(velocity = NA)) %>%
fill(velocity, .direction = "downup")
}
My desirable output must to be:
final.ds
# ClassType velocity
# Class_ 0_1 0.224285714
# Class_ 1_3 0.224285714
# Class_ 3_9 0.224285714
# Class_ 9_25 0.224285714
# Class_ 9_25 0.021463415
# Class_ 9_25 0.306481481
Please, any help with it?
How about this:
my.ds %>%
group_by(ClassType) %>%
mutate(velocity = c(NA, diff(AT) / diff(day))) %>%
ungroup()
# # A tibble: 12 x 5
# day AT ClassType ClassMax velocity
# <dbl> <dbl> <chr> <dbl> <dbl>
# 1 0 0.14 Class_0_1 1 NA
# 2 47 0.48 Class_0_1 1 0.00723
# 3 76 0.83 Class_0_1 1 0.0121
# 4 118 0.83 Class_0_1 1 0
# 5 160 0.94 Class_0_1 1 0.00262
# 6 193 0.94 Class_0_1 1 0
# 7 227 0.94 Class_0_1 1 0
# 8 262 0.94 Class_0_1 1 0
# 9 306 0.94 Class_0_1 1 0
# 10 355 11.9 Class_9_25 25 NA
# 11 396 12.8 Class_9_25 25 0.0215
# 12 450 29.4 Class_25_50 50 NA
complete.cases <- c("Class_0_1","Class_1_3","Class_3_9", "Class_9_25","Class_25_50")
my.ds %>% group_by(ClassType = factor(ClassType, levels = complete.cases), grp = lag(match(ClassType, unique(ClassType)), default = 1)) %>% slice_tail(n = 1) %>%
ungroup %>%summarise(ClassType, velocity = c(NA, diff(AT))/c(NA, diff(day))) %>%
complete(ClassType) %>%
fill(velocity, .direction = "updown")
# ClassType velocity
# <fct> <dbl>
# 1 Class_0_1 0.224
# 2 Class_1_3 0.224
# 3 Class_3_9 0.224
# 4 Class_9_25 0.224
# 5 Class_9_25 0.0215
# 6 Class_25_50 0.306

Data manipulation in R - creating a new data frame from an existing one

I have the below data frame in R
id <- c(112, 112,112)
case <- c("up","down","worse")
c1 <- c(0.12,0.24,0.09)
c2 <- c(0.11,0.14,0.06)
c3 <- c(0.15,0.34,0.04)
c4 <- c(0.16,0.44,0.03)
c5 <- c(0.17,0.94,0.01)
df3 <- data.frame(id,case,c1,c2,c3,c4,c5)
I am trying to create a new data frame with column names as id, case, value_in_period, and period
For each id, the period will have values from 0-9. The value_in_period column will take values of c1,c2,c3,c4, and c5 from periods 0-5 and the rest of the values will be 0. A sample of desired output is attached below
I tried using inner join and transpose, but it doesn't seem to work. Any help will be appreciated.
We reshape to 'long' format and then use complete to expand the data
library(dplyr)
library(tidyr)
out <- df3 %>%
pivot_longer(cols = c1:c5, names_to = NULL,
values_to = 'value_in_period') %>%
group_by(id, case = factor(case, levels = unique(case))) %>%
mutate(period = row_number()-1) %>%
complete(period = 0:9, fill = list(value_in_period = 0)) %>%
ungroup %>%
relocate(period, .after = 'value_in_period')
-output
> as.data.frame(out)
id case value_in_period period
1 112 up 0.12 0
2 112 up 0.11 1
3 112 up 0.15 2
4 112 up 0.16 3
5 112 up 0.17 4
6 112 up 0.00 5
7 112 up 0.00 6
8 112 up 0.00 7
9 112 up 0.00 8
10 112 up 0.00 9
11 112 down 0.24 0
12 112 down 0.14 1
13 112 down 0.34 2
14 112 down 0.44 3
15 112 down 0.94 4
16 112 down 0.00 5
17 112 down 0.00 6
18 112 down 0.00 7
19 112 down 0.00 8
20 112 down 0.00 9
21 112 worse 0.09 0
22 112 worse 0.06 1
23 112 worse 0.04 2
24 112 worse 0.03 3
25 112 worse 0.01 4
26 112 worse 0.00 5
27 112 worse 0.00 6
28 112 worse 0.00 7
29 112 worse 0.00 8
30 112 worse 0.00 9

How to output twice in R pipe?

library(psych)
library(mokken)
bfi[1:3] %>%
na.omit() %>%
mokken::check.monotonicity() %T>%
summary %>%
{.$Hi[.$Hi<0]}
A1
-0.3873723
Above script works well.I get the final output but still want to review the output of summary.
How to make summary output too in this pipe?
If we want the summary as well, place it in a list
library(psych)
library(mokken)
library(magrittr)
out <- bfi[1:3] %>%
na.omit() %>%
mokken::check.monotonicity() %>%
{list(summary(.), .$Hi[.$Hi < 0])}
out
#[[1]]
# ItemH #ac #vi #vi/#ac maxvi sum sum/#ac zmax #zsig crit
#A1 -0.39 75 54 0.72 0.52 9.79 0.1305 16.75 51 550
#A2 0.06 50 8 0.16 0.14 0.63 0.0126 4.76 7 128
#A3 0.09 30 6 0.20 0.12 0.45 0.0149 4.63 6 134
#[[2]]
# A1
#-0.3873723
You can use %T>% print() to show the result of summary() but not return it.
bfi[1:3] %>%
na.omit() %>%
mokken::check.monotonicity() %T>%
{print(summary(.))} %>%
{.$Hi[.$Hi<0]}
# ItemH #ac #vi #vi/#ac maxvi sum sum/#ac zmax #zsig crit
# A1 -0.39 75 54 0.72 0.52 9.79 0.1305 16.75 51 550
# A2 0.06 50 8 0.16 0.14 0.63 0.0126 4.76 7 128
# A3 0.09 30 6 0.20 0.12 0.45 0.0149 4.63 6 134
#
# A1
# -0.3873723
If you assign it to a variable, it doesn't store the result of summary().
out <- ...
out
# A1
# -0.3873723

How to keep other columns when using aggregate in R?

I have a dataframe, p4p5, that contains the following columns:
p4p5 <- c("SampleID", "expr", "Gene", "Period", "Consequence", "isPTV")
I've used the aggregate function here to find the median expression per Gene:
p4p5_med <- aggregate(expr ~ Gene, p4p5, median)
However, this results in a dataframe with the columns "expr" and "Gene" only. How can I still retain all the original columns when applying the aggregate function?
UPDATE:
Input (p4p5):
SampleID expr Gene Period Consequence isPTV
HSB430 -1.23 ENSG000098 4 upstream_gene_variant 0
HSB321 -0.02 ENSG000098 5 stop_gained 1
HSB296 3.12 ENSG000027 4 upstream_gene_variant 0
HSB201 1.22 ENSG000027 4 intron_variant 0
HSB220 0.13 ENSG000013 6 intron_variant 0
Expected output:
SampleID expr Gene Period Consequence isPTV Median
HSB430 -1.23 ENSG000098 4 upstream_gene_variant 0 -0.625
HSB321 -0.02 ENSG000098 5 stop_gained 1 -0.625
HSB296 3.12 ENSG000027 4 upstream_gene_variant 0 2.17
HSB201 1.22 ENSG000027 4 intron_variant 0 2.17
HSB220 0.13 ENSG000013 6 intron_variant 0 0.13
I'd use dplyr for this:
library(dplyr)
p4p5 %>%
group_by(Gene) %>%
mutate(Median = median(expr, na.rm = TRUE)) %>%
ungroup()
SampleID expr Gene Period Consequence isPTV Median
<chr> <dbl> <chr> <int> <chr> <int> <dbl>
1 HSB430 -1.23 ENSG000098 4 upstream_gene_variant 0 -0.625
2 HSB321 -0.02 ENSG000098 5 stop_gained 1 -0.625
3 HSB296 3.12 ENSG000027 4 upstream_gene_variant 0 2.17
4 HSB201 1.22 ENSG000027 4 intron_variant 0 2.17
5 HSB220 0.13 ENSG000013 6 intron_variant 0 0.13

Resources