ggplot2 - how to create a clustered timeline? - r

How would you go about creating the graph below in R? I want to show the duration of different treatments for different patients.
Mock data here:
Start Day Stop Day
Patient 1 Drug 1 1 3
Drug 2 2 5
Drug 3 3 8
Patient 2 Drug 1 2 4
Drug 2 2 5
Drug 3 1 6
Patient 3 Drug 1 4 7
Drug 2 3 8
Drug 3 5 6

Your graph can be generated using geom_segment in the ggplot2 package:
df <- structure(list(Patient = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("Patient1", "Patient2", "Patient3"), class = "factor"),
Drug = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("Drug1",
"Drug2", "Drug3"), class = "factor"), StartDay = c(1L, 2L,
3L, 2L, 2L, 1L, 4L, 3L, 5L), StopDay = c(3L, 5L, 8L, 4L,
5L, 6L, 7L, 8L, 6L)), .Names = c("Patient", "Drug", "StartDay",
"StopDay"), class = "data.frame", row.names = c(NA, -9L))
df$Drug <- factor(df$Drug, levels(df$Drug)[c(3,2,1)])
library(ggplot2)
ggplot(data=df, aes(color=Drug))+
geom_segment(aes(x=StartDay, xend=StopDay, y=Drug, yend=Drug),lwd=12)+
facet_grid(Patient~.)+xlab("Days")

Related

Calculating and looping summaries for individual participants into a table

I have data from several hundred participants who each provided between 1 and 6 sentences. They then rated their sentence(s) on 4 dimensions, as did two external raters.
I'd like to create a table, grouped by participant, with columns showing these values:
Participants' rate of agreement with rater 1 (par1), with rater 2 (par2) and overall (paro)
Participants' rate of agreement for each dimension with rater 1 (pad1.1, pad2.1 etc.), with rater 2 (pad1.2, pad2.2 etc.) and overall (pad1.o, pad2.o etc.)
Mean difference in rating between participant and rater 1 (mdrp1), rater 2 (mdrp2) and both raters (mdrpo)
Mean difference in rating for each dimension between participant and rater 1 (mdr1p1, mdr2p1 etc.), rater 2 (mdr1p2, mdr2p2 etc.) and both raters (mdr1po, mdr2po etc.)
(So with 4 dimensions there should be 30 values per participant)
Due to the size and structure of the data, I'm not sure where to start on this. I'm guessing that a loop would be necessary, but I've struggled to get my head around how to do that as well.
For agreement I'm considering adding TRUE/FALSE variables and then replacing them with 1 and 0 to eventually calculate agreement:
df <- df %>% mutate(par1 = (df$d1 == df$r1.1)
df <- df %>% mutate(par2 = (df$d1 == df$r2.1)
df <- df %>% mutate(paro = (df$d1 == df$r1.1 & df$d1 == df$r2.1)
And similarly for mean differences, adding variables with rating difference for each dimension...
df <- df %>% mutate(mdr1p1 = (df$d1 - df$r1.1))
df <- df %>% mutate(mdr1p2 = (df$d1 - df$r2.1))
df <- df %>% mutate(mdr1po = (df$d1 - ((df$r1.1 + df$r2.1)/2)))
...But these seem to be quite inefficient approaches!
My data looks like this:
ID Ans d1 d2 d3 d4 r1.1 r1.2 r1.3 r1.4 r2.1 r2.2 r2.3 r2.4
1 53 abc 3 3 3 3 3 2 4 3 3 2 4 3
2 a4 def 3 3 3 3 3 1 2 3 3 1 3 3
3 a4 ghi 4 4 4 4 3 2 5 1 3 1 5 2
4 hj jkl 3 3 3 3 3 1 3 3 3 1 5 3
5 32 mno 2 3 3 3 3 1 3 2 3 1 3 3
6 32 pqr 3 3 3 2 3 2 5 3 4 2 3 3
ID = participant
Ans = participants' written answer
d = dimension rated by participant
r1 = dimensions rated by external rater 1
r2 = dimensions rated by external rater 2
Example data:
structure(list(ID = c(1L, 2L, 2L, 3L, 4L, 4L, 5L),
Ans = c("abc", "def", "ghi", "jkl", "mno", "pqr", "stu"),
d1 = c(3L, 3L, 4L, 3L, 2L, 3L, 3L), d2 = c(3L, 3L, 4L, 3L, 3L, 3L, 1L),
d3 = c(3L, 3L, 4L, 3L, 3L, 3L, 1L), d4 = c(3L, 3L, 4L, 3L, 3L, 2L, 3L),
r1.1 = c(3L, 3L, 3L, 3L, 3L, 3L, 3L), r1.2 = c(2L, 1L, 2L, 1L, 1L, 2L, 3L),
r1.3 = c(4L, 2L, 5L, 3L, 3L, 5L, 3L), r1.4 = c(3L, 3L, 1L, 3L, 2L, 3L, 2L),
r2.1 = c(3L, 3L, 3L, 3L, 3L, 4L, 3L), r2.2 = c(2L, 1L, 1L, 1L, 1L, 2L, 1L),
r2.3 = c(4L, 3L, 5L, 5L, 3L, 3L, 5L), r2.4 = c(3L, 3L, 2L, 3L, 3L, 3L, 2L)),
row.names = c(1L, 2L, 3L, 4L, 5L, 6L), class = "data.frame")

how can I categorize based on several column [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 4 years ago.
I have a data like this
df<- structure(list(V1 = structure(c(10L, 4L, 7L, 5L, 3L, 1L, 8L,
11L, 12L, 9L, 2L, 6L), .Label = c("BRA_AC_A6IX", "BRA_BH_A18F",
"BRA_BH_A18V", "BRA_BH_A1ES", "BRA_BH_A1FE", "BRA_BH_A6R8", "BRA_E2_A15A",
"BRA_E2_A15K", "BRA_E2_A1B4", "BRA_EM_A15E", "BRA_LQ_A4E4", "BRA_OK_A5Q2"
), class = "factor"), V2 = structure(c(2L, 3L, 5L, 3L, 3L, 5L,
3L, 4L, 1L, 4L, 2L, 2L), .Label = c("Level ii", "Level iia",
"Level iib", "Level iiia", "Level iiic"), class = "factor"),
V3 = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 4L), .Label = c("amira", "boro", "car", "dim"), class = "factor")), class = "data.frame", row.names = c(NA,
-12L))
I am trying to categorize them based on two column
I can do the following
library(dplyr)
df %>%
+ group_by(V2) %>%
+ summarise(no_rows = length(V2))
# A tibble: 5 x 2
V2 no_rows
<fct> <int>
1 Level ii 1
2 Level iia 3
3 Level iib 4
4 Level iiia 2
5 Level iiic 2
but I want to have an output like this
Amira Boro Car dim
Level ii 1
Level iia 1 1 1
Level iib 1 1 1
Level iiia 1
Level iiic 1 1
How about
library(reshape2)
df1 <- df[,-1]
table(melt(df1, id.var="V2")[-2])
Here is a tidyverse method. I am imputing that you actually want the counts, but if you want just the presence/absence that is easy to add.
df <- structure(list(V1 = structure(c(10L, 4L, 7L, 5L, 3L, 1L, 8L, 11L, 12L, 9L, 2L, 6L), .Label = c("BRA_AC_A6IX", "BRA_BH_A18F", "BRA_BH_A18V", "BRA_BH_A1ES", "BRA_BH_A1FE", "BRA_BH_A6R8", "BRA_E2_A15A", "BRA_E2_A15K", "BRA_E2_A1B4", "BRA_EM_A15E", "BRA_LQ_A4E4", "BRA_OK_A5Q2"), class = "factor"), V2 = structure(c(2L, 3L, 5L, 3L, 3L, 5L, 3L, 4L, 1L, 4L, 2L, 2L), .Label = c("Level ii", "Level iia", "Level iib", "Level iiia", "Level iiic"), class = "factor"), V3 = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L), .Label = c("amira", "boro", "car", "dim"), class = "factor")), class = "data.frame", row.names = c(NA, -12L))
library(tidyverse)
df %>%
select(-V1) %>%
count(V2, V3) %>%
spread(V3, n, fill = 0L)
#> # A tibble: 5 x 5
#> V2 amira boro car dim
#> <fct> <int> <int> <int> <int>
#> 1 Level ii 0 0 1 0
#> 2 Level iia 1 0 1 1
#> 3 Level iib 1 2 1 0
#> 4 Level iiia 0 0 2 0
#> 5 Level iiic 1 1 0 0
Created on 2018-05-23 by the reprex package (v0.2.0).

Get sum of unique rows in table function in R

Suppose I have data which looks like this
Id Name Price sales Profit Month Category Mode Supplier
1 A 2 5 8 1 X K John
1 A 2 6 9 2 X K John
1 A 2 5 8 3 X K John
2 B 2 4 6 1 X L Sam
2 B 2 3 4 2 X L Sam
2 B 2 5 7 3 X L Sam
3 C 2 5 11 1 X M John
3 C 2 5 11 2 X L John
3 C 2 5 11 3 X K John
4 D 2 8 10 1 Y M John
4 D 2 8 10 2 Y K John
4 D 2 5 7 3 Y K John
5 E 2 5 9 1 Y M Sam
5 E 2 5 9 2 Y L Sam
5 E 2 5 9 3 Y M Sam
6 F 2 4 7 1 Z M Kyle
6 F 2 5 8 2 Z L Kyle
6 F 2 5 8 3 Z M Kyle
if I apply table function, it will just combines are the rows and result will be
K L M
X 4 4 1
Y 2 1 3
Z 0 1 2
Now what if I want not the sum of all rows but only sum of those rows with Unique Id
so it looks like
K L M
X 2 2 1
Y 1 1 2
Z 0 1 1
Thanks
If df is your data.frame:
# Subset original data.frame to keep columns of interest
df1 <- df[,c("Id", "Category", "Mode")]
# Remove duplicated rows
df1 <- df1[!duplicated(df1),]
# Create table
with(df1, table(Category, Mode))
# Mode
# Category K L M
# X 2 2 1
# Y 1 1 2
# Z 0 1 1
Or in one line using unique
table(unique(df[c("Id", "Category", "Mode")])[-1])
df <- structure(list(Id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), Name = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), .Label = c("A",
"B", "C", "D", "E", "F"), class = "factor"), Price = c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), sales = c(5L, 6L, 5L, 4L, 3L, 5L, 5L, 5L, 5L, 8L, 8L, 5L,
5L, 5L, 5L, 4L, 5L, 5L), Profit = c(8L, 9L, 8L, 6L, 4L, 7L, 11L,
11L, 11L, 10L, 10L, 7L, 9L, 9L, 9L, 7L, 8L, 8L), Month = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L), Category = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("X", "Y", "Z"
), class = "factor"), Mode = structure(c(1L, 1L, 1L, 2L, 2L,
2L, 3L, 2L, 1L, 3L, 1L, 1L, 3L, 2L, 3L, 3L, 2L, 3L), .Label = c("K",
"L", "M"), class = "factor"), Supplier = structure(c(1L, 1L,
1L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L
), .Label = c("John", "Kyle", "Sam"), class = "factor")), .Names = c("Id",
"Name", "Price", "sales", "Profit", "Month", "Category", "Mode",
"Supplier"), class = "data.frame", row.names = c(NA, -18L))
We can try
library(data.table)
dcast(unique(setDT(df1[c('Category', 'Mode', 'Id')])),
Category~Mode, value.var='Id', length)
# Category K L M
#1: X 2 2 1
#2: Y 1 1 2
#3: Z 0 1 1
Or with dplyr
library(dplyr)
df1 %>%
distinct(Id, Category, Mode) %>%
group_by(Category, Mode) %>%
tally() %>%
spread(Mode, n, fill=0)
# Category K L M
# (chr) (dbl) (dbl) (dbl)
#1 X 2 2 1
#2 Y 1 1 2
#3 Z 0 1 1
Or as #David Arenburg suggested, a variant of the above is
df1 %>%
distinct(Id, Category, Mode) %>%
select(Category, Mode) %>%
table()

Rescaling by group across data frames

I have two data frames
df1 <- structure(list(g1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), g2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), val1 = 1:20, val2 = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 4L, 1L, 2L, 3L)), .Names = c("g1", "g2", "val1", "val2"), row.names = c(NA, -20L), class = "data.frame")
df2 <- structure(list(g1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), g2 = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), val3 = c(5L, 6L, 7L, 3L, 4L, 5L, 2L, 3L, 4L, 8L, 9L, 10L, 4L, 5L, 6L, 5L, 6L)), .Names = c("g1", "g2", "val3"), row.names = c(NA, -17L), class = "data.frame")
> df1
g1 g2 val1 val2
1 A a 1 1
2 A a 2 2
3 A a 3 3
4 A a 4 4
5 A b 5 1
6 A b 6 2
7 A b 7 3
8 A c 8 1
9 A c 9 2
10 A c 10 3
11 B a 11 1
12 B a 12 2
13 B a 13 3
14 B b 14 1
15 B b 15 2
16 B b 16 3
17 B b 17 4
18 B c 18 1
19 B c 19 2
20 B c 20 3
> df2
g1 g2 val3
1 A a 5
2 A a 6
3 A a 7
4 A b 3
5 A b 4
6 A b 5
7 A c 2
8 A c 3
9 B c 4
10 B a 8
11 B a 9
12 B a 10
13 B b 4
14 B b 5
15 B b 6
16 B c 5
17 B c 6
My aim is to rescale df1$val2 to take values between the min and max values of df2$val3 within the respective groups.
I tried this:
library(dplyr)
df1 <- df1 %.% group_by(g1, g2) %.% mutate(rescaled=(max(df2$val3)-min(df2$val3))*(val2-min(val2))/(max(val2)-min(val2))+min(df2$val3))
But the output is different from what I expect. The problem is that I can neither cbind nor merge the two data frames due to their different lengths. Any hints?
Does this work?
library(plyr)
df3 <- ddply(df2, .(g1, g2), summarize, max.val=max(val3), min.val=min(val3))
merged.df <- merge(df1, df3, by=c("g1", "g2"), all.x=TRUE)
## Now rescale merged.df$val2 as desired

ddply with only certain values of splitting variable

Is it possible to return ddply results for only certain values of the splitting variable? For example, with the dataframe example:
example <- structure(list(shape = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("circle", "square", "triangle"
), class = "factor"), property = structure(c(1L, 3L, 2L, 1L,
2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L), .Label = c("color",
"intensity", "size"), class = "factor"), value = structure(c(5L,
2L, 1L, 5L, 4L, 1L, 5L, 6L, 6L, 7L, 4L, 3L, 6L, 5L), .Label = c("3",
"5", "6", "7", "blue", "green", "red"), class = "factor")), .Names = c("shape",
"property", "value"), class = "data.frame", row.names = c(NA,
-14L))
which looks like this
shape property value
1 circle color blue
2 circle size 5
3 circle intensity 3
4 circle color blue
5 square intensity 7
6 square size 3
7 square color blue
8 square color green
9 square color green
10 triangle color red
11 triangle intensity 7
12 triangle size 6
13 triangle color green
14 triangle color blue
I want to return a dataframe containing the number of each shape that has a certain color, which would be something like this:
shape property blue green red
1 circle color 2 0 0
2 square color 1 2 0
3 triangle color 1 1 1
However, I can't seem to get this to return properly! I've gotten part of the way using something like this:
ColorSummary <- ddply(example,.(shape,property="color"), function(example) summary(example$value))
But this is returning a dataframe with columns for all of the other unique value (from the properties size and intensity, which I do not want):
shape property 3 5 6 7 blue green red
1 circle color 1 1 0 0 2 0 0
2 square NA 1 0 0 1 1 2 0
3 triangle NA 0 0 1 1 1 1 1
What am I doing wrong - is there a way to return a dataframe like the first result that I showed?
Also, while this is a small and fast example, my "real" data are much bigger and take a long time to calculate. Does the speed of ddply improve by limiting to only property="color"?
EDIT: Thanks for the answers so far! Unfortunately for me, I oversimplified the situation and I'm not sure if the dcast solution will work for me. Let me explain - I am actually working with a dataframe example2:
example2 <- structure(list(factory = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), shape = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L), .Label = c("circle",
"square", "triangle"), class = "factor"), property = structure(c(1L,
3L, 2L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 3L, 2L
), .Label = c("color", "intensity", "size"), class = "factor"),
value = structure(c(5L, 2L, 1L, 5L, 4L, 1L, 5L, 6L, 6L, 7L,
4L, 3L, 6L, 5L, 5L, 2L, 1L), .Label = c("3", "5", "6", "7",
"blue", "green", "red"), class = "factor")), .Names = c("factory",
"shape", "property", "value"), class = "data.frame", row.names = c(NA,
-17L))
and I am trying to split by both factory and shape. I have a messy solution using ddply:
ColorSummary2 <- ddply(example2,.(factory,shape,property="color"), function(example2) summary(example2$value))
which gives
factory shape property 3 5 6 7 blue green red
1 A circle color 1 1 0 0 2 0 0
2 A square NA 1 0 0 1 1 2 0
3 A triangle NA 0 0 1 1 1 1 1
4 B circle NA 1 1 0 0 1 0 0
but what I would like to return is this (sorry for the messy table, I have trouble formatting tables on here):
factory shape property blue green red
1 A circle color 2 0 0
2 A square NA 1 2 0
3 A triangle NA 1 1 1
4 B circle NA 1 0 0
Is this possible?
EDIT 2: Sorry for all of the edits, I oversimplified my situation way too much. Here is a more complex dataframe that is closer to my real example. This one has a column state, which I do not want to use for splitting. I can do this (messily) with ddply, but can I ignore state using dcast?
example3 <- structure(list(state = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("CA", "FL"
), class = "factor"), factory = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), shape = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L), .Label = c("circle",
"square", "triangle"), class = "factor"), property = structure(c(1L,
3L, 2L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 3L, 2L
), .Label = c("color", "intensity", "size"), class = "factor"),
value = structure(c(5L, 2L, 1L, 5L, 4L, 1L, 5L, 6L, 6L, 7L,
4L, 3L, 6L, 5L, 5L, 2L, 1L), .Label = c("3", "5", "6", "7",
"blue", "green", "red"), class = "factor")), .Names = c("state",
"factory", "shape", "property", "value"), class = "data.frame", row.names = c(NA,
-17L))
Using dcast from reshape2:
dcast(...~value,data=subset(example,property=='color'))
Aggregation function missing: defaulting to length
shape property blue green red
1 circle color 2 0 0
2 square color 1 2 0
3 triangle color 1 1 1
EDIT
using the second data set example:
dcast(...~value,data=subset(example2,property=='color'))
Aggregation function missing: defaulting to length
factory shape property blue green red
1 A circle color 2 0 0
2 A square color 1 2 0
3 A triangle color 1 1 1
4 B circle color 1 0 0

Resources