How to plot a graph with four lines? - r

I am doing an eye-tracking experiment trying to find out the influence of two languages on the fixation proportions of participants on two different Areas of interest (AOIs), along the time.
My independent variables: Language (L1 vs. L2), AOI (AOI1 vs. AOI2), and time (divided into 50 time bins already). I want to plot a graph with four lines, each line stands for the fixation percentage of "L1 AOI1", "L1 AOI2", "L2 AOI1" and "L2 AOI2". An example of my data.frame is as follows:
Stimulus Bin Language AOI percentage
1 1 L1 AOI1 0.75
1 1 L1 AOI2 0.12
1 1 L2 AOI1 0.54
1 1 L2 AOI2 0.36
...
10 1 L1 AOI1 0.85
10 1 L1 AOI2 0.10
10 1 L2 AOI1 0.60
10 1 L2 AOI2 0.23
...
10 7 L1 AOI1 0.64
10 7 L1 AOI2 0.14
10 7 L2 AOI1 0.66
10 7 L2 AOI2 0.21
...
I think I do not need to melt my data, right? because it is already in a long format.
I have draw two graphs with facet_wrap as follows, but how could I get ONE graph with all those information?
ggplot(data,aes(Bin, percentage, linetype = Language)) +`enter code here`
facet_wrap(~ AOI)+
stat_summary(fun.y = mean,geom = "line")+
stat_summary(fun.data = mean_se,geom = "ribbon",
color = NA, alpha = 0.3) +
theme_bw(base_size = 10) +
labs(x = "2000 ms since picture onset (50 time bins)",
y = "fixation proportion") +
scale_linetype_manual(values = c("solid","dashed"))
Any ideas would be of great help to me.
Thanks!

facet_wrap is not the function you need for that. Instead you can add color = AOI in the ggplot(aes()) in addition to linetype = Language. It will make different colors for OAI and different linetypes for Language, so 4 different lines on the same graph.
This post may interest you : https://stackoverflow.com/a/3777592/10580543

Related

How to overplot geom_histogram with stat_bin or geom_line with multiple groupings?

I'm trying to create a plot which has:
histogram of values in "historic" time period, created from method "A"
histogram of values in "future" time period, created from method "A"
either stat_bin or geom_line of values in both "historic" and "future" time period, created from method "B"
example data:
draw method Parameter Value
1 A historic 0.99
1 A future 0.98
1 B historic 0.97
1 B future 0.96
2 A historic 0.9
2 A future 0.88
2 B historic 0.95
2 B future 0.94
3 A historic 0.97
3 A future 0.94
3 B historic 0.91
3 B future 0.89
ggplot(df,aes(x=Value,color=Parameter,fill=Parameter)) +
scale_color_discrete(name="Period",labels=c("historic","future")) +
scale_fill_discrete(name="Period",labels=c("historic","future"),guide="none") +
geom_histogram(aes(y=..density..),
breaks=seq(.8,1.0,by=0.01),
alpha=0.4,position="identity") +
theme(axis.title.x=element_blank(),axis.text.x=element_blank(),
axis.title.y=element_blank()) +
scale_x_continuous(breaks=seq(.8,1.00,by=0.01)) +coord_flip() +
theme(legend.position = "bottom") +
geom_line(data=subset(df,method == "B"),
aes(x=Value),stat="density")
In the image, it looks like the histogram is plotting all of the "method" values. But in the histograms, I only want method == "A" (and Parameter == "historic" and "future"). Is there any way to create different types of plots based on two types of groupings? The geom_line should only be plotting method == "B", Parameter == "historic","future", and geom_histogram should only be plotting , method == "A", Parameter == "historic","future".
I'd like the final result to look like this: (either the left, with geom_line, or the right, with stat_bin)
Plotting what you requested, i.e. histogram bars from method "A" and line from method "B".
Reading in example data:
x <- '
draw method Parameter Value
1 A historic 0.99
1 A future 0.98
1 B historic 0.97
1 B future 0.96
2 A historic 0.9
2 A future 0.88
2 B historic 0.95
2 B future 0.94
3 A historic 0.97
3 A future 0.94
3 B historic 0.91
3 B future 0.89
'
df <- read.table(textConnection(x), header = TRUE)
Plotting:
ggplot() +
geom_histogram(data=df %>% filter(method=="A"),
aes(x=Value, y=..density.., fill=Parameter),
breaks=seq(.8, 1.0, by=0.01),
alpha=0.4, position="identity") +
geom_line(data=df %>% filter(method=="B"),
aes(x=Value, colour=Parameter), stat="density") +
scale_fill_discrete(name="", labels=c("Historic, A","Future, A")) +
scale_colour_discrete(name="", labels=c("Historic B","Future B")) +
coord_flip()

Boxplot Integration three information levels

I have a question on how to plot my data using a boxplot and integrating 3 different information types. In particular, I have a data frame that looks like this:
Exp_number Condition Cell_Type Gene1 Gene2 Gene3
1 2 Cancer 0.33 0.2 1.2
1 2 Cancer 0.12 1.12 2.5
1 4 Fibro 3.4 2.2 0.8
2 4 Cancer 0.12 0.4 0.11
2 4 Normal 0.001 0.01 0.001
3 1 Cancer 0.22 1.2 3.2
2 1 Normal 0.001 0.00003 0.00045
for a total of 20.000 columns and 110 rows (rows are samples).
I would like to plot a boxplot in which data are grouped first by a condition. Then, in each condition, I would like to highlight, for example using different colors, the exp_number and finally, I don't know how but I would like to highlight the cell type. The aim is to highlight the differences between exp_number between conditions in terms of gene expression and also differences of cell types between Exp_numbers.
Is there a simple way to integrate all this information in a single plot?
Thank you in advance
What about this approach
dat <- data.frame(Exp_number=factor(sample(1:3,100,replace = T)),
condition=factor(sample(1:4,100,T)),
Cell_type=factor(sample(c("Normal", "Cancer", "Fibro"), 100, replace=T)),
Gene1=abs(rnorm(100, 5, 1)),
Gene2=abs(rnorm(100, 6, 0.5)),
Gene3=abs(rnorm(100, 4, 3)))
library(reshape2)
dat2 <- melt(dat, id=c("Exp_number", "condition", "Cell_type"))
ggplot(dat2, aes(x=Exp_number, y=value, col=Cell_type)) +
geom_boxplot() +
facet_grid(~ condition) +
theme_bw() +
ylab("Expression")
That gives the following result
Similar to #storaged's answer, but leveraging the two dimensions of facet_grid to represent 2 of your variables:
ggplot(dat2, aes(x=Cell_type, y=Expression)) +
geom_boxplot() +
facet_grid(Exp_number ~ condition) +
theme_bw()
The data:
library(reshape2)
dat <- data.frame(Exp_number=factor(sample(1:3,100,replace = T)),
condition=factor(sample(1:4,100,T)),
Cell_type=factor(sample(c("Normal", "Cancer", "Fibro"), 100, replace=T)),
Gene1=abs(rnorm(100, 5, 1)),
Gene2=abs(rnorm(100, 6, 0.5)),
Gene3=abs(rnorm(100, 4, 3)))
dat2 <- melt(dat, id=c("Exp_number", "condition", "Cell_type"), value.name = 'Expression')
dat2$Exp_number <- paste('Exp.', dat2$Exp_number)
dat2$condition <- paste('Condition', dat2$condition)

ggplot2 - include one level of a factor in all facets

I have some time series data that is facet wrapped by a variable 'treatment'. One of the levels of this 'treatment' factor the a negative control & I want to include it in every facet.
For example using R dataset 'Theoph':
data("Theoph")
head(Theoph)
Subject Wt Dose Time conc
1 1 79.6 4.02 0.00 0.74
2 1 79.6 4.02 0.25 2.84
3 1 79.6 4.02 0.57 6.57
4 1 79.6 4.02 1.12 10.50
5 1 79.6 4.02 2.02 9.66
6 1 79.6 4.02 3.82 8.58
Theoph$Subject <- factor(Theoph$Subject, levels = unique(Theoph$Subject)) # set factor order
ggplot(Theoph, aes(x=Time, y=conc, colour=Subject)) +
geom_line() +
geom_point() +
facet_wrap(~ Subject)
How could I include the data corresponding to Subject '1' (the control) to be included in each facet? (And ideally removing the facet that contains Subject 1's data alone.)
Thank you!
To have a certain subject appear in every facet, we need to replicate it's data for every facet. We'll create a new column called facet, replicate the Subject 1 data for each other value of Subject, and for Subject != 1, set facet equal to Subject.
every_facet_data = subset(Theoph, Subject == 1)
individual_facet_data = subset(Theoph, Subject != 1)
individual_facet_data$facet = individual_facet_data$Subject
every_facet_data = merge(every_facet_data,
data.frame(Subject = 1, facet = unique(individual_facet_data$facet)))
plot_data = rbind(every_facet_data, individual_facet_data)
library(ggplot2)
ggplot(plot_data, aes(x=Time, y=conc, colour=Subject)) +
geom_line() +
geom_point() +
facet_wrap(~ facet)

ggplot2 geom_bar position failure

I am using the ..count.. transformation in geom_bar and get the warning
position_stack requires non-overlapping x intervals when some of my categories have few counts.
This is best explained using some mock data (my data involves direction and windspeed and I retain names relating to that)
#make data
set.seed(12345)
FF=rweibull(100,1.7,1)*20 #mock speeds
FF[FF>60]=59
dir=sample.int(10,size=100,replace=TRUE) # mock directions
#group into speed classes
FFcut=cut(FF,breaks=seq(0,60,by=20),ordered_result=TRUE,right=FALSE,drop=FALSE)
# stuff into data frame & plot
df=data.frame(dir=dir,grp=FFcut)
ggplot(data=df,aes(x=dir,y=(..count..)/sum(..count..),fill=grp)) + geom_bar()
This works fine, and the resulting plot shows the frequency of directions grouped according to speed. It is of relevance that the velocity class with the fewest counts (here "[40,60)") will have 5 counts.
However more velocity classes leads to a warning. For instance, with
FFcut=cut(FF,breaks=seq(0,60,by=15),ordered_result=TRUE,right=FALSE,drop=FALSE)
the velocity class with the fewest counts (now "[45,60)") will have only 3 counts and ggplot2 will warn that
position_stack requires non-overlapping x intervals
and the plot will show data in this category spread out along the x axis.
It seems that 5 is the minimum size for a group to have for this to work correctly.
I would appreciate knowing if this is a feature or a bug in stat_bin (which geom_bar is using) or if I am simply abusing geom_bar.
Also, any suggestions how to get around this would be appreciated.
Sincerely
This occurs because df$dir is numeric, so the ggplot object assumes a continuous x-axis, and aesthetic parameter group is based on the only known discrete variable (fill = grp).
As a result, when there simply aren't that many dir values in grp = [45,60), ggplot gets confused over how wide each bar should be. This becomes more visually obvious if we split the plot into different facets:
ggplot(data=df,
aes(x=dir,y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar() +
facet_wrap(~ grp)
> for(l in levels(df$grp)) print(sort(unique(df$dir[df$grp == l])))
[1] 1 2 3 4 6 7 8 9 10
[1] 1 2 3 4 5 6 7 8 9 10
[1] 2 3 4 5 7 9 10
[1] 2 4 7
We can also check manually that the minimum difference between sorted df$dir values is 1 for the first three grp values, but 2 for the last one. The default bar width is thus wider.
The following solutions should all achieve the same result:
1. Explicitly specify the same bar width for all groups in geom_bar():
ggplot(data=df,
aes(x=dir,y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar(width = 0.9)
2. Convert dir to a categorical variable before passing it to aes(x = ...):
ggplot(data=df,
aes(x=factor(dir), y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar()
3. Specify that the group parameter should be based on both df$dir & df$grp:
ggplot(data=df,
aes(x=dir,
y=(..count..)/sum(..count..),
group = interaction(dir, grp),
fill = grp)) +
geom_bar()
This doesn't directly solve the issue, because I also don't get what's going on with the overlapping values, but it's a dplyr-powered workaround, and might turn out to be more flexible anyway.
Instead of relying on geom_bar to take the cut factor and give you shares via ..count../sum(..count..), you can easily enough just calculate those shares yourself up front, and then plot your bars. I personally like having this type of control over my data and exactly what I'm plotting.
First, I put dir and FF into a data frame/tbl_df, and cut FF. Then count lets me group the data by dir and grp and count up the number of observations for each combination of those two variables, then calculate the share of each n over the sum of n. I'm using geom_col, which is like geom_bar but when you have a y value in your aes.
library(tidyverse)
set.seed(12345)
FF <- rweibull(100,1.7,1) * 20 #mock speeds
FF[FF > 60] <- 59
dir <- sample.int(10, size = 100, replace = TRUE) # mock directions
shares <- tibble(dir = dir, FF = FF) %>%
mutate(grp = cut(FF, breaks = seq(0, 60, by = 15), ordered_result = T, right = F, drop = F)) %>%
count(dir, grp) %>%
mutate(share = n / sum(n))
shares
#> # A tibble: 29 x 4
#> dir grp n share
#> <int> <ord> <int> <dbl>
#> 1 1 [0,15) 3 0.03
#> 2 1 [15,30) 2 0.02
#> 3 2 [0,15) 4 0.04
#> 4 2 [15,30) 3 0.03
#> 5 2 [30,45) 1 0.01
#> 6 2 [45,60) 1 0.01
#> 7 3 [0,15) 6 0.06
#> 8 3 [15,30) 1 0.01
#> 9 3 [30,45) 2 0.02
#> 10 4 [0,15) 6 0.06
#> # ... with 19 more rows
ggplot(shares, aes(x = dir, y = share, fill = grp)) +
geom_col()

how to put percentage label in ggplot when geom_text is not suitable?

Here is my simplified data :
company <-c(rep(c(rep("company1",4),rep("company2",4),rep("company3",4)),3))
product<-c(rep(c(rep(c("product1","product2","product3","product4"),3)),3))
week<-c( c(rep("w1",12),rep("w2",12),rep("w3",12)))
mydata<-data.frame(company=company,product=product,week=week)
mydata$rank<-c(rep(c(1,3,2,3,2,1,3,2,3,2,1,1),3))
mydata=mydata[mydata$company=="company1",]
And, R code I used :
ggplot(mydata,aes(x = week,fill = as.factor(rank))) +
geom_bar(position = "fill")+
scale_y_continuous(labels = percent_format())
In the bar plot, I want to label the percentage by week, by rank.
The problem is the fact that the data doesn't have percentage of rank. And the structure of this data is not suitable to having one.
(of course, the original data has much more observations than the example)
Is there anyone who can teach me How I can label the percentage in this graph ?
I'm not sure I understand why geom_text is not suitable. Here is an answer using it, but if you specify why is it not suitable, perhaps someone might come up with an answer you are looking for.
library(ggplot2)
library(plyr)
mydata = mydata[,c(3,4)] #drop unnecessary variables
data.m = melt(table(mydata)) #get counts and melt it
#calculate percentage:
m1 = ddply(data.m, .(week), summarize, ratio=value/sum(value))
#order data frame (needed to comply with percentage column):
m2 = data.m[order(data.m$week),]
#combine them:
mydf = data.frame(m2,ratio=m1$ratio)
Which gives us the following data structure. The ratio column contains the relative frequency of given rank within specified week (so one can see that rank == 3 is twice as abundant as the other two).
> mydf
week rank value ratio
1 w1 1 1 0.25
4 w1 2 1 0.25
7 w1 3 2 0.50
2 w2 1 1 0.25
5 w2 2 1 0.25
8 w2 3 2 0.50
3 w3 1 1 0.25
6 w3 2 1 0.25
9 w3 3 2 0.50
Next, we have to calculate the position of the percentage labels and plot it.
#get positions of percentage labels:
mydf = ddply(mydf, .(week), transform, position = cumsum(value) - 0.5*value)
#make plot
p =
ggplot(mydf,aes(x = week, y = value, fill = as.factor(rank))) +
geom_bar(stat = "identity")
#add percentage labels using positions defined previously
p + geom_text(aes(label = sprintf("%1.2f%%", 100*ratio), y = position))
Is this what you wanted?

Resources