Related
I just started learning R. I melted my dataframe and used ggplot to get this graph. There's supposed to be two lines on the same graph, but the lines connecting seem random.
Correct points plotted, but wrong lines.
# Melted my data to create new dataframe
AvgSleep2_DF <- melt(AvgSleep_DF , id.vars = 'SleepDay_Date',
variable.name = 'series')
# Plotting
ggplot(AvgSleep2_DF, aes(SleepDay_Date, value, colour = series)) +
geom_point(aes(colour = series)) +
geom_line(aes(colour = series))
With or without the aes(colour = series) in the geom_line results in the same graph. What am I doing wrong here?
The following might explain what geom_line() does when you specify aesthetics in the ggplot() call.
I assign a deliberate colour column that differs from the series specification!
df <- data.frame(
x = c(1,2,3,4,5)
, y = c(2,2,3,4,2)
, colour = factor(c(rep(1,3), rep(2,2)))
, series = c(1,1,2,3,3)
)
df
x y colour series
1 1 2 1 1
2 2 2 1 1
3 3 3 1 2
4 4 4 2 3
5 5 2 2 3
Inheritance in ggplot will look for aesthetics defined in an upper layer.
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) + # setting the size to stress point layer call
geom_line() # geom_line will "inherit" a "grouping" from the colour set above
This gives you
While we can control the "grouping" associated to each line(segment) as follows:
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) +
geom_line(aes(group = series) # defining specific grouping
)
Note: As I defined a separate "group" in the series column for the 3rd point, it is depicted - in this case - as a single point "line".
I have a df like this:
set.seed(123)
df <- data.frame(Delay=rep(-5:6, times=8, each=1),
ID= rep(c("A","B","C","D"), times=1, each=24),
variable=rep(c("R2","SE"), times=4, each=12),
value=c(0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73,0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73,0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73,0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73))
df$ID <- as.factor(df$ID)
df$variable <- as.factor(df$variable)
Plot<- ggplot(df[df$ID=="B",], aes(x=Delay, y=value, group=variable, colour=variable)) +
geom_point(size=1) +
geom_line () +
theme_hc() +
theme(legend.position="right") +
labs(x= '\nDelay',y=expression(R^{2})) +
guides(color=guide_legend(override.aes=list(fill=NA))) +
scale_x_continuous(breaks=seq(-5,5,1)) +
scale_color_jco()
Plot
I am plotting just data of B.
I would like to add a vertical for the minimum value of SE and a vertical line for the maximum value of R2. I would like that the lines had the same colour than the variable. However, I don't know how to do it. The colour of the vertical lines are black as you can see below, so I don't know how to indicate I want the specific colour I Used previously.
Plot <- Plot + geom_vline(xintercept = 0)
Plot
Does anyone know how add both vertical lines using the same colours that for the variables?
You don't need to find the color to instruct ggplot2 to reuse it: you can supply "new data" with your desired x-intercept lines, and identify each v-line as belonging to a particular variable to use that variable's color.
I don't have your original Plot object or call, so my colors/theme will be different.
library(ggplot2)
ggplot(df, aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable),
data = data.frame(Delay = 0, variable = "R2"))
Or with multiple v-lines:
ggplot(df, aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable),
data = data.frame(Delay = c(-1, 1, 2), variable = c("R2", "SE", "R2")))
This edit might answer this and your other question:
mins <- do.call(rbind, by(df, df[,c("ID", "variable")], function(z) z[which.min(z$value),]))
mins
# Delay ID variable value
# 12 6 A R2 0.22
# 36 6 B R2 0.22
# 60 6 C R2 0.22
# 84 6 D R2 0.22
# 19 1 A SE 0.28
# 43 1 B SE 0.28
# 67 1 C SE 0.28
# 91 1 D SE 0.28
ggplot(df[df$ID == "B",], aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable), data = mins)
Or if you want to see multiple IDs, you can facet,
ggplot(df, aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable), data = mins) +
facet_wrap("ID")
I think #r2evans approach to your specific problem is the correct one. However, to answer the more general question about how you can retrieve the colours from an applied colour scale (e.g. if you want to modify the colour etc), you can get it without going through ggbuild, using the following:
Plot$scales$get_scales("colour")$palette(2)
[1] "#0073C2FF" "#EFC000FF"
So we could do:
# Get colours
my_blue <- Plot$scales$get_scales("colour")$palette(2)[1]
my_yellow <- Plot$scales$get_scales("colour")$palette(2)[2]
# Get index of max R2 and min SE
maxR2 <- which.max(df$value[df$ID == "B" & df$variable == "R2"])
minSE <- which.min(df$value[df$ID == "B" & df$variable == "SE"])
# Get value of Delay at maxR2 and minSE
D_R2 <- df$Delay[df$ID == "B" & df$variable == "R2"][maxR2]
D_SE <- df$Delay[df$ID == "B" & df$variable == "SE"][minSE]
# Plot lines at the correct positions and with the desired colours
Plot + geom_vline(aes(xintercept = D_R2), colour = my_blue) +
geom_vline(aes(xintercept = D_SE), colour = my_yellow)
I need plot two grouped barcodes with two dataframes that has distinct number of rows: 6, 5.
I tried many codes in R but I don't know how to fix it
Here are my data frames: The Freq colum must be in Y axis and the inter and intra columns must be the x axis.
> freqinter
inter Freq
1 0.293040975264367 17
2 0.296736775990729 2
3 0.297619926364764 4
4 0.587377012109561 1
5 0.595245125315916 4
6 0.597022018595893 2
> freqintra
intra Freq
1 0 3
2 0.293040975264367 15
3 0.597022018595893 4
4 0.598809552335782 2
5 0.898227748764939 6
I expect to plot the barplots in the same plot and could differ inter e intra values by colour
I want a picture like this one:
You probably want a histogram. Use the raw data if possible. For example:
library(tidyverse)
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id") %>%
uncount(Freq)
ggplot(df, aes(x, fill = id)) +
geom_histogram(binwidth = 0.1, position = 'dodge', col = 1) +
scale_fill_grey() +
theme_minimal()
With the data you posted I don't think you can have this graph to look good. You can't have bars thin enough to differentiate 0.293 and 0.296 when your data ranges from 0 to 0.9.
Maybe you could try to treat it as a factor just to illustrate what you want to do:
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id")
ggplot(df, aes(x = as.factor(x), y = Freq, fill = id)) +
geom_bar(stat = "identity", position = position_dodge2(preserve = "single")) +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
You can also check the problem by not treating your x variable as a factor:
ggplot(df, aes(x = x, y = Freq, fill = id)) +
geom_bar(stat = "identity", width = 0.05, position = "dodge") +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
Either the bars must be very thin (small width), or you'll get overlapping x intervals breaking the plot.
I have a set of paired data, and I'm using ggplot2.boxplot (of the easyGgplot2 package) with added (jittered) individual data points:
ggplot2.boxplot(data=INdata,xName='condition',yName='vicarious_pain',groupName='condition',showLegend=FALSE,
position="dodge",
addDot=TRUE,dotSize=3,dotPosition=c("jitter", "jitter"),jitter=0.2,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired")
INdata:
ID,condition,pain
1,Treatment,4.5
3,Treatment,12.5
4,Treatment,16
5,Treatment,61.75
6,Treatment,23.25
7,Treatment,5.75
8,Treatment,5.75
9,Treatment,5.75
10,Treatment,44.5
11,Treatment,7.25
12,Treatment,40.75
13,Treatment,17.25
14,Treatment,2.75
15,Treatment,15.5
16,Treatment,15
17,Treatment,25.75
18,Treatment,17
19,Treatment,26.5
20,Treatment,27
21,Treatment,37.75
22,Treatment,26.5
23,Treatment,15.5
25,Treatment,1.25
26,Treatment,5.75
27,Treatment,25
29,Treatment,7.5
1,No Treatment,34.5
3,No Treatment,46.5
4,No Treatment,34.5
5,No Treatment,34
6,No Treatment,65
7,No Treatment,35.5
8,No Treatment,48.5
9,No Treatment,35.5
10,No Treatment,54.5
11,No Treatment,7
12,No Treatment,39.5
13,No Treatment,23
14,No Treatment,11
15,No Treatment,34
16,No Treatment,15
17,No Treatment,43.5
18,No Treatment,39.5
19,No Treatment,73.5
20,No Treatment,28
21,No Treatment,12
22,No Treatment,30.5
23,No Treatment,33.5
25,No Treatment,20.5
26,No Treatment,14
27,No Treatment,49.5
29,No Treatment,7
The resulting plot looks like this:
However, since this is paired data, I want to represent this in the plot - specifically to add lines between paired datapoints. I've tried adding
... + geom_line(aes(group = ID))
..but I am not able to implement this into the ggplot2.boxplot code. Instead, I get this error:
Error in if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
argument is not interpretable as logical
In addition: Warning message:
In if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
the condition has length > 1 and only the first element will be used
Grateful for any input on this!
I do not know the package from which ggplot2.boxplot comes from but I will show you how perform the requested operation in ggplot.
The requested output is a bit problematic for ggplot since you want both points and lines connecting them to be jittered by the same amount. One way to perform that is to jitter the points prior making the plot. But the x axis is discrete, here is a workaround:
b <- runif(nrow(df), -0.1, 0.1)
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition), y = pain, group = condition))+
geom_point(aes(x = as.numeric(condition) + b, y = pain)) +
geom_line(aes(x = as.numeric(condition) + b, y = pain, group = ID)) +
scale_x_continuous(breaks = c(1,2), labels = c("No Treatment", "Treatment"))+
xlab("condition")
First I have made a vector to jitter by called b, and converted the x axis to numeric so I could add b to the x axis coordinates. Latter I relabeled the x axis.
I do agree with eipi10's comment that the plot works better without jitter:
ggplot(df, aes(condition, pain)) +
geom_boxplot(width=0.3, size=1.5, fatten=1.5, colour="grey70") +
geom_point(colour="red", size=2, alpha=0.5) +
geom_line(aes(group=ID), colour="red", linetype="11") +
theme_classic()
and the updated plot with jittered points eipi10 style:
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition),
y = pain,
group = condition),
width=0.3,
size=1.5,
fatten=1.5,
colour="grey70")+
geom_point(aes(x = as.numeric(condition) + b,
y = pain),
colour="red",
size=2,
alpha=0.5) +
geom_line(aes(x = as.numeric(condition) + b,
y = pain,
group = ID),
colour="red",
linetype="11") +
scale_x_continuous(breaks = c(1,2),
labels = c("No Treatment", "Treatment"),
expand = c(0.2,0.2))+
xlab("condition") +
theme_classic()
Although I like the oldschool way of plotting with ggplot as shown by #missuse's answer, I wanted to check whether using your ggplot2.boxplot-based code this was also possible.
I loaded your data:
'data.frame': 52 obs. of 3 variables:
$ ID : int 1 3 4 5 6 7 8 9 10 11 ...
$ condition: Factor w/ 2 levels "No Treatment",..: 2 2 2 2 2 2 2 2 2 2 ...
$ pain : num 4.5 12.5 16 61.8 23.2 ...
And called your code, adding geom_line at the end as you suggested your self:
ggplot2.boxplot(data = INdata,xName = 'condition', yName = 'pain', groupName = 'condition',showLegend = FALSE,
position = "dodge",
addDot = TRUE, dotSize = 3, dotPosition = c("jitter", "jitter"), jitter = 0,
ylim = c(0,100),
backgroundColor = "white",xtitle = "",ytitle = "Pain intenstity", mainTitle = "Pain intensity",
brewerPalette = "Paired") + geom_line(aes(group = ID))
Note that I set jitter to 0. The resulting graph looks like this:
If you don't set jitter to 0, the lines still run from the middle of each boxplot, ignoring the horizontal location of the dots.
Not sure why your call gives an error. I thought it might be a factor issue, but I see that my ID variable is not factor class.
I implemented missuse's jitter solution into the ggplot2.boxplot approach in order to align the dots and lines. Instead of using "addDot", I had to instead add dots using geom_point (and lines using geom_line) after, so I could apply the same jitter vector to both dots and lines.
b <- runif(nrow(df), -0.2, 0.2)
ggplot2.boxplot(data=df,xName='condition',yName='pain',groupName='condition',showLegend=FALSE,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired") +
geom_point(aes(x=as.numeric(condition) + b, y=pain),colour="black",size=3, alpha=0.7) +
geom_line(aes(x=as.numeric(condition) + b, y=pain, group=ID), colour="grey30", linetype="11", alpha=0.7)
this is my first question on SO, I hope someone can help me answer it.
I'm reading data from a csv with R with data<-read.csv("/data.csv") and get something like:
Group x y size Color
Medium 1 2 2000 yellow
Small -1 2 1000 red
Large 2 -1 4000 green
Other -1 -1 2500 blue
Each group color may vary, they are assigned by a formula when the csv file is generated, but those are all the possible colors (the number of groups may also vary).
I've been trying to use ggplot() like so:
data<-read.csv("data.csv")
xlim<-max(c(abs(min(data$x)),abs(max(data$x))))
ylim<-max(c(abs(min(data$y)),abs(max(data$y))))
data$Color<-as.character(data$Color)
print(data)
ggplot(data, aes(x = x, y = y, label = Group)) +
geom_point(aes(size = size, colour = Group), show.legend = TRUE) +
scale_color_manual(values=c(data$Color)) +
geom_text(size = 4) +
scale_size(range = c(5,15)) +
scale_x_continuous(name="x", limits=c(xlim*-1-1,xlim+1))+
scale_y_continuous(name="y", limits=c(ylim*-1-1,ylim+1))+
theme_bw()
Everything is correct except for the colors
small is drawn blue
Medium is drawn red
Other is drawn green
Large is drawn yellow
I noticed the legend at the right orders the Groups alphabetically (Large, Medium, Other, Small), but the colors stay in the csv file order.
Here is a screenshot of the plot.
Can anyone tell me what's missing in my code to fix this? other approaches to achieve the same result are welcome.
One way to do this, as suggested by help("scale_colour_manual") is to use a named character vector:
col <- as.character(data$Color)
names(col) <- as.character(data$Group)
And then map the values argument of the scale to this vector
# just showing the relevant line
scale_color_manual(values=col) +
full code
xlim<-max(c(abs(min(data$x)),abs(max(data$x))))
ylim<-max(c(abs(min(data$y)),abs(max(data$y))))
col <- as.character(data$Color)
names(col) <- as.character(data$Group)
ggplot(data, aes(x = x, y = y, label = Group)) +
geom_point(aes(size = size, colour = Group), show.legend = TRUE) +
scale_color_manual(values=col) +
geom_text(size = 4) +
scale_size(range = c(5,15)) +
scale_x_continuous(name="x", limits=c(xlim*-1-1,xlim+1))+
scale_y_continuous(name="y", limits=c(ylim*-1-1,ylim+1))+
theme_bw()
Ouput:
Data
data <- read.table("Group x y size Color
Medium 1 2 2000 yellow
Small -1 2 1000 red
Large 2 -1 4000 green
Other -1 -1 2500 blue",head=TRUE)
A Slightly Better Solution...
I had never heard of R back when this question was answered by #scoa, and I don't know if my solution was available, but you can do what the OP asks with slightly less work using scale_color_identity().
library(tidyverse)
data <- tribble(
~Group,~x,~y,~size,~Color,
"Medium",1,2,2000,"yellow",
"Small",-1, 2,1000,"red",
"Large",2,-1,4000,"green",
"Other",-1,-1,2500,"blue")
xlim<-max(c(abs(min(data$x)),abs(max(data$x))))
ylim<-max(c(abs(min(data$y)),abs(max(data$y))))
ggplot(data, aes(x = x, y = y, label = Group)) +
geom_point(aes(size = size, colour = Color), show.legend = TRUE) + # Set aes(colour = Color) (the column in the dataframe)
scale_color_identity() + # This tells ggplot to use the values explicit in the 'Color' column
geom_text(size = 4) +
scale_size(range = c(5,15)) +
scale_x_continuous(name="x", limits=c(xlim*-1-1,xlim+1))+
scale_y_continuous(name="y", limits=c(ylim*-1-1,ylim+1))+
theme_bw()
scale_color_identity()
By using this, you don't need to create the separate named vector that you do with scale_color_manual() and you can directly use the 'Color' column (note the change in geom_point(aes(colour = Group,... to geom_point(aes(colour = Color,...!!!).