I just started learning R. I melted my dataframe and used ggplot to get this graph. There's supposed to be two lines on the same graph, but the lines connecting seem random.
Correct points plotted, but wrong lines.
# Melted my data to create new dataframe
AvgSleep2_DF <- melt(AvgSleep_DF , id.vars = 'SleepDay_Date',
variable.name = 'series')
# Plotting
ggplot(AvgSleep2_DF, aes(SleepDay_Date, value, colour = series)) +
geom_point(aes(colour = series)) +
geom_line(aes(colour = series))
With or without the aes(colour = series) in the geom_line results in the same graph. What am I doing wrong here?
The following might explain what geom_line() does when you specify aesthetics in the ggplot() call.
I assign a deliberate colour column that differs from the series specification!
df <- data.frame(
x = c(1,2,3,4,5)
, y = c(2,2,3,4,2)
, colour = factor(c(rep(1,3), rep(2,2)))
, series = c(1,1,2,3,3)
)
df
x y colour series
1 1 2 1 1
2 2 2 1 1
3 3 3 1 2
4 4 4 2 3
5 5 2 2 3
Inheritance in ggplot will look for aesthetics defined in an upper layer.
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) + # setting the size to stress point layer call
geom_line() # geom_line will "inherit" a "grouping" from the colour set above
This gives you
While we can control the "grouping" associated to each line(segment) as follows:
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) +
geom_line(aes(group = series) # defining specific grouping
)
Note: As I defined a separate "group" in the series column for the 3rd point, it is depicted - in this case - as a single point "line".
Related
In ggplot, is there any simple way of overriding the line attributes of a single group(s) without having to specify the entirety of the color/line pallet via scale_*_manual()?
In the example below, I basically want to make all the boot_* lines gray and skinny, while I want all other lines to retain the default colors/widths otherwise being used.
I know there's a lot of brute ways of doing this by some combo of a) creating some auxiliary variables in the data-frame based on the string-pattern that will server as my color/size group, then b) generating the plot below, extracting all the color-layer info, and then filling out an entire scale_color_manual() and scale_size_manual() map, and c)replacing the 'boot_*' values with "grey."
Are there any versatile shortcuts here?
library(dplyr)
library(ggplot)
set.seed(231)
df=tibble(time=c(1:5), actual=2*time+3, estimate = actual+rnorm(length(actual)))
for(i in 1:8){
df[paste('boot_', i, sep='')] = df$estimate + rnorm(nrow(df))
}
> head(df) %>% data.frame
# time actual estimate boot_1 boot_2 boot_3 boot_4 boot_5 boot_6 boot_7
# 1 1 5 4.466898 4.684295 4.240585 4.786520 5.904332 4.862498 2.092772 4.595850
# 2 2 7 4.688336 4.751258 6.074914 5.694181 3.445036 4.639329 4.548511 5.453597
# 3 3 9 8.045802 7.167972 6.858666 7.519752 7.721405 7.801243 10.156436 9.521482
# 4 4 11 11.262516 11.826206 10.682760 11.137814 11.252465 11.452442 11.925339 11.754248
# 5 5 13 12.526643 12.492315 13.927974 14.176896 11.924183 12.950479 11.257865 13.430229
# boot_8
# 1 3.987001
# 2 3.813539
# 3 7.549984
# 4 11.482360
# 5 11.645106
# Melt for ggplot compatibility
df_long = df %>%
pivot_longer(cols=(-time))
head(df_long) %>% data.frame
# time name value
# 1 1 actual 5.000000
# 2 1 estimate 4.466898
# 3 1 boot_1 4.684295
# 4 1 boot_2 4.240585
# 5 1 boot_3 4.786520
# 6 1 boot_4 5.904332
## The basic ggplot
df_long %>%
ggplot(aes(x=time, y=value, color=name)) + geom_line()
You could just use the first four characters of name for the colour aesthetic (using substr), and the full name as a group aesthetic. It's a bit hacky but it's short, effective, and all gets done in the plotting code without extra data wrangling, post-hoc changes or a long vector of colour mappings.
df_long %>%
ggplot(aes(x = time, y = value, color = substr(name, 1, 4), group = name)) +
geom_line() +
scale_color_manual(labels = c("actual", "boot", "estimate"),
values = c("orange", "gray", "blue3"), name = "name")
An alternative is using filtering to have two sets of lines: one coloured, and one merely grouped. This has the benefit that you don't need to add any scale calls at all:
df_long %>%
filter(!grepl("boot", name)) %>%
ggplot(aes(x = time, y = value, color = name)) +
geom_line(data = filter(df_long, grepl("boot", name)),
aes(group = name), color = "gray", size = 0.3) +
geom_line()
EDIT
It's pretty difficult to only specify an aesthetic mapping for a single (multiple) group, while leaving the others at default values. However, it is possible using ggnewscale. Here we only have to specify the color of the boot group:
library(ggnewscale)
df_long %>%
filter(!grepl("boot", name)) %>%
ggplot(aes(x = time, y = value)) +
new_scale_color() +
geom_line(aes(color = name)) +
scale_color_discrete(name = "Variable") +
new_scale_color() +
geom_line(data = filter(df_long, grepl("boot", name)),
aes(group = name, color = "boot"), size = 0.3) +
scale_color_manual(values = "gray", name = "") +
theme(legend.margin = margin(-28, 10, 0, 0))
I am using ggplot to put boxplot and line in the same plot. I have two data frames, here are snippets for these two DFs:
TMA.core variable value
1 I-5 H&E 356642.6
2 B-1 H&E 490276.9
3 B-13 H&E 460831.8
4 L-11 H&E 551614.2
5 B-6 H&E 663711.8
6 F-10 H&E 596832.8
(there are many variables.)
TMA.core Mean CoV
I-5 390829.7 0.15181577
B-1 414909.9 0.21738852
B-13 500829.8 0.39049256
L-11 537229.7 0.07387486
B-6 575698.9 0.44764127
F-10 589245.2 0.15382864
What I want to do is draw boxplot using the first data frame and then plot the CoV for the corresponding TMA core and connect using geom_line.
My codes are:
ggplot() +
geom_boxplot(data = Merge_stats_melt, aes(x = reorder(TMA.core, value, FUN = mean), y = value)) +
geom_line(data = Merge_stas_mean_order, aes(x = reorder(TMA.core, Mean), y = CoV, group = 1)) +
scale_y_continuous(
# Add a second axis and specify its features
sec.axis = sec_axis(~./1000000, name = 'CoV')
)
Using these codes I can draw the boxplot but the line is always a horizontal line at y = 0.
How to solve this issue?
Using one or two data frames doesn't really matter. Just remember to adjust the y aesthetic accordingly, which you forgot to do.
library(ggplot2)
library(scales)
Find the ideal scaling factor for the dual axis
ratio <- max(Merge_stats_melt$value) / max(Merge_stas_mean_order$CoV)
ggplot() +
geom_boxplot(data = Merge_stats_melt, aes(x = reorder(TMA.core, value, FUN = mean), y = value)) +
geom_line(data = Merge_stas_mean_order, aes(x = reorder(TMA.core, Mean), y = CoV*ratio, group = 1)) +
scale_y_continuous(labels=comma,
sec.axis = sec_axis(~./ratio, name = 'CoV')
)
I'm using ggplot2 to create a simple dot plot of -1 to +1 correlation values using the following R code:
ggplot(dataframe, aes(x = exit)) +
geom_point(aes(y= row.names(dataframe))) +
geom_text(aes(y=exit, label=samplesize))
The y-axis has text labels, and I believe those text labels may be the reason that my geom_text() data point labels are squished down into the bottom of the plot as pictured here:
How can I change my plotting so that the data point labels appear on the dots themselves?
I understand that you would like to have the samplesize appear above each data point in the plot. Here is a sample plot with a sample data frame that does this:
EDIT: Per note by Gregor, changed the geom_text() call to utilize aes() when referencing the data. Thanks for the heads up!
top10_rank<-
String Number
4 h 0
1 a 1
11 w 1
3 z 3
7 z 3
2 b 4
8 q 5
6 k 6
9 r 9
5 x 10
10 l 11
x<-ggplot(data=top10_rank, aes(x = Number,
y = String)) + geom_point(size=3) + scale_y_discrete(limits=top10_rank$String)
x + geom_text(data=top10_rank, size=5, color = 'blue',
aes(x = Number,label = Number), hjust=0, vjust=0)
Not sure if this is what you wanted though.
Your problem is simply that you switched the y variables:
# your code
ggplot(dataframe, aes(x = exit)) +
geom_point(aes(y = row.names(dataframe))) + # here y is the row names
geom_text(aes(y =exit, label = samplesize)) # here y is the exit column
Since you want the same y-values for both you can define this in the initial ggplot() call and not worry about repeating it later
# working version
ggplot(dataframe, aes(x = exit, y = row.names(dataframe))) +
geom_point() +
geom_text(aes(label = samplesize))
Using row names is a little fragile, it's a little safer and more robust to actually create a data column with what you want for y values:
# nicer code
dataframe$y = row.names(dataframe)
ggplot(dataframe, aes(x = exit, y = y)) +
geom_point() +
geom_text(aes(label = samplesize))
Having done this, you probably don't want the labels right on top of the points, maybe a little offset would be better:
# best of all?
ggplot(dataframe, aes(x = exit, y = y)) +
geom_point() +
geom_text(aes(x = exit + .05, label = samplesize), vjust = 0)
In the last case, you'll have to play with the adjustment to the x aesthetic, what looks right will depend on the dimensions of your final plot
I've got a problem interacting with the labels in ggplot2.
I have two data sets (Temperature vs. Time) from two experiments but recorded at different timesteps. I've managed to merge the data frames and put them in a long fashion to plot them in the same graph, using the melt function from the reshape2 library.
So, the initial data frames look something like this:
> d1
step Temp
1 512.5 301.16
2 525.0 299.89
3 537.5 299.39
4 550.0 300.58
5 562.5 300.20
6 575.0 300.17
7 587.5 300.62
8 600.0 300.51
9 612.5 300.96
10 625.0 300.21
> d2
step Temp
1 520 299.19
2 540 300.39
3 560 299.67
4 580 299.43
5 600 299.78
6 620 300.74
7 640 301.03
8 660 300.39
9 680 300.54
10 700 300.25
I combine it like this:
> mrgd <- merge(d1, d2, by = "step", all = T)
step Temp.x Temp.y
1 512.5 301.16 NA
2 520.0 NA 299.19
...
And put it into long format for ggplot2 with this:
> melt1 <- melt(mrgd3, id = "step")
> melt1
step variable value
1 512.5 Temp.x 301.16
2 520.0 Temp.x NA
...
Now, I want to for example do a histogram of the distribution of values. I do it like this:
p <- ggplot(data = melt1, aes(x = value, color = variable, fill = variable)) + geom_histogram(alpha = 0.4)
My problem is when I try to modify the Legend of this graph, I don't know how to! I've followed what is suggested in the R Graphics Cookbook book, but I've had no luck.
I've tried to do this, for example (to change the labels of the Legend):
> p + scale_fill_discrete(labels = c("d1", "d2"))
But I just create a "new" Legend box, like so
Or even removing the Legend completely
> p + scale_fill_discrete(guide = F)
I just get this
Finally, doing this also doesn't help
> p + scale_fill_discrete("")
Again, it just adds a new Legend box
Does anyone know what's happening here? It looks as if I'm actually modyfing another Label object, if that makes any sense. I've looked into other related questions in this site, but I haven't found someone having the same problem as me.
Get rid of the aes(color = variable...) to remove the scale that belongs to aes(color = ...).
ggplot(data = melt1, aes(x = value, fill = variable)) +
geom_histogram(alpha = 0.4) +
scale_fill_discrete(labels = c("d1", "d1")) # Change the labels for `fill` scale
This second plot contains aes(color = variable...). Color in this case will draw colored outlines around the histogram bins. You can turn off the scale so that you only have one legend, the one created from fill
ggplot(data = melt1, aes(x = value, color = variable, fill = variable)) +
geom_histogram(alpha = 0.4) +
scale_fill_discrete(labels = c("d1", "d1")) +
scale_color_discrete(guide = F) # Turn off the color (outline) scale
The most straightforward thing to do would be to not use reshape2 or merge at all, but instead to rbind your data frames:
dfNew <- rbind(data.frame(d1, Group = "d1"),
data.frame(d2, Group = "d2"))
ggplot(dfNew, aes(x = Temp, color = Group, fill = Group)) +
geom_histogram(alpha = 0.4) +
labs(fill = "", color = "")
If you wanted to vary alpha by group:
ggplot(dfNew, aes(x = Temp, color = Group, fill = Group, alpha = Group)) +
geom_histogram() +
labs(fill = "", color = "") +
scale_alpha_manual("", values = c(d1 = 0.4, d2 = 0.8))
Note also that the default position for geom_histogram is "stacked". There won't be overlap of the bars unless you use geom_histogram(position = identity).
How can I control the colour of the dots in the scatter plot by ggplot2? I need the first 20 points to have a colour, then the next 20 to have a different colour. At the moment I am using base R plot output. The matrix looks like this
1 4
1 3
2 9
-1 8
9 9
and I have a colour vector which looks like
cols<-c("#B8DBD3","#FFB933","#FF6600","#0000FF")
then
plot(mat[,1],mat[,2],col=cols)
works.
How could I do this ggplot?
Regarding the colours
my cols vector looks ike this
100->n
colours<-c(rep("#B8DBD3",n),rep("#FFB933",n),rep("#FF6600",n),rep("#0000FF",n),rep("#00008B",n),rep("#ADCD00",n),rep("#008B00",n),rep("#9400D3",n))
when I then do
d<-ggplot(new,aes(x=PC1,y=PC2,col=rr))
d+theme_bw() +
scale_color_identity(breaks = rep(colours, each = 1)) +
geom_point(pch=21,size=7)
the colours look completely different from
plot(new[,1],new[,2],col=colours)
this looks like
http://fs2.directupload.net/images/150417/2wwvq9u2.jpg
while ggplot with the same colours looks like
http://fs1.directupload.net/images/150417/bwc5wn7b.jpg
I would recommend creating a column that designates to which group a point belongs to.
library(ggplot2)
xy <- data.frame(x = rnorm(80), y = rnorm(80), col = as.factor(rep(1:4, each = 20)))
cols<-c("#B8DBD3","#FFB933","#FF6600","#0000FF")
ggplot(xy, aes(x = x, y = y, col = col)) +
theme_bw() +
scale_colour_manual(values = cols) +
geom_point()