I'm trying to add a legend to a plot that I've created using ggplot. I load the data in from two csv files, each of which has two columns of 8 rows (not including the header).
I construct a data frame from each file which include a cumulative total, so the dataframe has three columns of data (bv, bin_count and bin_cumulative), 8 rows in each column and every value is an integer.
The two data sets are then plotted as follows. The display is fine but I can't figure out how to add a legend to the resulting plot as it seems the ggplot object itself should have a data source but I'm not sure how to build one where there are multiple columns with the same name.
library(ggplot2)
i2d <- data.frame(bv=c(0,1,2,3,4,5,6,7), bin_count=c(0,0,0,2,1,2,2,3), bin_cumulative=cumsum(c(0,0,0,2,1,2,2,3)))
i1d <- data.frame(bv=c(0,1,2,3,4,5,6,7), bin_count=c(0,1,1,2,3,2,0,1), bin_cumulative=cumsum(c(0,1,1,2,3,2,0,1)))
c_data_plot <- ggplot() +
geom_line(data = i1d, aes(x=i1d$bv, y=i1d$bin_cumulative), size=2, color="turquoise") +
geom_point(data = i1d, aes(x=i1d$bv, y=i1d$bin_cumulative), color="royalblue1", size=3) +
geom_line(data = i2d, aes(x=i2d$bv, y=i2d$bin_cumulative), size=2, color="tan1") +
geom_point(data = i2d, aes(x=i2d$bv, y=i2d$bin_cumulative), color="royalblue3", size=3) +
scale_x_continuous(name="Brightness", breaks=seq(0,8,1)) +
scale_y_continuous(name="Count", breaks=seq(0,12,1)) +
ggtitle("Combine plot of BV cumulative counts")
c_data_plot
I'm fairly new to R and would much appreciate any help.
Per comments, I've edited the code to reproduce the dataset after it's loaded into the dataframes.
Regarding producing a single data frames, I'd welcome advice on how to achieve that - I'm still struggling with how data frames work.
First, we organize the data by combining i1d and i2d. I've added a column data which stores the name of the original dataset.
restructure data
i1d$data <- 'i1d'
i2d$data <- 'i2d'
i12d <- rbind.data.frame(i1d, i2d)
Then, we create the plot, using syntax that is more common to ggplot2:
create plot
ggplot(i12d, aes(x = bv, y = bin_cumulative))+
geom_line(aes(colour = data), size = 2)+
geom_point(colour = 'royalblue', size = 3)+
scale_x_continuous(name="Brightness", breaks=seq(0,8,1)) +
scale_y_continuous(name="Count", breaks=seq(0,12,1)) +
ggtitle("Combine plot of BV cumulative counts")+
theme_bw()
If we specify x and y within the ggplot function, we do not need to keep rewriting it in the various geoms we want to add to the plot. After the first three lines I copied and pasted what you had so that the formatting would match your expectation. I also added theme_bw, because I think it's more visually appealing. We also specify colour in aes using a variable (data) from our data.frame
If we want to take this a step further, we can use the scale_colour_manual function to specify the colors attributed to the different values of the data column in the data.frame i12d:
ggplot(i12d, aes(x = bv, y = bin_cumulative))+
geom_line(aes(colour = data), size = 2)+
geom_point(colour = 'royalblue', size = 3)+
scale_x_continuous(name="Brightness", breaks=seq(0,8,1)) +
scale_y_continuous(name="Count", breaks=seq(0,12,1)) +
ggtitle("Combine plot of BV cumulative counts")+
theme_bw()+
scale_colour_manual(values = c('i1d' = 'turquoise',
'i2d' = 'tan1'))
Related
i have two dataframes comtaining results from epigenetic analysis.
the column from df1 which is important to the plot is labelled beta_ADHD
the column from df2 which is important to the plot is labelled beta_ADHD
I would like to make the the column from df 1 the x axis and the column from df 2 the y axis,
i would also like to label the points on the graph according to the data set they are from.
this is what ive tried so far but nothing has worked yet:
ggp <- ggplot(NULL, aes(Beta_ADHD, Beta_ADHD)) + # Draw ggplot2 plot based on two data frames
geom_point(data = df1, col = "red") +
geom_point(data = df2, col = "blue")
ggp # Draw plot
and i also tried this:
ggplot(data=data.frame(x=df1$Beta_ADHD, y=df2$Beta_ADHD), aes(x=x, y=y)) + geom_point()
I'm at a complete loss here and any help would be greatly appreciated.
I think you need to combine the inputs into a single data frame in order to use them as co-ordinates for a scatter plot. (Also, the 2 data sets must have the same number of values.)
I don't believe it makes sense to label or colour the points according to which data set they are from. As we are taking the x-coordinate from df1 and the y-coordinate from df2, that means that every point comes from both data sets. It is the labels on the x-axis beta_ADHD1 and y-axis beta_ADHD2 that show which data set the value came from. You can change the text and color of the axis titles using xlab(), ylab() and theme().
# create some sample data
df1 <- data.frame(beta_ADHD=runif(100,0,10))
df2 <- data.frame(beta_ADHD=rnorm(100,0,10))
# create a new data frame containing the required co-ordinates
# the values from df1 are named beta_ADHD1 and the values from df2 are named beta_ADHD2
new_df <- data.frame(beta_ADHD1 = df1$beta_ADHD, beta_ADHD2 = df2$beta_ADHD)
# plot this data using ggplot
ggplot(new_df, aes(x = beta_ADHD1, y = beta_ADHD2)) + geom_point() +
xlab('beta_ADHD from df1') + ylab('beta_ADHD from df2') +
theme(axis.title.x = element_text(color ='red'), axis.title.y = element_text(color = 'blue'))
This question already has answers here:
How to add legend to plot with data from multiple data frames
(2 answers)
Closed 2 years ago.
I am using ggplot to create two overlapping density from two different data frames. I need to create a legend for each of the densities.
I have been trying to follow these two posts, but still cannot get it to work:
How to add legend to plot with data from multiple data frames
ggplot legends when plot is built from two data frames
Sample code of what I am trying to do:
df1 = data.frame(x=rnorm(1000,0))
df2 = data.frame(y=rnorm(2500,0.5))
ggplot() +
geom_density(data=df1, aes(x=x), color='darkblue', fill='lightblue', alpha=0.5) +
geom_density(data=df2, aes(x=y), color='darkred', fill='indianred1', alpha=0.5) +
scale_color_manual('Legend Title', limits=c('x', 'y'), values = c('darkblue','darkred')) +
guides(colour = guide_legend(override.aes = list(pch = c(21, 21), fill = c('darkblue','darkred')))) +
theme(legend.position = 'bottom')
Is it possible to manually create a legend?
Or do I need to restructure the data as per this post?
Adding legend to ggplot made from multiple data frames with controlled colors
I'm newish to R so hoping to avoid stacking the data into a single dataframe if I can avoid it as they are weighted densities so have to multiply by different weights as well.
Unlike x, y, label etc., when using the density geom, the color aesthetic can be used within aes(). In order to accomplish what you are looking for, the color aesthetic needs to be moved into aes() enabling you to utilize scale_color_manual. Within that, you can change the values= to whatever you like.
library(tidyverse)
ggplot() +
geom_density(data=df1, aes(x=x, color='darkblue'), fill='lightblue', alpha=0.5) +
geom_density(data=df2, aes(x=y, color='darkred'), fill='indianred1', alpha=0.5) +
scale_color_manual('Legend Title', limits=c('x', 'y'), values = c('darkblue','darkred')) +
guides(colour = guide_legend(override.aes = list(pch = c(21, 21), fill = c('darkblue','darkred')))) +
theme(legend.position = 'bottom')+
scale_color_manual("Legend title", values = c("blue", "red"))
Created on 2020-08-09 by the reprex package (v0.3.0)
I am trying to simply add a legend to my Nyquist plot where I am plotting 2 sets of data: 1 is an experimental set (~600 points), and 2 is a data frame calculated using a transfer function (~1000 points)
I need to plot both and label them. Currently I have them both plotted okay but when i try to add the label using scale_colour_manual no label appears. Also a way to move this label around would be appreciated!! Code Below.
pdf("nyq_2elc.pdf")
nq2 <- ggplot() + geom_point(data = treat, aes(treat$V1,treat$V2), color = "red") +
geom_point(data = circuit, aes(circuit$realTF,circuit$V2), color = "blue") +
xlab("Real Z") + ylab("-Imaginary Z") +
scale_colour_manual(name = 'hell0',
values =c('red'='red','blue'='blue'), labels = c('Treatment','EQ')) +
ggtitle("Nyquist Plot and Equivilent Circuit for 2 Electrode Treatment Setup at 0 Minutes") +
xlim(0,700) + ylim(0,700)
print(nq2)
dev.off()
Ggplot works best with long dataframes, so I would combine the datasets like this:
treat$Cat <- "treat"
circuit$Cat <- "circuit"
CombData <- data.frame(rbind(treat, circuit))
ggplot(CombData, aes(x=V1, y=V2, col=Cat))+geom_point()
This should give you the legend you want.
You probably have to change the names/order of the columns of dataframes treat and circuit so they can be combined, but it's hard to tell because you're not giving us a reproducible example.
I'm struggling with facet_wrap in R. It should be simple however the facet variable is not being picked up? Here is what I'm running:
plot = ggplot(data = item.household.descr.count, mapping = aes(x=item.household.descr.count$freq, y = item.household.descr.count$descr, color = item.household.descr.count$age.cat)) + geom_point()
plot = plot + facet_wrap(~ age.cat, ncol = 2)
plot
I colored the faceting variable to try to help illustrate what is going on. The plot should have only one color in each facet instead of what you see here. Does anyone know what is going on?
This error is caused by fact that you are using $and data frame name to refer to your variables inside the aes(). Using ggplot() you should only use variables names in aes() as data frame is named already in data=.
plot = ggplot(data = item.household.descr.count,
mapping = aes(x=freq, y = descr, color = age.cat)) + geom_point()
plot = plot + facet_wrap(~ age.cat, ncol = 2)
plot
Here is an example using diamonds dataset.
diamonds2<-diamonds[sample(nrow(diamonds),1000),]
ggplot(diamonds2,aes(diamonds2$carat,diamonds2$price,color=diamonds2$color))+geom_point()+
facet_wrap(~color)
ggplot(diamonds2,aes(carat,price,color=color))+geom_point()+
facet_wrap(~color)
I want to plot a ggplot2 boxplot using all columns of a data.frame, and I want to reorder the columns by the median for each column, rotate the x-axis labels, and fill each box with the colour corresponding to the same median. I can't figure out how to do the last part. There are plenty of examples where the fill colour corresponds to a factor variable, but I haven't seen a clear example of using a continuous variable to control fill colour. (The reason I'm trying to do this is that the resultant plot will provide context for a force-directed network graph with nodes that will be colour-coded in the same way as the boxplot -- the colour will then provide a mapping between the two plots.) It would be nice if I could re-use the value-to-colour mapping for later plots so that colours are consistent between plots. So, for example, the box corresponding to the column variable with a high median value will have a colour that denotes this mapping and matches perfectly the colour for the same column variable in other plots (such as the corresponding node in a force-directed network graph).
So far, I have something like this:
# Melt the data.frame:
DT.m <- melt(results, id.vars = NULL) # using reshape2
# I can now make a boxplot for every column in the data.frame:
g <- ggplot(DT.m, aes(x = reorder(variable, value, FUN=median), y = value)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
stat_summary(fun.y=mean, colour="darkred", geom="point") +
geom_boxplot(???, alpha=0.5)
The colour fill information is what I'm stuck on. "value" is a continuous variable in the range [0,1] and there are 55 columns in my data.frame. Various approaches I've tried seem to result in the boxes being split vertically down the middle, and I haven't got any further. Any ideas?
You can do this by adding the median-by-group to your data frame and then mapping the new median variable to the fill aesthetic. Here's an example with the built-in mtcars data frame. By using this same mapping across different plots, you should get the same colors:
library(ggplot2)
library(dplyr)
ggplot(mtcars %>% group_by(carb) %>%
mutate(medMPG = median(mpg)),
aes(x = reorder(carb, mpg, FUN=median), y = mpg)) +
geom_boxplot(aes(fill=medMPG)) +
stat_summary(fun.y=mean, colour="darkred", geom="point") +
scale_fill_gradient(low=hcl(15,100,75), high=hcl(195,100,75))
If you have various data frames with different ranges of medians, you can still use the method above, but to get a consistent mapping of color to median across all your plots, you'll need to also set the same limits for scale_fill_gradient in each plot. In this example, the median of mpg (by carb grouping) varies from 15.0 to 22.8. But let's say across all my data sets, it varies from 13.3 to 39.8. Then I could add this to all my plots:
scale_fill_gradient(limits=c(13.3, 39.8),
low=hcl(15,100,75), high=hcl(195,100,75))
This is just for illustration. For ease of maintenance if your data might change, you'll want to set the actual limits programmatically.
I built on eipi10's solution and obtained the following code which does what I want:
# "results" is a 55-column data.frame containing
# bootstrapped estimates of the Gini impurity for each column variable
# (But can synthesize fake data for testing with a bunch of rnorms)
DT.m <- melt(results, id.vars = NULL) # using reshape2
g <- ggplot(DT.m %>% group_by(variable) %>%
mutate(median.gini = median(value)),
aes(x = reorder(variable, value, FUN=median), y = value)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_boxplot(aes(fill=median.gini)) +
stat_summary(fun.y=mean, colour="darkred", geom="point") +
scale_fill_gradientn(colours = heat.colors(9)) +
ylab("Gini impurity") +
xlab("Feature") +
guides(fill=guide_colourbar(title="Median\nGini\nimpurity"))
plot(g)
Later, for the second plot:
medians <- lapply(results, median)
color <- colorRampPalette(colors =
heat.colors(9))(1000)[cut(unlist(medians),1000,labels = F)]
color is then a character vector containing the colours of the nodes in my subsequent network graph, and these colours match those in the boxplot. Job done!