This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 2 years ago.
I'm using a data frame in R with 3 variables. I want to plot (ggplot) 2 variables (CMod4X and CMod5X) as two distinct lines, in function of the 3th variable (AmtX). In the end I succeed in creating some kind of graph that suits me, but I fail to include a legend. I have already consulted some other treads here, but the answers don't seem not to work for me.
The (artificial) data set looks like this
AmtX <- seq(from = 1, to = 10001, by = 50)
CMod4X <- rnorm(201, mean = 0.87, sd = 0.01)
CMod5X <- rnorm(201, mean = 0.84, sd = 0.01)
EvalAmtX <- as.data.frame(cbind(AmtX,CMod4X,CMod5X))
I have made the plot like this
pltX <- ggplot(data = EvalAmtX, aes (x = AmtX)) +
geom_line(aes(y = CMod4X), color = "red", show.legend = TRUE) +
geom_line(aes(y = CMod5X), color = "blue", show.legend = TRUE) +
geom_smooth(aes(y = CMod4X), color = "red", se = FALSE, show.legend = TRUE) +
geom_smooth(aes(y = CMod5X), color = "blue", se = FALSE, show.legend = TRUE) +
labs(y = "C-index", x = "Amount (Tau)", title = "model 4 and model 5") +
scale_colour_manual(name = "Models", values = c("CMod4" = "red", "CMod5" = "blue"))
pltX
But this plot won't include a label. I've included my plot below:
What am I doing wrong and what must I do to obtain a plot telling me the red line is CMod4 and the blue line is CMod5?
Thx for your help!!
Leonard
I guess you need to dive a little deeper into how ggplot2 works, since your question is related to the basic set up of your data frame. There are a lot of great resources around on this topic, e.g. this one. Anyway, here are two solutions for putting the legend into your graph.
Solution 1: Rearrange data frame to long format
library(reshape2)
df <- melt(data = EvalAmtX, id.vars = "AmtX")
The data frame now looks like this:
head(df)
# AmtX variable value
# 1 1 CMod4X 0.8772716
# 2 51 CMod4X 0.8524197
# 3 101 CMod4X 0.8686019
# 4 151 CMod4X 0.8638835
# 5 201 CMod4X 0.8674627
# 6 251 CMod4X 0.8729925
Now, plotting is easy. Instead of telling ggplot2 the color of each individual line, you simply give it the information which column in your data frame contains the factor that should determine the color of the lines. So you add another aesthetic (col = variable). This also automatically adds a legend for color.
ggplot(df, aes(x=AmtX, y=value, col = variable)) +
geom_line()
Solution 2: Use a manual color scale
You almost got it right in your code.
pltX <- ggplot(data = EvalAmtX, aes (x = AmtX)) +
geom_line(aes(y = CMod4X, color = "CMod4")) +
geom_line(aes(y = CMod5X, color = "CMod5")) +
geom_smooth(aes(y = CMod4X, color = "CMod4"), se = FALSE) +
geom_smooth(aes(y = CMod5X, color = "CMod5"), se = FALSE) +
labs(y = "C-index", x = "Amount (Tau)", title = "model 4 and model 5") +
scale_colour_manual(name = "Models", values = c(CMod4 = "red", CMod5 = "blue"))
pltX
Related
Reproduced from this code:
library(haven)
library(survey)
library(dplyr)
nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))
# Rename variables into something more readable
nhanesDemo$fpl <- nhanesDemo$INDFMPIR
nhanesDemo$age <- nhanesDemo$RIDAGEYR
nhanesDemo$gender <- nhanesDemo$RIAGENDR
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
nhanesDemo$psu <- nhanesDemo$SDMVPSU
nhanesDemo$strata <- nhanesDemo$SDMVSTRA
nhanesAnalysis <- nhanesDemo %>%
mutate(LowIncome = case_when(
INDFMIN2 < 40 ~ T,
T ~ F
)) %>%
# Select the necessary columns
select(INDFMIN2, LowIncome, persWeight, psu, strata)
# Set up the design
nhanesDesign <- svydesign(id = ~psu,
strata = ~strata,
weights = ~persWeight,
nest = TRUE,
data = nhanesAnalysis)
svyhist(~log10(INDFMIN2), design=nhanesDesign, main = '')
How do I color the histogram by independent variable, say, LowIncome? I want to have two separate histograms, one for each value of LowIncome. Unfortunately I picked a bad example, but I want them to be see-through in case their values overlap.
If you want to plot a histogram from your model, you can get its data from model.frame (this is what svyhist does under the hood). To get the histogram filled by group, you could use this data frame inside ggplot:
library(ggplot2)
ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
geom_histogram(alpha = 0.5, color = "gray60", breaks = 0:20 / 10) +
theme_classic()
Edit
As Thomas Lumley points out, this does not incorporate sampling weights, so if you wanted this you could do:
ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
geom_histogram(aes(weight = persWeight), alpha = 0.5,
color = "gray60", breaks = 0:20 / 10) +
theme_classic()
To demonstrate this approach works, we can replicate Thomas's approach in ggplot using the data example from svyhist. To get the uneven bin sizes (if this is desired), we need two histogram layers, though I'm guessing this would not be required for most use-cases.
ggplot(model.frame(dstrat), aes(enroll)) +
geom_histogram(aes(fill = "E", weight = pw, y = after_stat(density)),
data = subset(model.frame(dstrat), stype == "E"),
breaks = 0:35 * 100,
position = "identity", col = "gray50") +
geom_histogram(aes(fill = "Not E", weight = pw, y = after_stat(density)),
data = subset(model.frame(dstrat), stype != "E"),
position = "identity", col = "gray50",
breaks = 0:7 * 500) +
scale_fill_manual(NULL, values = c("#00880020", "#88000020")) +
theme_classic()
You can't just extract the data and use ggplot, because that won't use the weights and so misses the whole point of svyhist. You can use the add=TRUE argument, though. You do need to set the x and y axis ranges correctly to make sure the whole plot is visible
Using the data example from ?svyhist
svyhist(~enroll, subset(dstrat,stype=="E"), col="#00880020",ylim=c(0,0.003),xlim=c(0,3500))
svyhist(~enroll, subset(dstrat,stype!="E"), col="#88000020",add=TRUE)
This question already has answers here:
How to produce different geom_vline in different facets in R?
(3 answers)
How to get geom_vline to honor facet_wrap?
(3 answers)
How can I pass a vector of values to geom_vline when working with facets?
(1 answer)
Closed 2 years ago.
I have this image that is the result of this ggplot command:
causative_snp = filter(LOD_table, str_detect(`Causative SNP Chromosome: Calculation Method`, "Chr. 23")) %>%
pull(`Causative SNP`) %>%
extract(1)
ggplot(data = LOD_table, mapping = aes(x = `Locus (Mb)`, y = `LOD Score`)) +
facet_grid(`Test Type` ~ `Causative SNP Chromosome: Calculation Method`) +
geom_vline(xintercept = causative_snp, color = "blue", size = .5, alpha = .5, linetype = "dashed") +
geom_point(alpha = .01, size = .1, na.rm = TRUE) +
labs(title = "LOD Score Tests",
subtitle = paste("Causative SNP (on Chromosome 23): ", causative_snp, " Mb", sep = ""),
caption = "The causative SNP line is plotted on non-causative SNP chromosomes to show the lack of correlation.")
So basically, what I want to happen is for the 2 leftmost columns (the ones that start with Chr. 19) to not have a vertical line, while the 2 rightmost columns (the ones that start with Chr. 23) do have a vertical line.
The Causative SNP Chromosome: Calculation Method of my LOD_table dataset is of type factor. There are four values: Chr. 19: Haplotype-Based, Chr. 19: SNP-Based, Chr. 23: Haplotype-Based, and Chr. 23: SNP-Based, and these values become the column names in my plot as a result of facet_grid(). What I was wondering if I could do would be to have some sort if statement that checks that if the column value starts with "Chr. 23", it will plot the vertical line, but if else, it will plot no vertical line. I guess my problem is, then, that I am unsure how to do this while in a facet_grid().
I am currently dealing with my problem with the caption, but ideally, I would like for the caption to not have to be there. Any help appreciated, thanks.
It's easiest if you provide a data frame that specifies the facet variables where you want the line to be plotted, and the xintercept in it too. In the future, please provide the data necessary to reproduce your plot, below i use an example dataset:
set.seed(111)
LOD_table = data.frame('Locus (Mb)' = runif(360,1,40),
`LOD Score` = rnbinom(360,mu=2,size=0.1),
`Test Type` = rep(1:3,each = 120),
`Causative SNP Chromosome: Calculation Method` = rep(c("chr19 A","chr19 B","chr23 A","chr23 B")),check.names=FALSE)
da = expand.grid(`Test Type` = 1:3,
`Causative SNP Chromosome: Calculation Method` = c("chr23 A","chr23 B"),
causative_snp = 20)
ggplot(data = LOD_table, mapping = aes(x = `Locus (Mb)`, y = `LOD Score`)) +
geom_point() +
facet_grid(`Test Type` ~ `Causative SNP Chromosome: Calculation Method`) +
geom_vline(data = da,aes(xintercept = causative_snp), color = "blue", size = .5, alpha = .5, linetype = "dashed")
I am generating density plots for observations. The observations belong to a species and some are also connected to an individual ID.
With the data below, I want to generate a line for each level of IndID for species One and Two, and only a single line for Species Three, which does not include IndID. There are related questions on SO, but not with reproducible data and looking for different results.
library(ggplot2)
set.seed(1)
dat <- data.frame(Species = c(rep(c("One", "Two"), each = 2, length = 30), rep("Three",50)),
IndID = c(rep(letters[1:5],each = 6),rep(NA,50) ),
Value = sample(1:20, replace = T))
Keeping the color ascetic on the Species level, I want to create multiple lines for Species One and Two (green and red) and a single blue line for species Three.
ggplot(dat, aes(Value)) + geom_density(aes(color = Species), size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red"))
If you want to be able to tell them apart, you can set the linetype to IndID. Note, however, that you will need to change the NA to some other value to (easily) get it to plot.
I also expanded your data a little bit to give enough values per individual to show meaningful lines. I also used geom_line(stat = "density") instead of geom_density() because it omits the line along the bottom and gives legends with lines instead of boxes.
set.seed(1)
dat <- data.frame(Species = c(rep(c("One", "Two"), each = 2, length = 60), rep("Three",50)),
IndID = c(rep(letters[1:5],each = 12),rep("NA",50) ),
Value = sample(1:20, 110, replace = T))
ggplot(dat
, aes(x = Value
, color = Species
, linetype = IndID)) +
geom_line(stat = "density"
, size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red"))
gives
If you want the lines to all be solid, you can run:
ggplot(dat
, aes(x = Value
, color = Species
, linetype = IndID)) +
geom_line(stat = "density"
, size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red")) +
scale_linetype_manual(values = rep("solid", 6)) +
guides(linetype = "none")
(or use group as #Henrik suggested in zir comment)
I'm trying to plot 2 sets of data points and a single line in R using ggplot.
The issue I'm having is with the legend.
As can be seen in the attached image, the legend applies the lines to all 3 data sets even though only one of them is plotted with a line.
I have melted the data into one long frame, but this still requires me to filter the data sets for each individual call to geom_line() and geom_path().
I want to graph the melted data, plotting a line based on one data set, and points on the remaining two, with a complete legend.
Here is the sample script I wrote to produce the plot:
xseq <- 1:100
x <- rnorm(n = 100, mean = 0.5, sd = 2)
x2 <- rnorm(n = 100, mean = 1, sd = 0.5)
x.lm <- lm(formula = x ~ xseq)
x.fit <- predict(x.lm, newdata = data.frame(xseq = 1:100), type = "response", se.fit = TRUE)
my_data <- data.frame(x = xseq, ypoints = x, ylines = x.fit$fit, ypoints2 = x2)
## Now try and plot it
melted_data <- melt(data = my_data, id.vars = "x")
p <- ggplot(data = melted_data, aes(x = x, y = value, color = variable, shape = variable, linetype = variable)) +
geom_point(data = filter(melted_data, variable == "ypoints")) +
geom_point(data = filter(melted_data, variable == "ypoints2")) +
geom_path(data = filter(melted_data, variable == "ylines"))
pushViewport(viewport(layout = grid.layout(1, 1))) # One on top of the other
print(p, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
You can set them manually like this:
We set linetype = "solid" for the first item and "blank" for others (no line).
Similarly for first item we set no shape (NA) and for others we will set whatever shape we need (I just put 7 and 8 there for an example). See e.g. http://www.r-bloggers.com/how-to-remember-point-shape-codes-in-r/ to help you to choose correct shapes for your needs.
If you are happy with dots then you can use my_shapes = c(NA,16,16) and scale_shape_manual(...) is not needed.
my_shapes = c(NA,7,8)
ggplot(data = melted_data, aes(x = x, y = value, color=variable, shape=variable )) +
geom_path(data = filter(melted_data, variable == "ylines") ) +
geom_point(data = filter(melted_data, variable %in% c("ypoints", "ypoints2"))) +
scale_colour_manual(values = c("red", "green", "blue"),
guide = guide_legend(override.aes = list(
linetype = c("solid", "blank","blank"),
shape = my_shapes))) +
scale_shape_manual(values = my_shapes)
But I am very curious if there is some more automated way. Hopefully someone can post better answer.
This post relied quite heavily on this answer: ggplot2: Different legend symbols for points and lines
There were example code for E on ggplot2 library:
theme_set(theme_bw())
dat = data.frame(value = rnorm(100,sd=2.5))
dat = within(dat, {
value_scaled = scale(value, scale = sd(value))
obs_idx = 1:length(value)
})
ggplot(aes(x = obs_idx, y = value_scaled), data = dat) +
geom_ribbon(ymin = -1, ymax = 1, alpha = 0.1) +
geom_line() + geom_point()
There is a question: How I can make in ggplot2 my first 10 lines in red and the rest lines in blue based on example? I tried to use some kind of layer syntax is, but it doesn't work.
First, add another column to your data frame dat. It has value 0 for the first 10 rows and 1 for the rest.
dat$group <- factor(rep.int(c(0, 1), c(10, nrow(dat)-10)))
Generate the plot:
library(ggplot2)
ggplot(aes(x = obs_idx, y = value_scaled), data = dat) +
geom_ribbon(ymin = -1, ymax = 1, alpha = 0.1) +
geom_line(aes(colour = group), show_guide = FALSE) +
scale_colour_manual(values = c("red", "blue")) +
geom_point()
The parameter show_guide = FALSE suppresses the legend for the red and blue lines.
OK, I could manage layers, the code is (not elegant, but works):
require(ggplot2)
value=round(rnorm(50,200,50),0)
nmbrs<-length(value) ## length of vector
obrv<-1:length(value) ## list of observations
#create data frame from the values
data_lj<-data.frame(obrv,value)
data_lj20<-data.frame(data_lj[1:20,1:2])
data_lj21v<-data.frame(data_lj[20:nmbrs,1:2])
#plot with ggplot
rr<-ggplot()+
layer(mapping=aes(obrv,value),geom="line",data=data_lj20,colour="red")+
layer(mapping=aes(obrv,value),geom="line",data=data_lj21v,colour="blue")
print(rr)