I have four series that I would like to plot.
There are 2 models : xg and algo30.
There are two types of data: predicted and observed.
This means we have the following 4 series: "predicted xg","observed xg", "predicted 30", "observed 30".
I want "xg" to be blue, "algo30" to be red.
I also want predicted to be a solid line and observed to be points.
Here is what I mean, using base plot:
library(magrittr)
library(ggplot2)
library(dplyr)
set.seed(123)
gr <- 1:10
obs.xg <- sort(runif(10, 0.5, 1))
obs.30 <- sort(runif(10, 0.5, 1))
pred.xg <- lm(obs.xg~gr) %>% predict() %>% add(rnorm(10,0,.01))
pred.30 <- lm(obs.30~gr) %>% predict() %>% add(rnorm(10,0,.01))
plot(gr, obs.xg, col="darkblue", ylim=range(c(obs.xg,obs.30)), pch=20)
lines(gr, pred.xg, col="darkblue", lwd=2)
points(gr, obs.30, col="firebrick", pch=20)
lines(gr, pred.30, col="firebrick", lwd=2)
legend("bottomright",
pch=c(20,NA,NA,NA,NA),
lty=c(NA,1,NA,1,1),
lwd=c(NA,1,NA,2,2),
col = c("black","black",NA, "darkblue","firebrick"),
legend=c("observé","prédit",NA,"xgboost","algo30"),
bty='n')
Here is my best attempt using ggplot. Notice that the legend doesnt work as I want.
xg.data <- data.frame(model= "xg", decile = seq(1:10), observed = obs.xg, predicted = pred.xg)
algo30.data <- data.frame(model = "algo30",decile = seq(1:10), observed = obs.30, predicted = pred.30)
ggplotdata <- bind_rows(xg.data, algo30.data)
ggplotdata %>%
ggplot( aes(x=decile, y= predicted, color= model))+ geom_line()+
geom_point(aes(x=decile, y= observed, color = model))
Most of the time when making a legend like this I look to override.aes in guide_legend().
The idea here is to make a legend using an additional aesthetic that you don't want mapped onto the plot itself and then using constants instead of a variable for that aesthetic. I used alpha, since both points and lines use that aesthetic.
Then the heavy lifting is done in scale_alpha_manual: removing the legend name, making sure the plot still looks right by setting the values, and then, finally, picking the correct point type and lines along with blanks for the legend.
ggplot(ggplotdata, aes(x=decile, y= predicted, color= model))+
geom_line( aes(alpha = "prédit") )+
geom_point(aes(x=decile, y= observed, alpha = "observé")) +
scale_alpha_manual(name = NULL, values = c(1, 1),
guide = guide_legend(override.aes = list(linetype = c(0, 1), shape = c(16, NA)))) +
scale_color_manual(name = NULL, values = c("firebrick", "darkblue"))
Related
I am trying to make an x-y scatter-plot. I don't mind if it's in plot or ggplot2. I don't know much about each, but I would like an example in both if you don't mind. I would like a label on the points.
Below is code and dput:
tickers <- rownames(x2)
library(zoo)
plot(x2,
main= "Vol vs Div",
xlab= "Vol (in %)",
ylab= "Div",
col= "blue", pch = 19, cex = 1, lty = "solid", lwd = 2)
text(x=x2$Volatility101,y=x2$`12m yield`, labels=tickers,cex= 0.7, pos= 3)
x2:
structure(list(Volatility101 = c(25.25353177644, 42.1628734949414,
28.527736824123), `12m yield` = c("3.08", "7.07", "4.72")), class = "data.frame", row.names = c("EUN",
"HRUB", "HUKX"))
Here is a tidyverse solution.
library(ggplot2)
library(tidyr)
library(dplyr)
library(ggrepel)
x2 %>%
rownames_to_column(var = "tickers") %>%
ggplot(aes(x = Volatility101, y = `12m yield`)) +
geom_point(color = "blue") +
geom_text_repel(aes(label = tickers)) +
ggtitle("Vol vs Div") +
xlab("Vol (in %)") +
ylab("Div") +
theme_classic()
I was surprised that the plot function worked at all. The Y-values are character values. Fixing that in the text call results in text being placed in the expected locations
text(x=x2$Volatility101,y=as.numeric(x2$`12m yield`)+.1, labels=tickers,
cex= 0.7, col='black')
A couple of notes about the question presentation: It's unclear (and misleading) why ggplot2 is a tag. The plot function is generic and in this case it uses base-graphics rather than either ggplot2 specifically or grid graphics more generally. I also think that the library(zoo) call is probably unnecessary. There is a plot.zoo function, but it would not be called in this case.
I am trying to customize a plot for competing risks using R and the package cmprsk. Specifically, I want to overwrite the default that for competing events colors are used and for different groups linetypes are used.
Here is my reproducible example:
library(ggplot2)
library(cmprsk)
library(survminer)
# some simulated data to get started
comp.risk.data <- data.frame("tfs.days" = rweibull(n = 100, shape = 1, scale = 1)*100,
"status.tfs" = c(sample(c(0,1,1,1,1,2), size=50, replace=T)),
"Typing" = sample(c("A","B","C","D"), size=50, replace=T))
# fitting a competing risks model
CR <- cuminc(ftime = comp.risk.data$tfs.days,
fstatus = comp.risk.data$status.tfs,
cencode = 0,
group = comp.risk.data$Typing)
# the default plot makes it impossible to identify the groups
ggcompetingrisks(fit = CR, multiple_panels = F, xlab = "Days", ylab = "Cumulative incidence of event",title = "Competing Risks Analysis")+
scale_color_manual(name="", values=c("blue","red"), labels=c("Tumor", "Death without tumor"))
Using ggplot_build() I managed to change the default regarding linetype and color, but I cannot find a way to add a legend.
p2 <- ggcompetingrisks(fit = CR, multiple_panels = FALSE, xlab = "Days", ylab = "Cumulative incidence of event",title = "Death by TCR", ylim = c(0, 1)) +
scale_color_manual(name="", values=c("blue","red"), labels=c("Tumor", "Death without tumor"))
q <- ggplot_build(p2)
q$data[[1]]$colour2 <- ifelse(q$data[[1]]$linetype=="solid","blue", ifelse(q$data[[1]]$linetype==22,"red", ifelse(q$data[[1]]$linetype==42,"green", ifelse(q$data[[1]]$linetype==44,"black", NA))))
q$data[[1]]$linetype <- ifelse(q$data[[1]]$colour=="blue","solid", ifelse(q$data[[1]]$colour=="red","dashed", NA))
q$data[[1]]$colour <- q$data[[1]]$colour2
q$plot <- q$plot + ggtitle("Competing Risks Analysis") + guides(col = guide_legend()) + theme(legend.position = "right")
p2 <- ggplot_gtable(q)
plot(p2)
Does anyone know how to add the legend to a plot manipulated by ggplot_build()? Or an alternative way to plot the competing risks such that color indicated group and linetype indicates event?
You don't need to go down the ggplot_build route. The function ggcompetingrisks returns a ggplot object, which itself contains the aesthetic mappings. You can overwrite these with aes:
p <- ggcompetingrisks(fit = CR,
multiple_panels = F,
xlab = "Days",
ylab = "Cumulative incidence of event",
title = "Competing Risks Analysis")
p$mapping <- aes(x = time, y = est, colour = group, linetype = event)
Now we have reversed the linetype and color aesthetic mappings, we just need to swap the legend labels and we're good to go:
p + labs(linetype = "event", colour = "group")
Note that you can also add color scales, themes, coordinate transforms to p like any other ggplot object.
I have a fairly simple and probably common task, plotting a raster dataset with countour lines and adding country borders together in one plot, however I did not find a solution anywhere. There are a a few hints available (such as this one), but no raster dataset is used there and I can't get it to work.
The dataset I am using is actually in netcdf format and available here (15mb in size) and contains about 40 years of gridded precipitation data.
Here is my line of code:
setwd("...netcdf Data/GPCP")
library("raster")
library("maps")
nc_brick79_17 <- brick("precip.mon.mean.nc") # load in the ncdf data as a
raster brick
newextent <- c(85, 125, -20, 20) # specify region of interest
SEA_brick <- crop(nc_brick79_17, newextent) # crop the region
day1 <- SEA_brick[[1]] # select very first day as example
colfunc<-colorRampPalette(c("white","lightblue","yellow","red","purple")) # colorscale for plotting
So it works of course when I just plot the raster data together with a map overlaid:
plot(day1, col=(colfunc(100)), interpolate=F, main="day1",legend.args=list(text='mm/hr', side=4,font=1, line=2.5, cex=1.1))
map("world", add=TRUE, lwd=0.5, interior = FALSE, col = "black")
We get this plot (Raster Plot with country borders added)
Now the code I use to generate the contour plot is the following:
filledContour(day1,zlim=c(0,20),color=colorRampPalette(c("white","lightblue","yellow","red","purple")),
xlab = "Longitude (°)", ylab = "Latitude (°)")
map("world", add=TRUE, lwd=0.5, interior = FALSE, col = "black") # add map overlay
I end up with a plot where obviously the country borders do not align and are even covering the colorbar.
Contour plot with map overlay shifted
In this last part I am trying to add the country boundaries to the contour plot, but it does not work, even though it should I assume. The map is simply not there, no error though:
filledContour(day1, zlim=c(0,20),
color.palette = colorRampPalette(c("white","lightblue","yellow","red","purple")),
xlab = "Longitude (°)", ylab = "Latitude (°)",
xlim = c(90, 120), ylim = c(-20, 20), nlevels = 25,
plot.axes = {axis(1); axis(2);
map('world', xlim = c(90, 120), ylim = c(-20, 20), add = TRUE, lwd=0.5, col = "black")})
From that line of code I get this plot.
Contour plot but no country borders added
What could I improve or is there any mistake somewhere? Thank you!
I chose to use ggplot here. I leave two maps for you. The first one is the one you created. This is a replication with ggplot. The second one is the one you could not produce. There are many things to explain. But I am afraid I do not have enough time to write all. But I left some comments in my code below. Please check this question to learn more about the second graphic. Finally, I'd like to give credit to hrbrmstr who wrote a great answer in the linked question.
library(maptools)
library(akima)
library(raster)
library(ggplot2)
# This is a data set from the maptools package
data(wrld_simpl)
# Create a data.frame object for ggplot. ggplot requires a data frame.
mymap <- fortify(wrld_simpl)
# This part is your code.
nc_brick79_17 <- brick("precip.mon.mean.nc")
newextent <- c(85, 125, -20, 20)
SEA_brick <- crop(nc_brick79_17, newextent)
day1 <- SEA_brick[[1]]
# Create a data frame with a raster object. This is a spatial class
# data frame, not a regular data frame. Then, convert it to a data frame.
spdf <- as(day1, "SpatialPixelsDataFrame")
mydf <- as.data.frame(spdf)
colnames(mydf) <- c("value", "x", "y")
# This part creates the first graphic that you drew. You draw a map.
# Then, you add tiles on it. Then, you add colors as you wish.
# Since we have a world map data set, we trim it at the end.
ggplot() +
geom_map(data = mymap, map = mymap, aes(x = long, y = lat, map_id = id), fill = "white", color = "black") +
geom_tile(data = mydf, aes(x = x, y = y, fill = value), alpha = 0.4) +
scale_fill_gradientn(colors = c("white", "lightblue", "yellow", "red", "purple")) +
scale_x_continuous(limits = c(85, 125), expand = c(0, 0)) +
scale_y_continuous(limits = c( -20, 20), expand = c(0, 0)) +
coord_equal()
ggplot version of filled.contour()
# As I mentioned above, you want to study the linked question for this part.
mydf2 <- with(mydf, interp(x = x,
y = y,
z = value,
xo = seq(min(x), max(x), length = 400),
duplicate = "mean"))
gdat <- interp2xyz(mydf2, data.frame = TRUE)
# You need to draw countries as lines here. You gotta do that after you draw
# the contours. Otherwise, you will not see the map.
ggplot(data = gdat, aes(x = x, y = y, z = z)) +
geom_tile(aes(fill = z)) +
stat_contour(aes(fill = ..level..), geom = "polygon", binwidth = 0.007) +
geom_contour(color = "white") +
geom_path(data = mymap, aes(x = long, y = lat, group = group), inherit.aes = FALSE) +
scale_x_continuous(limits = c(85, 125), expand = c(0, 0)) +
scale_y_continuous(limits = c(-20, 20), expand = c(0, 0)) +
scale_fill_gradientn(colors = c("white", "lightblue", "yellow", "red", "purple")) +
coord_equal() +
theme_bw()
I'm trying to create a figure similar to the one below (taken from Ro, Russell, & Lavie, 2001). In their graph, they are plotting bars for the errors (i.e., accuracy) within the reaction time bars. Basically, what I am looking for is a way to plot bars within bars.
I know there are several challenges with creating a graph like this. First, Hadley points out that it is not possible to create a graph with two scales in ggplot2 because those graphs are fundamentally flawed (see Plot with 2 y axes, one y axis on the left, and another y axis on the right)
Nonetheless, the graph with superimposed bars seems to solve this dual sclaing problem, and I'm trying to figure out a way to create it in R. Any help would be appreciated.
It's fairly easy in base R, by using par(new = T) to add to an existing graph
set.seed(54321) # for reproducibility
data.1 <- sample(1000:2000, 10)
data.2 <- sample(seq(0, 5, 0.1), 10)
# Use xpd = F to avoid plotting the bars below the axis
barplot(data.1, las = 1, col = "black", ylim = c(500, 3000), xpd = F)
par(new = T)
# Plot the new data with a different ylim, but don't plot the axis
barplot(data.2, las = 1, col = "white", ylim = c(0, 30), yaxt = "n")
# Add the axis on the right
axis(4, las = 1)
It is pretty easy to make the bars in ggplot. Here is some example code. No two y-axes though (although look here for a way to do that too).
library(ggplot2)
data.1 <- sample(1000:2000, 10)
data.2 <- sample(500:1000, 10)
library(ggplot2)
ggplot(mapping = aes(x, y)) +
geom_bar(data = data.frame(x = 1:10, y = data.1), width = 0.8, stat = 'identity') +
geom_bar(data = data.frame(x = 1:10, y = data.2), width = 0.4, stat = 'identity', fill = 'white') +
theme_classic() + scale_y_continuous(expand = c(0, 0))
in R, with ecdf I can plot a empirical cumulative distribution function
plot(ecdf(mydata))
and with hist I can plot a histogram of my data
hist(mydata)
How I can plot the histogram and the ecdf in the same plot?
EDIT
I try make something like that
https://mathematica.stackexchange.com/questions/18723/how-do-i-overlay-a-histogram-with-a-plot-of-cdf
Also a bit late, here's another solution that extends #Christoph 's Solution with a second y-Axis.
par(mar = c(5,5,2,5))
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
par(new = T)
ec <- ecdf(dt)
plot(x = h$mids, y=ec(h$mids)*max(h$counts), col = rgb(0,0,0,alpha=0), axes=F, xlab=NA, ylab=NA)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
axis(4, at=seq(from = 0, to = max(h$counts), length.out = 11), labels=seq(0, 1, 0.1), col = 'red', col.axis = 'red')
mtext(side = 4, line = 3, 'Cumulative Density', col = 'red')
The trick is the following: You don't add a line to your plot, but plot another plot on top, that's why we need par(new = T). Then you have to add the y-axis later on (otherwise it will be plotted over the y-axis on the left).
Credits go here (#tim_yates Answer) and there.
There are two ways to go about this. One is to ignore the different scales and use relative frequency in your histogram. This results in a harder to read histogram. The second way is to alter the scale of one or the other element.
I suspect this question will soon become interesting to you, particularly #hadley 's answer.
ggplot2 single scale
Here is a solution in ggplot2. I am not sure you will be satisfied with the outcome though because the CDF and histograms (count or relative) are on quite different visual scales. Note this solution has the data in a dataframe called mydata with the desired variable in x.
library(ggplot2)
set.seed(27272)
mydata <- data.frame(x= rexp(333, rate=4) + rnorm(333))
ggplot(mydata, aes(x)) +
stat_ecdf(color="red") +
geom_bar(aes(y = (..count..)/sum(..count..)))
base R multi scale
Here I will rescale the empirical CDF so that instead of a max value of 1, its maximum value is whatever bin has the highest relative frequency.
h <- hist(mydata$x, freq=F)
ec <- ecdf(mydata$x)
lines(x = knots(ec),
y=(1:length(mydata$x))/length(mydata$x) * max(h$density),
col ='red')
you can try a ggplot approach with a second axis
set.seed(15)
a <- rnorm(500, 50, 10)
# calculate ecdf with binsize 30
binsize=30
df <- tibble(x=seq(min(a), max(a), diff(range(a))/binsize)) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(a))
# plot
ggplot() +
geom_histogram(aes(a), bins = binsize) +
geom_line(data = df, aes(x=x, y=Ecdf_scaled), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(a), name = "Ecdf"))
Edit
Since the scaling was wrong I added a second solution, calculatin everything in advance:
binsize=30
a_range= floor(range(a)) +c(0,1)
b <- seq(a_range[1], a_range[2], round(diff(a_range)/binsize)) %>% floor()
df_hist <- tibble(a) %>%
mutate(gr = cut(a,b, labels = floor(b[-1]), include.lowest = T, right = T)) %>%
count(gr) %>%
mutate(gr = as.character(gr) %>% as.numeric())
# calculate ecdf with binsize 30
df <- tibble(x=b) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(df_hist$n))
ggplot(df_hist, aes(gr, n)) +
geom_col(width = 2, color = "white") +
geom_line(data = df, aes(x=x, y=Ecdf*max(df_hist$n)), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(df_hist$n), name = "Ecdf"))
As already pointed out, this is problematic because the plots you want to merge have such different y-scales. You can try
set.seed(15)
mydata<-runif(50)
hist(mydata, freq=F)
lines(ecdf(mydata))
to get
Although a bit late... Another version which is working with preset bins:
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
ec <- ecdf(dt)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
lines(x = c(0,100), y=c(1,1)*max(h$counts), col ='red', lty = 3) # indicates 100%
lines(x = c(which.min(abs(ec(h$mids) - 0.9)), which.min(abs(ec(h$mids) - 0.9))), # indicates where 90% is reached
y = c(0, max(h$counts)), col ='black', lty = 3)
(Only the second y-axis is not working yet...)
In addition to previous answers, I wanted to have ggplot do the tedious calculation (in contrast to #Roman's solution, which was kindly enough updated upon my request), i.e., calculate and draw the histogram and calculate and overlay the ECDF. I came up with the following (pseudo code):
# 1. Prepare the plot
plot <- ggplot() + geom_hist(...)
# 2. Get the max value of Y axis as calculated in the previous step
maxPlotY <- max(ggplot_build(plot)$data[[1]]$y)
# 3. Overlay scaled ECDF and add secondary axis
plot +
stat_ecdf(aes(y=..y..*maxPlotY)) +
scale_y_continuous(name = "Density", sec.axis = sec_axis(trans = ~./maxPlotY, name = "ECDF"))
This way you don't need to calculate everything beforehand and feed the results to ggpplot. Just lay back and let it do everything for you!