how to make a cdf plot smoother and label y axis - r

I read parameters "data1" and "data2" from files and use this code to plot cdf but I have two problems:
make the figure smoother
label Y axis to CDF
Please notice that this code is correct but I need some modifications.
df <- data.frame(x = c(data1, data2), ggg=factor(rep(1:2, c(19365,19365))))
ggplot(df, aes(x, colour = ggg)) +
stat_ecdf() +
labs(x='Time (ms)', ggg='CDF', fill='') +
theme_bw()+
theme(panel.grid.major = element_line(colour = 'grey'),
panel.border = element_rect(colour = 'black'),
axis.line = element_blank(),
panel.background = element_blank(),
legend.direction='vertical',
legend.position = c(1, 0.5),
legend.justification = c(1, 0.5),
legend.background = element_rect(colour = NA)) +
scale_colour_hue(name='', labels=c('IEEE 802.11p','Our protocol'))

The empirical distribution function is always a step function and you should not smooth it in any way. Having said that, you can get the values for the empirical distribution function using ecdf. If you want to do any smoothing on the result (and this is not suggested), you can.
require(dplyr)
res <- df %>%
group_by(ggg) %>%
do(data.frame(x = sort(.$x),
ecdf = ecdf(.$x)(sort(.$x))))
ggplot(res, aes(x, ecdf, colour = ggg)) + geom_step()
To relabel the y axis, you can use
labs(x='Time (ms)', y='CDF')

Related

Customised Bubble plot

I am trying to do a bubble plot. My data are:
Year<-rep(2001:2005, each = 5)
name<-c("John","Ellen","Mark","Randy","Luisa")
Name<-c(rep(name,5))
Value<-sample(seq(0,25,by=1),25)
mydata<-data.frame(Year,Name,Value)
And by far I've got to this point:
ggplot(mydata, aes(x=Year, y=Name, size = Value)) +
geom_point() +
theme(axis.line = element_blank(),
axis.text.x=element_text(size=11,margin=margin(b=10),colour="black"),
axis.text.y=element_text(size=13,margin=margin(l=10),colour="black",
face="italic"),
axis.ticks = element_blank(),
axis.title=element_text(size=18,face="bold"),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),
legend.text = element_text(size=14),
legend.title = element_text(size=18))
I need many modifications but I couldn't understand how to do that (I am not very familiar with ggplot2).
First, I would like to use the viridis scale, but neither scale_color_viridis nor scale_fill_viridis are working (I have also tried setting the discrete=T argument).
Second, I would like to avoid the 0 values to be plotted (i.e., having a blank space where the 0 value is being plotted), but neither using na.omit (e.g. as ggplot(na.omit(mydata), aes(x=Year, y=Name, size = Value)) or as ggplot(mydata, aes(x=Year, y=Name, size = na.omit(Value)))) or removing the 0 from Value object work.
Third, I'd like the legend to be a continuous scale: the plotted values of Value are in a range from 1 to 25 (as I would like to remove the zeros) but the default legend is discrete with 5 points break.
I would like the plot to look more or less like this (with the bubble sizes depending on the value of Value):
Any suggestions? Sorry for the many questions but I have some real difficulties in understanding how ggplot works. Thanks!
In order to map a variable in your data to some scale, you use the aes() function to couple what ggplot2 calls an 'aesthetic' to an expression (typically a symbol for a column in your data). Thus, to make a colour scale, you have to specify a colour aesthetic inside the aes() function. In the code below, I also specify an alpha aesthetic, which is 1 if Value > 0 and 0 otherwise, making the 0-value points completely transparent. I specify I() to let ggplot2 know that it should take this value literally instead of mapping it to a scale.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.3
Year<-rep(2001:2005, each = 5)
name<-c("John","Ellen","Mark","Randy","Luisa")
Name<-c(rep(name,5))
Value<-sample(seq(0,25,by=1),25)
mydata<-data.frame(Year,Name,Value)
g <- ggplot(mydata, aes(x=Year, y=Name, size = Value)) +
geom_point(aes(colour = Value,
alpha = I(as.numeric(Value > 0))))
Once we have specified the aesthetics, we can begin customising the scales. The typical pattern is scale_{the aesthetic}_{type of scale}, so we need to add scale_colour_viridis_c() if we want to map the colour values to the viridis scale (the *_c is for continuous scales). In the scales, we can specify for example the limits, which you've indicated should be between 1 and 25. Also, I added a scale_size_area() where we say that we do not want a legend for the size of the points by setting `guide = "none".
g + scale_colour_viridis_c(option = "C", direction = -1,
limits = c(1, 25)) +
scale_size_area(guide = "none") +
theme(axis.line = element_blank(),
axis.text.x=element_text(size=11,margin=margin(b=10),colour="black"),
axis.text.y=element_text(size=13,margin=margin(l=10),colour="black",
face="italic"),
axis.ticks = element_blank(),
axis.title=element_text(size=18,face="bold"),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),
legend.text = element_text(size=14),
legend.title = element_text(size=18))
Created on 2021-02-24 by the reprex package (v1.0.0)
Is that what you are looking for?
library(ggplot2)
Year<-rep(2001:2005, each = 5)
name<-c("John","Ellen","Mark","Randy","Luisa")
Name<-c(rep(name,5))
Value<-sample(seq(0,25,by=1),25)
Value <- ifelse(Value == 0, NA, Value)
mydata<-data.frame(Year,Name,Value)
ggplot(mydata, aes(x=Year, y=Name, size = Value, colour = Value)) +
geom_point() +
scale_colour_viridis_c() +
scale_size(guide = F) +
theme(axis.line = element_blank(),
axis.text.x=element_text(size=11,margin=margin(b=10),colour="black"),
axis.text.y=element_text(size=13,margin=margin(l=10),colour="black",
face="italic"),
axis.ticks = element_blank(),
axis.title=element_text(size=18,face="bold"),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),
legend.text = element_text(size=14),
legend.title = element_text(size=18))
#> Warning: Removed 1 rows containing missing values (geom_point).
Concerning your points:
I did only see the scale_colour_viridis_c and the scale_colour_viridis_b functions which differ in the colors as far as I could see. Maybe I am missing some package?
Secondly regarding the NAs: you just needed to replace the 0s by NAs.
And lastly regarding the scale: The color-scale is automatically continuous. Depicting sizes continuously is a bit tricky, therefore it will always be discrete. But I removed it from the legend for you so that you only have the color there as in your example.
Just as an alternative way to think about this... maybe it's helpful. :-)
library(tidyverse)
set.seed(123)
df <- tibble(
year = rep(2001:2005, each = 5),
name = rep(c("John","Ellen","Mark","Randy","Luisa"),5),
value = sample(seq(0,25,by=1),25)
)
df %>%
mutate(name_2 = ifelse(year>2001 & year<2005, NA, name)) %>%
ggplot(aes(year, value, group = name, label = name_2, color = name)) +
geom_line() +
geom_point() +
geom_text(vjust = -1) +
scale_color_brewer(palette = "Set1") +
theme_minimal(base_family = "serif") +
theme(legend.position = "none") +
xlab("")

order y-axis of geom_tile plot by variable

I am using geom_tile to visualize random draws
Generate data:
set.seed(1)
df= crossing(sim=1:10,part= 1:10)
df$result = sample(c(1,0),size = nrow(df), replace=T)
df = df %>%
group_by(sim)%>%
# find out how many successful (1) pilots there were in the first 4 participants
summarize(good_pilots = sum(result[1:4])) %>%
arrange(good_pilots) %>%
ungroup() %>%
# add this back into full dataframe
full_join(df)
# plot data
plot = ggplot(df, aes( y=factor(sim), x=part)) +
geom_tile(aes(fill = factor(result)), colour = "black",
show.legend = T)+
scale_fill_manual(values=c("lightgrey", "darkblue"))+# c(0,1)
theme(panel.border = element_rect(size = 2),
plot.title = element_text(size = rel(1.2)),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.title = element_blank(),
legend.position = "right")+ theme_classic()+ coord_fixed(ratio=1)
This results in:
What I actually want is the y axis to be ordered by the # of blue (ie 1's) in the first four columns of the block (which is calculated in good_pilots).
I tried scale_y_discrete but that cannot be what is intended:
plot + scale_y_discrete(limits=df$sim[order(df$good_pilots)])
resulting in:
From what I can tell it seems like the ordering worked correctly, but using scale_y_discrete caused the plot to be messed up.
You can use reorder here
ggplot(df, aes(y = reorder(sim, good_pilots), x = part)) +
...

Dealing with factors in geom_pointrange in ggplot

I am trying to visualize some data that consist of odds ratios and confidence intervals for regions nested in countries. I am using the geom_pointrange option for that and it general it works very well.
My problem is that since the odds ratios (and upper confidence intervals) can get quite high values, the axes of the plot are stretched to accommodate for that. That has as a result that confidence intervals that lie between 0 and 1 do not appear clearly enough. One option I found through this community is to change the values into factors and the distance between them will be considered the same for every measurement. This works for the odds ratios (still need to tweak the axis tick marks) but when the values of lower and upper confidence intervals are involved, the position is totally wrong and the confidence intervals do not include the point estimate. I tried to solve this by including all values as levels of the factor, but this did not seem to solve the issue.
What i am trying to do is either to be able to "magnify" the area between 0 and 1 in the graph, while leaving the rest of the plot area unchanged or to manage to make ggplot to place the confidence intervals correctly around the odds ratios.
Below I include a simplified version of my data and the code I have been using for re-producibility.
dat <- data.frame(region = rep(LETTERS[1:5], 2),
country = rep(c("A1", "A2"), each = 5),
or = c(6.459578, 1.696221, 0.895115, 3.393235, 2.325510,
4.457805, 0.407111, 22.760861, 3.354883, 2.214915),
lower = c(5.768999699, 0.237062909, 0.347443105, 0.369881529,
0.010233696, 1.020315696, 0.004419494, 3.87391259,
0.808667764, 0.874415935),
upper = c(7.2328221, 12.1367207, 2.3060778, 31.1290104,
28.4497981, 19.4763489, 0.750188, 337.2960785,
13.9182469, 5.610429))
library(ggplot2)
ggplot(dat, aes(x = region, y = or, ymin = lower, ymax = upper))+
geom_pointrange() +
geom_hline(yintercept = 1, linetype = 2) +
theme_bw() +
theme(plot.margin = unit(c(1, 1, 1, 4), "lines"),
axis.title = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none") +
facet_wrap(~ country) +
coord_flip(ylim = c(0, 100))
# Change numeric variable into factors
f.levels <- c(dat$or, dat$lower, dat$upper)
f.levels <- unique(f.levels)
f.levels <- as.character(f.levels[order(f.levels)])
dat$or <- factor(dat$or, levels = f.levels)
dat$lower <- factor(dat$lower, levels = f.levels)
dat$upper <- factor(dat$upper, levels = f.levels)
ggplot(dat, aes(x = region, y = or, ymin = lower, ymax = upper))+
geom_pointrange() +
geom_hline(yintercept = 1, linetype = 2) +
theme_bw() +
theme(plot.margin = unit(c(1, 1, 1, 4), "lines"),
axis.title = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none") +
facet_wrap(~ country) +
coord_flip(ylim = c(0, 30))
I am relatively new to ggplot so please excuse any newbie mistakes.
Any suggestions on this problem are highly appreciated.
Thank you!
I think the standard solution for this problem is plotting the OR's in a log(10) scale. For a neat explanation see https://blogs.sas.com/content/iml/2015/07/29/or-plots-log-scale.html
ggplot(dat, aes(x = region, y = or, ymin = lower, ymax = upper)) +
geom_pointrange() +
geom_hline(yintercept = 1, linetype = 2) +
scale_y_log10() + ### This is the line that makes the transfomation
theme_bw() +
theme(plot.margin = unit(c(1, 1, 1, 4), "lines"),
axis.title = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none") +
facet_wrap(~ country) +
coord_flip()

Using Factor Variables to Facet a Histogram Beside a Scatter Plot in R

Is it possible to use a factor variable to facet a histogram below or beside a scatter plot in ggplot2 in R (such that the histograms are of the x- and y-components of the data)?
The reason I ask whether this could be done with a factor variable is because faceting seems to be more general than the available packages that address this issue, where, for example with faceting, facet labels can be turned on or off, and also faceting has a more standard appearance where publication may be a concern. (Faceting also by default preserves the use of the same axes).
So far I haven't been able to get this to work because it seems like all faceted data have to be of the same number of dimensions (e.g., the scatterplot data is 2D, the histogram data are 1D).
I am not sure if I fully understand the question since a histogram of factor variables doesn't quite make sense to me. Also, without sample data, I will just have to use mtcars. Something like this might help. I use grid.extra in addition to ggplot2 in order to make the plots have a custom grid arrangement.
library(gridExtra)
library(ggplot2)
s_plot <- ggplot(data = mtcars, aes(x = hp, y = mpg)) + geom_point()
h1 <- ggplot(data = mtcars, aes(x = hp)) + geom_histogram()
h2 <- ggplot(data = mtcars, aes(x = mpg)) + geom_histogram()
grid.arrange(s_plot, h1, h2, layout_matrix = cbind(c(1, 1), c(2, 3)))
Note that in the layout_matrix argument in grid.arrange, I use cbind(c(1,1), c(2, 3)) because I want the first plot to be in a column all by itself and then I want the other two plots to occupy individual rows in the second column of the grid.
Consider the use of geom_rug.
ggplot(mtcars, aes(wt, mpg)) +
geom_point() + geom_rug()
Nick and Brian,
Thanks for your help with the code. I asked around and was able to get the set-up I was looking for. Basically it goes like this, as shown below. (Hopefully this might be useful to you and others in the future, as I think this is a common type of graph):
rm(list = ls())
library(ggplot2)
library(gridExtra)
df <- data.frame(
x = rnorm(100),
y = rnorm(100)
)
xrange <- range(pretty(df$x))
yrange <- range(pretty(df$y))
p.left <- ggplot(df, aes(y)) +
geom_histogram() +
lims(x = yrange) +
coord_flip() +
theme_light() +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
panel.grid.major.x = element_blank(),
plot.margin = unit(c(1, 0.05, 0.05, 1), "lines")
)
p.blank <- ggplot() +
theme_void() +
theme(plot.margin = unit(rep(0, 4), "lines"))
p.main <- ggplot(df, aes(x, y)) +
geom_point() +
lims(x = xrange, y = yrange) +
theme_light() +
theme(
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
plot.margin = unit(c(1, 1, 0.05, 0.05), "lines")
)
p.bottom <- ggplot(df, aes(x)) +
geom_histogram() +
lims(x = xrange) +
theme_light() +
theme(
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major.y = element_blank(),
plot.margin = unit(c(0.05, 1, 1, 0.05), "lines")
)
lm <- matrix(1:4, nrow = 2)
grid.arrange(
p.left, p.blank, p.main, p.bottom,
layout_matrix = lm,
widths = c(1, 5),
heights = c(5, 1),
padding = unit(0.1, "line")
)

Remove grid, background color, and top and right borders from ggplot2

I would like to reproduce the plot immediately below by using ggplot2. I can come close, but cannot remove the top and right borders. Below I present several attempts using ggplot2, including several suggestions found on or via Stackoverflow. Unfortunately I have not been able to get those suggestions to work.
I am hoping someone may be able to correct one or more of the code snippets below.
Thank you for any suggestions.
# desired plot
a <- seq(1,20)
b <- a^0.25
plot(a,b, bty = "l")
library(ggplot2)
df <- as.data.frame(cbind(a,b))
# 1. ggplot2 default
ggplot(df, aes(x = a, y = b)) + geom_point()
# 2. removes background color
ggplot(df, aes(x = a, y = b)) + geom_point() + opts(panel.background = theme_rect(fill='white', colour='black'))
# 3. also removes gridlines
none <- theme_blank()
ggplot(df, aes(x = a, y = b)) + geom_point() + opts(panel.background = theme_rect(fill='white', colour='black')) + opts(panel.grid.major = none, panel.grid.minor = none)
# 4. does not remove top and right border
ggplot(df, aes(x = a, y = b)) + geom_point() + opts(panel.background = theme_rect(fill='white', colour='black')) + opts(panel.grid.major = none, panel.grid.minor = none) + opts(panel.border = none)
# 5. does not remove top and right border
ggplot(df, aes(x = a, y = b)) + geom_point() + opts(panel.background = theme_rect(fill='white', colour='black')) + opts(panel.grid.major = none, panel.grid.minor = none) + opts(axis.line = theme_segment())
# 6. removes x and y axis in addition to top and right border
# http://stackoverflow.com/questions/5458409/remove-top-and-right-border-from-ggplot2
ggplot(df, aes(x = a, y = b)) + geom_point() + opts(panel.background = theme_rect(fill='white', colour='black')) + opts(panel.grid.major = none, panel.grid.minor = none) + opts(panel.background=theme_rect(colour=NA))
# 7. returns error when attempting to remove top and right border
# https://groups.google.com/group/ggplot2/browse_thread/thread/f998d113638bf251
#
# Error in el(...) : could not find function "polylineGrob"
#
theme_L_border <- function(colour = "black", size = 1, linetype = 1) {
structure(
function(x = 0, y = 0, width = 1, height = 1, ...) {
polylineGrob(
x=c(x+width, x, x), y=c(y,y,y+height), ..., default.units = "npc",
gp=gpar(lwd=size, col=colour, lty=linetype),
)
},
class = "theme",
type = "box",
call = match.call()
)
}
ggplot(df, aes(x = a, y = b)) + geom_point() + opts(panel.background = theme_rect(fill='white', colour='black')) + opts(panel.grid.major = none, panel.grid.minor = none) + opts( panel.border = theme_L_border())
EDIT Ignore this answer. There are now better answers. See the comments. Use + theme_classic()
EDIT
This is a better version. The bug mentioned below in the original post remains (I think). But the axis line is drawn under the panel. Therefore, remove both the panel.border and panel.background to see the axis lines.
library(ggplot2)
a <- seq(1,20)
b <- a^0.25
df <- data.frame(a,b)
ggplot(df, aes(x = a, y = b)) + geom_point() +
theme_bw() +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
Original post
This gets close. There was a bug with axis.line not working on the y-axis (see here), that appears not to be fixed yet. Therefore, after removing the panel border, the y-axis has to be drawn in separately using geom_vline.
library(ggplot2)
library(grid)
a <- seq(1,20)
b <- a^0.25
df <- data.frame(a,b)
p = ggplot(df, aes(x = a, y = b)) + geom_point() +
scale_y_continuous(expand = c(0,0)) +
scale_x_continuous(expand = c(0,0)) +
theme_bw() +
opts(axis.line = theme_segment(colour = "black"),
panel.grid.major = theme_blank(),
panel.grid.minor = theme_blank(),
panel.border = theme_blank()) +
geom_vline(xintercept = 0)
p
The extreme points are clipped, but the clipping can be undone using code by baptiste.
gt <- ggplot_gtable(ggplot_build(p))
gt$layout$clip[gt$layout$name=="panel"] <- "off"
grid.draw(gt)
Or use limits to move the boundaries of the panel.
ggplot(df, aes(x = a, y = b)) + geom_point() +
xlim(0,22) + ylim(.95, 2.1) +
scale_x_continuous(expand = c(0,0), limits = c(0,22)) +
scale_y_continuous(expand = c(0,0), limits = c(.95, 2.2)) +
theme_bw() +
opts(axis.line = theme_segment(colour = "black"),
panel.grid.major = theme_blank(),
panel.grid.minor = theme_blank(),
panel.border = theme_blank()) +
geom_vline(xintercept = 0)
Recent updates to ggplot (0.9.2+) have overhauled the syntax for themes. Most notably, opts() is now deprecated, having been replaced by theme(). Sandy's answer will still (as of Jan '12) generates a chart, but causes R to throw a bunch of warnings.
Here's updated code reflecting current ggplot syntax:
library(ggplot2)
a <- seq(1,20)
b <- a^0.25
df <- as.data.frame(cbind(a,b))
#base ggplot object
p <- ggplot(df, aes(x = a, y = b))
p +
#plots the points
geom_point() +
#theme with white background
theme_bw() +
#eliminates background, gridlines, and chart border
theme(
plot.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank()
) +
#draws x and y axis line
theme(axis.line = element_line(color = 'black'))
generates:
An alternative to theme_classic() is the theme that comes with the cowplot package, theme_cowplot() (loaded automatically with the package). It looks similar to theme_classic(), with a few subtle differences. Most importantly, the default label sizes are larger, so the resulting figures can be used in publications without further modifications needed (in particular if you save them with save_plot() instead of ggsave()). Also, the background is transparent, not white, which may be useful if you want to edit the figure in illustrator. Finally, faceted plots look better, in my opinion.
Example:
library(cowplot)
a <- seq(1,20)
b <- a^0.25
df <- as.data.frame(cbind(a,b))
p <- ggplot(df, aes(x = a, y = b)) + geom_point()
save_plot('plot.png', p) # alternative to ggsave, with default settings that work well with the theme
This is what the file plot.png produced by this code looks like:
Disclaimer: I'm the package author.
I followed Andrew's answer, but I also had to follow https://stackoverflow.com/a/35833548 and set the x and y axes separately due to a bug in my version of ggplot (v2.1.0).
Instead of
theme(axis.line = element_line(color = 'black'))
I used
theme(axis.line.x = element_line(color="black", size = 2),
axis.line.y = element_line(color="black", size = 2))
The above options do not work for maps created with sf and geom_sf(). Hence, I want to add the relevant ndiscr parameter here. This will create a nice clean map showing only the features.
library(sf)
library(ggplot2)
ggplot() +
geom_sf(data = some_shp) +
theme_minimal() + # white background
theme(axis.text = element_blank(), # remove geographic coordinates
axis.ticks = element_blank()) + # remove ticks
coord_sf(ndiscr = 0) # remove grid in the background
Simplification from the above Andrew's answer leads to this key theme to generate the half border.
theme (panel.border = element_blank(),
axis.line = element_line(color='black'))
Here's an extremely simple answer
yourPlot +
theme(
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black")
)
It's that easy. Source: the end of this article
You may be check also panel.background as well.
theme(
panel.background = element_rect(fill = "black"),
panel.grid.major = element_blank(), panel.grid.minor = element_blank()

Resources