I have produced the following Box-Whisker Plot to display a dataset with GGPlot2 in R:
As you may notice however, the figure looks very "tall". Is there any way to further compress the length of the y-axis without changing the scale so none of my data gets cutoff?
My code is as follows:
healthy.control <- c(96.8,96.2,94.3,94.0,95.5,94.7)
healthy.exp <- c(median(79.64,79.13,79.04,79.49,79.51,79.90),
median(78.98,78.35,78.57,78.78,78.45,78.63),
median(77.12,77.90,77.43,77.07,77.85,77.81),
median(76.59,76.82,76.64,77.13,77.16,76.66),
median(78.00,78.26,78.08,77.79,78.35,78.34),
median(76.96,76.83,77.88,77.93,77.69,77.30))
adhd.control <- c(58.4,59.1,53.7,56.3,53.1,54.3)
adhd.exp <- c(median(49.12,48.39,48.68,48.50,48.00,48.32),
median(48.96,48.94,49.24,49.30,48.78,49.15),
median(44.97,45.24,45.26,45.00,44.87,45.02),
median(46.95,47.05,47.04,46.80,47.70,46.97),
median(44.28,44.20,44.42,44.37,44.43,44.67),
median(45.04,45.56,44.76,45.56,45.50,45.02))
fig.data <- c(adhd.control,adhd.exp,healthy.control,healthy.exp)
group <- c(rep("Deficient WM",12),rep("Healthy WM",12))
Condition <- c(rep("Non-Impulsive",6),rep("Impulsive",6),rep("Non-Impulsive",6),rep("Impulsive",6))
data.summary <- data.frame(group,Condition,fig.data)
plot <- ggplot(data.summary, aes(x=group, y=fig.data,fill=Condition)) +
geom_boxplot(outlier.colour="red", outlier.shape=8,outlier.size=4) +
scale_y_continuous(limits = c(40,100))
plot+labs(x="", y="MNIST TestSet Accuracy (%)\n")+
theme_classic() +
scale_fill_manual(values=c('#999999','#E69F00'))
Thank you kindly!
Try log scale. It will make it appear a bit closer as shown below.
scale_y_continuous(limits = c(40,100), trans = "log10")
Related
You can find my dataset here.
From this data, I wish to plot (one line for each):
x$y[,1]
x$y[,5]
x$y[,1]+x$y[,5]
Therefore, more clearly, in the end, each of the following will be represented by one line:
y0,
z0,
y0+z0
My x-axis (time-series) will be from x$t.
I have tried the following, but the time-series variable is problematic and I cannot figure out how I can exactly plot it. My code is:
Time <- x$t
X0 <- x$y[,1]
Z0 <- x$y[,5]
X0.plus.Z0 <- X0 + Z0
xdf0 <- cbind(Time,X0,Z0,X0.plus.Z0)
xdf0.melt <- melt(xdf0, id.vars="Time")
ggplot(data = xdf0.melt, aes(x=Time, y=value)) + geom_line(aes(colour=Var2))
The error in your code comes from the use of melt applied to an object that is not a data.frame. You should modify like this:
xdf0 <- cbind.data.frame(Time,X0,Z0,X0.plus.Z0)
xdf0.melt <- reshape2::melt(xdf0, id.vars="Time")
ggplot(data = xdf0.melt, aes(x=Time, y=value)) + geom_line(aes(colour=variable))
You don't have to go through the melt process since you juste have 3 lines to plot, it's fine to plot them separately
ggplot(data=xdf0) + aes(x=Time) +
geom_line(aes(y=X0), col="red") +
geom_line(aes(y=Z0), col="blue") +
geom_line(aes(y=X0.plus.Z0))
However, you don't get the legend.
A remark about your example: you try to plot values of really different order of magnitude, so you can't really see anything.
How about
matplot(xdf0, type = 'l')
?
I would like to change the color of coefficient lines based on whether the point estimate is negative or positive in a ggplot2 coefficient plot in R. For example:
require(coefplot)
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100))
mod1 <- lm(y1 ~ x + z, data = dat)
coefplot.lm(mod1)
Which produces the following plot:
In this plot, I would like to change the "x" variable to red when plotted. Any ideas? Thanks.
I think, you cannot do this with a plot produced by coefplot.lm. The package coefplot uses ggplot2 as the plotting system, which is good itself, but does not allow to play with colors as easily as you would like. To achieve the desired colors, you need to have a variable in your dataset that would color-code the values; you need to specify color = color-code in aes() function within the layer that draws the dots with CE. Apparently, this is impossible to do with the output of coefplot.lm function. Maybe, you can change the colors using ggplot2 ggplot_build() function. I would say, it's easier to write your own function for this task.
I've done this once to plot odds. If you want, you may use my code. Feel free to change it. The idea is the same as in coefplot. First, we extract coefficients from a model object and prepare the data set for plotting; second, actually plot.
The code for extracting coefficients and data set preparation
df_plot_odds <- function(x){
tmp<-data.frame(cbind(exp(coef(x)), exp(confint.default(x))))
odds<-tmp[-1,]
names(odds)<-c('OR', 'lower', 'upper')
odds$vars<-row.names(odds)
odds$col<-odds$OR>1
odds$col[odds$col==TRUE] <-'blue'
odds$col[odds$col==FALSE] <-'red'
odds$pvalue <- summary(x)$coef[-1, "Pr(>|t|)"]
return(odds)
}
Plot the output of the extract function
plot_odds <- function(df_plot_odds, xlab="Odds Ratio", ylab="", asp=1){
require(ggplot2)
p <- ggplot(df_plot_odds, aes(x=vars, y=OR, ymin=lower, ymax=upper),asp=asp) +
geom_errorbar(aes(color=col),width=0.1) +
geom_point(aes(color=col),size=3)+
geom_hline(yintercept = 1, linetype=2) +
scale_color_manual('Effect', labels=c('Positive','Negative'),
values=c('blue','red'))+
coord_flip() +
theme_bw() +
theme(legend.position="none",aspect.ratio = asp)+
ylab(xlab) +
xlab(ylab) #switch because of the coord_flip() above
return(p)
}
Plotting your example
set.seed(123)
dat <- data.frame(x = rnorm(100),y = rnorm(100), z = rnorm(100))
mod1 <- lm(y ~ x + z, data = dat)
df <- df_plot_odds(mod1)
plot <- plot_odds(df)
plot
Which yields
Note that I chose theme_wb() as the default. Output is a ggplot2object. So, you may change it quite a lot.
I am trying to plot a simple bar chart with labels in ggplot2. However, when I use position=dodge, it puts the wrong labels in the resulting graphic, eg. 17.6% instead of 77.7% for Trucks. My data and code are below.
library(ggplot2)
mode <- factor(c("Truck", "Rail","Water","Air","Other"), levels=c("Truck", "Rail","Water","Air","Other"))
Year <- factor(c("2011","2011","2011","2011","2011","2040","2040","2040","2040","2040"))
share <- c(0.709946085, 0.175582806, 0.11392987, 0.000534132, 0.00000710797, 0.777162621, 0.133121584, 0.088818658, 0.000880041, 0.000017097)
modeshares <- data.frame(Year, mode, share)
theme_set(theme_grey(base_size = 18))
modeshares$lab <- as.character(round(100 * share,1))
modeshares$lab <- paste(modeshares$lab,"%",sep="")
ggplot(data=modeshares, aes(x=mode, y=share*100, fill=Year, ymax=(share*100))) + geom_bar(stat="identity", position="dodge") + labs(y="Percent",x="Mode") +geom_text(label=modeshares$lab,position=position_dodge(width=1),vjust=-0.5)
The resulting graph is shown below.
Any insights into how to ensure that the correct label values are displayed would be much appreciated.
Thanks!
What's the ggplot2 equivalent of "dotplot" histograms? With stacked points instead of bars? Similar to this solution in R:
Plot Histogram with Points Instead of Bars
Is it possible to do this in ggplot2? Ideally with the points shown as stacks and a faint line showing the smoothed line "fit" to these points (which would make a histogram shape.)
ggplot2 does dotplots Link to the manual.
Here is an example:
library(ggplot2)
set.seed(789); x <- data.frame(y = sample(1:20, 100, replace = TRUE))
ggplot(x, aes(y)) + geom_dotplot()
In order to make it behave like a simple dotplot, we should do this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot')
You should get this:
To address the density issue, you'll have to add another term, ylim(), so that your plot call will have the form ggplot() + geom_dotplot() + ylim()
More specifically, you'll write ylim(0, A), where A will be the number of stacked dots necessary to count 1.00 density. In the example above, the best you can do is see that 7.5 dots reach the 0.50 density mark. From there, you can infer that 15 dots will reach 1.00.
So your new call looks like this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot') + ylim(0, 15)
Which will give you this:
Usually, this kind of eyeball estimate will work for dotplots, but of course you can try other values to fine-tune your scale.
Notice how changing the ylim values doesn't affect how the data is displayed, it just changes the labels in the y-axis.
As #joran pointed out, we can use geom_dotplot
require(ggplot2)
ggplot(mtcars, aes(x = mpg)) + geom_dotplot()
Edit: (moved useful comments into the post):
The label "count" it's misleading because this is actually a density estimate may be you could suggest we changed this label to "density" by default. The ggplot implementation of dotplot follow the original one of Leland Wilkinson, so if you want to understand clearly how it works take a look at this paper.
An easy transformation to make the y axis actually be counts, i.e. "number of observations". From the help page it is written that:
When binning along the x axis and stacking along the y axis, the numbers on y axis are not meaningful, due to technical limitations of ggplot2. You can hide the y axis, as in one of the examples, or manually scale it to match the number of dots.
So you can use this code to hide y axis:
ggplot(mtcars, aes(x = mpg)) +
geom_dotplot(binwidth = 1.5) +
scale_y_continuous(name = "", breaks = NULL)
I introduce an exact approach using #Waldir Leoncio's latter method.
library(ggplot2); library(grid)
set.seed(789)
x <- data.frame(y = sample(1:20, 100, replace = TRUE))
g <- ggplot(x, aes(y)) + geom_dotplot(binwidth=0.8)
g # output to read parameter
### calculation of width and height of panel
grid.ls(view=TRUE, grob=FALSE)
real_width <- convertWidth(unit(1,'npc'), 'inch', TRUE)
real_height <- convertHeight(unit(1,'npc'), 'inch', TRUE)
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
real_binwidth <- real_width / width_coordinate_range * 0.8 # 0.8 is the argument binwidth
num_balls <- real_height / 1.1 / real_binwidth # the number of stacked balls. 1.1 is expanding value.
# num_balls is the value of A
g + ylim(0, num_balls)
Apologies : I don't have enough reputation to 'comment'.
I like cuttlefish44's "exact approach", but to make it work (with ggplot2 [2.2.1]) I had to change the following line from :
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
to
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$layout$panel_ranges[[1]]$x.range)
I am trying to produce something similar to densityplot() from the lattice package, using ggplot2 after using multiple imputation with the mice package. Here is a reproducible example:
require(mice)
dt <- nhanes
impute <- mice(dt, seed = 23109)
x11()
densityplot(impute)
Which produces:
I would like to have some more control over the output (and I am also using this as a learning exercise for ggplot). So, for the bmi variable, I tried this:
bar <- NULL
for (i in 1:impute$m) {
foo <- complete(impute,i)
foo$imp <- rep(i,nrow(foo))
foo$col <- rep("#000000",nrow(foo))
bar <- rbind(bar,foo)
}
imp <-rep(0,nrow(impute$data))
col <- rep("#D55E00", nrow(impute$data))
bar <- rbind(bar,cbind(impute$data,imp,col))
bar$imp <- as.factor(bar$imp)
x11()
ggplot(bar, aes(x=bmi, group=imp, colour=col)) + geom_density()
+ scale_fill_manual(labels=c("Observed", "Imputed"))
which produces this:
So there are several problems with it:
The colours are wrong. It seems my attempt to control the colours is completely wrong/ignored
There are unwanted horizontal and vertical lines
I would like the legend to show Imputed and Observed but my code gives the error invalid argument to unary operator
Moreover, it seems like quite a lot of work to do what is accomplished in one line with densityplot(impute) - so I wondered if I might be going about this in the wrong way entirely ?
Edit: I should add the fourth problem, as noted by #ROLO:
.4. The range of the plots seems to be incorrect.
The reason it is more complicated using ggplot2 is that you are using densityplot from the mice package (mice::densityplot.mids to be precise - check out its code), not from lattice itself. This function has all the functionality for plotting mids result classes from mice built in. If you would try the same using lattice::densityplot, you would find it to be at least as much work as using ggplot2.
But without further ado, here is how to do it with ggplot2:
require(reshape2)
# Obtain the imputed data, together with the original data
imp <- complete(impute,"long", include=TRUE)
# Melt into long format
imp <- melt(imp, c(".imp",".id","age"))
# Add a variable for the plot legend
imp$Imputed<-ifelse(imp$".imp"==0,"Observed","Imputed")
# Plot. Be sure to use stat_density instead of geom_density in order
# to prevent what you call "unwanted horizontal and vertical lines"
ggplot(imp, aes(x=value, group=.imp, colour=Imputed)) +
stat_density(geom = "path",position = "identity") +
facet_wrap(~variable, ncol=2, scales="free")
But as you can see the ranges of these plots are smaller than those from densityplot. This behaviour should be controlled by parameter trim of stat_density, but this seems not to work. After fixing the code of stat_density I got the following plot:
Still not exactly the same as the densityplot original, but much closer.
Edit: for a true fix we'll need to wait for the next major version of ggplot2, see github.
You can ask Hadley to add a fortify method for this mids class. E.g.
fortify.mids <- function(x){
imps <- do.call(rbind, lapply(seq_len(x$m), function(i){
data.frame(complete(x, i), Imputation = i, Imputed = "Imputed")
}))
orig <- cbind(x$data, Imputation = NA, Imputed = "Observed")
rbind(imps, orig)
}
ggplot 'fortifies' non-data.frame objects prior to plotting
ggplot(fortify.mids(impute), aes(x = bmi, colour = Imputed,
group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00"))
note that each ends with a '+'. Otherwise the command is expected to be complete. This is why the legend did not change. And the line starting with a '+' resulted in the error.
You can melt the result of fortify.mids to plot all variables in one graph
library(reshape)
Molten <- melt(fortify.mids(impute), id.vars = c("Imputation", "Imputed"))
ggplot(Molten, aes(x = value, colour = Imputed, group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00")) +
facet_wrap(~variable, scales = "free")