Related
I have produced the following Box-Whisker Plot to display a dataset with GGPlot2 in R:
As you may notice however, the figure looks very "tall". Is there any way to further compress the length of the y-axis without changing the scale so none of my data gets cutoff?
My code is as follows:
healthy.control <- c(96.8,96.2,94.3,94.0,95.5,94.7)
healthy.exp <- c(median(79.64,79.13,79.04,79.49,79.51,79.90),
median(78.98,78.35,78.57,78.78,78.45,78.63),
median(77.12,77.90,77.43,77.07,77.85,77.81),
median(76.59,76.82,76.64,77.13,77.16,76.66),
median(78.00,78.26,78.08,77.79,78.35,78.34),
median(76.96,76.83,77.88,77.93,77.69,77.30))
adhd.control <- c(58.4,59.1,53.7,56.3,53.1,54.3)
adhd.exp <- c(median(49.12,48.39,48.68,48.50,48.00,48.32),
median(48.96,48.94,49.24,49.30,48.78,49.15),
median(44.97,45.24,45.26,45.00,44.87,45.02),
median(46.95,47.05,47.04,46.80,47.70,46.97),
median(44.28,44.20,44.42,44.37,44.43,44.67),
median(45.04,45.56,44.76,45.56,45.50,45.02))
fig.data <- c(adhd.control,adhd.exp,healthy.control,healthy.exp)
group <- c(rep("Deficient WM",12),rep("Healthy WM",12))
Condition <- c(rep("Non-Impulsive",6),rep("Impulsive",6),rep("Non-Impulsive",6),rep("Impulsive",6))
data.summary <- data.frame(group,Condition,fig.data)
plot <- ggplot(data.summary, aes(x=group, y=fig.data,fill=Condition)) +
geom_boxplot(outlier.colour="red", outlier.shape=8,outlier.size=4) +
scale_y_continuous(limits = c(40,100))
plot+labs(x="", y="MNIST TestSet Accuracy (%)\n")+
theme_classic() +
scale_fill_manual(values=c('#999999','#E69F00'))
Thank you kindly!
Try log scale. It will make it appear a bit closer as shown below.
scale_y_continuous(limits = c(40,100), trans = "log10")
I am trying to arrange n consecutive plots into one single matrix of plots. I get the plots in first place by running a for-loop, but I can't figure out how to arrange those into a 'plot of plots'. I have used par(mfrow=c(num.row,num.col)) but it does not work. Also multiplot(plotlist = p, cols = 4) and plot_grid(plotlist = p)
#import dataset
Survey<-read_excel('datasets/Survey_Key_and_Complete_Responses_excel.xlsx',
sheet = 2)
#Investigate how the dataset looks like
glimpse(Survey)#library dplyr
#change data types
Survey$brand <- as.factor(Survey$brand)
Survey$zipcode <- as.factor(Survey$zipcode)
Survey$elevel <- as.factor(Survey$elevel)
Survey$car <- as.numeric(Survey$car)
#Relation brand-variables
p = list()
for(i in 1:ncol(Survey)) {
if ((names(Survey[i])) == "brand"){
p[[i]]<-ggplot(Survey, aes(x = brand)) + geom_bar() +
labs(x="Brand")
} else if (is.numeric(Survey[[i]]) == "TRUE"){
p[[i]]<-ggplot(Survey, aes(x = Survey[[i]], fill=brand)) + geom_histogram() +
labs(x=colnames(Survey[i]))
} else {
p[[i]]<-ggplot(Survey, aes(x = Survey[[i]], fill = brand)) + geom_bar() +
labs(x=colnames(Survey[i]))
}
}
I think plots are appended correctly to the list but I can not plot them in a matrix form.
The problem does not appear to be with your multiple plots, but how you are calling the variable into your plot.
You've already put "Survey" into ggplot as the first argument (the data slot). In the mapping argument (the second slot), you put in aes(...) and inside that you should be specifying variable names, not data itself. So try this:
Where you have aes(x = Survey[[i]], fill=brand)) in two places,
put aes(x = names(Survey[[i]], fill=brand)) instead.
Regarding plotting multiple plots, par(mfrow... is for base R plots and cannot be used for ggplots. grid.arrange, multiplot, and plot_grid should all work once you fix the error in your plot.
I have some R code to draw a plot. It looks like the image below:
Now I want to be able to set the line color and style (need it in grey values), add the two straight lines to the legend and remove the title of the legend.
This code works (I put fake data):
library(ggplot2)
load ("vns.data")
load ("rw.data")
load("gibbs.data")
if you don't load data, do this:
vtm1<- c(1,3,4)
vmean<-c(3.9,3.8,3)
vmax<-c(4.1,4.2,4)
vmin<-c(3,2.5,2)
rtm1<- c(1,2,4,5)
rmean<-c(3.9,3.85,3.7,3.1)
rmax<-c(4.1,4.2,4,3.9)
rmin<-c(3,2.5,2,1.9)
gtm1<- c(2,4,5)
gmean<-c(4.1,3.9,3)
gmax<-c(4.1,4,3.9)
gmin<-c(3,2.5,1.5)
vns <- data.frame(vtm1, vmean, vmax, vmin)
gibbs <- data.frame(gtm1, gmean, gmax, gmin)
rw <- data.frame(rtm1, rmean, rmax, rmin)
Then continue
names(vns) <- c("tm1", "mean", "max", "min")
names(gibbs) <- c("tm1", "mean", "max", "min")
names(rw) <- c("tm1", "mean", "max", "min")
vns$what <- "VNS"
gibbs$what <- "GS"
rw$what <- "RW"
DF <- do.call(rbind, list(vns, gibbs, rw))
plt <- ggplot(DF, aes(x= tm1, ymin= min, ymax= max, y=mean)) +
xlab("Number of TM lookups") +
ylab("Cross-entropy")+
geom_hline(yintercept=3.2240952381, linetype = "dotted", color="#A8A8A8")+
geom_hline(yintercept=3.44366666666667, linetype = "dotted") +
geom_ribbon(aes(fill=what), alpha=0.3) +
geom_line(aes(linetype = what))+
theme_bw( )+
scale_colour_manual(name="what", values=c("vns"="#A8A8A8", "gibbs"="#E8E8E8", "rw"="#C8C8C8")) +
scale_fill_manual(name="what", values=c("vns"="#A8A8A8", "gibbs"="#E8E8E8", "rw"="#C8C8C8"))+
scale_linetype_manual(name="what", values=c("vns"="solid", "gibbs"="dotdash", "rw"="dashed"))
print(plt)
This code works, when these lines are removed:
scale_colour_manual(name="what", values=c("vns"="#A8A8A8", "gibbs"="#E8E8E8", "rw"="#C8C8C8")) +
scale_fill_manual(name="what", values=c("vns"="#A8A8A8", "gibbs"="#E8E8E8", "rw"="#C8C8C8"))+
scale_linetype_manual(name="what", values=c("vns"="solid", "gibbs"="dotdash", "rw"="dashed"))
I get the following error with my real data. With the toy data, it works fine.
Error in grid.Call.graphics(L_lines, x$x, x$y, index, x$arrow) :
invalid line type
With the toy data, it works fine. Yet, the horizontal lines need to be added to the legend, and the legend name removed. Therefore, I have made my real data available here: https://dl.dropboxusercontent.com/u/13564139/vns.saved and https://dl.dropboxusercontent.com/u/13564139/rw.saved and https://dl.dropboxusercontent.com/u/13564139/gibbs.saved
Any idea what can cause this error, and why only with the real data, not the toy data? I have already rebooted as suggested in some other posts here.
In your toy data, the what column has values "vns", "gibbs", and "rw". In your real data, it is "VNS", "GS", and "RW". The mappings you have in the scale_* calls work for the toy data, but fail for the real data because the names are not the same. You would need
scale_linetype_manual(name="what", values=c("VNS"="solid", "GS"="dotdash", "RW"="dashed"))
etc.
Linetype was the one to throw an error because colour and fill could both handle the value being missing (which it was since it was not mapped to something), but a linetype of missing is an error.
I am trying to produce something similar to densityplot() from the lattice package, using ggplot2 after using multiple imputation with the mice package. Here is a reproducible example:
require(mice)
dt <- nhanes
impute <- mice(dt, seed = 23109)
x11()
densityplot(impute)
Which produces:
I would like to have some more control over the output (and I am also using this as a learning exercise for ggplot). So, for the bmi variable, I tried this:
bar <- NULL
for (i in 1:impute$m) {
foo <- complete(impute,i)
foo$imp <- rep(i,nrow(foo))
foo$col <- rep("#000000",nrow(foo))
bar <- rbind(bar,foo)
}
imp <-rep(0,nrow(impute$data))
col <- rep("#D55E00", nrow(impute$data))
bar <- rbind(bar,cbind(impute$data,imp,col))
bar$imp <- as.factor(bar$imp)
x11()
ggplot(bar, aes(x=bmi, group=imp, colour=col)) + geom_density()
+ scale_fill_manual(labels=c("Observed", "Imputed"))
which produces this:
So there are several problems with it:
The colours are wrong. It seems my attempt to control the colours is completely wrong/ignored
There are unwanted horizontal and vertical lines
I would like the legend to show Imputed and Observed but my code gives the error invalid argument to unary operator
Moreover, it seems like quite a lot of work to do what is accomplished in one line with densityplot(impute) - so I wondered if I might be going about this in the wrong way entirely ?
Edit: I should add the fourth problem, as noted by #ROLO:
.4. The range of the plots seems to be incorrect.
The reason it is more complicated using ggplot2 is that you are using densityplot from the mice package (mice::densityplot.mids to be precise - check out its code), not from lattice itself. This function has all the functionality for plotting mids result classes from mice built in. If you would try the same using lattice::densityplot, you would find it to be at least as much work as using ggplot2.
But without further ado, here is how to do it with ggplot2:
require(reshape2)
# Obtain the imputed data, together with the original data
imp <- complete(impute,"long", include=TRUE)
# Melt into long format
imp <- melt(imp, c(".imp",".id","age"))
# Add a variable for the plot legend
imp$Imputed<-ifelse(imp$".imp"==0,"Observed","Imputed")
# Plot. Be sure to use stat_density instead of geom_density in order
# to prevent what you call "unwanted horizontal and vertical lines"
ggplot(imp, aes(x=value, group=.imp, colour=Imputed)) +
stat_density(geom = "path",position = "identity") +
facet_wrap(~variable, ncol=2, scales="free")
But as you can see the ranges of these plots are smaller than those from densityplot. This behaviour should be controlled by parameter trim of stat_density, but this seems not to work. After fixing the code of stat_density I got the following plot:
Still not exactly the same as the densityplot original, but much closer.
Edit: for a true fix we'll need to wait for the next major version of ggplot2, see github.
You can ask Hadley to add a fortify method for this mids class. E.g.
fortify.mids <- function(x){
imps <- do.call(rbind, lapply(seq_len(x$m), function(i){
data.frame(complete(x, i), Imputation = i, Imputed = "Imputed")
}))
orig <- cbind(x$data, Imputation = NA, Imputed = "Observed")
rbind(imps, orig)
}
ggplot 'fortifies' non-data.frame objects prior to plotting
ggplot(fortify.mids(impute), aes(x = bmi, colour = Imputed,
group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00"))
note that each ends with a '+'. Otherwise the command is expected to be complete. This is why the legend did not change. And the line starting with a '+' resulted in the error.
You can melt the result of fortify.mids to plot all variables in one graph
library(reshape)
Molten <- melt(fortify.mids(impute), id.vars = c("Imputation", "Imputed"))
ggplot(Molten, aes(x = value, colour = Imputed, group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00")) +
facet_wrap(~variable, scales = "free")
I am having some trouble creating a facet grid of a back-to-back histogram created with ggplot.
# create data frame with latency values
latc_sorted <- data.frame(
subject=c(1,1,1,1,1,2,2,2,2,2),
grp=c("K_N","K_I","K_N","K_I","K_N","K_I","K_N","K_I","K_N","K_I"),
lat=c(22,45,18,55,94,11,67,22,64,44)
)
# subset and order data
x.sub_ki<-subset(latc_sorted, grp=="K_I")
x.sub_kn<-subset(latc_sorted, grp=="K_N")
x.sub_k<-rbind(x.sub_ki,x.sub_kn)
x=x.sub_ki$lat
y=x.sub_kn$lat
nm<-list("x","y")
# make absolute values on x axis
my.abs<-function(x){abs(x)}
# plot back-to-back histogram
hist_K<-qplot(x, geom="histogram", fill="inverted", binwidth=20) +
geom_histogram(data=data.frame(x=y), aes(fill="non-inverted", y=-..count..),
binwidth= 20) + scale_y_continuous(formatter='my.abs') + coord_flip() +
scale_fill_hue("variable")
hist_K
this plots fine but if I try the following I get the error:
Error: Casting formula contains variables not found in molten data: x.sub_k$subject
hist_K_sub<-qplot(x, geom="histogram", fill="inverted", binwidth=20) +
geom_histogram(data=data.frame(x=y), aes(fill="non-inverted", y=-..count..),
binwidth= 20) + scale_y_continuous(formatter='my.abs') + coord_flip() +
scale_fill_hue("variable")+
facet_grid(x.sub_k$subject ~ .)
hist_K_sub
any ideas what is causing this to fail?
The problem is that the variables referenced in facet_grid are looked for in the data.frames that are passed to the various layers. You have created (implicitly and explicitly) data.frames which have only the lat data and do not have the subject information. If you use x.sub_ki and x.sub_kn instead, they do have the subject variable associated with the lat values.
hist_K_sub <-
ggplot() +
geom_histogram(data=x.sub_ki, aes(x=lat, fill="inverted", y= ..count..), binwidth=20) +
geom_histogram(data=x.sub_kn, aes(x=lat, fill="not inverted", y=-..count..), binwidth=20) +
facet_grid(subject ~ .) +
scale_y_continuous(formatter="my.abs") +
scale_fill_hue("variable") +
coord_flip()
hist_K_sub
I also converted from qplot to full ggplot syntax; that shows the parallel structure of ki and kn better.
The syntax above doesn't work with newer versions of ggplot2, use
the following instead for the formatting of axes:
abs_format <- function() {
function(x) abs(x)
}
hist_K_sub <- hist_K_sub+ scale_y_continuous(labels=abs_format())