I generated the histogram with the following code:
# Load Data
file <- "SharedData.csv"
data <- read.csv(file,header = TRUE,sep = ",")
## Bin Levels
data$xLevel <- cut(data$xLevel,
breaks = quantile(data$xLevel,(0:5)/5),
labels = paste("Quant",1:5,sep = "."),
include.lowest = TRUE,ordered_result = TRUE)
# Histogram
g <- ggplot(data, aes(x=xTime,color = xLevel)) +
geom_histogram(aes(y=..density..),
binwidth=100)
g
How do I create the above histogram with an x axis that goes from 0-300 and from 1500-2400, but not include 300-1500? The unit here is military time.
Data: https://www.dropbox.com/s/e5gaym7dhefs04e/SharedData.csv?dl=0
According to https://groups.google.com/forum/#!topic/ggplot2/jSrL_FnS8kc and Using ggplot2, can I insert a break in the axis? it does not seem to be possible. you can plots 2 graphics instead
first create a new column splitting the data in two
data$Block <- ifelse(data$xTime <=500, "A", "B")
and then plot the graphics
library(scales)
# Histogram
g <- ggplot(data, aes(x=xTime,color = xLevel)) +
geom_histogram(aes(y=..density..),
binwidth=100) + facet_grid(.~Block, scales = "free_x")
g
Related
The bounty expires in 7 days. Answers to this question are eligible for a +50 reputation bounty.
Electrino wants to draw more attention to this question.
I am trying to create a plot that combines 2 separate legends and a grid of multiple plots. The issue I'm having is I'm finding it difficult to align the legends so they are visible and not overlapping. hopefully the example below will explain what I mean.
To begin I am going to create 2 plots. In these two plots I am only interested in the legends, and I am discarding the actual plot (so please ignore the actual plots in these two plots). To get just the legend I am using the cowplot package.
library(ggplot2)
library(cowplot)
# -------------------------------------------------------------------------
# plot 1 ------------------------------------------------------------------
# create fake data
dfLegend_1 <- data.frame(x = LETTERS[1:10], y = c(1:10))
# set colours
pointColours <- c(A = "#F5736A", B = "#D58D00", C = "#A0A300",
D = "#36B300", E = "#00BC7B", F = "#00BCC2",
G = "#00ADF4", H = "#928DFF", I = "#E568F0",
J = "#808080")
# plot
ggLegend_1 <- ggplot(dfLegend_1, aes(x=x, y=y))+
geom_point(aes(fill = pointColours), shape = 22, size = 10) +
scale_fill_manual(values = unname(pointColours),
label = names(pointColours),
name = 'Variable') +
theme(legend.key.size = unit(0.5, "cm")) +
theme_void()
# get legend
legend_1 <- get_legend(ggLegend_1)
# -------------------------------------------------------------------------
# plot 2 ------------------------------------------------------------------
# Create fake data
dflegend_2 <- data.frame(
x = runif(100),
y = runif(100),
z2 = abs(rnorm(100))
)
# plot
ggLegend_2 <- ggplot(dflegend_2, aes(x=x, y = y))+
geom_point(aes(color = z2), shape = 22, size = 10) +
scale_color_gradientn(
colours = rev(colorRampPalette(c('steelblue', '#f7fcfd', 'orange'))(5)),
limits = c(0,10),
name = 'Gradient',
guide = guide_colorbar(
frame.colour = "black",
ticks.colour = "black"
))
# get legend
legend_2 <- get_legend(ggLegend_2)
Then I am creating many plots (in this example, I am creating 20 individual plots) and plotting them on a grid:
# create data
dfGrid <- data.frame(x = rnorm(10), y = rnorm(10))
# make a list of plots
plotList <- list()
for(i in 1:20){
plotList[[i]] <- ggplot(dfGrid) +
geom_ribbon(aes(x = x, ymin = min(y), ymax = 0), fill = "red", alpha = .5) +
geom_ribbon(aes(x = x, ymin = min(0), ymax = max(y)), fill = "blue", alpha = .5) +
theme_void()
}
# plot them on a grid
gridFinal <- cowplot::plot_grid(plotlist = plotList)
Finally, I am joining the two legends together and adding them to the grid of many plots:
# add legends together into on single plot
legendFinal <- plot_grid(legend_2, legend_1, ncol = 1)
# plot everything on the same plot
plot_grid(gridFinal, legendFinal, rel_widths = c(3, 1))
This results in something that looks like this:
As you can see, the legends overlap and are not very well spaced. I was wondering if there is any way to fit everything in whilst having the legends appropriately spaced and readable?
I should also note, that, in general, there can be any number of variables and any number of gridded plots.
One option to fix your issue would be to switch to patchwork to glue your plots and the legends together. Especially I make use of the design argument to assign more space to the Variable legend. However, you should be aware that legends are much less flexible compared to plots, i.e. the size of legends is in absolute units and will not adjust to the available space. Hence, I'm not sure whether my solution will fit your desire for a "one-size-fits-all" approach.
library(patchwork)
design <-
"
ABCDEU
FGHIJV
KLMNOV
PQRSTV
"
plotList2 <- c(plotList, list(legend_2, legend_1))
wrap_plots(plotList2) +
plot_layout(design = design)
I'm having trouble displaying the multiple graphs on the same page. I'm having a data frame with 18 numerical columns. For each column, I need to show its histogram and boxplot on the same page with a 4*9 grid. Following is what I tried. But I need to show it along with the boxplot as well. Through a for a loop if possible. Can someone please help me to do it.
library(gridExtra)
library(ggplot2)
p <- list()
for(i in 1:18){
x <- my_data[,i]
p[[i]] <- ggplot(gather(x), aes(value)) +
geom_histogram(bins = 10) +
facet_wrap(~key, scales = 'free_x')
}
do.call(grid.arrange,p)
I received the following graph.
When following is tried, I'm getting the graph in separate pages
library(dplyr)
dat2 <- my_data %>% mutate_all(scale)
# Boxplot from the R trees dataset
boxplot(dat2, col = rainbow(ncol(dat2)))
par(mfrow = c(2, 2)) # Set up a 2 x 2 plotting space
# Create the loop.vector (all the columns)
loop.vector <- 1:4
p <- list()
for (i in loop.vector) { # Loop over loop.vector
# store data in column.i as x
x <- my_data[,i]
# Plot histogram of x
p[[i]] <-hist(x,
main = paste("Question", i),
xlab = "Scores",
xlim = c(0, 100))
plot_grid(p, label_size = 12)
}
You can assemble the base R boxplot and the ggplot object generated with facet_wrap together using the R package patchwork:
library(ggplot2)
library(patchwork)
p <- ggplot(mtcars, aes(x = mpg)) +
geom_histogram() +
facet_wrap(~gear)
wrap_elements(~boxplot(split(mtcars$mpg, mtcars$gear))) / p
ggsave('test.png', width = 6, height = 8, units = 'in')
Hi there: I need to plot a factor with 81 different categories with different frequency counts each. Each factor name is a 4-letter category. It looks like this. As you can see, it is pretty tough to read the factor labels. I'd like to stagger the y-axis according to this suggestion. However, this issue on github suggests that something has changed in ggplot2 and that the hjust and vjust options no longer work. Does anyone have any suggestions to make this plot look better, in particular to make the factor levels readable.
#libraries
# install.packages('stringi')
library(ggplot2)
library(stringi)
#fake data
var<-stri_rand_strings(81, 4, pattern='[HrhEgeIdiFtf]')
var1<-rnorm(81, mean=175, sd=75)
#data frame
out<-data.frame(var, var1)
#set levels for plotting
out$var<-factor(out$var, levels=out$var[order(out$var1, decreasing=FALSE)])
#PLot
out.plot<-out %>%
ggplot(., aes(x=var, y=var1))+geom_point()+coord_flip()
#Add staggered axis option
out.plot+theme(axis.text.y = element_text(hjust = grid::unit(c(-2, 0, 2), "points")))
To stagger the labels, you could add spaces to the labels in the dataframe.
# Libraries
library(ggplot2)
library(stringi)
# fake data
set.seed(12345)
var <- stri_rand_strings(81, 4, pattern = '[HrhEgeIdiFtf]')
var1 <- rnorm(81, mean = 175, sd = 75)
out <- data.frame(var, var1)
# Add spacing, and set levels for plotting
out = out[order(out$var1), ]
out$var = paste0(out$var, c("", " ", " "))
out$var <- factor(out$var, levels = out$var[order(out$var1, decreasing = FALSE)])
# Plot
out.plot <- ggplot(out, aes(x = var, y = var1)) +
geom_point() + coord_flip()
out.plot
Alternatively, draw the original plot, then edit. Here, I use the grid function, editGrob() to do the editing.
# Libraries
library(ggplot2)
library(gtable)
library(grid)
library(stringi)
# fake data
set.seed(12345)
var <- stri_rand_strings(81, 4, pattern = '[HrhEgeIdiFtf]')
var1 <- rnorm(81, mean = 175, sd = 75)
out <- data.frame(var, var1)
# Set levels for plotting
out$var <- factor(out$var, levels = out$var[order(out$var1, decreasing = FALSE)])
# Plot
out.plot <- ggplot(out, aes(x = var, y = var1)) +
geom_point() + coord_flip()
# Get the ggplot grob
g = ggplotGrob(out.plot)
# Get a hierarchical list of component grobs
grid.ls(grid.force(g))
Look through the list to find the section referring to the left axis. The relevant bit is:
axis-l.6-3-6-3
axis.line.y..zeroGrob.232
axis
axis.1-1-1-1
GRID.text.229
axis.1-2-1-2
You will need to set up path from 'axis-l', through 'axis', through 'axis', though to 'GRID.text'.
# make the relevant column a little wider
g$widths[3] = unit(2.5, "cm")
# The edit
g = editGrob(grid.force(g),
gPath("axis-l", "axis", "axis", "GRID.text"),
x = unit(c(-1, 0, 1), "npc"),
grep = TRUE)
# Draw the plot
grid.newpage()
grid.draw(g)
Another option is to find your way through the structure to the relevant grob to make the edit.
# Get the grob
g <- ggplotGrob(out.plot)
# Get the y axis
index <- which(g$layout$name == "axis-l") # Which grob
yaxis <- g$grobs[[index]]
# Get the ticks (labels and marks)
ticks <- yaxis$children[[2]]
# Get the labels
ticksL <- ticks$grobs[[1]]
# Make the edit
ticksL$children[[1]]$x <- rep(unit.c(unit(c(1,0,-1),"npc")), 27)
# Put the edited labels back into the plot
ticks$grobs[[1]] <- ticksL
yaxis$children[[2]] <- ticks
g$grobs[[index]] <- yaxis
# Make the relevant column a little wider
g$widths[3] <- unit(2.5, "cm")
# Draw the plot
grid.newpage()
grid.draw(g)
Sandy mentions adding spaces to the labels.
With a discrete axis, you can also simply add line breaks to alternate cases. In my case I wanted to stagger alternate ones:
scale_x_discrete(labels=paste0(c("","\n"),net_change$TZ_t)
Where net_change$TZ_t is my ordered factor. It extends to 'triple' levels easily with c("","\n","\n\n").
I'm trying to write a custom scatterplot matrix function in ggplot2 using facet_grid. My data have two categorical variables and one numeric variable.
I'd like to facet (make the scatterplot rows/cols) according to one of the categorical variables and change the plotting symbol according to the other categorical.
I do so by first constructing a larger dataset that includes all combinations (combs) of the categorical variable from which I'm creating the scatterplot panels.
My questions are:
How to use geom_rect to white-out the diagonal and upper panels in facet_grid (I can only make the middle ones black so far)?
How can you move the titles of the facets to the bottom and left hand sides respectively?
How does one remove tick axes and labels for the top left and bottom right facets?
Thanks in advance.
require(ggplot2)
# Data
nC <- 5
nM <- 4
dat <- data.frame(
Control = rep(LETTERS[1:nC], nM),
measure = rep(letters[1:nM], each = nC),
value = runif(nC*nM))
# Change factors to characters
dat <- within(dat, {
Control <- as.character(Control)
measure <- as.character(measure)
})
# Check, lapply(dat, class)
# Define scatterplot() function
scatterplotmatrix <- function(data,...){
controls <- with(data, unique(Control))
measures <- with(data, unique(measure))
combs <- expand.grid(1:length(controls), 1:length(measures), 1:length(measures))
# Add columns for values
combs$value1 = 1
combs$value2 = 0
for ( i in 1:NROW(combs)){
combs[i, "value1"] <- subset(data, subset = Control==controls[combs[i,1]] & measure == measures[combs[i,2]], select = value)
combs[i, "value2"] <- subset(data, subset = Control==controls[combs[i,1]] & measure == measures[combs[i,3]], select = value)
}
for ( i in 1:NROW(combs)){
combs[i,"Control"] <- controls[combs[i,1]]
combs[i,"Measure1"] <- measures[combs[i,2]]
combs[i,"Measure2"] <- measures[combs[i,3]]
}
# Final pairs plot
plt <- ggplot(combs, aes(x = value1, y = value2, shape = Control)) +
geom_point(size = 8, colour = "#F8766D") +
facet_grid(Measure2 ~ Measure1) +
ylab("") +
xlab("") +
scale_x_continuous(breaks = c(0,0.5,1), labels = c("0", "0.5", "1"), limits = c(-0.05, 1.05)) +
scale_y_continuous(breaks = c(0,0.5,1), labels = c("0", "0.5", "1"), limits = c(-0.05, 1.05)) +
geom_rect(data = subset(combs, subset = Measure1 == Measure2), colour='white', xmin = -Inf, xmax = Inf,ymin = -Inf,ymax = Inf)
return(plt)
}
# Call
plt1 <- scatterplotmatrix(dat)
plt1
I'm not aware of a way to move the panel strips (the labels) to the bottom or left. Also, it's not possible to format the individual panels separately (e.g., turn off the tick marks for just one facet). So if you really need these features, you will probably have to use something other than, or in addition to ggplot. You should really look into GGally, although I've never had much success with it.
As far as leaving some of the panels blank, here is a way.
nC <- 5; nM <- 4
set.seed(1) # for reproducible example
dat <- data.frame(Control = rep(LETTERS[1:nC], nM),
measure = rep(letters[1:nM], each = nC),
value = runif(nC*nM))
scatterplotmatrix <- function(data,...){
require(ggplot2)
require(data.table)
require(plyr) # for .(...)
DT <- data.table(data,key="Control")
gg <- DT[DT,allow.cartesian=T]
setnames(gg,c("Control","H","x","V","y"))
fmt <- function(x) format(x,nsmall=1)
plt <- ggplot(gg, aes(x,y,shape = Control)) +
geom_point(subset=.(as.numeric(H)<as.numeric(V)),size=5, colour="#F8766D") +
facet_grid(V ~ H) +
ylab("") + xlab("") +
scale_x_continuous(breaks=c(0,0.5,1), labels=fmt, limits=c(-0.05, 1.05)) +
scale_y_continuous(breaks=c(0,0.5,1), labels=fmt, limits=c(-0.05, 1.05))
return(plt)
}
scatterplotmatrix(dat)
The main feature of this is the use of subset=.(as.numeric(H)<as.numeric(V)) in the call to geom_point(...). This subsets the dataset so you only get a point layer when the condition is met, e.g. in facets where is.numeric(H)<is.numeric(V). This works because I've left the H and V columns as factors and is.numeric(...) operating on a factor returns the levels, not the names.
The rest is just a more compact (and much faster) way of creating what you called comb.
I'm running an R script generating plots of the PCA analysis using FactorMineR.
I'd like to output the coordinates for the generated PCA plots but I'm having trouble finding the right coordinates. I found results1$ind$coord and results1$var$coord but neither look like the default plot.
I found
http://www.statistik.tuwien.ac.at/public/filz/students/seminar/ws1011/hoffmann_ausarbeitung.pdf
and
http://factominer.free.fr/classical-methods/principal-components-analysis.html
but neither describe the contents of the variable created by the PCA
library(FactoMineR)
data1 <- read.table(file=args[1], sep='\t', header=T, row.names=1)
result1 <- PCA(data1,ncp = 4, graph=TRUE) # graphs generated automatically
plot(result1)
I found that $ind$coord[,1] and $ind$coord[,2] are the first two pca coords in the PCA object. Here's a worked example that includes a few other things you might want to do with the PCA output...
# Plotting the output of FactoMineR's PCA using ggplot2
#
# load libraries
library(FactoMineR)
library(ggplot2)
library(scales)
library(grid)
library(plyr)
library(gridExtra)
#
# start with a clean slate
rm(list=ls(all=TRUE))
#
# load example data
data(decathlon)
#
# compute PCA
res.pca <- PCA(decathlon, quanti.sup = 11:12, quali.sup=13, graph = FALSE)
#
# extract some parts for plotting
PC1 <- res.pca$ind$coord[,1]
PC2 <- res.pca$ind$coord[,2]
labs <- rownames(res.pca$ind$coord)
PCs <- data.frame(cbind(PC1,PC2))
rownames(PCs) <- labs
#
# Just showing the individual samples...
ggplot(PCs, aes(PC1,PC2, label=rownames(PCs))) +
geom_text()
# Now get supplementary categorical variables
cPC1 <- res.pca$quali.sup$coor[,1]
cPC2 <- res.pca$quali.sup$coor[,2]
clabs <- rownames(res.pca$quali.sup$coor)
cPCs <- data.frame(cbind(cPC1,cPC2))
rownames(cPCs) <- clabs
colnames(cPCs) <- colnames(PCs)
#
# Put samples and categorical variables (ie. grouping
# of samples) all together
p <- ggplot() + theme(aspect.ratio=1) + theme_bw(base_size = 20)
# no data so there's nothing to plot...
# add on data
p <- p + geom_text(data=PCs, aes(x=PC1,y=PC2,label=rownames(PCs)), size=4)
p <- p + geom_text(data=cPCs, aes(x=cPC1,y=cPC2,label=rownames(cPCs)),size=10)
p # show plot with both layers
# Now extract the variables
#
vPC1 <- res.pca$var$coord[,1]
vPC2 <- res.pca$var$coord[,2]
vlabs <- rownames(res.pca$var$coord)
vPCs <- data.frame(cbind(vPC1,vPC2))
rownames(vPCs) <- vlabs
colnames(vPCs) <- colnames(PCs)
#
# and plot them
#
pv <- ggplot() + theme(aspect.ratio=1) + theme_bw(base_size = 20)
# no data so there's nothing to plot
# put a faint circle there, as is customary
angle <- seq(-pi, pi, length = 50)
df <- data.frame(x = sin(angle), y = cos(angle))
pv <- pv + geom_path(aes(x, y), data = df, colour="grey70")
#
# add on arrows and variable labels
pv <- pv + geom_text(data=vPCs, aes(x=vPC1,y=vPC2,label=rownames(vPCs)), size=4) + xlab("PC1") + ylab("PC2")
pv <- pv + geom_segment(data=vPCs, aes(x = 0, y = 0, xend = vPC1*0.9, yend = vPC2*0.9), arrow = arrow(length = unit(1/2, 'picas')), color = "grey30")
pv # show plot
# Now put them side by side in a single image
#
grid.arrange(p,pv,nrow=1)
#
# Now they can be saved or exported...
Adding something extra to Ben's answer. You'll note in the first chart in Ben's response that the labels overlap somewhat. The pointLabel() function in the maptools package attempts to find locations for the labels without overlap. It's not perfect, but you can adjust the positions in the new dataframe (see below) to fine tune if you want. (Also, when you load maptools you get a note about gpclibPermit(). You can ignore it if you're concerned about the restricted licence). The first part of the script below is Ben's script.
# load libraries
library(FactoMineR)
library(ggplot2)
library(scales)
library(grid)
library(plyr)
library(gridExtra)
#
# start with a clean slate
# rm(list=ls(all=TRUE))
#
# load example data
data(decathlon)
#
# compute PCA
res.pca <- PCA(decathlon, quanti.sup = 11:12, quali.sup=13, graph = FALSE)
#
# extract some parts for plotting
PC1 <- res.pca$ind$coord[,1]
PC2 <- res.pca$ind$coord[,2]
labs <- rownames(res.pca$ind$coord)
PCs <- data.frame(cbind(PC1,PC2))
rownames(PCs) <- labs
#
# Now, the code to produce Ben's first chart but with less overlap of the labels.
library(maptools)
PCs$label=rownames(PCs)
# Base plot first for pointLabels() to get locations
plot(PCs$PC1, PCs$PC2, pch = 20, col = "red")
new = pointLabel(PCs$PC1, PCs$PC2, PCs$label, cex = .7)
new = as.data.frame(new)
new$label = PCs$label
# Then plot using ggplot2
(p = ggplot(data = PCs) +
geom_hline(yintercept = 0, linetype = 3, colour = "grey20") +
geom_vline(xintercept = 0, linetype = 3, colour = "grey20") +
geom_point(aes(PC1, PC2), shape = 20, col = "red") +
theme_bw())
(p = p + geom_text(data = new, aes(x, y, label = label), size = 3))
The result is:
An alternative is to use the biplot function from CoreR or biplot.psych from the psych package. This will put the components and the data onto the same figure.
For the decathlon data set, use principal and biplot from the psych package:
library(FactoMineR) #needed to get the example data
library(psych) #needed for principal
data(decathlon) #the data set
pc2 <- principal(decathlon[1:10],2) #just the first 10 columns
biplot(pc2,labels = rownames(decathlon),cex=.5, main="Biplot of Decathlon results")
#this is a call to biplot.psych which in turn calls biplot.
#adjust the cex parameter to change the type size of the labels.
This looks like:
!a biplot http://personality-project.org/r/images/olympic.biplot.pdf
Bill