ggplot only printing gray boxes to file - r

So SO nailed getting my graph to work, but now i can't get it to print! The end goal is that i need to automate the updating of these plots, so the ggplot and print calls need to be in a function. When i run this code, each file just contains a gray square.
toyfn <- function(plotdata){
library(ggplot2)
plotS1 <- ggplot(plotdata)
plotS1 + geom_bar(aes(x=year,y=value,factor=variable,fill=variable,
order=-as.numeric(variable)), stat="identity") +
geom_line(data=linedata, aes(x=year,y=production))
ggsave('testprint.png',plotS1)
png(filename='testprint2.png')
print(plotS1)
dev.off()
}
library(ggplot2)
library(reshape)
# First let's make a toy dataset for our stacked plot/line plot example.
year = c(1,2,3,4,5,6)
stocks = c(2,4,3,2,4,3)
exports = stocks*2
domestic = stocks*3
production = c(15,16,15,16,15,16)
# Make 2 df's: alldata is for stacked bar chart, linedata is for plotting a line on top of it.
alldata = data.frame(year,stocks,exports,domestic)
linedata = data.frame(year,production)
# Make alldata 'long' for the stacking
melteddata = melt(alldata,id.vars="year")
toyfn(melteddata)

You are saving a plot with no geoms. The plot with geoms will display on the screen, but not in the file.
Try this:
toyfn <- function(plotdata){
plotS1 <- ggplot(plotdata, aes(year, value, factor = variable, fill = variable)) +
geom_bar(stat="identity", aes(order = -as.numeric(variable))) +
geom_line(data=linedata, aes(x=year,y=production))
ggsave('testprint.png', plot = plotS1)
}

Related

Something wrong with my segmented bar plot in ggplot2

I want to plot a segmented bar plot in ggplot2. Here is part of my dataframe, I want to plot the proportion of output(0 and 1) for each x1(0 and 1). But when I use the following code, what I plot is just black bars without any segmentation. What's the problem in here?
fig = ggplot(data=df, mapping=aes(x=x1, fill=output)) + geom_bar(stat="count", width=0.5, position='fill')
The output plot is here
You need factor variables for your task:
library(ggplot2)
df <- data.frame(x1=sample(0:1,100,replace = T),output=sample(0:1,100,replace = T))
ggplot(data = df, aes(x = as.factor(x1), fill = as.factor(output))) +
geom_histogram(stat = "count")+
labs(x="x11")
which give me:

Line plot with bars in secondary axis with different scales in ggplot2

I'm trying to plot a line graph (data points between 0 and 2.5, with interval of 0.5). I want to plot some bars in the same chart on the right-hand axis (between 0 and 60 with interval of 10). I am making some mistake in my code such that the bars get plotted in the left hand axis.
Here's some sample data and code:
Month <- c("J","F","M","A")
Line <- c(2.5,2,0.5,3.4)
Bar <- c(30,33,21,40)
df <- data.frame(Month,Line,Bar)
ggplot(df, aes(x=Month)) +
geom_line(aes(y = Line,group = 1)) +
geom_col(aes(y=Bar))+
scale_y_continuous("Line",
sec.axis = sec_axis(trans= ~. /50, name = "Bar"))
Here's the output
Thanks in advance.
Try this approach with scaling factor. It is better if you work with a scaling factor between your variables and then you use it for the second y-axis. I have made slight changes to your code:
library(tidyverse)
#Data
Month <- c("J","F","M","A")
Line <- c(2.5,2,0.5,3.4)
Bar <- c(30,33,21,40)
df <- data.frame(Month,Line,Bar)
#Scale factor
sfactor <- max(df$Line)/max(df$Bar)
#Plot
ggplot(df, aes(x=Month)) +
geom_line(aes(y = Line,group = 1)) +
geom_col(aes(y=Bar*sfactor))+
scale_y_continuous("Line",
sec.axis = sec_axis(trans= ~. /sfactor, name = "Bar"))
Output:

How to plot histograms of raw data on the margins of a plot of interpolated data

I would like to show in the same plot interpolated data and a histogram of the raw data of each predictor. I have seen in other threads like this one, people explain how to do marginal histograms of the same data shown in a scatter plot, in this case, the histogram is however based on other data (the raw data).
Suppose we see how price is related to carat and table in the diamonds dataset:
library(ggplot2)
p = ggplot(diamonds, aes(x = carat, y = table, color = price)) + geom_point()
We can add a marginal frequency plot e.g. with ggMarginal
library(ggExtra)
ggMarginal(p)
How do we add something similar to a tile plot of predicted diamond prices?
library(mgcv)
model = gam(price ~ s(table, carat), data = diamonds)
newdat = expand.grid(seq(55,75, 5), c(1:4))
names(newdat) = c("table", "carat")
newdat$predicted_price = predict(model, newdat)
ggplot(newdat,aes(x = carat, y = table, fill = predicted_price)) +
geom_tile()
Ideally, the histograms go even beyond the margins of the tileplot, as these data points also influence the predictions. I would, however, be already very happy to know how to plot a histogram for the range that is shown in the tileplot. (Maybe the values that are outside the range could just be added to the extreme values in different color.)
PS. I managed to more or less align histograms to the margins of the sides of a tile plot, using the method of the accepted answer in the linked thread, but only if I removed all kind of labels. It would be particularly good to keep the color legend, if possible.
EDIT:
eipi10 provided an excellent solution. I tried to modify it slightly to add the sample size in numbers and to graphically show values outside the plotted range since they also affect the interpolated values.
I intended to include them in a different color in the histograms at the side. I hereby attempted to count them towards the lower and upper end of the plotted range. I also attempted to plot the sample size in numbers somewhere on the plot. However, I failed with both.
This was my attempt to graphically illustrate the sample size beyond the plotted area:
plot_data = diamonds
plot_data <- transform(plot_data, carat_range = ifelse(carat < 1 | carat > 4, "outside", "within"))
plot_data <- within(plot_data, carat[carat < 1] <- 1)
plot_data <- within(plot_data, carat[carat > 4] <- 4)
plot_data$carat_range = as.factor(plot_data$carat_range)
p2 = ggplot(plot_data, aes(carat, fill = carat_range)) +
geom_histogram() +
thm +
coord_cartesian(xlim=xrng)
I tried to add the sample size in numbers with geom_text. I tried fitting it in the far right panel but it was difficult (/impossible for me) to adjust. I tried to put it on the main graph (which would anyway probably not be the best solution), but it didn’t work either (it removed the histogram and legend, on the right side and it did not plot all geom_texts). I also tried to add a third row of plots and writing it there. My attempt:
n_table_above = nrow(subset(diamonds, table > 75))
n_table_below = nrow(subset(diamonds, table < 55))
n_table_within = nrow(subset(diamonds, table >= 55 & table <= 75))
text_p = ggplot()+
geom_text(aes(x = 0.9, y = 2, label = paste0("N(>75) = ", n_table_above)))+
geom_text(aes(x = 1, y = 2, label = paste0("N = ", n_table_within)))+
geom_text(aes(x = 1.1, y = 2, label = paste0("N(<55) = ", n_table_below)))+
thm
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, text_p, ggplot(), widths=c(6,1), heights =c(6,1))
I would be very happy to receive help on either or both tasks (adding sample size as text & adding values outside plotted range in a different color).
Based on your comment, maybe the best approach is to roll your own layout. Below is an example. We create the marginal plots as separate ggplot objects and lay them out with the main plot. We also extract the legend and put it outside the marginal plots.
Set-up
library(ggplot2)
library(cowplot)
# Function to extract legend
#https://github.com/hadley/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend) }
thm = list(theme_void(),
guides(fill=FALSE),
theme(plot.margin=unit(rep(0,4), "lines")))
xrng = c(0.6,4.4)
yrng = c(53,77)
Plots
p1 = ggplot(newdat, aes(x = carat, y = table, fill = predicted_price)) +
geom_tile() +
theme_classic() +
coord_cartesian(xlim=xrng, ylim=yrng)
leg = g_legend(p1)
p1 = p1 + thm[-1]
p2 = ggplot(diamonds, aes(carat)) +
geom_line(stat="density") +
thm +
coord_cartesian(xlim=xrng)
p3 = ggplot(diamonds, aes(table)) +
geom_line(stat="density") +
thm +
coord_flip(xlim=yrng)
plot_grid(
plot_grid(plotlist=list(p2, ggplot(), p1, p3), ncol=2,
rel_widths=c(4,1), rel_heights=c(1,4), align="hv", scale=1.1),
leg, rel_widths=c(5,1))
UPDATE: Regarding your comment about the space between the plots: This is an Achilles heel of plot_grid and I don't know if there's a way to fix it. Another option is ggarrange from the experimental egg package, which doesn't add so much space between plots. Also, you need to save the output of ggarrange first and then lay out the saved object with the legend. If you run ggarrange inside grid.arrange you get two overlapping copies of the plot:
# devtools::install_github('baptiste/egg')
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, widths=c(6,1))

arbitrary number of plots for grid.arrange

I'm trying to plot an arbitrary number of bar plots with rmarkdown separated by 2 columns. In my example there will be 20 total plots so I was hoping to get 10 plots in each column, however, I can't seem to get this to work with grid.arrange
plot.categoric = function(df, feature){
df = data.frame(x=df[,feature])
plot.feature = ggplot(df, aes(x=x, fill = x)) +
geom_bar() +
geom_text(aes(label=scales::percent(..count../1460)), stat='count', vjust=-.4) +
labs(x=feature, fill=feature) +
ggtitle(paste0(length(df$x))) +
theme_minimal()
return(plot.feature)
}
plist = list()
for (i in 1:20){
plist = c(plist, list(plot.categoric(train, cat_features[i])))
}
args.list = c(plist, list(ncol=2))
do.call("grid.arrange", args.list)
When I knit this to html I'm getting the following output:
I was hoping I would get something along the lines of:
but even with this the figure sizes are still funky, I've tried playing with heights and widths but still no luck. Apologies if this is a long question
If you have all the ggplot objects in a list then you can easily build the two column graphic via gridExtra::grid.arrange. Here is a simple example that will put eight graphics into a 4x2 matrix.
library(ggplot2)
library(gridExtra)
# Build a set of plots
plots <-
lapply(unique(diamonds$clarity),
function(cl) {
ggplot(subset(diamonds, clarity %in% cl)) +
aes(x = carat, y = price, color = color) +
geom_point()
})
length(plots)
# [1] 8
grid.arrange(grobs = plots, ncol = 2)

Plot line on top of stacked bar chart in ggplot2

I have created a stacked bar chart, and I would now like to plot a line on the same graphic, but I can't figure it out. I've added the geom_line() to the ggplot call, but I only end up with the line, not the bar chart.
library(ggplot2)
library(reshape)
# First let's make a toy dataset for our stacked plot/line plot example.
year = c(1,2,3,4,5,6)
stocks = c(2,4,3,2,4,3)
exports = stocks*2
domestic = stocks*3
production = c(15,16,15,16,15,16)
# Make 2 df's: alldata is for stacked bar chart, linedata is for plotting a line on top of it.
alldata = data.frame(year,stocks,exports,domestic)
linedata = data.frame(year,production)
# Make alldata 'long' for the stacking
melteddata = melt(alldata,id.vars="year")
# This works fine: (but hooboy was tricky to figure out the ordering w/ stat="identity" )
plotS1 <- ggplot(melteddata, aes(x=year,y=value,factor=variable,fill=variable,order=-as.numeric(variable)))
plotS1 + geom_bar(stat="identity")
# This plots only the line, not the stacked bar chart :
plotS1 <- ggplot(melteddata)
plotS1 + geom_bar(aes(x=year,y=value,factor=variable,fill=variable,order=-as.numeric(variable)), stat="identity")
plotS1 + geom_line(data=linedata, aes(x=year,y=production))
You were close:
plotS1 <- ggplot(melteddata)
plotS1 + geom_bar(aes(x=year,y=value,factor=variable,fill=variable,
order=-as.numeric(variable)), stat="identity") +
geom_line(data=linedata, aes(x=year,y=production))

Resources