I'm trying to recreate a graph I made in excel, using ggplot2 in R and I'm having some trouble.
My variable 1 is a continuous variable (prices) and my variable two is a discrete one (cashflows)- both plotted against the same time steps.
As you noticed one variable is plotted using bars and the other using a line.
Is there any way someone could give me some help using random values? I was only able to plot them as lines.
In the sample code below v1 is the prices, v2 is the cashflows, and time is a seq(1:270)
gdata = data.frame(num = time, prices=v2, cashflows = v3)
test_data <- melt(gdata, id="num")
ggplot(data=test_data, aes(x=num, y=value, colour=variable)) +
geom_line() +
ggtitle("Prices") +
labs(x="Time",y="Prices") + theme_grey(base_size = 14) + theme(legend.title=element_blank())
Related
I want to compare the distribution of several variables (here X1 and X2) with a single value (here bm). The issue is that these variables are too many (about a dozen) to use a single boxplot.
Additionaly the levels are too different to use one plot. I need to use facets to make things more organised:
However with this plot my benchmark category (bm), which is a single value in X1 and X2, does not appear in X1 and seems to have several values in X2. I want it to be only this green line, which it is in the first plot. Any ideas why it changes? Is there any good workaround? I tried the options of facet_wrap/facet_grid, but nothing there delivered the right result.
I also tried combining a bar plot with bm and three empty categories with the boxplot. But firstly it looked terrible and secondly it got similarly screwed up in the facetting. Basically any work around would help.
Below the code to create the minimal example displayed here:
# Creating some sample data & loading libraries
library(ggplot2)
library(RColorBrewer)
set.seed(10111)
x=matrix(rnorm(40),20,2)
y=rep(c(-1,1),c(10,10))
x[y==1,]=x[y==1,]+1
x[,2]=x[,2]+20
df=data.frame(x,y)
# creating a benchmark point
benchmark=data.frame(y=rep("bm",2),key=c("X1","X2"),value=c(-0.216936,20.526312))
# melting the data frame, rbinding it with the benchmark
test_dat=rbind(tidyr::gather(df,key,value,-y),benchmark)
# Creating a plot
p_box <- ggplot(data = test_dat, aes(x=key, y=value,color=as.factor(test_dat$y))) +
geom_boxplot() + scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1"))
# The first line delivers the first plot, the second line the second plot
p_box
p_box + facet_wrap(~key,scales = "free",drop = FALSE) + theme(legend.position = "bottom")
The problem only lies int the use of test_dat$y inside the color aes. Never use $ in aes, ggplot will mess up.
Anyway, I think you plot would improve if you use a geom_hline for the benchmark, instead of hacking in a single value boxplot:
library(ggplot2)
library(RColorBrewer)
ggplot(tidyr::gather(df,key,value,-y)) +
geom_boxplot(aes(x=key, y=value, color=as.factor(y))) +
geom_hline(data = benchmark, aes(yintercept = value), color = '#4DAF4A', size = 1) +
scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1")) +
facet_wrap(~key,scales = "free",drop = FALSE) +
theme(legend.position = "bottom")
I'm sure this is a very simple question for most of you, but I'm new and can't figure it out. How do you create a side by side box plot grouped by time? For example, I have 24 months of data. I want to make one box plot for the first 12 months, and another for the second 12 months. My data can be seen below.
Month,Revenue
1,94000
2,81000
3,117000
4,105000
5,117000
6,89000
7,101000
8,118000
9,105000
10,123000
11,109000
12,89000
13,106000
14,159000
15,121000
16,135000
17,116000
18,133000
19,144000
20,130000
21,142000
22,124000
23,140000
24,104000
Since your data has a time ordering, it might be illuminating to plot line plots by month for each year separately. Here is code for both a line plot and a boxplot. I just made up the year values in the code below, but you can make those whatever is appropriate:
library(ggplot2)
# Assuming your data frame is called "dat"
dat$Month.abb = month.abb[rep(1:12,2)]
dat$Month.abb = factor(dat$Month.abb, levels=month.abb)
dat$Year = rep(2014:2015, each=12)
ggplot(dat, aes(Month.abb, Revenue, colour=factor(Year))) +
geom_line(aes(group=Year)) + geom_point() +
scale_y_continuous(limits=c(0,max(dat$Revenue))) +
theme_bw() +
labs(colour="Year", x="Month")
ggplot(dat, aes(factor(Year), Revenue)) +
geom_boxplot() +
scale_y_continuous(limits=c(0,max(dat$Revenue))) +
theme_bw() +
labs(x="Year")
In the data that I am attempting to plot, each sample belongs in one of several groups, that will be plotted on their own grids. I am plotting stacked bar plots for each sample that will be ordered in increasing number of sequences, which is an id attribute of each sample.
Currently, the plot (with some random data) looks like this:
(Since I don't have the required 10 rep for images, I am linking it here)
There are couple things I need to accomplish. And I don't know where to start.
I would like the bars not to be placed at its corresponding nseqs value, rather placed next to each other in ascending nseqs order.
I don't want each grid to have the same scale. Everything needs to fit snugly.
I have tried to set scales and size to for facet_grid to free_x, but this results in an unused argument error. I think this is related to the fact that I have not been able to get the scales library loaded properly (it keeps saying not available).
Code that deals with plotting:
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_grid(~group) +
scale_y_continuous() +
opts(title=paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))
Try this:
update.packages()
## I'm assuming your ggplot2 is out of date because you use opts()
## If the scales library is unavailable, you might need to update R
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
ggfdata$nseqs <- factor(ggfdata$nseqs)
## Making nseqs a factor will stop ggplot from treating it as a numeric,
## which sounds like what you want
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_wrap(~group, scales="free_x") + ## No need for facet_grid with only one variable
labs(title = paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))
I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.
I would like to produce a scatter plot with ggplot2, which contains both a regression line through all data points (regardless which group they are from), but at the same time varies the shape of the markers by the grouping variable. The code below produces the group markers, but comes up with TWO regression lines, one for each group.
#model=lm(df, ParamY~ParamX)
p1<-ggplot(df,aes(x=ParamX,y=ParamY,shape=group)) + geom_point() + stat_smooth(method=lm)
How can I program that?
you shouldn't have to redo your full aes in the geom_point and add another layer, just move the shape aes to the geom_point call:
df <- data.frame(x=1:10,y=1:100+5,grouping = c(rep("a",10),rep("b",10)))
ggplot(df,aes(x=x,y=y)) +
geom_point(aes(shape=grouping)) +
stat_smooth(method=lm)
EDIT:
To help with your comment:
because annotate can end up, for me anyway, with the same labels on each facet. I like to make a mini data.frame that has my variable for faceting and the facet levels with another column representing the labels I want to use. In this case the label data frame is called dfalbs.
Then use this to label data frame to label the facets individually e.g.
df <- data.frame(x=1:10,y=1:10,grouping =
c(rep("a",5),rep("b",5)),faceting=c(rep(c("oneR2","twoR2"),5)))
dflabs <- data.frame(faceting=c("oneR2","twoR2"),posx=c(7.5,7.5),posy=c(2.5,2.5))
ggplot(df,aes(x=x,y=y,group=faceting)) +
geom_point(aes(shape=grouping),size=5) +
stat_smooth(method=lm) +
facet_wrap( ~ faceting) +
geom_text(data=dflabs,aes(x=posx,y=posy,label=faceting))