introducing a gap in continuous x axis using ggplot - r

This is kinda a build-on on my previous post creating an stacked area/bar plot with missing values (all the script I run can be found there). In this post, however, Im asking if its possible to leave a gap in an continuous x axis? I have a time-serie (month-by-month) over a year, but for one sample one month is missing and I would like to show this month as a complete gap in the plot. Almost like plotting a graph for Jan-Aug (Sep is missing) and one for Oct-Dec and merging these with a gap for Sep.
The only things I have come up trying are treating the missing month as zero or NA, creating a hugh drop in the area chart for Sep or excluding it but with an x axis ranging from 1-11, respectively (see plots in dropbox folder).
The data set Im working on can be found in my dropbox folder and it's named r_class.txt and you can also see the two different plots (Rplots1 and 2).
Any ideas would really be appreciated!

Plot the series as two separate data frames:
#Load libraries
require(ggplot2)
require(reshape)
#Code copied from your linked post:
wa=read.table('wa_class.txt', sep="", header=F, na.string="0")
names(wa)=c("Class","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
wam=melt(wa)
wam$variablen=as.numeric(wam$variable)
#For readability, split the melted data frame into two separate data frames
wam1 <- wam[wam$variablen %in% 1:6,]
wam2 <- wam[wam$variablen %in% 8:12, ]
ggplot() +
geom_area(data=wam1, aes(x=variablen, y=value, fill=Class)) +
geom_area(data=wam2, aes(x=variablen, y=value, fill=Class))
#and add lineranges, etc., accordingly

Related

geom_area doesn't show data, supposedly because of x-axis data

I want to create a stacked area plot based on a data frame.
Time <- c("W37/19","W38/19","W39/19","W40/19","W41/19")
Basis <- c(20.07,20.07,20.07,20.07,20.07)
AdStock <- c(5.88,5.60,5.34,5.09,4.86)
TV <- c(0,0,0.54,0.93,1.14)
Display <- c(0.07,0.21,0.33,0.35,0.36)
df_graph <- data.frame(Time, Basis, AdStock, TV, Display)
Data is time series data, "Time" is German calender weeks and should stay in this order.
First thing I do is transforming the data in long format.
library(tidyr)
df_graph <- pivot_longer(df_graph[,c("Time","Basis","AdStock","TV","Display")],-Time)
Second I convert df_graph$name to a factor and reverse the order, because I want to keep the original order for the stacking.
library(forcats)
df_graph$name <-factor(df_graph$name, levels = c("Basis","AdStock","TV","Display"))
df_graph$name <- fct_rev(df_graph$name)
Then I want to plot my data.
library(ggplot2)
p <- ggplot(df_graph, aes(x=Time, y=value, fill=name))
p <- p + geom_area()
p
The plot shows both axes as well as the legend but no data.
If I replace the calender weeks in "Time" by just an ascending series of numbers
df_graph$Time <- seq(1:5)
it works, but not with my X-Axis values.
Also I don't think, that the conversion of "Name" to factor is a problem, because I still don't get data in my plot even if I remove these two lines.
I tried different methods for the Long-Format (e.g. gather) and also tried using the ascending series of numbers(1:5) as x-values and then replacing it with scale_x_discrete but my areas always disappear.
What am I missing?
Many thanks in advance.

Overlapping R ggplots with two different y axis (different time period) and x axis (different scale)

I currently have two different data sets, with two columns each (date and value). Basically, these two data sets differ in that they occur in different time periods (y axis), I would say roughly twenty years apart and as for the x axis (value) the ratio is about 1:10. The aim here is that since they occur in different time periods, is there a way for me to overlay these two plots using ggplot and have the two different y axes (different time periods) placed one above and the other below i.e. to say one of the dataset is from 1994-2002 and second dataset is 2017-2020. Reason is because they both exhibit the same pattern and i would like them to be placed together to exhibit the pattern clearly.
Example of such a chart is as attached.
It's very doable. For ggplot2, you can easily add a secondary axis, with a simple transform. I'll use the EuStockMarkets data as an example.
library(ggplot2)
library(dplyr)
library(tidyr)
rm(list=ls())
data(EuStockMarkets)
StockMarkets <- EuStockMarkets[, c("DAX","FTSE")]
plot(StockMarkets, plot.type="single", col=c(2,3),
main="European Stock Markets (1991 - 1998)",
ylab="Closing price (value)")
legend("topleft", inset=0.02, legend=colnames(StockMarkets),
lwd=2, lty=1, col=c(2,3))
You'll get a nice plot like this:
The ggplot2 version is also quite simple:
StockMarkets %>%
as.data.frame() %>%
mutate(sDate=as.Date(seq(1,1860,1), origin="1991-05-10")) %>%
pivot_longer(-sDate) %>%
ggplot(aes(x=sDate, y=value, color=name)) +
geom_line()
Now suppose FTSE is 20 years ago. I will change the date manually, so you can see the results.
DAX <- tibble(Stock=as.vector(EuStockMarkets[,c("DAX")])) %>%
mutate(stDate=as.Date(seq(1,1860,1), origin="1991-05-10"), name="DAX")
FTSE <- tibble(Stock=as.vector(EuStockMarkets[,c("FTSE")])) %>%
mutate(stDate=as.Date(seq(1,1860,1), origin="1971-05-10"), name="FTSE")
Now combine them into one data frame. Imagine you start with this data. And imagine it has NASDAQ and Bitcoin instead of FTSE and DAX.
DAX_FTSE <- bind_rows(DAX, FTSE)
If you try to plot this data, you get the following, which is correct, but not what the OP wanted:
DAX_FTSE %>%
ggplot(aes(x=stDate, y=Stock, color=name)) +
geom_line()
The trick here is to add a secondary axis with a simple transform:
DAX_FTSE %>%
mutate(st2Date=if_else(name=="FTSE", stDate+20*365.25, stDate)) %>%
ggplot(aes(x=st2Date, y=Stock, color=name)) +
geom_line() + xlab(label="DAX") +
scale_x_date("DAX", sec.axis=sec_axis(~ . -20*365.25, name="FTSE"))

simple boxplot using qplot/ggplot2

This is my first post, so go easy. Up until now (the past ~5 years?) I've been able to either tweak my R code the right way or find an answer on this or various other sites. Trust me when I say that I've looked for an answer!
I have a working script to create the attached boxplot in basic R.
http://i.stack.imgur.com/NaATo.jpg
This is fine, but I really just want to "jazz" it up in ggplot, for vain reasons.
I've looked at the following questions and they are close, but not complete:
Why does a boxplot in ggplot requires axis x and y?
How do you draw a boxplot without specifying x axis?
My data is basically like "mtcars" if all the numerical variables were on the same scale.
All I want to do is plot each variable on the same boxplot, like the basic R boxplot I made above. My y axis is the same continuous scale (0 to 1) for each box and the x axis simply labels each month plus a yearly average (think all the mtcars values the same on the y axis and the x axis is each vehicle model). Each box of my data represents 75 observations (kind of like if mtcars had 75 different vehicle models), again all the boxes are on the same scale.
What am I missing?
Though I don't think mtcars makes a great example for this, here it is:
First, we make the data (hopefully) more similar to yours by using a column instead of rownames.
mt = mtcars
mt$car = row.names(mtcars)
Then we reshape to long format:
mt_long = reshape2::melt(mt, id.vars = "car")
Then the plot is easy:
library(ggplot2)
ggplot(mt_long, aes(x = variable, y = value)) +
geom_boxplot()
Using ggplot all but requires data in "long" format rather than "wide" format. If you want something to be mapped to a graphical dimension (x-axis, y-axis, color, shape, etc.), then it should be a column in your data. Luckily, it's usually quite easy to get data in the right format with reshape2::melt or tidyr::gather. I'd recommend reading the Tidy Data paper for more on this topic.

graphing multiple data series in R ggplot

I am trying to plot (on the same graph) two sets of data versus date from two different data frames. Both data frames have the same exact dates for each of the two measurements. I would like to plot these two sets of data on the same graph, with different colors. However, I can't get them on the same graph at all. R is already reading the date as date. I tried this:
qplot( date , NO3, data=qual.arn)
+ qplot( qual.arn$date , qual.arn$DIS.O2, "O2(aq)" , add=T)
and received this error.
Error in add_ggplot(e1, e2, e2name) :
argument "e2" is missing, with no default
I tried using the ggplot function instead of qplot, but I couldn't even plot one graph this way.
ggplot(date=qual.no3.s, aes(date,NO3))
Error: ggplot2 doesn't know how to deal with data of class uneval
PLEASE HELP. Thank you!
Since you didn't provide any data (please do so in future), here's a made up dataset for demonstrate a solution. There are (at least) two ways to do this: the right way and the wrong way. Both yield equivalent results in this very simple case.
# set up minimum reproducible example
set.seed(1) # for reproducible example
dates <- seq(as.Date("2015-01-01"),as.Date("2015-06-01"), by=1)
df1 <- data.frame(date=dates, NO3=rpois(length(dates),25))
df2 <- data.frame(date=dates, DIS.O2=rnorm(length(dates),50,10))
ggplot is designed to use data in "long" format. This means that all the y-values (the concentrations) are in a single column, and there is separate column which identifies the corresponding category ("NO3" or "DIS.O2" in your case). So first we merge the two data-sets based on date, then use melt(...) to convert from "wide" (categories in separate columns) to "long" format. Then we let ggplot worry about legends, colors, etc.
library(ggplot2)
library(reshape2) # for melt(...)
# The right way: combine the data-sets, then plot
df.mrg <- merge(df1,df2, by="date", all=TRUE)
gg.df <- melt(df.mrg, id="date", variable.name="Component", value.name="Concentration")
ggplot(gg.df, aes(x=date, y=Concentration, color=Component)) +
geom_point() + labs(x=NULL)
The "wrong" way to do this is by making separate calls to geom_point(...) for each layer. In your particular case this might be simpler, but in the long run it's better to use the other method.
# The wrong way: plot two sets of points
ggplot() +
geom_point(data=df1, aes(x=date, y=NO3, color="NO2")) +
geom_point(data=df2, aes(x=date, y=DIS.O2, color="DIS.O2")) +
scale_color_manual(name="Component",values=c("red", "blue")) +
labs(x=NULL, y="Concentration")

Create Lollipop-like plot with R

I have a .csv file that looks like that:
Pos,ReadsME_016,ReadsME_017,ReadsME_018,ReadsME_019,ReadsME_020,ReadsME_021
95952794,62.36,62.06,55.56,51,60.35,44.27
95952795,100,100,100,100,100,100
95952833,0,0,-,0,-,-
95952846,0,0,-,0,0,-
95952876,0,-,0,0,0,0
95952877,38.89,28.98,25.67,36.99,37.91,16.86
95952878,100,100,100,100,100,100
95952884,0,-,0,-,-,0
95952897,18.7,20.52,20.94,16.43,22.68,12.55
95952898,100,100,75,80,-,100
95952902,10.88,8.93,10.22,10.63,13.51,6.06
95952903,100,100,100,75,-,100
95952915,10.75,8.7,7.91,8.35,15.12,8.88
What I want is to create a plot that is similar to this one:
http://www.scfbm.org/content/9/1/11/figure/F2
However, all my attempts failed. Unfortunately, the tool is yet not available and I cannot read the source code.
I've thought of ggplot and melt, but I do not come close to this graph. How can I achieve that all read samples (ReadsME_016,ReadsME_017,..) are listed on the x-axes and the positions are listed on the y-axes? I don’t know how to deal with both x- & y-axes being categorical while the plotted values should show percentages?
dataset <- melt(dataset, id.vars="Pos")
ggplot(dataset, aes(x=value, y=Pos, colour=variable)) + geom_point()
Here is the complete .csv file:
Pos,ReadsME_016,ReadsME_017,ReadsME_018,ReadsME_019,ReadsME_020,ReadsME_021,ReadsME_022,ReadsME_023,ReadsME_024,ReadsME_025,ReadsME_026,ReadsME_027,ReadsME_028,ReadsME_030,ReadsME_031,ReadsME_032
95952794,62.36,62.06,55.56,51.0,60.35,44.27,53.73,61.69,57.04,64.16,61.48,59.42,66.93,49.71,55.23,66.67
95952795,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-
95952833,0.0,0.0,-,0.0,-,-,100.0,-,-,-,-,0.0,-,-,0.0,-
95952846,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,-,-,0.0,-,-,-,-
95952876,0.0,-,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-
95952877,38.89,28.98,25.67,36.99,37.91,16.86,29.65,35.38,35.43,36.87,34.04,33.91,35.04,19.09,38.35,0.0
95952878,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-
95952884,0.0,-,0.0,-,-,0.0,-,-,100.0,-,-,0.0,-,-,-,-
95952897,18.7,20.52,20.94,16.43,22.68,12.55,18.3,22.28,21.05,22.55,24.81,20.63,22.05,13.06,22.8,0.0
95952898,100.0,100.0,75.0,80.0,-,100.0,80.0,100.0,100.0,-,-,-,100.0,-,100.0,-
95952902,10.88,8.93,10.22,10.63,13.51,6.06,9.62,15.73,14.08,18.65,13.28,16.44,15.02,8.92,11.11,100.0
95952903,100.0,100.0,100.0,75.0,-,100.0,100.0,100.0,100.0,-,-,100.0,100.0,100.0,100.0,-
95952915,10.75,8.7,7.91,8.35,15.12,8.88,7.32,9.76,11.45,8.99,10.57,14.07,10.36,6.35,10.04,0.0
95952916,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-,-,100.0,100.0,-,100.0,-
95952925,10.39,8.33,8.59,10.51,14.19,10.99,6.98,11.56,13.93,15.0,14.29,16.26,9.76,5.86,12.96,0.0
95952926,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-,-,-,100.0,-,100.0,-
95952937,19.53,14.97,11.97,14.43,19.26,17.18,19.48,12.31,21.17,21.57,23.08,26.24,16.38,13.47,21.82,0.0
95952938,100.0,100.0,100.0,100.0,-,100.0,100.0,-,-,-,-,-,-,-,100.0,-
95952825,-,0.0,-,-,-,-,-,-,-,-,0.0,-,-,0.0,0.0,-
95952975,-,0.0,-,-,-,-,-,-,0.0,-,-,-,-,-,-,-
95952669,-,-,0.0,-,-,0.0,0.0,-,-,-,-,-,-,-,0.0,-
95952718,-,-,0.0,0.0,0.0,-,0.0,-,-,-,0.0,-,-,0.0,0.0,-
95952868,-,-,0.0,-,0.0,-,-,0.0,-,-,0.0,-,-,-,-,-
95952957,-,-,0.0,-,-,-,-,0.0,0.0,0.0,-,0.0,-,-,-,-
95952976,-,-,0.0,-,0.0,0.0,0.0,100.0,-,0.0,-,-,-,-,0.0,-
95952681,-,-,-,0.0,-,0.0,-,0.0,-,-,-,-,-,0.0,-,-
95952779,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-,-,-
95952811,-,-,-,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-,-,-,0.0,-
95952821,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-,-,-
95952823,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-,-,-
95952859,-,-,-,0.0,0.0,-,-,0.0,0.0,-,0.0,-,-,0.0,0.0,-
95952882,-,-,-,0.0,-,-,-,-,-,-,0.0,-,-,-,-,-
95953023,-,-,-,0.0,-,0.0,-,-,-,-,-,-,-,-,-,-
95953058,-,-,-,0.0,-,0.0,-,-,-,-,-,-,-,-,-,-
95952664,-,-,-,-,-,0.0,0.0,-,-,0.0,-,-,-,-,0.0,-
95952801,-,-,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-
95952968,-,-,-,-,-,-,0.0,-,-,0.0,-,-,-,-,-,-
95952797,-,-,-,-,-,-,-,-,0.0,-,-,-,-,-,-,-
95952851,-,-,-,-,-,-,-,-,-,-,0.0,-,-,-,-,-
95952894,-,-,-,-,-,-,-,-,-,-,0.0,-,-,-,-,-
95952807,-,-,-,-,-,-,-,-,-,-,-,-,-,0.0,-,-
95952712,-,-,-,-,-,-,-,-,-,-,-,-,-,-,0.0,-
First, you want to make sure you are reading in your data properly. You have non-numeric values (specifically "-") mixed in with numeric values. I'm assuming those are missing values. Make sure you let R know that with na.strings="-". Then, to get something more consistent with the example plot, i changed your variables around
library(reshape2) # for melt()
library(ggplot2) # for ggplot()
dataset <- read.table("file.txt", header=TRUE, sep=",", na.strings="-")
ggplot(melt(dataset, id.vars="Pos"),
aes(x=Pos, y=variable, colour=cut(value, breaks=5))) +
geom_point()

Resources