geom_area doesn't show data, supposedly because of x-axis data - r

I want to create a stacked area plot based on a data frame.
Time <- c("W37/19","W38/19","W39/19","W40/19","W41/19")
Basis <- c(20.07,20.07,20.07,20.07,20.07)
AdStock <- c(5.88,5.60,5.34,5.09,4.86)
TV <- c(0,0,0.54,0.93,1.14)
Display <- c(0.07,0.21,0.33,0.35,0.36)
df_graph <- data.frame(Time, Basis, AdStock, TV, Display)
Data is time series data, "Time" is German calender weeks and should stay in this order.
First thing I do is transforming the data in long format.
library(tidyr)
df_graph <- pivot_longer(df_graph[,c("Time","Basis","AdStock","TV","Display")],-Time)
Second I convert df_graph$name to a factor and reverse the order, because I want to keep the original order for the stacking.
library(forcats)
df_graph$name <-factor(df_graph$name, levels = c("Basis","AdStock","TV","Display"))
df_graph$name <- fct_rev(df_graph$name)
Then I want to plot my data.
library(ggplot2)
p <- ggplot(df_graph, aes(x=Time, y=value, fill=name))
p <- p + geom_area()
p
The plot shows both axes as well as the legend but no data.
If I replace the calender weeks in "Time" by just an ascending series of numbers
df_graph$Time <- seq(1:5)
it works, but not with my X-Axis values.
Also I don't think, that the conversion of "Name" to factor is a problem, because I still don't get data in my plot even if I remove these two lines.
I tried different methods for the Long-Format (e.g. gather) and also tried using the ascending series of numbers(1:5) as x-values and then replacing it with scale_x_discrete but my areas always disappear.
What am I missing?
Many thanks in advance.

Related

Differentiate missing values from main data in a plot using R

I create a dummy timeseries xts object with missing data on date 2-09-2015 as:
library(xts)
library(ggplot2)
library(scales)
set.seed(123)
seq <- seq(as.POSIXct("2015-09-01"),as.POSIXct("2015-09-02"), by = "1 hour")
ob1 <- xts(rnorm(length(seq),150,5),seq)
seq2 <- seq(as.POSIXct("2015-09-03"),as.POSIXct("2015-09-05"), by = "1 hour")
ob2 <- xts(rnorm(length(seq2),170,5),seq2)
final_ob <- rbind(ob1,ob2)
plot(final_ob)
# with ggplot
df <- data.frame(time = index(final_ob), val = coredata(final_ob) )
ggplot(df, aes(time, val)) + geom_line()+ scale_x_datetime(labels = date_format("%Y-%m-%d"))
After plotting my data looks like this:
The red coloured rectangular portion represents the date on which data is missing. How should I show that data was missing on this day in the main plot?
I think I should show this missing data with a different colour. But, I don't know how should I process data to reflect the missing data behaviour in the main plot.
Thanks for the great reproducible example.
I think you are best off to omit that line in your "missing" portion. If you have a straight line (even in a different colour) it suggests that data was gathered in that interval, that happened to fall on that straight line. If you omit the line in that interval then it is clear that there is no data there.
The problem is that you want the hourly data to be connected by lines, and then no lines in the "missing data section" - so you need some way to detect that missing data section.
You have not given a criteria for this in your question, so based on your example I will say that each line on the plot should consist of data at hourly intervals; if there's a break of more than an hour then there should be a new line. You will have to adjust this criteria to your specific problem. All we're doing is splitting up your dataframe into bits that get plotted by the same line.
So first create a variable that says which "group" (ie line) each data is in:
df$grp <- factor(c(0, cumsum(diff(df$time) > 1)))
Then you can use the group= aesthetic which geom_line uses to split up lines:
ggplot(df, aes(time, val)) + geom_line(aes(group=grp)) + # <-- only change
scale_x_datetime(labels = date_format("%Y-%m-%d"))

graphing multiple data series in R ggplot

I am trying to plot (on the same graph) two sets of data versus date from two different data frames. Both data frames have the same exact dates for each of the two measurements. I would like to plot these two sets of data on the same graph, with different colors. However, I can't get them on the same graph at all. R is already reading the date as date. I tried this:
qplot( date , NO3, data=qual.arn)
+ qplot( qual.arn$date , qual.arn$DIS.O2, "O2(aq)" , add=T)
and received this error.
Error in add_ggplot(e1, e2, e2name) :
argument "e2" is missing, with no default
I tried using the ggplot function instead of qplot, but I couldn't even plot one graph this way.
ggplot(date=qual.no3.s, aes(date,NO3))
Error: ggplot2 doesn't know how to deal with data of class uneval
PLEASE HELP. Thank you!
Since you didn't provide any data (please do so in future), here's a made up dataset for demonstrate a solution. There are (at least) two ways to do this: the right way and the wrong way. Both yield equivalent results in this very simple case.
# set up minimum reproducible example
set.seed(1) # for reproducible example
dates <- seq(as.Date("2015-01-01"),as.Date("2015-06-01"), by=1)
df1 <- data.frame(date=dates, NO3=rpois(length(dates),25))
df2 <- data.frame(date=dates, DIS.O2=rnorm(length(dates),50,10))
ggplot is designed to use data in "long" format. This means that all the y-values (the concentrations) are in a single column, and there is separate column which identifies the corresponding category ("NO3" or "DIS.O2" in your case). So first we merge the two data-sets based on date, then use melt(...) to convert from "wide" (categories in separate columns) to "long" format. Then we let ggplot worry about legends, colors, etc.
library(ggplot2)
library(reshape2) # for melt(...)
# The right way: combine the data-sets, then plot
df.mrg <- merge(df1,df2, by="date", all=TRUE)
gg.df <- melt(df.mrg, id="date", variable.name="Component", value.name="Concentration")
ggplot(gg.df, aes(x=date, y=Concentration, color=Component)) +
geom_point() + labs(x=NULL)
The "wrong" way to do this is by making separate calls to geom_point(...) for each layer. In your particular case this might be simpler, but in the long run it's better to use the other method.
# The wrong way: plot two sets of points
ggplot() +
geom_point(data=df1, aes(x=date, y=NO3, color="NO2")) +
geom_point(data=df2, aes(x=date, y=DIS.O2, color="DIS.O2")) +
scale_color_manual(name="Component",values=c("red", "blue")) +
labs(x=NULL, y="Concentration")

How do I put multiple boxplots in the same graph in R?

Sorry I don't have example code for this question.
All I want to know is if it is possible to create multiple side-by-side boxplots in R representing different columns/variables within my data frame. Each boxplot would also only represent a single variable--I would like to set the y-scale to a range of (0,6).
If this isn't possible, how can I use something like the panel option in ggplot2 if I only want to create a boxplot using a single variable? Thanks!
Ideally, I want something like the image below but without factor grouping like in ggplot2. Again, each boxplot would represent completely separate and single columns.
ggplot2 requires that your data to be plotted on the y-axis are all in one column.
Here is an example:
set.seed(1)
df <- data.frame(
value = runif(810,0,6),
group = 1:9
)
df
library(ggplot2)
ggplot(df, aes(factor(group), value)) + geom_boxplot() + coord_cartesian(ylim = c(0,6)
The ylim(0,6) sets the y-axis to be between 0 and 6
If your data are in columns, you can get them into the longform using melt from reshape2 or gather from tidyr. (other methods also available).
You can do this if you reshape your data into long format
## Some sample data
dat <- data.frame(a=rnorm(100), b=rnorm(100), c=rnorm(100))
## Reshape data wide -> long
library(reshape2)
long <- melt(dat)
plot(value ~ variable, data=long)

Overlay multiple lines from data frame with index column onto existing plot

I have a dataframe with 3 columns, (Id, Lat, Long), you can construct a small section of this with the following data:
df <- data.frame(
Id=c(1,1,2,2,2,2,2,2,3,3,3,3,3,3),
Lat=c(58.12550, 58.17426, 58.46461, 58.45812, 58.45207, 58.44512, 58.43358, 58.42727, 57.77700, 57.76034, 57.73614, 57.72411, 57.70498, 57.68453),
Long=c(-5.098068, -5.314452, -4.914108, -4.899922, -4.887067, -4.873312, -4.852384, -4.840817, -5.666568, -5.648711, -5.617588, -5.594681, -5.557740, -5.509405))
The Id column is an index column. So all the rows with the same Id number have the coordinates for a single line. In my data frame this Id number varies from 1 through to 7696. So I have 7696 lines to plot.
Each Id number relates to an individual separate line of Lat and Long coordinates. What I want to do is overlay onto an existing plot all of these 7696 individual lines.
With the example data above this contains the Lat & Long coordinates for lines 1, 2, 3.
What is the best way to overlay all these lines onto an existing plot, I was thinking maybe some kind of loop?
Using ggplot2:
#dummy data
df <- data.frame(
Id=c(1,1,2,2,2,2,2,2,3,3,3,3,3,3),
Lat=c(58.12550, 58.17426, 58.46461, 58.45812, 58.45207, 58.44512, 58.43358, 58.42727, 57.77700, 57.76034, 57.73614, 57.72411, 57.70498, 57.68453),
Long=c(-5.098068, -5.314452, -4.914108, -4.899922, -4.887067, -4.873312, -4.852384, -4.840817, -5.666568, -5.648711, -5.617588, -5.594681, -5.557740, -5.509405))
library(ggplot2)
#plot
ggplot(data=df,aes(Lat,Long,colour=as.factor(Id))) +
geom_line()
Using base R:
#plot blank
with(df,plot(Lat,Long,type="n"))
#plot lines
for(i in unique(df$Id))
with(df[ df$Id==i,],lines(Lat,Long,col=i))
To be honest, I think that any approach to take is going to result in a very cluttered plot since you have so many Ids (unless their lines do not overlap much). Either way, I would probably use ggplot2 for this.
##
if( !("ggplot2" %in% installed.packages()[,1]) ){
install.packages("ggplot2",dependencies=TRUE)
}
library(ggplot2)
##
D <- data.frame(
Id=Id,
Lat=Lat,
Long=Long
)
##
ggplot(data=D,aes(x=Lat,y=Long,group=Id,color=Id))+
geom_point()+ ## you might want to omit geom_point() in your plot
geom_line()
##
The reason I used group=Id, color=Id in aes() rather than passing Id as a factor to aes() and just using color=Id is that you will end up with a legend containing 7000+ factor levels (the majority of which will not be visible in the plot area).

introducing a gap in continuous x axis using ggplot

This is kinda a build-on on my previous post creating an stacked area/bar plot with missing values (all the script I run can be found there). In this post, however, Im asking if its possible to leave a gap in an continuous x axis? I have a time-serie (month-by-month) over a year, but for one sample one month is missing and I would like to show this month as a complete gap in the plot. Almost like plotting a graph for Jan-Aug (Sep is missing) and one for Oct-Dec and merging these with a gap for Sep.
The only things I have come up trying are treating the missing month as zero or NA, creating a hugh drop in the area chart for Sep or excluding it but with an x axis ranging from 1-11, respectively (see plots in dropbox folder).
The data set Im working on can be found in my dropbox folder and it's named r_class.txt and you can also see the two different plots (Rplots1 and 2).
Any ideas would really be appreciated!
Plot the series as two separate data frames:
#Load libraries
require(ggplot2)
require(reshape)
#Code copied from your linked post:
wa=read.table('wa_class.txt', sep="", header=F, na.string="0")
names(wa)=c("Class","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
wam=melt(wa)
wam$variablen=as.numeric(wam$variable)
#For readability, split the melted data frame into two separate data frames
wam1 <- wam[wam$variablen %in% 1:6,]
wam2 <- wam[wam$variablen %in% 8:12, ]
ggplot() +
geom_area(data=wam1, aes(x=variablen, y=value, fill=Class)) +
geom_area(data=wam2, aes(x=variablen, y=value, fill=Class))
#and add lineranges, etc., accordingly

Resources