ggplot2: plotting line behind boxplot

ggplot2: plotting line behind boxplot - r

I want to plot a line using geom_line behind my boxplot, I finally managed to combine line plotting with a boxplot. I have this dataset which I used to create a boxplot:
>head(MdataNa)
1 2 3 4 5 6 7
1 -0.02798634 -0.05740014 -0.02643664 0.02203644 0.02366325 -0.02868668 -0.01278713
2 0.20278229 0.19960302 0.10896017 0.24215229 0.31925211 0.29928739 0.15911725
3 0.06570653 0.08658396 -0.06019098 0.01437147 0.02078022 0.13814853 0.11369999
4 -0.42805441 -0.91945721 -1.05555731 -0.90877542 -0.77493682 -0.90620917 -1.00535742
5 0.39922939 0.12347996 0.06712451 0.07419287 -0.09517628 -0.12056720 -0.40863078
6 0.52821596 0.30827515 0.29733794 0.30555717 0.31636676 0.11592717 0.16957927
I have glucose concentration which should be plotted in a line behind this boxplot:
# glucose curve values
require("scales")
offconc <- c(0,0.4,0.8,1.8,3.5,6.9,7.3)
offtime <- c(9,11.4,12.9,14.9,16.7,18.3,20.5)
# now we have to scale them so they fit in the (boxplot)plot
time <- rescale(offtime, to=c(1,7))
conc <- rescale(offconc, to=c(-1,1))
glucoseConc <- data.frame(time,conc)
glucoseConc2 <- melt(glucoseConc, id = "time")
Then I plotted this data, but I was only able to plot the glucose curve in FRONT of the boxplot instead of behind it, I used this code:
boxNa <- ggplot(stack(MdataNa), aes(x = ind, y = values)) +
geom_boxplot() +
coord_cartesian(y = c(-1.5,1.5)) +
labs(list(title = "After Loess", x = "Timepoint", y = "M")) +
geom_line(data=glucoseConc2,aes(x=time,y=value),group=1)
output of the code above:
EDIT as suggested by the comments(NOT WORKING)
boxNa <- ggplot(stack(MdataNa), aes(x = ind, y = values)) +
geom_line(data=glucoseConc2,aes(x=time,y=value),group=1) +
geom_boxplot(data=stack(MdataNa), aes(x = ind, y = values)) +
coord_cartesian(y = c(-1.5,1.5)) +
labs(list(title = "After Loess", x = "Timepoint", y = "M"))
this will give the following error:
Error: Discrete value supplied to continuous scale
probably I'm doing something wrong then?

Here's a solution.
The idea is to convert the x axis in continous values:
ggplot() +
geom_line(data=glucoseConc2,aes(x=time,y=value),group=1)+
geom_boxplot(data=stack(MdataNA), aes(x = as.numeric(ind), y = values, group=ind)) +
coord_cartesian(y = c(-1.5,1.5)) +
labs(list(title = "After Loess", x = "Timepoint", y = "M"))+
scale_x_continuous(breaks=1:7)

Related

Create a graph from a binary column in a dataframe - R

I need to create a point graph using the "ggplot" library based on a binary column of a dataframe.
df <- c(1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1)
I need a point to be created every time the value "1" appears in the column, and all points are on the same graph. Thanks.

If the binary column you talk about is associated to some other variables, then I think this might work:
(I've just created some random x and y which are the same length as the binary 0, 1s you provided)
x <- rnorm(22)
y <- x^2 + rnorm(22, sd = 0.3)
df <- data.frame("x" = x, "y" = y,
"binary" = c(1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1))
library(ggplot2)
# this is the plot with all the points
ggplot(data = df, mapping = aes(x = x, y = y)) + geom_point()
# this is the plot with only the points for which the "binary" variable is 1
ggplot(data = subset(df, binary == 1), mapping = aes(x = x, y = y)) + geom_point()
# this is the plot with all points where they are coloured by whether "binary" is 0 or 1
ggplot(data = df, mapping = aes(x = x, y = y, colour = as.factor(binary))) + geom_point()

Something like this?
library(ggplot2)
y <- df
is.na(y) <- y == 0
ggplot(data = data.frame(x = seq_along(y), y), mapping = aes(x, y)) +
geom_point() +
scale_y_continuous(breaks = c(0, 1),
labels = c("0" = "0", "1" = "1"),
limits = c(0, 1))
It only plots points where df == 1, not the zeros. If you also want those, don't run the code line starting is.na(y).

Not sure exactly what you are asking, but here are a few options. Since your data structure is not a data frame, I've renamed it test. First, dotplot with ggplot:
library(ggplot2)
ggplot(as.data.frame(test), aes(x=test)) + geom_dotplot()
Or you could do the same thing as a bar:
qplot(test, geom="bar")
Or, a primitive base R quick look:
plot(test, pch=16, cex=3)

Scale the x-axes with quarterly date format

I created a plot in R using the ggplot library:
library(ggplot2)
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = variable), size = 1) +
scale_color_manual(values = c("#00AFBB", "#E7B800"))
I got the plot that I want but the only problem is that variable, yQ values have the format:
1990Q1
1900Q2
1990Q3
1990Q4
......
......
2017Q1
2017Q2
2017Q3
2017Q4
and because there are many years, the x-axis label cannot show all the dates clearly (they overlapped).
Therefore, I want the x-axis label to show only Q1 and Q3 for every 5 years.
So I want the x-axis to be something like this:
1990Q1 1990Q3 1995Q1 1995Q3 ...... 2015Q1 2015Q3
I tried to use scale_x_date but my dates are not in date format (e.g. 1990Q1) and therefore this does not work. How can I fix it?

The question does not provide reproducible input but using df from the Note below with the autoplot.zoo method of ggplot's autoplot generic we can write:
library(ggplot2)
library(zoo)
z <- read.zoo(df, index = "yQ", FUN = as.yearqtr)
autoplot(z) + scale_x_yearqtr()
Note
Test input--
df <- data.frame(yQ = c("1990Q1", "1990Q2", "1990Q3", "1990Q4"), value = 1:4)

The zoo::format.yearqtr() function is quite easy to use with ggplot2.
Try
scale_x_date(labels = function(x) zoo::format.yearqtr(x, "%YQ%q"))

Use function zoo::as.yearqtr (zoo package) to work with quarterly dates.
Generate example data:
year <- 1990:2000
quar <- paste0("Q", 1:4)
foo <- as.vector(outer(year, quar, paste0))
data <- data.frame(dateQ = foo, Y = rnorm(length(foo)))
head(data)
dateQ Y
1 1990Q1 -0.09944705
2 1991Q1 0.14493910
3 1992Q1 0.54856787
4 1993Q1 1.12966224
5 1994Q1 -0.93539302
6 1995Q1 0.24772265
Transform quarterly date to "normal" date:
data$dateNorm <- as.Date(zoo::as.yearqtr(data$dateQ))
head(data)
dateQ Y dateNorm
1 1990Q1 -0.09944705 1990-01-01
2 1991Q1 0.14493910 1991-01-01
3 1992Q1 0.54856787 1992-01-01
4 1993Q1 1.12966224 1993-01-01
5 1994Q1 -0.93539302 1994-01-01
6 1995Q1 0.24772265 1995-01-01
It sets Q1/2/3/4 as the first day of January/April/July/October.
data[grep("1991", data$dateQ), ]
dateQ Y dateNorm
2 1991Q1 0.1449391 1991-01-01
13 1991Q2 1.5878678 1991-04-01
24 1991Q3 -0.1071823 1991-07-01
35 1991Q4 2.2905729 1991-10-01
Now you can plot it or perform other calculations as it's in Date format.
library(ggplot2)
ggplot(data, aes(dateNorm, Y)) +
geom_line()

You can
manipulate x-axis breaks and labels with scale_x_discrete(breaks = ..., labels = ...)
change the angle of text with theme(axis.text.x = element_text(angle = ...))
I generated some data
Combs <- expand.grid(1990:2017, c("Q1", "Q2", "Q3", "Q4"))
df <- data.frame(
yQ = sort(apply(Combs, 1, paste, collapse="")),
value = runif(112)
)
In the first example, I subset yQ values you want with a logical vector - and change the angle of text
library(ggplot2)
pattern <- c(T, F, T, F, rep(F, 16))
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ[pattern], labels = df$yQ[pattern]) +
theme(axis.text.x = element_text(angle=90))
But notice that ticks marks not specified by break are not shown - so the alternative is to copy yQ values into a vector and make non-relevant years = ""
xVec <- as.character(df$yQ)
xVec[pattern==F] <- ""
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ, labels = xVec) +
theme(axis.text.x = element_text(angle=90))

Having trouble plotting multiple data sets and their confidence intervals on the same GGplot. Data Frame included

First off, here is my data frame:
> df.combined
MLSupr MLSpred MLSlwr BPLupr BPLpred BPLlwr
1 1.681572 1.392213 1.102854 1.046068 0.8326201 0.6191719
2 3.363144 2.784426 2.205708 2.112885 1.6988250 1.2847654
3 5.146645 4.232796 3.318946 3.201504 2.5999694 1.9984346
4 6.930146 5.681165 4.432184 4.368555 3.6146180 2.8606811
5 8.713648 7.129535 5.545422 5.480557 4.5521112 3.6236659
6 10.497149 8.577904 6.658660 6.592558 5.4896044 4.3866506
7 12.280651 10.026274 7.771898 7.681178 6.3907488 5.1003198
8 14.064152 11.474644 8.885136 8.924067 7.4889026 6.0537381
9 15.847653 12.923013 9.998373 10.125539 8.5444783 6.9634176
10 17.740388 14.429805 11.119222 11.327011 9.6000541 7.8730970
11 19.633122 15.936596 12.240071 12.620001 10.7425033 8.8650055
12 21.525857 17.443388 13.360919 13.821473 11.7980790 9.7746850
13 23.535127 19.010958 14.486789 15.064362 12.8962328 10.7281032
14 25.544397 20.578528 15.612659 16.307252 13.9943865 11.6815215
15 27.553667 22.146098 16.738529 17.600241 15.1368357 12.6734300
16 29.562937 23.713668 17.864399 18.893231 16.2792849 13.6653384
17 31.572207 25.281238 18.990268 20.245938 17.4678163 14.6896948
18 33.581477 26.848807 20.116138 21.538928 18.6102655 15.6816033
19 35.590747 28.416377 21.242008 22.891634 19.7987969 16.7059597
20 37.723961 30.047177 22.370394 24.313671 21.0352693 17.7568676
So, as you can see, i have predicted values along with the upper and lower bounds of their 95% CI. I'd like to plot the lines and their ribbons for MLS and BPL in the same plot but i'm not quite sure how.
Right now, for a single data set, I am using this command:
ggplot(BULISeason, aes(x = 1:length(BULISeason$`Running fit`), y = `Running fit`)) +
geom_line(aes(fill = "black")) +
geom_ribbon(aes(ymin = `Running lwr`, ymax = `Running upr`, fill = "red"),alpha = 0.25)
Note: The variables are different for the independent data frames.

You can, of course, construct your plots as a series of layers like you imply in your question. For that you can use the following code:
ggplot(data = df.combined) +
geom_ribbon(aes(x = x, ymin = MLSlwr, ymax = MLSupr),
fill = "blue", alpha = 0.25) +
geom_line(aes(x = x, y = MLSpred), color = "black") +
geom_ribbon(aes(x = x, ymin = BPLlwr, ymax = BPLupr),
fill = "red", alpha = 0.25) +
geom_line(aes(x = x, y = BPLpred), color = "black")
and obtain something like this:
However, reshaphing your dataset to a "tidy", or long format, has some advantages. For example you could map the origin of the predictions into a color and the type of prediction into line types in the resulting plot:
You can achieve that using the following code:
library(tidyr)
tidy.data <- df.combined %>%
# add id variable
mutate(x = 1:20) %>%
# reshape to long format
gather("variable", "value", 1:6) %>%
# separate variable names at position 3
separate(variable,
into = c("model", "line"),
sep = 3,
remove = TRUE)
# plot
ggplot(data = tidy.data, aes(x = x,
y = value,
linetype = line,
color = model)) +
geom_line() +
scale_linetype_manual(values = c("dashed", "solid", "dashed"))
You can still use ribbons in your plot by spreading your dataframe back to a wide(r) format:
# back to wide
wide.data <- tidy.data %>%
spread(line, value)
# plot with ribbon
ggplot(data = wide.data, aes(x = x, y = pred)) +
geom_ribbon(aes(ymin = lwr, ymax = upr, fill = model), alpha = .5) +
geom_line(aes(group = model))
Hope this helps!

facet_grid() causing crash

I can not figure out what I'm missing. I keep crashing r or causing it to give very weird plots.
> head(vData)
vix.Close vstoxx vxfxi.Close Date
2011-03-16 29.40 35.2293 35.84 2011-03-16
2011-03-17 26.37 30.6133 31.77 2011-03-17
2011-03-18 24.44 28.5337 29.31 2011-03-18
2011-03-21 20.61 25.2355 25.95 2011-03-21
2011-03-22 20.21 24.3914 24.52 2011-03-22
2011-03-23 19.17 23.9226 24.03 2011-03-23
The below works:
p1.1<-ggplot(data = vData, aes(x = Date, y = vix.Close)) + geom_line(col= "red")
p1.1
p2<-p1.1 + geom_line(data = vData[!is.na(vData$vstoxx),], aes(x = Date, y = vstoxx), col="blue")
p2
p3<-p2 + geom_line(data = vData[!is.na(vData$vxfxi.Close),], aes(x = Date, y = vxfxi.Close), col="green")
p3
p4<-p3 + labs(title = "Volatility Indexes", x = "Time", y = "Index")
p4
But this is the part that is giving me trouble:
p5<- p4 + facet_grid(Date~., scales = Date)
p5

I echo what baptiste said: what is it you're trying to do? The code you've provided suggests that you're trying to create a separate line chart for each date in the dataset, which doesn't make much sense. For this demonstration, I'll show you how to facet the data by year to see the correlations between the different measurements of volatility over time. If you provide more detail as a comment, I'll revisit the code.
First let's take a look at what you've already done.
library(tidyverse)
library(gridExtra)
library(lubridate)
library(reshape2)
#Generate dummy data
vData <- tibble(
vix.Close = rnorm(1000, mean = 12, sd = 5),
vstoxx = rnorm(1000, mean = 12, sd = 5),
vxfxi.Close = rnorm(1000, mean = 12, sd = 5),
Date = as.Date(1:1000, origin = '2011-01-01')
)
# Generate individual plots per your question
p1.1 <-
ggplot(data = vData, aes(x = Date, y = vix.Close)) + geom_line(col = "red")
p1.1
p2 <-
p1.1 + geom_line(data = vData[!is.na(vData$vstoxx), ], aes(x = Date, y = vstoxx), col =
"blue")
p2
p3 <-
p2 + geom_line(data = vData[!is.na(vData$vxfxi.Close), ], aes(x = Date, y = vxfxi.Close), col =
"green")
p3
p4 <-
p3 + labs(title = "Volatility Indexes", x = "Time", y = "Index")
p4
You're creating four different plots and then layering them on top of each other. This approach works here, but it's cumbersome to make changes to each of the calls to ggplot or if you want to add/remove variables. Let's move your data to a "long" format and simplify the ggplot call.
# Melt the data into three columns and remove NAs
vData <- melt(vData, id = "Date") %>%
filter(!is.na(value)) %>%
tbl_df()
# Create one ggplot for all three indexes
ggplot(data = vData, aes(x = Date, y = value, color = variable)) +
geom_line() +
labs(title = "Volatility Indexes", x = "Time", y = "Index")
Now back to the big problem: you shouldn't be faceting by date because that would give you a huge number of tiny unreadable line charts. There are a number of other facets that might make sense. For example, you could look at the distribution of the three indexes by year.
ggplot(data = vData, aes(x = variable, y = value, color = variable)) +
geom_boxplot() +
labs(title = "Volatility Indexes", x = "", y = "") +
facet_grid(year(Date) ~ .)
So put some thought into what exactly you want to show.

ggplot geom_tile overlay plot with points

severity <- c("Major","Serious","Minor","Negligible")
probability <- c("Highly Probable","Probable","Possible","Remote","Unlikely","Impossible")
df <- expand.grid(x=severity,y=probability)
df$x <- factor(df$x, levels=rev(unique(df$x)))
df$y <- factor(df$y, levels=rev(unique(df$y)))
df$color <- c(1,1,2,2,1,2,2,2,2,2,2,3,2,2,3,3,2,3,3,3,3,3,3,3)
ggplot(df,aes(x,y,fill=factor(color)))+
geom_tile(color="black")+
scale_fill_manual(guide="none",values=c("red","yellow","green"))+
scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))+
labs(x="",y="")
Produces a risk assesssment score card chart. I want to add points by using a csv file by adding a record. Each record has 3 fields, a item name, x, and y coordinate. x= severity and y = probability.
da <- data.frame(list(name=c("ENVIRONMENTAL","COSTS","SUPPLY","HEALTH"),
severity=c("Major","Serious","Minor","Serious"),
probability=c("Probable","Possible","Probable","Unlikely")))
da
name severity probability
1 ENVIRONMENTAL Major Probable
2 COSTS Serious Possible
3 SUPPLY Minor Probable
4 HEALTH Serious Unlikely
> p1 <- p + data.frame(da, aes(severity, probability)) + geom_point()
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class ""uneval"" to a data.frame
>
> d <- data.frame(list(name=c("ENVIRONMENTAL","COSTS","SUPPLY","HEALTH"),
severity=c(2,3,4,1),probability=c(3,5,4,6)))
> d
name severity probability
1 ENVIRONMENTAL 2 3
2 COSTS 3 5
3 SUPPLY 4 4
4 HEALTH 1 6
> ggplot(d,x=severity, y=probability)+ geom_point()
Error in exists(name, envir = env, mode = mode) :
argument "env" is missing, with no default
How can I add points to the ggplot / geom_tile graph?

You can't add a data.frame to a plot (not like that, at least...). What you can do is add a new layer, geom_point(), and specify the data.frame it comes from. To make things work, you should have the columns from any aesthetics you still want to use (here, x and y) have the same names in both data.frames.
# It's better practice to modify your data
# then to convert to factor within the plot
df$color <- factor(c(1,1,2,2,1,2,2,2,2,2,2,3,2,2,3,3,2,3,3,3,3,3,3,3))
# get some meaningful names, that match da and d
names(df)[1:2] <- c("severity", "probability")
p <- ggplot(df, aes(x = severity, y = probability)) +
# moved fill to the geom_tile layer, because it's only used there
geom_tile(color = "black", aes(fill = color)) +
scale_fill_manual(guide = "none", values = c("red", "yellow", "green")) +
scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
labs(x = "", y = "")
# alsonoticehowaddingspacesmakesiteasiertoread
# Using the same column names? Yup! Now it's this easy:
p + geom_point(data = da) +
geom_point(data = d, color = "dodgerblue4")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ggplot2: plotting line behind boxplot - r

Related

Create a graph from a binary column in a dataframe - R

Scale the x-axes with quarterly date format

Having trouble plotting multiple data sets and their confidence intervals on the same GGplot. Data Frame included

facet_grid() causing crash

ggplot geom_tile overlay plot with points

Categories

Resources