ggplot2: arranging multiple boxplots as a time series - r

I would like to create a multivariate boxplot time series with ggplot2 and I need to have an x axis that positions the boxplots based on their associated dates.
I found two posts about this question: one is Time series plot with groups using ggplot2 but the x axis is not a scale_x_axis so graph is biased in my case. The other one is ggplot2 : multiple factors boxplot with scale_x_date axis in R but the person uses an interaction function which i don't use in my case.
Here is an example file and my code:
dtm <- read.table(text="date ruche mortes trmt
03.10.2013 1 8 P+
04.10.2013 1 7 P+
07.10.2013 1 34 P+
03.10.2013 7 16 P+
04.10.2013 7 68 P+
07.10.2013 7 170 P+
03.10.2013 2 7 P-
04.10.2013 2 7 P-
07.10.2013 2 21 P-
03.10.2013 5 8 P-
04.10.2013 5 27 P-
07.10.2013 5 24 P-
03.10.2013 3 15 T
04.10.2013 3 6 T
07.10.2013 3 13 T
03.10.2013 4 6 T
04.10.2013 4 18 T
07.10.2013 4 19 T ", h=T)
require(ggplot2)
require(visreg)
require(MASS)
require(reshape2)
library(scales)
dtm$asDate = as.Date(dtm[,1], "%d.%m.%Y")
## Plot 1: Nearly what I want but is biased by the x-axis format where date should not be a factor##
p2<-ggplot(data = dtm, aes(x = factor(asDate), y = mortes))
p2 + geom_boxplot(aes(fill = factor(dtm$trmt)))
## Plot 2: Doesn't show me what I need, ggplot apparently needs a factor as x##
p<-ggplot(data = dtm, aes(x = asDate, y = mortes))
p + geom_boxplot(aes( group = asDate, fill=trmt) ) `
Can anyone help me with this issue, please?

Is this what you want?
Code:
p <- ggplot(data = dtm, aes(x = asDate, y = mortes, group=interaction(date, trmt)))
p + geom_boxplot(aes(fill = factor(dtm$trmt)))
The key is to group by interaction(date, trmt) so that you get all of the boxes, and not cast asDate to a factor, so that ggplot treats it as a date. If you want to add anything more to the x axis, be sure to do it with + scale_x_date().

Related

line graph with multiple variables on y axis stepwise

I need some help. Here is my data which i want to plot. I want to keep $path.ID on y axis and numerics of all other columns added stepwise. this is a subset of very large dataset so i want to pathID labels attached to each line. and also the values of the other columns with each point if possible.
head(table)
Path.ID sc st rc rt
<chr> <dbl> <dbl> <dbl> <dbl>
1 map00230 1 12 5 52
2 map00940 1 20 10 43
3 map01130 NA 15 8 34
4 map00983 NA 14 5 28
5 map00730 NA 5 3 26
6 map00982 NA 16 2 24
somewhat like this
Thank you
Here is the pseudo code.
library(tidyr)
library(dplyr)
library(ggplot2)
# convert your table into a long format - sorry I am more used to this type of data
table_long <- table %>% gather(x_axis, value, sc:rt)
# Plot with ggplot2
ggplot() +
# draw line
geom_line(data=table_long, aes(x=x_axis, y=value, group=Path.ID, color=Path.ID)) +
# draw label at the last x_axis in this case is **rt**
geom_label(data=table_long %>% filter(x_axis=="rt"),
aes(x=x_axis, y=value, label=Path.ID, fill=Path.ID),
color="#FFFFFF")
Note that with this code if a Path.ID doesn't have the rt value then it will not have any label
p<-ggplot() +
# draw line
geom_line(data=table_long, aes(x=x_axis, y=value, group=Path.ID, color=Path.ID)) +
geom_text(data=table_long %>% filter(x_axis=="rt"),
aes(x=x_axis, y=value, label=Path.ID),
color= "#050505", size = 3, check_overlap = TRUE)
p +labs(title= "title",x = "x-lable", y="y-label")
I had to use geom_text as i had large dataset and it gave me somewhat more clear graph
thank you #sinh it it helped a lot.

R - reshaped data from wide to long format, now want to use created timevar as factor

I am working with longitudinal data and assess the utilization of a policy over 13 months. In oder to get some barplots with the different months on my x-axis, I converted my data from wide Format to Long Format.
So now, my dataset looks like this
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
I thought, after reshaping I could easily use my newly created "month" variable as a factor and plot some graphs. However, it does not work out and tells me it's a list or an atomic vector. Transforming it into a factor did not work out - I would desperately Need it as a factor.
Does anybody know how to turn it into a factor?
Thank you very much for your help!
EDIT.
The OP's graph code was posted in a comment. Here it is.
library(ggplot2)
ggplot(data, aes(x = hours, y = month)) + geom_density() + labs(title = 'Distribution of hours')
# Loading ggplot2
library(ggplot2)
# Placing example in dataframe
data <- read.table(text = "
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
", header = TRUE)
# Converting month to factor
data$month <- factor(data$month, levels = 1:12, labels = 1:12)
# Plotting grouping by id
ggplot(data, aes(x = month, y = hours, group = id, color = factor(id))) + geom_line()
# Plotting hour density by month
ggplot(data, aes(hours, color = month)) + geom_density()
The problem seems to be in the aes. geom_density only needs a x value, if you think about it a little, y doesn't make sense. You want the density of the x values, so on the vertical axis the values will be the values of that density, not some other values present in the dataset.
First, read in the data.
Indirekte_long <- read.table(text = "
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
", header = TRUE)
Now graph it.
library(ggplot2)
g <- ggplot(Indirekte_long, aes(hours))
g + geom_density() + labs(title = 'Distribution of hours')

Reasons that ggplot2 legend does not appear [duplicate]

This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 2 years ago.
I was attempting (unsuccessfully) to show a legend in my R ggplot2 graph which involves multiple plots. My data frame df and code is as follows:
Individuals Mod.2 Mod.1 Mod.3
1 2 -0.013473145 0.010859793 -0.08914021
2 3 -0.011109863 0.009503278 -0.09049672
3 4 -0.006465788 0.011304668 -0.08869533
4 5 0.010536718 0.009110458 -0.09088954
5 6 0.015501212 0.005929766 -0.09407023
6 7 0.014565584 0.005530390 -0.09446961
7 8 -0.009712516 0.012234843 -0.08776516
8 9 -0.011282278 0.006569570 -0.09343043
9 10 -0.011330579 0.003505439 -0.09649456
str(df)
'data.frame': 9 obs. of 4 variables:
$ Individuals : num 2 3 4 5 6 7 8 9 10
$ Mod.2 : num -0.01347 -0.01111 -0.00647 0.01054 0.0155 ...
$ Mod.1 : num 0.01086 0.0095 0.0113 0.00911 0.00593 ...
$ Mod.3 : num -0.0891 -0.0905 -0.0887 -0.0909 -0.0941 ...
ggplot(df, aes(df$Individuals)) +
geom_point(aes(y=df[,2]), colour="red") + geom_line(aes(y=df[,2]), colour="red") +
geom_point(aes(y=df[,3]), colour="lightgreen") + geom_line(aes(y=df[,3]), colour="lightgreen") +
geom_point(aes(y=df[,4]), colour="darkgreen") + geom_line(aes(y=df[,4]), colour="darkgreen") +
labs(title = "Modules", x = "Number of individuals", y = "Mode")
I looked up the following stackflow threads, as well as Google searches:
Merging ggplot2 legend
ggplot2 legend not showing
`ggplot2` legend not showing label for added series
ggplot2 legend for geom_area/geom_ribbon not showing
ggplot and R: Two variables over time
ggplot legend not showing up in lift chart
Why ggplot2 legend not show in the graph
ggplot legend not showing up in lift chart.
This one was created 4 days ago
This made me realize that making legends appear is a recurring issue, despite the fact that legends usually appear automatically.
My first question is what are the causes of a legend to not appear when using ggplot? The second is how to solve these causes. One of the causes appears to be related to multiple plots and the use of aes(), but I suspect there are other reasons.
colour= XYZ should be inside the aes(),not outside:
geom_point(aes(data, colour=XYZ)) #------>legend
geom_point(aes(data),colour=XYZ) #------>no legend
Hope it helps, it took me a hell long way to figure out.
You are going about the setting of colour in completely the wrong way. You have set colour to a constant character value in multiple layers, rather than mapping it to the value of a variable in a single layer.
This is largely because your data is not "tidy" (see the following)
head(df)
x a b c
1 1 -0.71149883 2.0886033 0.3468103
2 2 -0.71122304 -2.0777620 -1.0694651
3 3 -0.27155800 0.7772972 0.6080115
4 4 -0.82038851 -1.9212633 -0.8742432
5 5 -0.71397683 1.5796136 -0.1019847
6 6 -0.02283531 -1.2957267 -0.7817367
Instead, you should reshape your data first:
df <- data.frame(x=1:10, a=rnorm(10), b=rnorm(10), c=rnorm(10))
mdf <- reshape2::melt(df, id.var = "x")
This produces a more suitable format:
head(mdf)
x variable value
1 1 a -0.71149883
2 2 a -0.71122304
3 3 a -0.27155800
4 4 a -0.82038851
5 5 a -0.71397683
6 6 a -0.02283531
This will make it much easier to use with ggplot2 in the intended way, where colour is mapped to the value of a variable:
ggplot(mdf, aes(x = x, y = value, colour = variable)) +
geom_point() +
geom_line()
ind = 1:10
my.df <- data.frame(ind, sample(-5:5,10,replace = T) ,
sample(-5:5,10,replace = T) , sample(-5:5,10,replace = T))
df <- data.frame(rep(ind,3) ,c(my.df[,2],my.df[,3],my.df[,4]),
c(rep("mod.1",10),rep("mod.2",10),rep("mod.3",10)))
colnames(df) <- c("ind","value","mod")
Your data frame should look something likes this
ind value mod
1 5 mod.1
2 -5 mod.1
3 3 mod.1
4 2 mod.1
5 -2 mod.1
6 5 mod.1
Then all you have to do is :
ggplot(df, aes(x = ind, y = value, shape = mod, color = mod)) +
geom_line() + geom_point()
I had a similar problem with the tittle, nevertheless, I found a way to show the title: you can add a layer using
ggtitle ("Name of the title that you want to show")
example:
ggplot(data=mtcars,
mapping = aes(x=hp,
fill = factor(vs)))+
geom_histogram(bins = 9,
position = 'identity',
alpha = 0.8, show.legend = T)+
labs(title = 'Horse power',
fill = 'Vs Motor',
x = 'HP',
y = 'conteo',
subtitle = 'A',
caption = 'B')+
ggtitle("Horse power")

Stacked Bar Chart in R using ggplot2

I am trying to create a Stacked Bar Graph using ggplot2. Here is how my data looks like
Date Full.Page.Ads Total.Pages
11/10/2015 3 24
12/10/2015 10 24
13/10/2015 15 24
This is the code which I am trying to use to visualize the data
library("ggplot2")
library("reshape2")
ns <- read.csv("NS.csv")
ggplot(data=ns, aes(x = Date, y = Total.Pages, fill=Full.Page.Ads)) + geom_bar(stat = "identity")
But for some reason rather than making a proper stacked bar chart, it keeps making something like this http://imgur.com/rqxEzjF.jpg. Rather than this I would like to show the main bars as Total.Pages and inside those bars, Full.Page.Ads bar in different colours.
I am not sure what I am doing wrong out here.
This is a sample of what I am aspiring to make http://blog.visual.ly/wp-content/uploads/2012/08/StackedPercent.png
You have to melt your data before plotting. You also need to compute the difference between Total.Pages and Full.Page.Ads if you want your graph to represent the proportion of Full.Page.Ads with respect to the total number of pages. Something like:
ns <- data.frame(Date = c("11/10/2015", "12/10/2015", "13/10/2015"),
Full.Page.Ads = c(3, 10, 15),
Total.Pages = c(24, 24, 24))
# Date Full.Page.Ads Total.Pages
# 1 11/10/2015 3 24
# 2 12/10/2015 10 24
# 3 13/10/2015 15 24
ns$not.Full.Page.Ads <- ns$Total.Pages - ns$Full.Page.Ads
ns$Total.Pages <- NULL
ns2 <-melt(ns)
# Date variable value
#1 11/10/2015 Full.Page.Ads 3
#2 12/10/2015 Full.Page.Ads 10
#3 13/10/2015 Full.Page.Ads 15
#4 11/10/2015 not.Full.Page.Ads 21
#5 12/10/2015 not.Full.Page.Ads 14
#6 13/10/2015 not.Full.Page.Ads 9
ggplot(data = ns2, aes(x = Date, y = value, fill = variable)) + geom_bar(stat = "identity")

Modifying y-axis with ggplot2

I'm trying to plot the number of observations for each instance of a word, both of which are stored in a data frame.
I can generate the plot with ggplot2, but the y-axis displays "1+e05", "2+e05",...,etc...instead of numerical values.
How can I modify this code so that the y-axis displays numbers instead?
Here is my code:
> w
p.word p.freq
1 the 294571
2 and 158624
3 you 84152
4 for 77117
5 that 71672
6 with 47987
7 this 42768
8 was 41088
9 have 39835
10 are 36458
11 but 33899
12 not 30370
13 all 27079
14 your 26923
15 just 25507
16 from 24497
17 out 22578
18 like 22501
19 what 22150
20 will 21530
21 they 21435
22 about 21184
23 one 20877
24 its 20109
ggplot(w, aes(x = p.word, y = p.freq))+ geom_bar(stat = "identity")
Here is the plot that is generated:
"1e+05" etc are numerical values (scientific notation).
If you want the long notation (e.g. "100,000") use library(scales) and the comma formatter:
library(scales)
ggplot(w, aes(x = p.word, y = p.freq))+ geom_bar(stat = "identity") +
scale_y_continuous(labels=comma)

Resources