Multiple time series in one plot - r

I have a time series of several years that I need to plot in one graph. The largest series has a mean of 340 and a minimum of 245 and maximum of 900. The smallest series has a mean of 7 with a minimum of -28 and maximum of 31. The remaining series has values in the range of 6 to 700. The series follows a regular annual and seasonal pattern over years until suddenly there was an upsurge of temperature for a month which was followed by much increased deaths than usual.
I cannot provide any real data, but I have simulated the following data and tried the code below which was based on an example code found here http://www.r-bloggers.com/multiple-y-axis-in-a-r-plot/. But the plot has not produced what I have desired. I have the following questions
In the plot it is difficult to clearly depict any of the series and important facts are hidden in the detail. How can I better present this data?
The Y axes have different lengths. How could I have axes with the same length? I appreciate any idea and suggestion on how to improve this code and present a better plot. The data I have simulated does not reflect my data as I am unable to simulate the extreme values that mirror the period of extreme weather episode.
Many thanks
temp<- rnorm(365, 5, 10)
mort<- rnorm(365, 300, 45)
poll<- rpois(365, lambda=76)
date<-seq(as.Date('2011-01-01'),as.Date('2011-12-31'),by = 1)
df<-data.frame(date,mort,poll,temp)
windows(600,600)
par(mar=c(5, 12, 4, 4) + 0.1)
with(df, {
plot(date, mort, axes=F, ylim=c(170,max(mort)), xlab="", ylab="",type="l",col="black", main="")
points(date,mort,pch=20,col="black")
axis(2, ylim=c(170,max(mort)),col="black",lwd=2)
mtext(2,text="Mortality",line=2)
})
par(new=T)
plot(date, poll, axes=F, ylim=c(45,max(poll)), xlab="", ylab="",
type="l",col="red",lty=2, main="",lwd=1)
axis(2, ylim=c(45,max(poll)),lwd=1,line=3.5)
points(date, poll,pch=20)
mtext(2,text="PM10",line=5.5)
par(new=T)
plot(date, temp, axes=F, ylim=c(-28,max(temp)), xlab="", ylab="",
type="l",lty=3,col="brown", main="",lwd=1)
axis(2, ylim=c(-28,max(temp)),lwd=1,line=7)
points(date, temp,pch=20)
mtext(2,text="Temperature",line=9)
axis(1,pretty(range(date),10))
mtext("date",side=1,col="black",line=2)

Here are 6 approaches:
library(zoo)
z <- read.zoo(df)
# classic graphics in separate and single plots
plot(z)
plot(z, screen = 1)
# lattice graphics in separate and single plots
library(lattice)
xyplot(z)
xyplot(z, screen = 1)
# ggplot2 graphics in separate and single plots
library(ggplot2)
autoplot(z) + facet_free()
autoplot(z, facet = NULL)

I had the same task in hand and after some research I came across ts.plot {stats} function in r which was very helpful.
The usage of the function is as follows :
ts.plot(..., gpars = list())
gpars is the graphic parameters where you can specify the graphic components of the plot.
I had a data similar to this and stored in a variable called time:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
V3 1951 1100 433 5638 1760 2385 2602 11007 2490 421
V5 433 880 216 4988 220 8241 13229 18704 6289 421
V7 4001 440 433 3686 880 9976 12795 21036 13229 1263
V9 2385 1320 650 8241 440 12795 13229 19518 11711 1474
V11 4771 880 1084 6723 0 17783 17566 27326 11060 210
V13 6940 880 2168 2602 1320 21036 16265 10843 15831 1474
V15 3903 1760 1951 3470 0 18217 14964 0 13663 2465
V17 4771 440 2819 8458 880 25591 24940 1518 17783 1895
V19 7807 1760 5205 2385 0 14096 22771 13880 12578 1263
V21 5205 880 5205 6506 880 28410 18217 13229 19952 1474
V23 6506 1760 5638 7590 880 14747 26675 11928 12795 1474
V25 7373 440 5855 10626 0 19301 21470 15398 19952 1895
V27 5638 2640 6289 0 880 16482 20603 30796 14313 2316
V29 8241 440 6506 6723 880 11277 35784 25157 23205 4423
V31 7373 2640 6072 8891 220 17133 27109 31013 27287 4001
V33 6723 660 5855 14313 660 6940 26892 17566 24111 4844
V35 9325 2420 9325 12578 0 6506 30796 34483 23422 5476
V37 4771 440 6872 12361 880 9325 36218 25808 30362 4844
V39 9976 2640 7658 12361 440 11277 36001 31013 40555 4633
V41 10410 880 6506 12795 440 26241 33398 27976 24940 5686
V43 5638 2200 7590 14313 0 9976 34483 29928 33832 6108
V45 10843 440 8675 11711 440 7807 29278 24940 43375 4633
V47 8675 1760 8891 13663 0 9108 38386 31230 33398 4633
V49 10410 1760 9542 13880 440 8675 39051 31446 42507 5476
. . . . . . . . .
And I had to get a Time series plot for each column on the same plot.
The code is as follows:
ts.plot(time,gpars= list(col=rainbow(10)))

I'd use separate plots for each variable, making their y-axis different. I like this better than introducing multiple y-axes in one plot. I will use ggplot2 to do this, and more specifically the concept of facetting:
library(ggplot2)
library(reshape2)
df_melt = melt(df, id.vars = 'date')
ggplot(df_melt, aes(x = date, y = value)) +
geom_line() +
facet_wrap(~ variable, scales = 'free_y', ncol = 1)
Notice that I stack the facets on top of each other. This will enable you to easily compare the timing of events in each of the series. Alternatively, you could put them next to each other (using nrow = 1 in facet_wrap), this will enable you to easily compare the y-values.
We can also introduce some extremes:
df = within(df, {
temp[61:90] = temp[61:90] + runif(30, 30, 50)
mort[61:90] = mort[61:90] + runif(30, 300, 500)
})
df_melt = melt(df, id.vars = 'date')
ggplot(df_melt, aes(x = date, y = value)) +
geom_line() +
facet_wrap(~ variable, scales = 'free_y', ncol = 1)
Here you can see easily that the increase in temp is correlated with the increase in mortality.

Related

Barplot using ggplot2 for 4 variables

I have data like this:
ID height S1 S2 S3
1 927 0.90695438 0.28872194 0.67114294
2 777 0.20981677 0.71783084 0.74498220
3 1659 0.35813799 0.92339744 0.44001698
4 174 0.44829914 0.67493949 0.11503942
5 1408 0.90642643 0.18593999 0.67564278
6 1454 0.38943930 0.34806716 0.73155952
7 2438 0.51745975 0.12351953 0.48398490
8 1114 0.12523909 0.10811622 0.17104804
9 1642 0.03014575 0.29795320 0.67584853
10 515 0.77180549 0.83819990 0.26298995
11 1877 0.32741508 0.99277109 0.34148083
12 2647 0.38947869 0.43713441 0.21024554
13 845 0.04105275 0.20256457 0.01631959
14 1198 0.36139663 0.96387150 0.37676288
15 2289 0.57097808 0.66038711 0.56230740
16 2009 0.68488024 0.29811683 0.67998461
17 618 0.97111675 0.11926219 0.74538877
18 1076 0.70195881 0.59975160 0.95007272
19 1082 0.01154550 0.12019055 0.16309071
20 2072 0.53553213 0.78843202 0.32475690
21 1610 0.83657146 0.36959607 0.13271604
22 2134 0.80686674 0.95632284 0.63729744
23 1617 0.08093264 0.91357666 0.33092961
24 2248 0.23890930 0.82333634 0.64907957
25 1263 0.96598986 0.31948216 0.30288836
26 518 0.03767233 0.87770033 0.07123327
27 2312 0.91640643 0.80035100 0.66239047
28 2646 0.72622658 0.61135664 0.75960356
29 1650 0.20077621 0.07242114 0.55336017
30 837 0.84020075 0.42158771 0.53927210
31 1467 0.39666235 0.34446560 0.84959232
32 2786 0.39270226 0.75173569 0.65322596
33 1049 0.47255689 0.21875132 0.95088576
34 2863 0.58365691 0.29213397 0.61722305
35 2087 0.35238717 0.35595337 0.49284063
36 2669 0.02847401 0.63196192 0.97600657
37 545 0.99508793 0.89253107 0.49034522
38 1890 0.95755846 0.74403278 0.65517230
39 2969 0.55165118 0.45722242 0.59880179
40 395 0.10195396 0.03609544 0.94756902
41 995 0.23791515 0.56851452 0.36801151
42 2596 0.86009766 0.43901589 0.87818701
43 2334 0.73826129 0.60048445 0.45487507
44 2483 0.49731226 0.95138276 0.49646702
45 1812 0.57992109 0.26943131 0.46061562
46 1476 0.01618339 0.65883839 0.61790820
47 2342 0.47212988 0.07647121 0.60414349
48 2653 0.04238973 0.07128521 0.78587960
49 627 0.46315442 0.37033152 0.55526847
50 925 0.62999477 0.29710220 0.76897834
51 995 0.67324929 0.55107827 0.40428567
52 600 0.08703467 0.36989059 0.51071981
53 711 0.14358380 0.84568953 0.52353644
54 828 0.90847850 0.62079070 0.99279921
55 1776 0.12253259 0.39914002 0.42964742
56 764 0.72886279 0.29966153 0.99601125
57 375 0.95037718 0.38111984 0.78660025
58 694 0.04335591 0.70113494 0.51591063
59 1795 0.01959930 0.94686529 0.50268797
60 638 0.19907246 0.77282832 0.91163748
61 1394 0.50508626 0.21955016 0.26441590
62 1943 0.92638876 0.71611036 0.17385687
63 2882 0.13840169 0.66421796 0.40033126
64 2031 0.16919458 0.70625020 0.53835738
65 1338 0.60662738 0.27962799 0.24496437
66 1077 0.81587669 0.71225050 0.37585096
67 1370 0.84338121 0.66094211 0.58025355
68 1339 0.78807719 0.04101269 0.20895531
69 739 0.01902087 0.06114149 0.80133001
70 2085 0.69808750 0.27976169 0.63880242
71 1240 0.81509312 0.30196772 0.73633076
72 987 0.56840006 0.95661083 0.43881241
73 1720 0.48006288 0.38981872 0.57981238
74 2901 0.16137012 0.37178879 0.25604401
75 1987 0.08925623 0.84314249 0.46371823
76 1876 0.16268237 0.84723500 0.16861486
77 2571 0.02672845 0.31933115 0.61389453
78 2325 0.70962948 0.13250605 0.95810262
79 2503 0.76101818 0.61710912 0.47819473
80 279 0.85747478 0.79130451 0.75115933
81 1381 0.43726582 0.33804871 0.02058322
82 1800 0.41713645 0.90544760 0.17096903
83 2760 0.58564949 0.19755671 0.63996650
84 2949 0.82496758 0.79408518 0.16497848
85 118 0.79313923 0.75460289 0.35472278
86 1736 0.32615257 0.91139485 0.18642647
87 2201 0.95793194 0.32268770 0.89765616
88 750 0.65301961 0.08616947 0.23778386
89 906 0.45867582 0.91120045 0.98494348
90 2202 0.60602188 0.95517383 0.02133074
I want to make a barplot using ggplot2 like this:
In the above-mentioned dataset height should be on the y-axis and S1, S2, S3 should be representing colors of each sample.
I have tried the base R function barplot which gave me the following output. Please give me any suggestion.
barplot(t(as.matrix(examp[,3:5])),col=rainbow(3))
It's not clear to me exactly what you want to plot. You say you want height on the y axis, but the examples you show are all 'filled to the top', implying the same height for each ID. Also, it is not clear what the numbers associated with each sample represent. I am guessing they should be relative weightings for the bar heights.
Assuming you actually want a filled bar plot as in the examples, with the relative sizes of the bars dictated by the sample values, you can do:
library(tidyr)
library(dplyr)
library(ggplot2)
df %>%
mutate(ID = reorder(ID, S3/(S3 + S2 + S1))) %>%
pivot_longer(3:5, names_to = "Sample", values_to = "Value") %>%
ggplot(aes(ID, Value * height, fill = Sample)) +
geom_col(position = "fill", color = NA) +
labs(y = "Height") +
theme_classic() +
scale_fill_manual(values = c("red", "green", "blue"))
Alternative
df %>%
arrange(order(height)) %>%
group_by(height) %>%
summarize(across(everything(), mean)) %>%
pivot_longer(3:5, names_to = "Sample", values_to = "Value") %>%
ggplot(aes(height, Value, fill = Sample, colour = Sample)) +
geom_smooth(method = loess, formula = y ~ x, linetype = 2, alpha = 0.2) +
theme_bw()

Change axises' scale in a plot without creating new varibale

I have a dataset like below (this is only the first 20 rows and the first 3 columns of data):
row fitted measured
1 1866 1950
2 2489 2500
3 1486 1530
4 1682 1720
5 1393 1402
6 2524 2645
7 2676 2789
8 3200 3400
9 1455 1456
10 1685 1765
11 2587 2597
12 3040 3050
13 2767 2769
14 3300 3310
15 4001 4050
16 1918 2001
17 2889 2907
18 2063 2150
19 1591 1640
20 3578 3601
I plotted this data
plot(data$measured~data$fitted, ylab = expression("Measured Length (" * mu ~ "m)"),
xlab = expression("NIR Fitted Length (" * mu ~ "m)"), cex.lab=1.5, cex.axis=1.5)
and got the following:
As you can see the axises scales are in micrometer, I need the axis to be in millimeter.
How can I plot the data while axises are in millimeter, WITHOUT creating a new variable?
Like this;
If I want to create a new variable, I have to change the whole 2000 lines code that I've written before and that's not a road that I want to go! :|
Thanks much :)
I used #bdemarest method for plot and #IukeA method for abline ;
plot(y=data$measured/1000,x=data$fitted/1000, ylab = expression("Measured Length (mm)"),
xlab = expression("NIR Fitted Length (mm)"), cex.lab=1.5, cex.axis=1.5)
a = lm(I(data$measured/1000)~I(data$fitted/1000), data=data)
abline(a)
Here is the final plot;

Customizing x-axis labels on plot

I plotted my data and also suppressed the auto x-axis labeling successfully.
Now I'm using the following command to customize my x=axis labels:
axis(
1,
at = min(LoopVariable[ ,"TW"]) - 1 : max(LoopVariable[ ,"TW"]) + 1,
labels = min(LoopVariable[ ,"TW"]) - 1 : max(LoopVariable[ ,"TW"]) + 1,
las = 2
)
And I'm getting:
This is correct in the sense that I'm having 28 data points, but when I do:
LoopVariable[ ,"TW"]
Then I get:
[1] 2801 2808 2813 2825 2833 2835 2839 2840 2844 2856 2858 2863 2865 2868 2870 2871 2873 2879 2881 2903 2904 2914 2918 2947 2970 2974 2977 2986
These are the the values I want as x-axis labels rather than 1:28. There is obviously a little bit missing in my line I seem not to figure out.

Binning a dataframe with equal frequency of samples

I have binned my data using the cut function
breaks<-seq(0, 250, by=5)
data<-split(df2, cut(df2$val, breaks))
My split dataframe looks like
... ...
$`(15,20]`
val ks_Result c
15 60 237
18 70 247
... ...
$`(20,25]`
val ks_Result c
21 20 317
24 10 140
... ...
My bins looks like
> table(data)
data
(0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35]
0 0 0 7 128 2748 2307
(35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70]
1404 11472 1064 536 7389 1008 1714
(70,75] (75,80] (80,85] (85,90] (90,95] (95,100] (100,105]
2047 700 329 1107 399 376 323
(105,110] (110,115] (115,120] (120,125] (125,130] (130,135] (135,140]
314 79 1008 77 474 158 381
(140,145] (145,150] (150,155] (155,160] (160,165] (165,170] (170,175]
89 660 15 1090 109 824 247
(175,180] (180,185] (185,190] (190,195] (195,200] (200,205] (205,210]
1226 139 531 174 1041 107 257
(210,215] (215,220] (220,225] (225,230] (230,235] (235,240] (240,245]
72 671 98 212 70 95 25
(245,250]
494
When I mean the bins, I get on an average of ~900 samples
> mean(table(data))
[1] 915.9
I want to tell R to make irregular bins in such a way that each bin will contain on an average 900 samples (e.g. (0, 27] = 900, (27,28.5] = 900, and so on). I found something similar here, which deals with only one variable, not the whole dataframe.
I also tried Hmisc package, unfortunately the bins don't contain equal frequency!!
library(Hmisc)
data<-split(df2, cut2(df2$val, g=30, oneval=TRUE))
data<-split(df2, cut2(df2$val, m=1000, oneval=TRUE))
Assuming you want 50 equal sized buckets (based on your seq) statement, you can use something like:
df <- data.frame(var=runif(500, 0, 100)) # make data
cut.vec <- cut(
df$var,
breaks=quantile(df$var, 0:50/50), # breaks along 1/50 quantiles
include.lowest=T
)
df.split <- split(df, cut.vec)
Hmisc::cut2 has this option built in as well.
Can be done by the function provided here by Joris Meys
EqualFreq2 <- function(x,n){
nx <- length(x)
nrepl <- floor(nx/n)
nplus <- sample(1:n,nx - nrepl*n)
nrep <- rep(nrepl,n)
nrep[nplus] <- nrepl+1
x[order(x)] <- rep(seq.int(n),nrep)
x
}
data<-split(df2, EqualFreq2(df2$val, 25))

How to plot overlaying time series in R?

I would like to compare the values of two different variables in time.
For example, having two datasets:
dataset1(Date, value)
and
dataset2(Date, value)
In order to plot just first, we can execute the following:
x.Date <- as.Date(dataset1$Date)
x <- zoo(dataset1$Value, x.Date)
plot(x)
To the same window I would like to add (dataset2$value, dataset2$Date), and by chance set the different color.
the values dataset1$Date and dataset2$Date are not neccessary the same (some days might overlap and some not), for example dataset1$Date might contain (dec01, dec02, dec03, dec05) and dataset2$Date (dec02, dec03, dec06).
Does anyone know how to plot two (or several) time plots in the same window?
There are several options. Here are three options working with zoo objects.
set.seed(1)
xz = zoo(ts(rnorm(20), frequency = 4, start = c(1959, 2)))
yz = zoo(ts(rnorm(20), frequency = 4, start = c(1959, 2)))
# Basic approach
plot(xz)
lines(yz, col = "red")
# Panels
plot.zoo(cbind(xz, yz))
# Overplotted
plot.zoo(cbind(xz, yz),
plot.type = "single",
col = c("red", "blue"))
If you are plotting regular ts objects, you can also explore ts.plot:
set.seed(1)
x = ts(rnorm(20), frequency = 4, start = c(1959, 2))
y = ts(rnorm(20), frequency = 4, start = c(1959, 2))
ts.plot(x, y, gpars = list(col = c("black", "red")))
I had the same task in hand and after some research I came across ts.plot {stats} function in r which was very helpful.
The usage of the function is as follows :
ts.plot(..., gpars = list())
gpars is the graphic parameters where you can specify the graphic components of the plot.
I had a data similar to this and stored in a variable called time:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
V3 1951 1100 433 5638 1760 2385 2602 11007 2490 421
V5 433 880 216 4988 220 8241 13229 18704 6289 421
V7 4001 440 433 3686 880 9976 12795 21036 13229 1263
V9 2385 1320 650 8241 440 12795 13229 19518 11711 1474
V11 4771 880 1084 6723 0 17783 17566 27326 11060 210
V13 6940 880 2168 2602 1320 21036 16265 10843 15831 1474
V15 3903 1760 1951 3470 0 18217 14964 0 13663 2465
V17 4771 440 2819 8458 880 25591 24940 1518 17783 1895
V19 7807 1760 5205 2385 0 14096 22771 13880 12578 1263
V21 5205 880 5205 6506 880 28410 18217 13229 19952 1474
V23 6506 1760 5638 7590 880 14747 26675 11928 12795 1474
V25 7373 440 5855 10626 0 19301 21470 15398 19952 1895
V27 5638 2640 6289 0 880 16482 20603 30796 14313 2316
V29 8241 440 6506 6723 880 11277 35784 25157 23205 4423
V31 7373 2640 6072 8891 220 17133 27109 31013 27287 4001
V33 6723 660 5855 14313 660 6940 26892 17566 24111 4844
V35 9325 2420 9325 12578 0 6506 30796 34483 23422 5476
V37 4771 440 6872 12361 880 9325 36218 25808 30362 4844
V39 9976 2640 7658 12361 440 11277 36001 31013 40555 4633
V41 10410 880 6506 12795 440 26241 33398 27976 24940 5686
V43 5638 2200 7590 14313 0 9976 34483 29928 33832 6108
V45 10843 440 8675 11711 440 7807 29278 24940 43375 4633
V47 8675 1760 8891 13663 0 9108 38386 31230 33398 4633
V49 10410 1760 9542 13880 440 8675 39051 31446 42507 5476
. . . . . . . . .
And I had to get a Time series plot for each column on the same plot.
The code is as follows:
ts.plot(time,gpars= list(col=rainbow(10)))
The result of the graph looks like this.
What about using ggplot2?
library(ggplot2)
dta <- data.frame(year = 2012,
month = rep(seq(1,9),each=10),
day = sample(seq(1,20),90,replace=T),
Group = sample(c('A','B'),90,replace=T),
Value = seq(1,90)
)
dta$Date <- apply(dta[,c('year','month','day')],1,paste,sep='',collapse='-')
dta$Date <- as.Date(dta$Date)
qplot(Date,Value,data=dta,geom='line',color=Group)
in the case you describe, I would assemble dta in the following way
dataset1$Group <- 'dataset1'
dataset2$Group <- 'dataset2'
dta <- rbind(dataset1,dataset2)
dta$Date <- as.Date(dta$Date)
The following worked for me. Say, you have two time-series vectors x1 and x2 and you want to overlay them on a single plot.
library("tseries")
comb_ts <- cbind(x1, x2) # please make sure the length of both your timeseries
plot.ts(comb_ts, plot.type = "single")
And I get the following plot -
Hope this helps!
If you are working with xts object(like stock data) you can use higcharter package.
library(quantmod)
sbi <- getSymbols("SBIN.BO", src="yahoo",auto.assign = F)
rel <- getSymbols("RELIANCE.NS", src="yahoo",auto.assign = F)
infy <- getSymbols("INFY.NS", src="yahoo",auto.assign = F)
library(highcharter)
highchart(type = "stock") %>%
hc_add_series(sbi) %>%
hc_add_series(rel) %>%
hc_add_series(infy) %>% hc_title(text = "infy/sbi/rel")

Resources