I would like to compare the values of two different variables in time.
For example, having two datasets:
dataset1(Date, value)
and
dataset2(Date, value)
In order to plot just first, we can execute the following:
x.Date <- as.Date(dataset1$Date)
x <- zoo(dataset1$Value, x.Date)
plot(x)
To the same window I would like to add (dataset2$value, dataset2$Date), and by chance set the different color.
the values dataset1$Date and dataset2$Date are not neccessary the same (some days might overlap and some not), for example dataset1$Date might contain (dec01, dec02, dec03, dec05) and dataset2$Date (dec02, dec03, dec06).
Does anyone know how to plot two (or several) time plots in the same window?
There are several options. Here are three options working with zoo objects.
set.seed(1)
xz = zoo(ts(rnorm(20), frequency = 4, start = c(1959, 2)))
yz = zoo(ts(rnorm(20), frequency = 4, start = c(1959, 2)))
# Basic approach
plot(xz)
lines(yz, col = "red")
# Panels
plot.zoo(cbind(xz, yz))
# Overplotted
plot.zoo(cbind(xz, yz),
plot.type = "single",
col = c("red", "blue"))
If you are plotting regular ts objects, you can also explore ts.plot:
set.seed(1)
x = ts(rnorm(20), frequency = 4, start = c(1959, 2))
y = ts(rnorm(20), frequency = 4, start = c(1959, 2))
ts.plot(x, y, gpars = list(col = c("black", "red")))
I had the same task in hand and after some research I came across ts.plot {stats} function in r which was very helpful.
The usage of the function is as follows :
ts.plot(..., gpars = list())
gpars is the graphic parameters where you can specify the graphic components of the plot.
I had a data similar to this and stored in a variable called time:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
V3 1951 1100 433 5638 1760 2385 2602 11007 2490 421
V5 433 880 216 4988 220 8241 13229 18704 6289 421
V7 4001 440 433 3686 880 9976 12795 21036 13229 1263
V9 2385 1320 650 8241 440 12795 13229 19518 11711 1474
V11 4771 880 1084 6723 0 17783 17566 27326 11060 210
V13 6940 880 2168 2602 1320 21036 16265 10843 15831 1474
V15 3903 1760 1951 3470 0 18217 14964 0 13663 2465
V17 4771 440 2819 8458 880 25591 24940 1518 17783 1895
V19 7807 1760 5205 2385 0 14096 22771 13880 12578 1263
V21 5205 880 5205 6506 880 28410 18217 13229 19952 1474
V23 6506 1760 5638 7590 880 14747 26675 11928 12795 1474
V25 7373 440 5855 10626 0 19301 21470 15398 19952 1895
V27 5638 2640 6289 0 880 16482 20603 30796 14313 2316
V29 8241 440 6506 6723 880 11277 35784 25157 23205 4423
V31 7373 2640 6072 8891 220 17133 27109 31013 27287 4001
V33 6723 660 5855 14313 660 6940 26892 17566 24111 4844
V35 9325 2420 9325 12578 0 6506 30796 34483 23422 5476
V37 4771 440 6872 12361 880 9325 36218 25808 30362 4844
V39 9976 2640 7658 12361 440 11277 36001 31013 40555 4633
V41 10410 880 6506 12795 440 26241 33398 27976 24940 5686
V43 5638 2200 7590 14313 0 9976 34483 29928 33832 6108
V45 10843 440 8675 11711 440 7807 29278 24940 43375 4633
V47 8675 1760 8891 13663 0 9108 38386 31230 33398 4633
V49 10410 1760 9542 13880 440 8675 39051 31446 42507 5476
. . . . . . . . .
And I had to get a Time series plot for each column on the same plot.
The code is as follows:
ts.plot(time,gpars= list(col=rainbow(10)))
The result of the graph looks like this.
What about using ggplot2?
library(ggplot2)
dta <- data.frame(year = 2012,
month = rep(seq(1,9),each=10),
day = sample(seq(1,20),90,replace=T),
Group = sample(c('A','B'),90,replace=T),
Value = seq(1,90)
)
dta$Date <- apply(dta[,c('year','month','day')],1,paste,sep='',collapse='-')
dta$Date <- as.Date(dta$Date)
qplot(Date,Value,data=dta,geom='line',color=Group)
in the case you describe, I would assemble dta in the following way
dataset1$Group <- 'dataset1'
dataset2$Group <- 'dataset2'
dta <- rbind(dataset1,dataset2)
dta$Date <- as.Date(dta$Date)
The following worked for me. Say, you have two time-series vectors x1 and x2 and you want to overlay them on a single plot.
library("tseries")
comb_ts <- cbind(x1, x2) # please make sure the length of both your timeseries
plot.ts(comb_ts, plot.type = "single")
And I get the following plot -
Hope this helps!
If you are working with xts object(like stock data) you can use higcharter package.
library(quantmod)
sbi <- getSymbols("SBIN.BO", src="yahoo",auto.assign = F)
rel <- getSymbols("RELIANCE.NS", src="yahoo",auto.assign = F)
infy <- getSymbols("INFY.NS", src="yahoo",auto.assign = F)
library(highcharter)
highchart(type = "stock") %>%
hc_add_series(sbi) %>%
hc_add_series(rel) %>%
hc_add_series(infy) %>% hc_title(text = "infy/sbi/rel")
Related
I have a dataset with 50 observations over 10 variables and I would like to apply the following function over the all variable permutations.
new_fun <- function(data, x, y) {
x <- data[ , x]
y <- data[ , y]
value <- (x - y) / (x + y)
colnames(value) <- paste(names(x), "/", names(y), sep = "")
return(value)
}
here is a part of the dataset
var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
1268 1522 1268 1842 4728 5611 5544 2374 1535 5773
1286 1534 1259 1829 4834 5802 5776 2383 1538 5928
1296 1534 1266 1853 4905 5805 5916 2418 1545 5949
1296 1488 1239 1791 4963 5985 5880 2359 1524 6142
1273 1503 1228 1787 4694 5608 5608 2268 1476 5725
1290 1522 1271 1811 4799 5728 5752 2402 1555 5832
1265 1510 1247 1786 4981 6072 6172 2409 1526 6258
1289 1527 1246 1841 4876 5827 5808 2361 1522 6009
1322 1590 1351 1917 4532 5271 5264 2412 1589 5418
1334 1589 1445 1899 3680 4638 4820 2321 1638 4974
1347 1532 1370 1865 3618 4702 4852 2275 1619 4994
The idea is to have a new dataset with 50 observations on 90 columns (n=10, r=2, no repeats).
var1/var2 var1/var3 var1/var4 ...
1 . . . ...
2 . . . ...
3 . . . ...
. . . . ...
. . . . ...
. . . . ...
I have tried apply functions and loops with no success so far. Any help is greatly appreciated!
You can do this using the tidyverse and the purrr package:
library(tidyverse)
# the data you provided
varst <- as.data.frame(read_csv("var1,var2,var3,var4,var5,var6,var7,var8,var9,var10
1268,1522,1268,1842,4728,5611,5544,2374,1535,5773
1286,1534,1259,1829,4834,5802,5776,2383,1538,5928
1296,1534,1266,1853,4905,5805,5916,2418,1545,5949
1296,1488,1239,1791,4963,5985,5880,2359,1524,6142
1273,1503,1228,1787,4694,5608,5608,2268,1476,5725
1290,1522,1271,1811,4799,5728,5752,2402,1555,5832
1265,1510,1247,1786,4981,6072,6172,2409,1526,6258
1289,1527,1246,1841,4876,5827,5808,2361,1522,6009
1322,1590,1351,1917,4532,5271,5264,2412,1589,5418
1334,1589,1445,1899,3680,4638,4820,2321,1638,4974
1347,1532,1370,1865,3618,4702,4852,2275,1619,4994"))
map_dfc(names(varst), # cycle through each column
function(x) {
# fetch all columns beside x to match
map(setdiff(names(varst), x),
function(y){ # your function as above
v_x <- varst[x]
v_y <- varst[y]
ret <- (v_x - v_y) / (v_x + v_y)
names(ret) <- paste0(x, "/", y)
ret # return the caluclated values
})
})
to stop repeats
To stop comparing var2/var1 if var1/var2 has already been compared, etc. do the following:
# if you need to put the column names in order, use this
# you'll also need two digit number names, e.g. var01, var07, var10
# varst <- varst %>% select(order(colnames(varst)))
map_dfc(names(varst)[-length(names(varst))], # map each column except the final column as x
function(x) {
# fetch all columns after x in the df column names
map(names(varst)[(match(x,names(varst))+1):length(names(varst))],
function(y){ # your function as above
if(!is.na(y)){
v_x <- varst[x]
v_y <- varst[y]
ret <- (v_x - v_y) / (v_x + v_y)
names(ret) <- paste0(x, "/", y)
ret # return the caluclated values
}
})
})
I have data like this:
ID height S1 S2 S3
1 927 0.90695438 0.28872194 0.67114294
2 777 0.20981677 0.71783084 0.74498220
3 1659 0.35813799 0.92339744 0.44001698
4 174 0.44829914 0.67493949 0.11503942
5 1408 0.90642643 0.18593999 0.67564278
6 1454 0.38943930 0.34806716 0.73155952
7 2438 0.51745975 0.12351953 0.48398490
8 1114 0.12523909 0.10811622 0.17104804
9 1642 0.03014575 0.29795320 0.67584853
10 515 0.77180549 0.83819990 0.26298995
11 1877 0.32741508 0.99277109 0.34148083
12 2647 0.38947869 0.43713441 0.21024554
13 845 0.04105275 0.20256457 0.01631959
14 1198 0.36139663 0.96387150 0.37676288
15 2289 0.57097808 0.66038711 0.56230740
16 2009 0.68488024 0.29811683 0.67998461
17 618 0.97111675 0.11926219 0.74538877
18 1076 0.70195881 0.59975160 0.95007272
19 1082 0.01154550 0.12019055 0.16309071
20 2072 0.53553213 0.78843202 0.32475690
21 1610 0.83657146 0.36959607 0.13271604
22 2134 0.80686674 0.95632284 0.63729744
23 1617 0.08093264 0.91357666 0.33092961
24 2248 0.23890930 0.82333634 0.64907957
25 1263 0.96598986 0.31948216 0.30288836
26 518 0.03767233 0.87770033 0.07123327
27 2312 0.91640643 0.80035100 0.66239047
28 2646 0.72622658 0.61135664 0.75960356
29 1650 0.20077621 0.07242114 0.55336017
30 837 0.84020075 0.42158771 0.53927210
31 1467 0.39666235 0.34446560 0.84959232
32 2786 0.39270226 0.75173569 0.65322596
33 1049 0.47255689 0.21875132 0.95088576
34 2863 0.58365691 0.29213397 0.61722305
35 2087 0.35238717 0.35595337 0.49284063
36 2669 0.02847401 0.63196192 0.97600657
37 545 0.99508793 0.89253107 0.49034522
38 1890 0.95755846 0.74403278 0.65517230
39 2969 0.55165118 0.45722242 0.59880179
40 395 0.10195396 0.03609544 0.94756902
41 995 0.23791515 0.56851452 0.36801151
42 2596 0.86009766 0.43901589 0.87818701
43 2334 0.73826129 0.60048445 0.45487507
44 2483 0.49731226 0.95138276 0.49646702
45 1812 0.57992109 0.26943131 0.46061562
46 1476 0.01618339 0.65883839 0.61790820
47 2342 0.47212988 0.07647121 0.60414349
48 2653 0.04238973 0.07128521 0.78587960
49 627 0.46315442 0.37033152 0.55526847
50 925 0.62999477 0.29710220 0.76897834
51 995 0.67324929 0.55107827 0.40428567
52 600 0.08703467 0.36989059 0.51071981
53 711 0.14358380 0.84568953 0.52353644
54 828 0.90847850 0.62079070 0.99279921
55 1776 0.12253259 0.39914002 0.42964742
56 764 0.72886279 0.29966153 0.99601125
57 375 0.95037718 0.38111984 0.78660025
58 694 0.04335591 0.70113494 0.51591063
59 1795 0.01959930 0.94686529 0.50268797
60 638 0.19907246 0.77282832 0.91163748
61 1394 0.50508626 0.21955016 0.26441590
62 1943 0.92638876 0.71611036 0.17385687
63 2882 0.13840169 0.66421796 0.40033126
64 2031 0.16919458 0.70625020 0.53835738
65 1338 0.60662738 0.27962799 0.24496437
66 1077 0.81587669 0.71225050 0.37585096
67 1370 0.84338121 0.66094211 0.58025355
68 1339 0.78807719 0.04101269 0.20895531
69 739 0.01902087 0.06114149 0.80133001
70 2085 0.69808750 0.27976169 0.63880242
71 1240 0.81509312 0.30196772 0.73633076
72 987 0.56840006 0.95661083 0.43881241
73 1720 0.48006288 0.38981872 0.57981238
74 2901 0.16137012 0.37178879 0.25604401
75 1987 0.08925623 0.84314249 0.46371823
76 1876 0.16268237 0.84723500 0.16861486
77 2571 0.02672845 0.31933115 0.61389453
78 2325 0.70962948 0.13250605 0.95810262
79 2503 0.76101818 0.61710912 0.47819473
80 279 0.85747478 0.79130451 0.75115933
81 1381 0.43726582 0.33804871 0.02058322
82 1800 0.41713645 0.90544760 0.17096903
83 2760 0.58564949 0.19755671 0.63996650
84 2949 0.82496758 0.79408518 0.16497848
85 118 0.79313923 0.75460289 0.35472278
86 1736 0.32615257 0.91139485 0.18642647
87 2201 0.95793194 0.32268770 0.89765616
88 750 0.65301961 0.08616947 0.23778386
89 906 0.45867582 0.91120045 0.98494348
90 2202 0.60602188 0.95517383 0.02133074
I want to make a barplot using ggplot2 like this:
In the above-mentioned dataset height should be on the y-axis and S1, S2, S3 should be representing colors of each sample.
I have tried the base R function barplot which gave me the following output. Please give me any suggestion.
barplot(t(as.matrix(examp[,3:5])),col=rainbow(3))
It's not clear to me exactly what you want to plot. You say you want height on the y axis, but the examples you show are all 'filled to the top', implying the same height for each ID. Also, it is not clear what the numbers associated with each sample represent. I am guessing they should be relative weightings for the bar heights.
Assuming you actually want a filled bar plot as in the examples, with the relative sizes of the bars dictated by the sample values, you can do:
library(tidyr)
library(dplyr)
library(ggplot2)
df %>%
mutate(ID = reorder(ID, S3/(S3 + S2 + S1))) %>%
pivot_longer(3:5, names_to = "Sample", values_to = "Value") %>%
ggplot(aes(ID, Value * height, fill = Sample)) +
geom_col(position = "fill", color = NA) +
labs(y = "Height") +
theme_classic() +
scale_fill_manual(values = c("red", "green", "blue"))
Alternative
df %>%
arrange(order(height)) %>%
group_by(height) %>%
summarize(across(everything(), mean)) %>%
pivot_longer(3:5, names_to = "Sample", values_to = "Value") %>%
ggplot(aes(height, Value, fill = Sample, colour = Sample)) +
geom_smooth(method = loess, formula = y ~ x, linetype = 2, alpha = 0.2) +
theme_bw()
I have a dataframe like the following (this is a reduced example, I have many more rows and columns):
CH1 CH2 CH3
1 3434 282 7622
2 4442 6968 8430
3 4128 6947 478
4 6718 6716 3017
5 3735 9171 1128
6 65 4876 4875
7 9305 6944 3309
8 4283 6060 650
9 5588 2285 203
10 205 2345 9225
11 8634 4840 780
12 6383 0 1257
13 4533 7692 3760
14 9363 9846 4697
15 3892 79 4372
16 6130 5312 9651
17 7880 7386 6239
18 8515 8021 2295
19 1356 74 8467
20 9024 8626 4136
I need to create additional columns by splitting the values. For example, value 1356 would have to be split into 6, 56, and 356. I do this on a for loop splitting by string. I do this to keep the leading zeros. So far, decent.
# CREATE ADDITIONAL COLUMNS
for(col in 1:3) {
# Create a temporal variable
temp <- as.character(data[,col] )
# Save the new column
for(mod in c(-1, -2, -3)) {
# Create the column
temp <- cbind(temp, str_sub(as.character(data[,col]), mod))
}
# Merge to the row
data <- cbind(data, temp)
}
My problem is that not all cells have 4 digits: some may have 1, 2 or 3 digits. Therefore, I get repeated values when I split. For example, for 79 I get: 79 (original), 9, 79, 79, 79.
Problem: I need to remove the repeated values. Of course, I could do unique, but that gives me rows of uneven number of columns. I need to fill those missing (i.e. the removed repeated values) with NA. I can only compare this by row.
I checked CJ Yetman's answer here, but they only replace consecutive numbers. I only need to keep unique values.
Reproducible Example: Here is a fiddle with my code working: http://rextester.com/IKMP73407
Expected outcome: For example, for rows 11 & 12 of the example (see the link for the reproducible example), if this is my original:
8634 4 34 634 4840 0 40 840 780 0 80 780
6383 3 83 383 0 0 0 0 1257 7 57 257
I'd like to get this:
8634 4 34 634 4840 0 40 840 780 NA 80 NA
6383 3 83 383 0 NA NA NA 1257 7 57 257
You can use apply():
The data:
data <- structure(list(CH1 = c(3434L, 4442L, 4128L, 6718L, 3735L, 65L,
9305L, 4283L, 5588L, 205L, 8634L, 6383L, 4533L, 9363L, 3892L,
6130L, 7880L, 8515L, 1356L, 9024L), CH2 = c(282L, 6968L, 6947L,
6716L, 9171L, 4876L, 6944L, 6060L, 2285L, 2345L, 4840L, 0L, 7692L,
9846L, 79L, 5312L, 7386L, 8021L, 74L, 8626L), CH3 = c(7622L,
8430L, 478L, 3017L, 1128L, 4875L, 3309L, 650L, 203L, 9225L, 780L,
1257L, 3760L, 4697L, 4372L, 9651L, 6239L, 2295L, 8467L, 4136L
)), .Names = c("CH1", "CH2", "CH3"), row.names = c(NA, 20L), class = "data.frame")
Select row 11 and 12:
data <- data[11:12, ]
Using your code:
# CREATE ADDITIONAL COLUMNS
for(col in 1:3) {
# Create a temporal variable
temp <- data[,col]
# Save the new column
for(mod in c(10, 100, 1000)) {
# Create the column
temp <- cbind(temp, data[, col] %% mod)
}
data <- cbind(data, temp)
}
data[,1:3] <- NULL
The result is:
temp V2 V3 V4 temp V2 V3 V4 temp V2 V3 V4
11 8634 4 34 634 4840 0 40 840 780 0 80 780
12 6383 3 83 383 0 0 0 0 1257 7 57 257
Then go through the data row by row and remove duplicates and transpose the outcome:
t(apply(data, 1, function(row) {
row[duplicated(row)] <- NA
return(row)
}))
The result is:
temp V2 V3 V4 temp V2 V3 V4 temp V2 V3 V4
11 8634 4 34 634 4840 0 40 840 780 NA 80 NA
12 6383 3 83 383 0 NA NA NA 1257 7 57 257
We know mutate_at function from dplyr allows us to mutate selected multiple columns and apply a function to each of them. I need opposite of it. I mean to say, apply multiple functions to same column or apply same function multiple times to the same column. Take the following reproducible example.
> main <- structure(list(PolygonId = c(0L, 1L, 1612L, 3L, 2L, 1698L), Area = c(3.018892,
1.995702, 0.582808, 1.176975, 2.277057, 0.014854), Perimeter = c(10.6415,
8.6314, 4.8478, 6.1484, 9.2226, 0.6503), h0 = c(1000,500,700,1000,200,1200)), .Names = c("PolygonId",
"Area", "Perimeter", "h0"), row.names = c(NA, 6L), class = "data.frame")
> main
PolygonId Area Perimeter h0
1 0 3.018892 10.6415 1000
2 1 1.995702 8.6314 500
3 1612 0.582808 4.8478 700
4 3 1.176975 6.1484 1000
5 2 2.277057 9.2226 200
6 1698 0.014854 0.6503 1200
I am only concerned about h0 column in the df main.
Expected outcome:
The h10 field is h0 + 10% of h0 and h_10 is h0 - 10% of h0
PolygonId Area Perimeter h0 h10 h20 h_10 h_20
1 0 3.018892 10.6415 1000 1100 1200 900 800
2 1 1.995702 8.6314 500 550 600 450 400
3 1612 0.582808 4.8478 700 770 840 630 560
4 3 1.176975 6.1484 1000 1100 1200 900 800
5 2 2.277057 9.2226 200 220 240 180 160
6 1698 0.014854 0.6503 1200 1320 1440 1080 960
I'd usually do this::
calcH <- function(h, pc){
h + pc / 100 * h
}
new_main <- mutate ( main,
h10 = calcH(h0, 10),
h20 = calcH(h0, 20),
h_10 = calcH(h0, -10),
h_20 = calcH(h0, -20)
)
But this is going to be hectic and long code since I have to do this calculation for 1%, 2.5%, 5%, 7.5%, 10%, 12.5%, 15%... 30% in both positive and negative ways.
mutate_at can use multiple functions, but they need to exist in the environment as named functions (can't be anonymous functions) So something like
pcts<-rep(c(1,2.5*1:12),2)*c(-1,1)
for(i in pcts){
assign(gsub("-","_",paste0("h",i)),eval(parse(text=sprintf("function(x) x*(100+%f)/100",i)))) }
main %>% mutate_at(vars(h0),gsub("-","_",paste0("h",pcts)))
would work
I like to solve these kind of problems using long data representation:
library(dplyr)
library(tidyr)
# create data frame with join helper and multiplier-values:
bla <- data.frame(mult = seq(-.1, .1, .01),
join = TRUE)
# join, calculate values, create names, transform to wide:
main %>%
mutate(join = TRUE) %>%
left_join(bla) %>%
mutate(h0 = h0*(1+mult),
mult = sub(x = paste0("h", mult*100), pattern = "-", replacement = "_")) %>%
select(-join) %>%
spread(mult, h0)
This is easy in base R. The idea is to create a vector with the required percentages, loop over that vector and calculate your metric, i.e.
v1 <- c(1, seq(2.5, 30, by = 2.5), seq(-30, -2.5, by = 2.5), -1)
sapply(v1, function(i) calcH(main$h0, i))
Here's another approach similar to #andyyy's, but uses rlang instead:
library(dplyr)
library(rlang)
percent <- c(1, 2.5*1:12)
calc_expr <- function(percent_vec){
parse_exprs(paste(paste0("h0+(",percent_vec,"/100*h0)"), collapse = ";"))
}
main %>%
mutate(!!!calc_expr (percent), !!!calc_expr (percent*-1)) %>%
setNames(c(colnames(main), paste0("h", percent), paste0("h_", percent)))
Result:
PolygonId Area Perimeter h0 h1 h2.5 h5 h7.5 h10 h12.5 h15 h17.5 h20 h22.5 h25 h27.5
1 0 3.018892 10.6415 1000 1010 1025.0 1050 1075.0 1100 1125.0 1150 1175.0 1200 1225.0 1250 1275.0
2 1 1.995702 8.6314 500 505 512.5 525 537.5 550 562.5 575 587.5 600 612.5 625 637.5
3 1612 0.582808 4.8478 700 707 717.5 735 752.5 770 787.5 805 822.5 840 857.5 875 892.5
4 3 1.176975 6.1484 1000 1010 1025.0 1050 1075.0 1100 1125.0 1150 1175.0 1200 1225.0 1250 1275.0
5 2 2.277057 9.2226 200 202 205.0 210 215.0 220 225.0 230 235.0 240 245.0 250 255.0
6 1698 0.014854 0.6503 1200 1212 1230.0 1260 1290.0 1320 1350.0 1380 1410.0 1440 1470.0 1500 1530.0
h30 h_1 h_2.5 h_5 h_7.5 h_10 h_12.5 h_15 h_17.5 h_20 h_22.5 h_25 h_27.5 h_30
1 1300 990 975.0 950 925.0 900 875.0 850 825.0 800 775.0 750 725.0 700
2 650 495 487.5 475 462.5 450 437.5 425 412.5 400 387.5 375 362.5 350
3 910 693 682.5 665 647.5 630 612.5 595 577.5 560 542.5 525 507.5 490
4 1300 990 975.0 950 925.0 900 875.0 850 825.0 800 775.0 750 725.0 700
5 260 198 195.0 190 185.0 180 175.0 170 165.0 160 155.0 150 145.0 140
6 1560 1188 1170.0 1140 1110.0 1080 1050.0 1020 990.0 960 930.0 900 870.0 840
Notes:
Using the vector of percentages, I construct multiple expressions using paste0 and parse_exprs then unquote and splice them in mutate using !!!. Finally, rename the columns using setNames.
I have a time series of several years that I need to plot in one graph. The largest series has a mean of 340 and a minimum of 245 and maximum of 900. The smallest series has a mean of 7 with a minimum of -28 and maximum of 31. The remaining series has values in the range of 6 to 700. The series follows a regular annual and seasonal pattern over years until suddenly there was an upsurge of temperature for a month which was followed by much increased deaths than usual.
I cannot provide any real data, but I have simulated the following data and tried the code below which was based on an example code found here http://www.r-bloggers.com/multiple-y-axis-in-a-r-plot/. But the plot has not produced what I have desired. I have the following questions
In the plot it is difficult to clearly depict any of the series and important facts are hidden in the detail. How can I better present this data?
The Y axes have different lengths. How could I have axes with the same length? I appreciate any idea and suggestion on how to improve this code and present a better plot. The data I have simulated does not reflect my data as I am unable to simulate the extreme values that mirror the period of extreme weather episode.
Many thanks
temp<- rnorm(365, 5, 10)
mort<- rnorm(365, 300, 45)
poll<- rpois(365, lambda=76)
date<-seq(as.Date('2011-01-01'),as.Date('2011-12-31'),by = 1)
df<-data.frame(date,mort,poll,temp)
windows(600,600)
par(mar=c(5, 12, 4, 4) + 0.1)
with(df, {
plot(date, mort, axes=F, ylim=c(170,max(mort)), xlab="", ylab="",type="l",col="black", main="")
points(date,mort,pch=20,col="black")
axis(2, ylim=c(170,max(mort)),col="black",lwd=2)
mtext(2,text="Mortality",line=2)
})
par(new=T)
plot(date, poll, axes=F, ylim=c(45,max(poll)), xlab="", ylab="",
type="l",col="red",lty=2, main="",lwd=1)
axis(2, ylim=c(45,max(poll)),lwd=1,line=3.5)
points(date, poll,pch=20)
mtext(2,text="PM10",line=5.5)
par(new=T)
plot(date, temp, axes=F, ylim=c(-28,max(temp)), xlab="", ylab="",
type="l",lty=3,col="brown", main="",lwd=1)
axis(2, ylim=c(-28,max(temp)),lwd=1,line=7)
points(date, temp,pch=20)
mtext(2,text="Temperature",line=9)
axis(1,pretty(range(date),10))
mtext("date",side=1,col="black",line=2)
Here are 6 approaches:
library(zoo)
z <- read.zoo(df)
# classic graphics in separate and single plots
plot(z)
plot(z, screen = 1)
# lattice graphics in separate and single plots
library(lattice)
xyplot(z)
xyplot(z, screen = 1)
# ggplot2 graphics in separate and single plots
library(ggplot2)
autoplot(z) + facet_free()
autoplot(z, facet = NULL)
I had the same task in hand and after some research I came across ts.plot {stats} function in r which was very helpful.
The usage of the function is as follows :
ts.plot(..., gpars = list())
gpars is the graphic parameters where you can specify the graphic components of the plot.
I had a data similar to this and stored in a variable called time:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
V3 1951 1100 433 5638 1760 2385 2602 11007 2490 421
V5 433 880 216 4988 220 8241 13229 18704 6289 421
V7 4001 440 433 3686 880 9976 12795 21036 13229 1263
V9 2385 1320 650 8241 440 12795 13229 19518 11711 1474
V11 4771 880 1084 6723 0 17783 17566 27326 11060 210
V13 6940 880 2168 2602 1320 21036 16265 10843 15831 1474
V15 3903 1760 1951 3470 0 18217 14964 0 13663 2465
V17 4771 440 2819 8458 880 25591 24940 1518 17783 1895
V19 7807 1760 5205 2385 0 14096 22771 13880 12578 1263
V21 5205 880 5205 6506 880 28410 18217 13229 19952 1474
V23 6506 1760 5638 7590 880 14747 26675 11928 12795 1474
V25 7373 440 5855 10626 0 19301 21470 15398 19952 1895
V27 5638 2640 6289 0 880 16482 20603 30796 14313 2316
V29 8241 440 6506 6723 880 11277 35784 25157 23205 4423
V31 7373 2640 6072 8891 220 17133 27109 31013 27287 4001
V33 6723 660 5855 14313 660 6940 26892 17566 24111 4844
V35 9325 2420 9325 12578 0 6506 30796 34483 23422 5476
V37 4771 440 6872 12361 880 9325 36218 25808 30362 4844
V39 9976 2640 7658 12361 440 11277 36001 31013 40555 4633
V41 10410 880 6506 12795 440 26241 33398 27976 24940 5686
V43 5638 2200 7590 14313 0 9976 34483 29928 33832 6108
V45 10843 440 8675 11711 440 7807 29278 24940 43375 4633
V47 8675 1760 8891 13663 0 9108 38386 31230 33398 4633
V49 10410 1760 9542 13880 440 8675 39051 31446 42507 5476
. . . . . . . . .
And I had to get a Time series plot for each column on the same plot.
The code is as follows:
ts.plot(time,gpars= list(col=rainbow(10)))
I'd use separate plots for each variable, making their y-axis different. I like this better than introducing multiple y-axes in one plot. I will use ggplot2 to do this, and more specifically the concept of facetting:
library(ggplot2)
library(reshape2)
df_melt = melt(df, id.vars = 'date')
ggplot(df_melt, aes(x = date, y = value)) +
geom_line() +
facet_wrap(~ variable, scales = 'free_y', ncol = 1)
Notice that I stack the facets on top of each other. This will enable you to easily compare the timing of events in each of the series. Alternatively, you could put them next to each other (using nrow = 1 in facet_wrap), this will enable you to easily compare the y-values.
We can also introduce some extremes:
df = within(df, {
temp[61:90] = temp[61:90] + runif(30, 30, 50)
mort[61:90] = mort[61:90] + runif(30, 300, 500)
})
df_melt = melt(df, id.vars = 'date')
ggplot(df_melt, aes(x = date, y = value)) +
geom_line() +
facet_wrap(~ variable, scales = 'free_y', ncol = 1)
Here you can see easily that the increase in temp is correlated with the increase in mortality.