R:use window function to extract data from csv file - r

I have a csv data file https://github.com/user59036/first/blob/master/dataFraserRiver.csv
and I want to create two data sets using the function window.
The first set of data is from January 1990 to December 2008 and the second set is from January 2009 to December 2010.
datRiver <- read.csv("dataFraserRiver.csv")
datRiverTest <-ts(datRiver)
window(datRiver,start=c(1990,1),end = c(2008,12),frequency=12)
I kept getting an error:
'frequency' not changed'end' value not changedError in
window.default(x, ...) : 'start' cannot be after 'end'
How should I change my code to get the data? Thanks for any help.

It can work if you remove the year column. And convert the data.frame to a transposed matrix. And lastly converting to vector.
datvec <- c(t(datRiver[-1]))
dat_ts <- ts(datvec, start = c(1912, 1), end = c(2010, 12), frequency = 12)
window(dat_ts, start = c(1990, 1), end = c(2008, 12))
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1990 1210 841 926 3000 5050 8760 6270 3340 1790 1520 2110 1190
# 1991 867 1560 1060 2690 5810 6910 6270 4330 2730 1560 1370 1110
# 1992 1090 1180 1770 2950 4800 5940 3870 2550 1960 2190 1740 853
# 1993 662 668 751 1940 5620 4930 3640 2900 1760 1060 1050 828
# 1994 819 709 1160 3600 5970 5960 5200 2900 1580 1350 857 814
# 1995 654 837 765 1900 4450 5880 4030 3500 1870 1730 2080 1560
# 1996 1120 947 1120 3080 4070 6750 6400 3780 2610 2130 1860 1210
# 1997 1040 938 1080 2580 7420 9580 7310 4440 2490 3270 2510 1320
Why remove the first column? Because how does the function know that those years are not actual values? Why convert to matrix? Because the list format of data frames make it problematic when using as a time-series. Why transpose? Because when a matrix is turned into a vector it is ordered by column, but the data is organized by row. Why convert to vector? Because this format makes for the clearest input for creating time-series objects with ts.

It works for me. Make sure that:
1- You remove the Year column with [,-1] when creating the ts object
2- You specify the start, end and frequency in your ts call
3- You use datRiverTest in your window call instead of datRiver
datRiverTest <-ts(c(t(datRiver[,-1])), start=c(1912,1), end=c(2010,12), frequency = 12)
window(datRiverTest,start=c(1990,1),end = c(2008,12),frequency=12)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1990 1210 841 926 3000 5050 8760 6270 3340 1790 1520 2110 1190
1991 867 1560 1060 2690 5810 6910 6270 4330 2730 1560 1370 1110
1992 1090 1180 1770 2950 4800 5940 3870 2550 1960 2190 1740 853
1993 662 668 751 1940 5620 4930 3640 2900 1760 1060 1050 828
1994 819 709 1160 3600 5970 5960 5200 2900 1580 1350 857 814
1995 654 837 765 1900 4450 5880 4030 3500 1870 1730 2080 1560
1996 1120 947 1120 3080 4070 6750 6400 3780 2610 2130 1860 1210
1997 1040 938 1080 2580 7420 9580 7310 4440 2490 3270 2510 1320
1998 949 922 1050 1790 5730 4990 3910 2680 1490 1390 1180 839
1999 827 738 759 2390 5220 8910 8640 5250 2630 1610 1940 1230
2000 872 751 728 1980 4270 6740 6520 3730 2840 1920 1760 812
2001 754 580 537 1220 3620 5740 5110 3890 2000 1480 1600 924
2002 922 719 576 1800 5490 9300 6740 2960 2330 2160 1300 863
2003 698 665 735 2060 3700 6450 4430 2720 1630 2280 1560 836
2004 752 656 801 2440 4390 5380 3860 2530 3210 2270 2450 1690
2005 1780 2200 1730 2370 5470 6110 5040 2830 1900 2420 1970 1240
2006 1410 985 814 1910 4470 5900 3750 2270 1380 924 1410 860
2007 766 733 1420 2780 5500 8270 6100 3260 2040 2340 2400 1390
2008 935 847 848 1100 5990 7200 4980 3300 2370 1510 1750 1100

Related

Trying to print leap years as a vector between user given years

I am trying to print leap years between user given years as a vector.
leap_years<-function(V1,V2){
for(i in V1:V2){
if(i%%4==0 && i%%100!=0 ||i%%400==0)
{print(i)}}}
This gives me right years but as a single units. How to get them in single vector?
If you want to store the results in a vector you will need to store them.
Start by initiating an empty vector, and then instead of printing just append the value to the vector.
I modified your code in that sense:
leap_years<-function(V1,V2){
leap_vect=NULL
for(i in V1:V2){
if(i%%4==0 && i%%100!=0 ||i%%400==0){
leap_vect = append(leap_vect,i)
}
}
return(leap_vect)
}
Actually you can vectorize operations in function leap_years like below
leap_years <- function(V1,V2) {
v <- V1:V2
v[(v%%4==0 & v%%100!=0) | v%%400==0]
}
such that
> leap_years(1000,2000)
[1] 1004 1008 1012 1016 1020 1024 1028 1032 1036 1040 1044 1048 1052 1056 1060
[16] 1064 1068 1072 1076 1080 1084 1088 1092 1096 1104 1108 1112 1116 1120 1124
[31] 1128 1132 1136 1140 1144 1148 1152 1156 1160 1164 1168 1172 1176 1180 1184
[46] 1188 1192 1196 1200 1204 1208 1212 1216 1220 1224 1228 1232 1236 1240 1244
[61] 1248 1252 1256 1260 1264 1268 1272 1276 1280 1284 1288 1292 1296 1304 1308
[76] 1312 1316 1320 1324 1328 1332 1336 1340 1344 1348 1352 1356 1360 1364 1368
[91] 1372 1376 1380 1384 1388 1392 1396 1404 1408 1412 1416 1420 1424 1428 1432
[106] 1436 1440 1444 1448 1452 1456 1460 1464 1468 1472 1476 1480 1484 1488 1492
[121] 1496 1504 1508 1512 1516 1520 1524 1528 1532 1536 1540 1544 1548 1552 1556
[136] 1560 1564 1568 1572 1576 1580 1584 1588 1592 1596 1600 1604 1608 1612 1616
[151] 1620 1624 1628 1632 1636 1640 1644 1648 1652 1656 1660 1664 1668 1672 1676
[166] 1680 1684 1688 1692 1696 1704 1708 1712 1716 1720 1724 1728 1732 1736 1740
[181] 1744 1748 1752 1756 1760 1764 1768 1772 1776 1780 1784 1788 1792 1796 1804
[196] 1808 1812 1816 1820 1824 1828 1832 1836 1840 1844 1848 1852 1856 1860 1864
[211] 1868 1872 1876 1880 1884 1888 1892 1896 1904 1908 1912 1916 1920 1924 1928
[226] 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988
[241] 1992 1996 2000

How can I swap default colors in heatmap function in R?

I have created following heatmap of days in the week and hours in a day;
This is table of values, from which was the map created;
0 1 2 3 4 5 6 7 8 9 10 11 12 13
nedeľa 2028 1236 1019 838 607 461 478 483 615 864 884 787 1192 789
piatok 1873 932 743 560 473 602 839 1203 1268 1286 938 822 1207 857
pondelok 1900 825 712 527 415 542 772 1123 1323 1235 971 737 1129 824
sobota 2050 1267 985 836 652 508 541 650 858 1039 946 789 1204 767
streda 1814 790 619 469 396 561 862 1140 1329 1237 947 763 1225 804
štvrtok 1856 816 696 508 400 534 799 1135 1298 1301 932 731 1093 752
utorok 1691 777 603 464 414 520 845 1118 1175 1174 948 786 1108 762
14 15 16 17 18 19 20 21 22 23
nedeľa 959 1037 1083 1160 1389 1342 1706 1696 2079 1584
piatok 937 1140 1165 1318 1623 1652 1736 1881 2308 1921
pondelok 958 1059 1136 1252 1518 1503 1622 1815 2009 1490
sobota 963 1086 1055 1084 1348 1390 1570 1702 2078 1750
streda 863 1075 1076 1289 1580 1507 1718 1748 2093 1511
štvrtok 831 1044 1131 1258 1510 1537 1668 1776 2134 1579
utorok 908 1071 1090 1274 1553 1496 1696 1816 2044 1458
I wonder if there is some easy and elegant way how to swap color range, so that high values are represented by red color and other way around.
I've used this function;
heatmap (myMatrix, Colv=NA, Rowv=NA)
The default colors for the heatmap function are actually set by the image() function and are
col=heat.colors(12)
If you want to reverse them, just use pass
heatmap(..., col=rev(heat.colors(12)))
where ... are the rest of the parameters you need to pass.

What is the difference between lag and zlag function in r?

While working with time series or any data frame, what is the difference between taking lag of a column or zlag of a column?
lag of time-series is shifting underlying time period without affecting time series values. E.g. ldeaths time-series:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1974 3035 2552 2704 2554 2014 1655 1721 1524 1596 2074 2199 2512
1975 2933 2889 2938 2497 1870 1726 1607 1545 1396 1787 2076 2837
1976 2787 3891 3179 2011 1636 1580 1489 1300 1356 1653 2013 2823
1977 3102 2294 2385 2444 1748 1554 1498 1361 1346 1564 1640 2293
1978 2815 3137 2679 1969 1870 1633 1529 1366 1357 1570 1535 2491
1979 3084 2605 2573 2143 1693 1504 1461 1354 1333 1492 1781 1915
After lag(ldeaths, 12) 1-year shift (12 months), values of time series are not changing. Only period of time changes from 1974-1979 to 1973-1978:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1973 3035 2552 2704 2554 2014 1655 1721 1524 1596 2074 2199 2512
1974 2933 2889 2938 2497 1870 1726 1607 1545 1396 1787 2076 2837
1975 2787 3891 3179 2011 1636 1580 1489 1300 1356 1653 2013 2823
1976 3102 2294 2385 2444 1748 1554 1498 1361 1346 1564 1640 2293
1977 2815 3137 2679 1969 1870 1633 1529 1366 1357 1570 1535 2491
1978 3084 2605 2573 2143 1693 1504 1461 1354 1333 1492 1781 1915
After execution of library(TSA); zlag(ldeath, 12) the output is a vector, where last 12 values are truncated and in the beginning 12 NA are added:
[1] NA NA NA NA NA NA NA NA NA NA NA NA 3035 2552 2704 2554 2014 1655 1721 1524 1596 2074 2199 2512 2933 2889
[27] 2938 2497 1870 1726 1607 1545 1396 1787 2076 2837 2787 3891 3179 2011 1636 1580 1489 1300 1356 1653 2013 2823 3102 2294 2385 2444
[53] 1748 1554 1498 1361 1346 1564 1640 2293 2815 3137 2679 1969 1870 1633 1529 1366 1357 1570 1535 2491

I have weekly booking qty for 2015 and 2016 year as I have provided and I need to analyze for any pattern in the data

for this requirement I approached with time series.
data as below
bookingdate bookingqty
2014-07-27 202
2014-08-03 564
2014-08-10 359
2014-08-17 638
2014-08-24 487
2014-08-31 491
2014-09-07 364
2014-09-14 762
2014-09-21 419
2014-09-28 642
2014-10-05 723
2014-10-12 579
2014-10-19 1803
2014-10-26 437
2014-11-02 587
2014-11-09 803
2014-11-16 1347
2014-11-23 600
2014-11-30 616
2014-12-07 2242
2014-12-14 1313
2014-12-21 264
2014-12-28 918
2015-01-04 420
2015-01-11 741
2015-01-18 2213
2015-01-25 379
2015-02-01 386
2015-02-08 854
2015-02-15 1235
2015-02-22 726
2015-03-01 774
2015-03-08 1135
2015-03-15 1127
2015-03-22 466
2015-03-29 987
2015-04-05 665
2015-04-12 997
2015-04-19 2800
2015-04-26 594
2015-05-03 715
2015-05-10 2009
2015-05-17 1592
2015-05-24 499
2015-05-31 1906
2015-06-07 1619
2015-06-14 1277
2015-06-21 683
2015-06-28 2132
2015-07-05 1195
2015-07-12 1250
2015-07-19 5001
2015-07-26 320
2015-08-02 577
2015-08-09 825
2015-08-16 885
2015-08-23 1910
2015-08-30 1072
2015-09-06 615
2015-09-13 1809
2015-09-20 1243
2015-09-27 1516
2015-10-04 754
2015-10-11 910
2015-10-18 1766
2015-10-25 599
2015-11-01 536
2015-11-08 1170
2015-11-15 2060
2015-11-22 719
2015-11-29 706
2015-12-06 1129
2015-12-13 1807
2015-12-20 949
2015-12-27 653
2016-01-03 1612
2016-01-10 1058
2016-01-17 2699
2016-01-24 617
2016-01-31 335
2016-02-07 527
2016-02-14 526
2016-02-21 1729
2016-02-28 512
2016-03-06 1026
2016-03-13 824
2016-03-20 1144
2016-03-27 711
2016-04-03 743
2016-04-10 847
2016-04-17 833
2016-04-24 4192
2016-05-01 576
2016-05-08 610
2016-05-15 645
2016-05-22 950
2016-05-29 578
2016-06-05 786
2016-06-12 990
2016-06-19 1804
2016-06-26 853
2016-07-03 767
2016-07-10 1325
2016-07-17 1872
2016-07-24 3002
#I created time series object as
library(zoo)
pidbookingdata <- zoo(arrangeddata$bookingqty,arrangeddata$bookingdate)
#I can see in the x axis index for year 2015 and 2016 but when I use #decompose function I get the below error as
decompose(pidbookingdata)
Error in decompose(pidbookingdata) :
time series has no or less than 2 periods
#so, from this site I got and tried the below approach which works:
dfts <- as.ts(xts(arrangeddata$bookingqty,order.by=arrangeddata$bookingdate))
dfts
dfts <- ts(dfts, frequency=52)
dcomp <- decompose(dfts)
#the dfts data as shown below
> `dfts`
Time Series:
Start = c(1, 1)
End = c(3, 1)
Frequency = 52
[1] 202 564 359 638 487 491 364 762 419 642 723 579 1803 437 587 803 1347 600 616
[20] 2242 1313 264 918 420 741 2213 379 386 854 1235 726 774 1135 1127 466 987 665 997
[39] 2800 594 715 2009 1592 499 1906 1619 1277 683 2132 1195 1250 5001 320 577 825 885 1910
[58] 1072 615 1809 1243 1516 754 910 1766 599 536 1170 2060 719 706 1129 1807 949 653 1612
[77] 1058 2699 617 335 527 526 1729 512 1026 824 1144 711 743 847 833 4192 576 610 645
[96] 950 578 786 990 1804 853 767 1325 1872 3002
#when I do plot(dfts)
#I see the x axis index as 1.0 1.5 2 2.5 3
am I doing correct or something is wrong here? what does the indices 1.0, #1.5, 2.0 means in terms of weeks?

Can I plot multiple rows on the same plot from a 2-way table

I have the following table, which gives the number of earthquakes in each year (row) by month (column).
> tmp=table(quakes$year,quakes$mon)
> tmp
0 1 2 3 4 5 6 7 8 9 10 11
1973 388 453 451 508 375 533 496 392 349 424 400 406
1974 386 384 385 388 456 414 491 501 385 432 354 420
1975 435 374 397 439 449 629 461 434 386 404 440 470
1976 677 478 474 430 612 514 561 533 600 485 463 481
1977 453 355 508 519 460 477 416 541 449 523 585 489
1978 499 449 730 533 550 578 524 480 535 458 526 566
1979 485 444 771 662 705 661 590 597 514 635 549 549
1980 530 530 668 654 969 779 668 472 452 614 549 463
1981 501 506 545 547 538 524 662 587 690 561 518 650
1982 655 527 632 602 630 658 603 639 640 761 628 772
1983 909 683 775 847 1028 743 823 902 727 770 793 842
1984 798 732 872 943 795 721 782 820 994 947 1056 1033
1985 1016 839 1140 1078 1146 989 1066 1136 1095 1115 1162 1333
1986 1050 867 1217 944 1368 1046 1256 1035 912 1086 1066 871
1987 834 860 1003 884 891 871 959 943 952 1022 1035 1036
1988 990 957 1127 1123 1121 975 1095 1160 929 1079 1092 1063
1989 1133 1106 1144 1297 1235 1060 1175 1312 1200 1458 1137 1305
1990 1247 1176 1404 1489 1431 1321 1713 1496 1160 1277 1307 1569
1991 1476 1226 1369 1388 1387 1380 1327 1378 1253 1530 1301 1469
1992 1362 1292 1622 1715 1915 1649 1941 1722 1518 1501 1653 1634
1993 1435 1428 1821 1691 1970 1767 2502 1957 1903 1852 1628 1522
1994 2095 1409 1466 1520 1760 1702 1473 1494 1625 1889 1673 1265
1995 1656 1590 1444 1798 1931 1691 1445 1574 1640 2005 1917 2316
1996 2297 2310 1513 1290 1335 1675 1545 1450 1615 1604 1690 1614
1997 1441 1570 1890 1919 1618 1269 1582 1463 1463 1645 1892 2120
1998 1905 1592 1773 2021 2068 1786 1971 1776 1724 1749 1761 1562
1999 1752 1740 2093 1713 2145 1891 1679 1628 1487 1799 1584 1321
2000 1451 1340 1587 1702 1710 1941 2221 2125 1724 1863 2735 1857
2001 1945 2007 1856 2091 1724 2091 2039 1915 1817 2124 1917 2008
2002 2101 1996 2291 2202 1981 2126 2001 2091 2733 2411 3316 2205
2003 2053 2139 2604 2475 2526 2950 2655 2841 3030 2794 2709 2643
2004 2680 2861 2866 2692 3157 2767 2090 2274 2313 2168 2449 2883
2005 3253 2096 2842 3028 2562 2492 2340 2215 2347 2887 2176 2245
2006 2086 2007 2509 2739 2738 2445 2548 2405 2157 2399 3128 2407
2007 2822 1954 2361 3206 2351 2257 2566 2779 2682 2324 2072 2311
2008 2333 2666 2732 2595 3303 3024 2743 2795 2096 2726 2337 2427
2009 1512 1266 1223 1171 1124 1158 1162 1355 1112 1623 1085 1034
2010 1371 1630 2032 2120 1402 1419 2747 1885 1548 1550 1651 2186
Then following two commands give me two different plots, the first for the 1973 time series and the second for the 2010 time series:
> dim(tmp)
[1] 38 12
> plot(tmp[1,], type="l")
> plot(tmp[38,], type="l")
I want to combine and show both of these time series on the same plot. Is there a way to plot rows from the table above on the same plot and at the same time identify each time series by the year (row label)?
matplot is good for this sort of thing:
Reverse your rows and columns of your table:
tmp <- table(quakes$mon,quakes$year)
# 1973 1974 1975 1976 1977 1978
#0 388 386 435 677 453 499
#1 453 384 374 478 355 449
#2 451 385 397 474 508 730
#3 508 388 439 430 519 533
#etc
Then use matplot:
vars <- c(1,6)
matplot(tmp[,vars], type="l", lty=1)
legend("topright", colnames(tmp)[vars], lty=1, col=seq_along(vars))
As a general rule I try not to plot using tables, even though it makes sense for a person to read the data that way.
library(ggplot2)
ggplot(data.frame(tmp)) +
geom_line(aes(x = Var2, y = Freq, group = Var1, col = Var1))
The ggplot2 library is great for this sort of group plotting exercise, though it can take a little bit of input to get used to.
It's probably a bad idea to call Var1 and Var2 (which are created when I coerce the table to a data.frame). You can avoid this by aggregating the quakes data frame first, then calling the plot on that.

Resources