warning: non-list contrasts argument ignored - r

I am running a gamm using the 'mgcv' package in R:
additive.model.saturated <- gamm(log.titer ~ condition +
Age_month_selective + Season.2 +
s(capture.month, bs = "cc", k = 12) +
s(capture.year, bs = "ps", k = 5),
random=list(Animal.ID=~1), data = data)
However, I keep getting the warning message below. I can not seem to figure out why I am getting this warning and how to adjust my analysis to resolve any mistakes the warning is suggesting I need to fix.
Warning message:
In model.matrix.default(~b$groups[[n.levels - i + 1]] - 1, contrasts.arg = c("contr.treatment", :
non-list contrasts argument ignored
A summary and subset of the data is included below:
#summary:
'data.frame': 1263 obs. of 6 variables:
$ log.titer : num 0 0 0 0 0 ...
$ condition : num 5 3.5 3.75 3.25 4 3.5 3.25 2.5 3.25 2.75 ...
$ Age_month_selective: int 39 57 63 68 75 83 27 44 39 51 ...
$ Season.2 : Factor w/ 2 levels "dry","wet": 1 2 1 2 1 2 1 2 1 1 ...
$ capture.month : int 6 12 6 11 6 2 6 11 6 6 ...
$ capture.year : int 2008 2009 2010 2010 2011 2012 2008 2009 2009 2010 ...
#data subset
log.titer condition Age_month_selective Season.2 capture.month capture.year Animal.ID
1 0.000000 5.00 39 dry 6 2008 B1
2 0.000000 3.50 57 wet 12 2009 B1
3 0.000000 3.75 63 dry 6 2010 B1
4 0.000000 3.25 68 wet 11 2010 B1
5 0.000000 4.00 75 dry 6 2011 B1
6 1.447158 3.50 83 wet 2 2012 B1
7 1.334454 3.25 27 dry 6 2008 B10
8 0.000000 2.50 44 wet 11 2009 B10
9 0.000000 3.25 39 dry 6 2009 B10
10 0.000000 2.75 51 dry 6 2010 B10
11 0.000000 2.50 56 wet 11 2010 B10
12 0.000000 2.00 63 dry 6 2011 B10
13 0.000000 2.50 71 wet 2 2012 B10
14 0.000000 4.50 63 dry 6 2008 B11
15 1.363612 3.75 80 wet 11 2009 B11
16 1.365488 4.75 76 dry 7 2009 B11
17 0.000000 3.75 87 dry 6 2010 B11
18 0.000000 4.00 95 wet 2 2011 B11
19 1.447158 3.25 99 dry 6 2011 B11
20 0.000000 4.75 51 dry 6 2008 B12
21 0.000000 4.25 68 wet 11 2009 B12
22 0.000000 4.25 68 wet 11 2009 B12
23 0.000000 3.50 75 dry 6 2010 B12
24 0.000000 3.75 80 wet 11 2010 B12
25 1.414973 2.00 92 wet 11 2011 B12

Related

merge of 2 data frames based on several columns defining 1 variable in r

I have 2 data frame. Codes are: year, pd, treatm and rep.
Variablea are LAI in the first data frame, cimer, himv, nőv are in the second.
I would like to add variable LAI to the other variables/ columns.
I am not sure how to set the correct ordeing of LAI data, while 1 data has 4 codes to define.
Could You help me to solve this problem, please?
Thank You very much!
Data frames are:
> sample1
year treatm pd rep LAI
1 2020 1 A 1 2.58
2 2020 1 A 2 2.08
3 2020 1 A 3 2.48
4 2020 1 A 4 2.98
5 2020 2 A 1 3.34
6 2020 2 A 2 3.11
7 2020 2 A 3 3.20
8 2020 2 A 4 2.56
9 2020 1 B 1 2.14
10 2020 1 B 2 2.17
11 2020 1 B 3 2.24
12 2020 1 B 4 2.29
13 2020 2 B 1 3.41
14 2020 2 B 2 3.12
15 2020 2 B 3 2.81
16 2020 2 B 4 2.63
17 2021 1 A 1 2.15
18 2021 1 A 2 2.25
19 2021 1 A 3 2.52
20 2021 1 A 4 2.57
21 2021 2 A 1 2.95
22 2021 2 A 2 2.82
23 2021 2 A 3 3.11
24 2021 2 A 4 3.04
25 2021 1 B 1 3.25
26 2021 1 B 2 2.33
27 2021 1 B 3 2.75
28 2021 1 B 4 3.09
29 2021 2 B 1 3.18
30 2021 2 B 2 2.75
31 2021 2 B 3 3.21
32 2021 2 B 4 3.57
> sample2
year.pd.treatm.rep.cimer.himv.nőv
1 2020,A,1,1,92,93,94
2 2020,A,2,1,91,92,93
3 2020,B,1,1,72,73,75
4 2020,B,2,1,73,74,75
5 2020,A,1,2,95,96,100
6 2020,A,2,2,90,91,94
7 2020,B,1,2,74,76,78
8 2020,B,2,2,71,72,74
9 2020,A,1,3,94,95,96
10 2020,A,2,3,92,93,96
11 2020,B,1,3,76,77,77
12 2020,B,2,3,74,75,76
13 2020,A,1,4,90,91,97
14 2020,A,2,4,90,91,94
15 2020,B,1,4,74,75,NA
16 2020,B,2,4,73,75,NA
17 2021,A,1,1,92,93,94
18 2021,A,2,1,91,92,93
19 2021,B,1,1,72,73,75
20 2021,B,2,1,73,74,75
21 2021,A,1,2,95,96,100
22 2021,A,2,2,90,91,94
23 2021,B,1,2,74,76,78
24 2021,B,2,2,71,72,74
25 2021,A,1,3,94,95,96
26 2021,A,2,3,92,93,96
27 2021,B,1,3,76,77,77
28 2021,B,2,3,74,75,76
29 2021,A,1,4,90,91,97
30 2021,A,2,4,90,91,94
31 2021,B,1,4,74,75,NA
32 2021,B,2,4,73,75,NA
You can use inner_join from dply
library(tidyverse)
inner_join(sample2,sample1, by=c("year","pd", "treatm", "rep"))
Output (first six lines)
year pd treatm rep cimer himv nov LAI
1: 2020 A 1 1 92 93 94 2.58
2: 2020 A 2 1 91 92 93 3.34
3: 2020 B 1 1 72 73 75 2.14
4: 2020 B 2 1 73 74 75 3.41
5: 2020 A 1 2 95 96 100 2.08
6: 2020 A 2 2 90 91 94 3.11
You can also use data.table
sample2[sample1, on=.(year,pd,treatm,rep)]

vis.gam transparent at top of "persp" graph

I am running a GAMM using package mgcv. The model is running fine and gives an output that makes sense, but when I use vis.gam(plot.type="persp") my graph appears like this:
enter image description here
Why is this happening? When I use vis.gam(plot.type="contour") there is no area which is transparent.
It appears to not simply be a problem with the heat color pallete; the same thing happens when I change the color scheme of the "persp" plot:
persp plot, "topo" colour
The contour plot is completely filled while the persp plot is still transparent at the top.
Data:
logcpue assnage distkm fsamplingyr
1 -1.5218399 7 3.490 2015
2 -1.6863990 4 3.490 2012
3 -1.4534337 6 3.490 2014
4 -1.5207723 5 3.490 2013
5 -2.4061258 2 3.490 2010
6 -2.5427262 3 3.490 2011
7 -1.6177367 3 3.313 1998
8 -4.4067192 10 3.313 2005
9 -4.3438054 11 3.313 2006
10 -2.8834031 7 3.313 2002
11 -2.3182512 2 3.313 1997
12 -4.1108738 1 3.235 2010
13 -2.0149030 3 3.235 2012
14 -1.4900912 6 3.235 2015
15 -3.7954892 2 3.235 2011
16 -1.6499840 4 3.235 2013
17 -1.9924302 5 3.235 2014
18 -1.2122716 4 3.189 1998
19 -0.6675703 3 3.189 1997
20 -4.7957905 7 3.106 1998
21 -3.8763958 6 3.106 1997
22 -1.2205021 4 3.073 2010
23 -1.9262374 7 3.073 2013
24 -3.3463891 9 3.073 2015
25 -1.7805862 2 3.073 2008
26 -3.2451931 8 3.073 2014
27 -1.4441139 5 3.073 2011
28 -1.4395389 6 3.073 2012
29 -1.6357552 4 2.876 2014
30 -1.3449091 5 2.876 2015
31 -2.3782225 3 2.876 2013
32 -4.4886364 1 2.876 2011
33 -2.6026897 2 2.876 2012
34 -3.5765503 1 2.147 2002
35 -4.8040211 9 2.147 2010
36 -1.3993664 5 2.147 2006
37 -1.2712250 4 2.147 2005
38 -1.8495790 7 2.147 2008
39 -2.5073795 1 2.034 2012
40 -2.0654553 4 2.034 2015
41 -3.6309855 2 2.034 2013
42 -2.2643639 3 2.034 2014
43 -2.2643639 6 1.452 2006
44 -3.3900241 8 1.452 2008
45 -4.9628446 2 1.452 2002
46 -2.0088240 5 1.452 2005
47 -3.9186675 1 1.323 2013
48 -4.3438054 2 1.323 2014
49 -3.5695327 3 1.323 2015
50 -1.6986690 7 1.200 2005
51 -3.2451931 8 1.200 2006
52 -0.9024016 4 1.200 2002
library(mgcv)
f1 <- formula(logcpue ~ s(assnage)+distkm)
m1 <- gamm(f1,random = list(fsamplingyr =~ 1),
method = "REML",
data =ycsnew)
vis.gam(m1$gam,color="topo",plot.type = "persp",theta=180)
vis.gam(m1$gam,color="heat",plot.type = "persp",theta=180)
vis.gam(m1$gam,view=c("assnage","distkm"),
plot.type="contour",color="heat",las=1)
vis.gam(m1$gam,view=c("assnage","distkm"),
plot.type="contour",color="terrain",las=1,contour.col="black")
The code of vis.gam has this:
surf.col[surf.col > max.z * 2] <- NA
I am unable to understand what it is doing and it appears to be rather ad_hoc. NA values of colors are generally transparent. If you comment out that line (and assign the environment of the new function as:
environment(vis.gam2) <- environment(vis.gam)
.... you get complete coloring of the surface.

Merging complementary rows of a dataframe with R

I have such a data frame
0 weekday day month year hour basal bolus carb period.h
1 Tuesday 01 03 2016 0.0 0.25 NA NA 0
2 Tuesday 01 03 2016 10.9 NA NA 67 10
3 Tuesday 01 03 2016 10.9 NA 4.15 NA 10
4 Tuesday 01 03 2016 12.0 0.30 NA NA 12
5 Tuesday 01 03 2016 17.0 0.50 NA NA 17
6 Tuesday 01 03 2016 17.6 NA NA 33 17
7 Tuesday 01 03 2016 17.6 NA 1.35 NA 17
8 Tuesday 01 03 2016 18.6 NA NA 44 18
9 Tuesday 01 03 2016 18.6 NA 1.80 NA 18
10 Tuesday 01 03 2016 18.9 NA NA 17 18
11 Tuesday 01 03 2016 18.9 NA 0.70 NA 18
12 Tuesday 01 03 2016 22.0 0.40 NA NA 22
13 Wednesday 02 03 2016 0.0 0.25 NA NA 0
14 Wednesday 02 03 2016 9.7 NA NA 39 9
15 Wednesday 02 03 2016 9.7 NA 2.65 NA 9
16 Wednesday 02 03 2016 11.2 NA NA 13 11
17 Wednesday 02 03 2016 11.2 NA 0.30 NA 11
18 Wednesday 02 03 2016 12.0 0.30 NA NA 12
19 Wednesday 02 03 2016 12.0 NA NA 16 12
20 Wednesday 02 03 2016 12.0 NA 0.65 NA 12
If you look at the lines 2 and 3, you notice that they correspond exactly to the same day & time: just for the line #2 the "carb" is not NA, and the "bolus" is not NA (These are data about diabete).
I want to merge such lines into a single one:
2 Tuesday 01 03 2016 10.9 NA NA 67 10
3 Tuesday 01 03 2016 10.9 NA 4.15 NA 10
->
2 Tuesday 01 03 2016 10.9 NA 4.15 67 10
I could of course do a brutal double loop over each line, but I look for a cleverer and faster way.
You can group your data frame by the common identifier columns weekday, day, month, year, hour, period.h here and then sort and take the first element from the remaining columns which you would like to merge, sort() function by default will remove NAs in the vector to be sorted and thus you will end up with non-NA elements for each column within each group; if all elements in a column are NA, sort(col)[1] returns NA:
library(dplyr)
df %>%
group_by(weekday, day, month, year, hour, period.h) %>%
summarise_all(funs(sort(.)[1]))
# weekday day month year hour period.h basal bolus carb
# <fctr> <int> <int> <int> <dbl> <int> <dbl> <dbl> <int>
# 1 Tuesday 1 3 2016 0.0 0 0.25 NA NA
# 2 Tuesday 1 3 2016 10.9 10 NA 4.15 67
# 3 Tuesday 1 3 2016 12.0 12 0.30 NA NA
# 4 Tuesday 1 3 2016 17.0 17 0.50 NA NA
# 5 Tuesday 1 3 2016 17.6 17 NA 1.35 33
# 6 Tuesday 1 3 2016 18.6 18 NA 1.80 44
# 7 Tuesday 1 3 2016 18.9 18 NA 0.70 17
# 8 Tuesday 1 3 2016 22.0 22 0.40 NA NA
# 9 Wednesday 2 3 2016 0.0 0 0.25 NA NA
# 10 Wednesday 2 3 2016 9.7 9 NA 2.65 39
# 11 Wednesday 2 3 2016 11.2 11 NA 0.30 13
# 12 Wednesday 2 3 2016 12.0 12 0.30 0.65 16
Instead of sort(), maybe a more appropriate function to use here is na.omit():
df %>% group_by(weekday, day, month, year, hour, period.h) %>%
summarise_all(funs(na.omit(.)[1]))

Numerical Method for SARIMAX Model using R

My friend is currently working on his assignment about estimation of parameter of a time series model, SARIMAX (Seasonal ARIMA Exogenous), with Maximum Likelihood Estimation (MLE) method. The data used by him is about the monthly rainfall from 2000 - 2012 with Indian Ocean Dipole (IOD) index as the exogenous variable.
Here is data:
MONTH YEAR RAINFALL IOD
1 1 2000 15.3720526 0.0624
2 2 2000 10.3440804 0.1784
3 3 2000 14.6116392 0.3135
4 4 2000 18.6842179 0.3495
5 5 2000 15.2937896 0.3374
6 6 2000 15.0233152 0.1946
7 7 2000 11.1803399 0.3948
8 8 2000 11.0589330 0.4391
9 9 2000 10.1488916 0.3020
10 10 2000 21.1187121 0.2373
11 11 2000 15.3980518 -0.0324
12 12 2000 18.9393770 -0.0148
13 1 2001 19.1075901 -0.2448
14 2 2001 14.9097284 0.1673
15 3 2001 19.2379833 0.1538
16 4 2001 19.6900990 0.3387
17 5 2001 8.0684571 0.3578
18 6 2001 14.0463518 0.3394
19 7 2001 5.9916609 0.1754
20 8 2001 8.4439327 0.0048
21 9 2001 11.8321596 0.1648
22 10 2001 24.3700636 -0.0653
23 11 2001 22.3584436 0.0291
24 12 2001 23.6114379 0.1731
25 1 2002 17.8409641 0.0404
26 2 2002 14.7377067 0.0914
27 3 2002 21.2226294 0.1766
28 4 2002 16.6403125 -0.1512
29 5 2002 10.8074049 -0.1072
30 6 2002 6.3796552 0.0244
31 7 2002 17.0704423 0.0542
32 8 2002 1.7606817 0.0898
33 9 2002 5.3665631 0.6736
34 10 2002 8.3246622 0.7780
35 11 2002 17.8044938 0.3616
36 12 2002 16.7062862 0.0673
37 1 2003 13.5572859 -0.0628
38 2 2003 17.1113997 0.2038
39 3 2003 14.9899967 0.1239
40 4 2003 14.0996454 0.0997
41 5 2003 11.4017542 0.0581
42 6 2003 6.7749539 0.3490
43 7 2003 7.1484264 0.4410
44 8 2003 10.3004854 0.4063
45 9 2003 10.6630202 0.3289
46 10 2003 20.6518764 0.1394
47 11 2003 20.8638443 0.1077
48 12 2003 20.5548048 0.4093
49 1 2004 16.0436903 0.2257
50 2 2004 17.2568827 0.2978
51 3 2004 20.2361063 0.2523
52 4 2004 11.6619038 0.1212
53 5 2004 12.8296532 -0.3395
54 6 2004 8.4202138 -0.1764
55 7 2004 15.5916644 0.0118
56 8 2004 0.9486833 0.1651
57 9 2004 7.2732386 0.2825
58 10 2004 18.0083314 0.3747
59 11 2004 14.4672043 0.1074
60 12 2004 17.3637554 0.0926
61 1 2005 18.9420168 0.0551
62 2 2005 17.0146995 -0.3716
63 3 2005 23.3002146 -0.2641
64 4 2005 17.8689675 0.2829
65 5 2005 17.2365890 0.1883
66 6 2005 14.0178458 0.0347
67 7 2005 12.6925175 -0.0680
68 8 2005 9.3861600 -0.0420
69 9 2005 11.7132404 -0.1425
70 10 2005 18.5768673 -0.0514
71 11 2005 19.6723156 -0.0008
72 12 2005 18.3248465 -0.0659
73 1 2006 18.6252517 0.0560
74 2 2006 18.7002674 -0.1151
75 3 2006 23.4882950 -0.0562
76 4 2006 19.5652754 0.1862
77 5 2006 13.6857590 0.0105
78 6 2006 11.1265448 0.1504
79 7 2006 11.0227038 0.3490
80 8 2006 7.6550637 0.5267
81 9 2006 1.8708287 0.8089
82 10 2006 5.4129474 0.9479
83 11 2006 15.2249795 0.7625
84 12 2006 14.1703917 0.3941
85 1 2007 22.8691932 0.4027
86 2 2007 14.3317829 0.3353
87 3 2007 13.0766968 0.2792
88 4 2007 23.2335964 0.2960
89 5 2007 12.2474487 0.4899
90 6 2007 11.3357840 0.2445
91 7 2007 9.3112835 0.3629
92 8 2007 1.6431677 0.5396
93 9 2007 6.8483575 0.6252
94 10 2007 13.1529464 0.4540
95 11 2007 14.5120639 0.2489
96 12 2007 18.7909553 0.0054
97 1 2008 17.6493626 0.3037
98 2 2008 13.3828248 0.1166
99 3 2008 19.0525589 0.2730
100 4 2008 17.3262806 0.0467
101 5 2008 5.2345009 0.4020
102 6 2008 3.3166248 0.4263
103 7 2008 10.1094016 0.5558
104 8 2008 11.7260394 0.4236
105 9 2008 10.7470926 0.4762
106 10 2008 15.1591557 0.4127
107 11 2008 25.5558213 0.1474
108 12 2008 18.2455474 0.1755
109 1 2009 14.5430396 0.2185
110 2 2009 12.8569048 0.3521
111 3 2009 24.0707291 0.2680
112 4 2009 16.0374562 0.3234
113 5 2009 7.2387844 0.4757
114 6 2009 13.8021737 0.3078
115 7 2009 7.5232972 0.1179
116 8 2009 6.3403470 0.1999
117 9 2009 4.6583259 0.2814
118 10 2009 13.0958008 0.3646
119 11 2009 15.3329710 0.1914
120 12 2009 19.0394328 0.3836
121 1 2010 15.5080624 0.4732
122 2 2010 17.1551742 0.2134
123 3 2010 23.9729014 0.6320
124 4 2010 18.2537667 0.5644
125 5 2010 18.2236111 0.1881
126 6 2010 14.6082169 0.0680
127 7 2010 13.6161669 0.3111
128 8 2010 11.1220502 0.2472
129 9 2010 20.7870152 0.1259
130 10 2010 19.5371441 -0.0529
131 11 2010 24.8837296 -0.2133
132 12 2010 15.5016128 0.0233
133 1 2011 17.3435867 0.3739
134 2 2011 17.6096564 0.4228
135 3 2011 19.0682983 0.5413
136 4 2011 20.4890214 0.3569
137 5 2011 12.0540450 0.1313
138 6 2011 12.5896783 0.2642
139 7 2011 5.0990195 0.5356
140 8 2011 6.5726707 0.6490
141 9 2011 2.5099801 0.5884
142 10 2011 17.6380271 0.7376
143 11 2011 17.5128524 0.6004
144 12 2011 17.2655727 0.0990
145 1 2012 16.6883193 0.2272
146 2 2012 20.8374663 0.1049
147 3 2012 16.7002994 0.1991
148 4 2012 18.7962762 -0.0596
149 5 2012 16.9292646 -0.1165
150 6 2012 11.6490343 0.2207
151 7 2012 6.2529993 0.8586
152 8 2012 5.8991525 0.9473
153 9 2012 7.8485667 0.8419
154 10 2012 12.5817328 0.4928
155 11 2012 24.7770055 0.1684
156 12 2012 23.2486559 0.4899
In doing this, he works with R because it has the package for analysing the SARIMAX model. And so far, he's been doing it good with arimax() function of TSA package with seasonal ARIMA order (1,0,1).
So here I attach his syntax:
#Importing data
data=read.csv("C:/DATA.csv", header=TRUE)
rainfall=data$RAINFALL
exo=data$IOD
#Creating the suitable model of data that is able to be read by R with ts() function
library(forecast)
rainfall_ts=ts(rainfall, start=c(2000, 1), end=c(2012, 12), frequency = 12)
exo_ts=ts(exo, start=c(2000, 1), end=c(2012, 12), frequency = 12)
#Fitting SARIMAX model with seasonal ARIMA order (1,0,1) & estimation method is MLE (or ML)
library(TSA)
model_ts=arimax(log(rainfall_ts), order=c(1,0,1), seasonal=list(order=c(1,0,1), period=12), xreg=exo_ts, method='ML')
Below is the result:
> model_ts
Call:
arimax(x = log(rainfall_ts), order = c(1, 0, 1), seasonal = list(order = c(1,
0, 1), period = 12), xreg = exo_ts, method = "ML")
Coefficients:
ar1 ma1 sar1 sma1 intercept xreg
0.5730 -0.4342 0.9996 -0.9764 2.6757 -0.4894
s.e. 0.2348 0.2545 0.0018 0.0508 0.1334 0.1489
sigma^2 estimated as 0.1521: log likelihood = -86.49, aic = 184.99
Although he claimed the syntax is working, but his lecturer expected more.
Theoretically, because he used MLE, he has proven that the first derivatives of the log-likelihood function give implicit solutions. It means that the estimation process couldn't be done analytically with MLE so we need
to continue our working with the numerical method to get it done.
So this is the expectation of my friend's lecturer. He expected him that he can at least convince him that the estimation is truly need to be done numerically
and if so, he might be able to show him what method that is used by R (the numerical method such as Newton-Raphson, BFGS, BHHH, etc).
But the thing here is the arimax() function doesn't give us the choice on numerical method to choose if the estimation need to be executed numerically like below:
model_ts=arimax(log(rainfall_ts), order=c(1,0,1), seasonal=list(order=c(1,0,1), period=12), xreg=exo_ts, method='ML')
The 'method' above is for the estimation method and the available method are ML, CSS, and CSS-ML. It is clear that the sintax above doesn't consist of the numerical method and this is the matter.
So is there any possible way to know what numerical method used by R? Or my friend just got to construct his own program without depending to arimax() function?
If there are any errors in my code, please let me know. I also apologize for any grammatical or vocabulary mistakes. English is not my native language.
Some suggestions:
Estimate the model with each of the methods: ML, CSS, CSS-ML. Do the parameter estimates agree?
You can view the source code of the arimax() function by typing arimax, View(arimax) or getAnywhere(arimax) in the console.
Or you can do a debug by placing a debug bullet before the line model_ts=arimax(...) and then sourcing or debugSource()-ing your script. You can then step into the arimax function and see/verify yourself which optimization method arimax uses.

R- how subset lines of data based on column values in a data frame

I would like to plot things like (where C is column):
C4 vs C2 for all similar C1 and
C1 vs C4 for all similar C2
The data frame in question is:
C1 C2 C3 C4
1 2012-12-28 0 NA 10773
2 2012-12-28 5 NA 34112
3 2012-12-28 10 NA 30901
4 2012-12-28 0 NA 12421
5 2012-12-30 0 NA 3925
6 2012-12-30 5 NA 17436
7 2012-12-30 10 NA 13717
8 2012-12-30 15 NA 36708
9 2012-12-30 20 NA 28408
10 2012-12-30 NA NA 2880
11 2013-01-02 0 -13.89 9972
12 2013-01-02 5 -13.89 10576
13 2013-01-02 10 -13.89 33280
14 2013-01-02 15 -13.89 28667
15 2013-01-02 20 -13.89 21104
16 2013-01-02 25 -13.89 24771
17 2013-01-02 NA NA 22
18 2013-01-05 0 -3.80 20727
19 2013-01-05 5 -3.80 2033
20 2013-01-05 10 -3.80 16045
21 2013-01-05 15 -3.80 12074
22 2013-01-05 20 -3.80 10095
23 2013-01-05 NA NA 32693
24 2013-01-08 0 -1.70 19579
25 2013-01-08 5 -1.70 20200
26 2013-01-08 10 -1.70 12263
27 2013-01-08 15 -1.70 28797
28 2013-01-08 20 -1.70 23963
29 2013-01-11 0 -2.30 26525
30 2013-01-11 5 -2.30 21472
31 2013-01-11 10 -2.30 9633
32 2013-01-11 15 -2.30 27849
33 2013-01-11 20 -2.30 23950
34 2013-01-17 0 1.40 16271
35 2013-01-17 5 1.40 18581
36 2013-01-19 0 0.10 5910
37 2013-01-19 5 0.10 16890
38 2013-01-19 10 0.10 13078
39 2013-01-19 NA NA 55
40 2013-01-23 0 -9.20 15048
41 2013-01-23 6 -9.20 20792
42 2013-01-26 0 NA 21649
43 2013-01-26 6 NA 24655
44 2013-01-29 0 0.10 9100
45 2013-01-29 5 0.10 27514
46 2013-01-29 10 0.10 19392
47 2013-01-29 15 0.10 21720
48 2013-01-29 NA 0.10 112
49 2013-02-11 0 0.40 13619
50 2013-02-11 5 0.40 2748
51 2013-02-11 10 0.40 1290
52 2013-02-11 15 0.40 762
53 2013-02-11 20 0.40 1125
54 2013-02-11 25 0.40 1709
55 2013-02-11 30 0.40 29459
56 2013-02-11 35 0.40 106474
57 2013-02-13 0 1.30 3355
58 2013-02-13 5 1.30 970
59 2013-02-13 10 1.30 2240
60 2013-02-13 15 1.30 35871
61 2013-02-18 0 -0.60 8564
62 2013-02-20 0 -1.20 12399
63 2013-02-26 0 0.30 2985
64 2013-02-26 5 0.30 9891
65 2013-03-01 0 0.90 5221
66 2013-03-01 5 0.90 9736
67 2013-03-05 0 0.60 3192
68 2013-03-05 5 0.60 4243
69 2013-03-09 0 0.10 45138
70 2013-03-09 5 0.10 55534
71 2013-03-12 0 1.40 7278
72 2013-03-12 NA NA 45
73 2013-03-15 0 0.30 2447
74 2013-03-15 5 0.30 2690
75 2013-03-18 0 -2.30 3008
76 2013-03-22 0 -0.90 11411
77 2013-03-22 5 -0.90 NA
78 2013-03-22 10 -0.90 17675
79 2013-03-22 NA NA 47
80 2013-03-25 0 1.20 9802
81 2013-03-25 5 1.20 15790
There are other posts here about time series subseting and merging/matching/pasting subseting, but I think I miss the point when I'm trying to follow those instructions.
The end goal is to have a plot of C1 vs C4 for every C2 = 0 C2 = 5 and so on. Same thing for C4 vs C2 for every same C1. I know there are some duplicate C1 and C2, but the C4 for those values can be averaged. I can figure these plots out, I just need to know how to subset the data in this way. Perhaps creating a new data.frame() with these subsets could be the easiest?
Thanks in advance,
It's relatively easy to plot subsets using ggplot2. First you need to reshape your data from "wide" to "long" format, creating a new categorical variable with possible values C4 and C5.
library(reshape2)
library(ggplot2)
# Starting with the data you posted in a data frame called "dat":
# Convert C2 to date format
dat$C2 = as.Date(dat$C2)
# Reshape data to long format
dat.m = melt(dat, id.var=c("C1","C2","C3"))
# Plot values of C4 and C5 vs. C2 with separate lines for each level of C3
ggplot(dat.m, aes(x=C2, y=value, group=C3, colour=as.factor(C3))) +
geom_line() + geom_point() +
facet_grid(variable ~ ., scales="free_y")
The C4 lines are the same for every level of C3, so they all overlap each other.
You can also have a separate panel for each level of C3:
ggplot(dat.m, aes(x=C2, y=value, group=variable, colour=variable)) +
geom_line() + geom_point() +
facet_grid(variable ~ C3, scales="free_y") +
theme(axis.text.x=element_text(angle=-90)) +
guides(colour=FALSE)
Here's a base graphics method to getting separate plots. I'm using your new column names below:
# Use lapply to create a separate plot for each level of C2
lapply(na.omit(unique(dat$C2)), function(x) {
# The next line of code removes NA values so that there will be a line through
# every point. You can remove this line if you don't care whether all points
# are connected or not.
dat = dat[complete.cases(dat[,c("C1","C2","C4")]),]
# Create a plot of C4 vs. C1 for the current value of C2
plot(dat$C1[dat$C2==x], dat$C4[dat$C2==x],
type="o", pch=16,
xlab=paste0("C2=",x), ylab="C4")
})

Resources