How to define range of values of a time series? - r

First of all, sorry for any mistakes regarding my post, I'm new to this site.
I´m getting started with R now and I´m trying to do some analysis with time series data.
So, I got a times series at hand and already loaded it into R.
I can also plot this times series and add labels to the axes and so on. So far so good.
My problem: When I plot the time series, R would set the range of values on the y-axis to the interval of [0:170] approximately.
This is somehow strange, since the times series contains the daily EUR/USD exchange rates for this year. That means the values are in a range of about 1.05 to 1.2.
The relative values are correct.
If the plot shows a maximum around day 40, the corresponding value in the data set appears to be a maximum.
But it is around 1.4 and not 170.
I hope one can understand my problem.
I would like to have the y-axis on a scale from 1 to 1.2 for example.
The ylim=c(1, 1.2) command will scale the axis to that range but not the values.
It just ignores them.
Does anyone know how to adjust that?
I´d really appreciate it.
Thank you very much in advance.
Thanks a lot for the input so far.
The "critical code" is the following:
> FRB <- read.csv("FRB_H10.csv", header=TRUE, sep=",")
> attach(FRB)
> str(FRB)
'data.frame': 212 obs. of 2 variables:
$ Date: Factor w/ 212 levels "2015-01-01","2015-01-02",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Rate: Factor w/ 180 levels "1.0524","1.0575",..: 180 179 177 178 174 173 175 176 171 172 ...
> plot.ts(Rate)
The result of this last plot is the one shown above.
Changing the variable to numeric yields this:
> as.numeric(Rate)
[1] 180 179 177 178 174 173 175 176 171 172 170 166 180 167 169 160 123 128 150 140 132 128 138 165
[25] 161 163 136 134 134 129 159 158 180 156 140 155 151 142 131 148 104 100 96 104 65 53 27 24
[49] 13 3 8 1 2 7 10 9 21 42 36 50 39 33 23 15 19 29 51 54 26 23 11 6
[73] 4 12 5 16 20 18 17 14 22 30 34 49 92 89 98 83 92 141 125 110 81 109 151 149
[97] 162 143 85 69 77 61 180 30 32 38 52 37 78 127 120 73 105 126 131 106 122 119 107 112
[121] 157 137 152 96 93 99 87 94 86 70 71 180 67 43 66 58 84 57 55 47 35 25 26 41
[145] 31 48 48 75 63 59 38 60 46 44 28 40 45 52 62 101 82 74 68 60 64 102 144 168
[169] 159 154 108 91 98 118 111 72 76 180 95 90 117 139 131 116 130 133 145 103 79 88 115 97
[193] 106 113 89 102 121 102 119 114 124 148 180 153 164 161 147 135 146 141 80 56
So, it remains unchanged. This is very strange. The data excerpt shows that "Rate" takes on values between 1.1 and 1.5 approximately, so really not the values that are shown above. :/
The data set can be found under this link:
https://www.dropbox.com/s/ndxstdl1aae5glt/FRB_H10.csv?dl=0
It should be alright. I got it from the data base from the Federal Reserve System, so quite a decent source.
(Had to remove the link to the data excerpt because my reputation only allows for 2 links to be posted at a time. But the entire data set should be even better, I guess.

#BlankUsername
Thanks very much for the link. I got it working now using this code:
FRB <- read.csv("FRB_H10.csv", header=TRUE, sep=",")
> attach(FRB)
> as.numeric(paste(Rate))
[1] NA 1.2015 1.1918 1.1936 1.1820 1.1811 1.1830 1.1832 1.1779 1.1806 1.1598 1.1517 NA
[14] 1.1559 1.1584 1.1414 1.1279 1.1290 1.1370 1.1342 1.1308 1.1290 1.1337 1.1462 1.1418 1.1432
[27] 1.1330 1.1316 1.1316 1.1300 1.1410 1.1408 NA 1.1395 1.1342 1.1392 1.1372 1.1346 1.1307
[40] 1.1363 1.1212 1.1197 1.1190 1.1212 1.1070 1.1006 1.0855 1.0846 1.0707 1.0576 1.0615 1.0524
[53] 1.0575 1.0605 1.0643 1.0621 1.0792 1.0928 1.0908 1.0986 1.0919 1.0891 1.0818 1.0741 1.0768
[66] 1.0874 1.0990 1.1008 1.0850 1.0818 1.0671 1.0598 1.0582 1.0672 1.0596 1.0742 1.0780 1.0763
[79] 1.0758 1.0729 1.0803 1.0876 1.0892 1.0979 1.1174 1.1162 1.1194 1.1145 1.1174 1.1345 1.1283
[92] 1.1241 1.1142 1.1240 1.1372 1.1368 1.1428 1.1354 1.1151 1.1079 1.1126 1.1033 NA 1.0876
[105] 1.0888 1.0914 1.0994 1.0913 1.1130 1.1285 1.1271 1.1108 1.1232 1.1284 1.1307 1.1236 1.1278
[118] 1.1266 1.1238 1.1244 1.1404 1.1335 1.1378 1.1190 1.1178 1.1196 1.1156 1.1180 1.1154 1.1084
[131] 1.1090 NA 1.1076 1.0952 1.1072 1.1025 1.1150 1.1020 1.1015 1.0965 1.0898 1.0848 1.0850
[144] 1.0927 1.0884 1.0976 1.0976 1.1112 1.1055 1.1026 1.0914 1.1028 1.0962 1.0953 1.0868 1.0922
[157] 1.0958 1.0994 1.1042 1.1198 1.1144 1.1110 1.1078 1.1028 1.1061 1.1200 1.1356 1.1580 1.1410
[170] 1.1390 1.1239 1.1172 1.1194 1.1263 1.1242 1.1104 1.1117 NA 1.1182 1.1165 1.1262 1.1338
[183] 1.1307 1.1260 1.1304 1.1312 1.1358 1.1204 1.1133 1.1160 1.1252 1.1192 1.1236 1.1246 1.1162
[196] 1.1200 1.1276 1.1200 1.1266 1.1249 1.1282 1.1363 NA 1.1382 1.1437 1.1418 1.1360 1.1320
[209] 1.1359 1.1345 1.1140 1.1016
Warning message:
NAs introduced by coercion
> Rate <- cbind(paste(Rate))
> plot(Rate)
Warning message:
In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
> plot.ts(Rate, ylab="EUR/USD")
Despite the warning message, I get the following output (shown below). Like I intended to plot it.
Nevertheless, I do not really understand why it works the way it did. Why I have to use the paste() command and what it does exactly. I get the basic idea of what the classes do, but am very new to this whole world of R.
One thing I came to realize already is that R is such a powerful program. And yet confusing if you are a beginner. :D

Related

How to find out the sequence of value in R

Supposed that I have generated 100 different value from -100 to 100, and I have cumsum all of those value.
set.seed(123)
x <- -100:100
z <- sample (x, size = 100,replace=T)
cumsum(z)
and I got
[1] 58 136 49 143 212 161 178 120 33 50 102 91 81 177 167 251 242 278 276 247 172 78 147
[24] 183 246 223 203 145 147 163 138 180 111 119 25 61 129 102 24 78 165 117 151 103 157 222
[47] 155 123 94 69 31 71 67 57 109 46 -34 -94 -20 -31 -72 -157 -142 -149 -244 -145 -160 -175 -237
[70] -179 -162 -213 -280 -377 -465 -497 -471 -419 -468 -547 -559 -500 -576 -642 -575 -564 -635 -596 -538 -518 -509 -452
[93] -489 -448 -350 -384 -334 -313 -335 -351
Now, I would like to stop or find out the value that is greater than 200 or lower than -200.
If I do it by my hand, I know that the 5th sequence (212) is greater than 200.
However, in R, is there any command to find out the first time that z is greater than 200 or lower than -200?
Thank you very much
A quick hack way to do this might be:
z <- as.data.frame(z)
z$lv <- if_else(z >200,T,F)
min(which(lv == TRUE))
The min(which(...)) solutions provided by others don't give a convenient answer in case none of the values meet the condition. For example,
set.seed(123)
x <- -100:100
z <- sample (x, size = 100,replace=T)
min(which(abs(cumsum(z)) > 200))
#> [1] 5
min(which(abs(cumsum(z)) > 1000)) # None meet this condition
#> Warning in min(which(abs(cumsum(z)) > 1000)): no non-missing arguments to min;
#> returning Inf
#> [1] Inf
A better way is given in the R help page for which.max:
match(TRUE, abs(cumsum(z)) > 200)
#> [1] 5
match(TRUE, abs(cumsum(z)) > 1000)
#> [1] NA

How to process multi columns data in data.frame with plyr

I am trying to solve the DSC(Differential scanning calorimetry) data with R but it seems that I ran into some troubles. All this used to be done in Origin or Qtiplot tediously in my lab.But I wonder if there is another way to do it in batch.But the result did not goes well. For example, maybe I have used the wrong colnames of my data.frame,the code
dat$0.5min
Error: unexpected numeric constant in "dat$0.5"
can not reach my data.
So below is the full description of my purpose, thank you in advance!
the DSC data is like this(I store the CSV file in my GoogleDrive Link ) :
T1 0.5min T2 1min
40.59 -0.2904 40.59 -0.2545
40.81 -0.281 40.81 -0.2455
41.04 -0.2747 41.04 -0.2389
41.29 -0.2728 41.29 -0.2361
41.54 -0.2553 41.54 -0.2239
41.8 -0.07 41.8 -0.0732
42.06 0.1687 42.06 0.1414
42.32 0.3194 42.32 0.2817
42.58 0.3814 42.58 0.3421
42.84 0.3863 42.84 0.3493
43.1 0.3665 43.11 0.3322
43.37 0.3438 43.37 0.3109
43.64 0.3265 43.64 0.2937
43.9 0.3151 43.9 0.2819
44.17 0.3072 44.17 0.2735
44.43 0.2995 44.43 0.2656
44.7 0.2899 44.7 0.2563
44.96 0.2779 44.96 0.245
in fact I have merge the data into a data.frame and hope I can adjust it and do something further.
the command is:
dat<-read.csv("Book1.csv",header=F)
colnames(dat)<-c('T1','0.5min','T2','1min','T3','2min','T4','4min','T5','8min','T6','10min',
'T7','20min','T8','ascast1','T9','ascast2','T10','ascast3','T11','ascast4',
'T12','ascast5'
)
so actually dat is a data.frame with 1163 obs. of 24 variables.
T1,T2,T3.....T12 means temperature that the samples were tested of DSC although in the same interval they do differ a little due to the unstability of the machine.
And the colname along T1~T12 is Heat Flow of different heat treatment durations that records by the machine and ascast1~ascast5 means nothing done to the sample to check the accuracy of the machine.
Now I need to do something like the following:
for T1~T2 is in Celsius Degrees,I need to change them into Kelvin Degrees whichi means every data plus 273.16.
Two temperature is chosen to compare the result that is Ts=180.25,Te=240.45(all is discussed in Celsius Degrees and I have seen it Qtiplot to make sure). To be clear I list the two temperature and the first 6 columns data.
T1 0.5min T2 1min T3 2min T4 4min
180.25 -0.01710000 180.25 -0.01780000 180.25 -0.02120000 180.25 -0.02020000
. . . .
. . . .
240.45 0.05700000 240.45 0.04500000 240.45 0.05780000 240.45 0.05580000
That all Heat Flow in Ts should be the same that can be made 0 for convenience. So based on the different values Heat Flow of different times like 0.5min,1min,2min,4min,8min,10min,20min and ascas1~ascast5 all Heat Flow value should be minus the Heat Flow value in Ts.
And for Heat Flow in Te, the value should be adjust to make sure that all the Heat Flow data are the same in Te. The purpose is like the following, (1) calculate mean of the 12 heat flow data in Te. Let's use Hmean for the mean heat flow.So Hmean is the value that all Heat Flow should be. (2) for data in column 0.5min,I use col("0.5min") to denote, and the lineal transform formula is like the following:
col("0.5min")-[([0.05700000-(-0.01710000)]-Hmean)/(Te-Ts)]*(col(T1)-Ts)
Actually, [0.05700000-(-0.01710000)] is done in step 2,but I write it for your reference. And this formula is used for different pair of T1~T12 and columns,like (T1,0.5min),(T2, 1min),(T3,1min).....all is 12 pairs.
Now we can plot the 12 pairs of data on the same plot with intervals from 180~240(also in Celsius Degrees) to magnify the details of differences between the different scans of DSC.
I have been stuck on this problems for 2 days , so I return to stackoverflow for help.
Thanks!
I am assuming that your question was right in the beginning where you got the following error,
dat$0.5min
Error: unexpected numeric constant in "dat$0.5"
As I could not find a question in the rest of the steps. They just seemed like a step by step procedure of an experiment.
To fix that error, the problem is the column name has a number in it so to use the column name in the way you want (to reference a column), you should use "`", accent mark, symbol.
>dataF <- data.frame("0.5min"=1:10,"T2"=11:20,check.names = F)
> dataF$`0.5min`
[1] 1 2 3 4 5 6 7 8 9 10
Based on comments adding more information,
You can add a constant to add to alternate columns in the following manner,
dataF <- data.frame(matrix(1:100,10,10))
const <- 237
> print(dataF)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 11 21 31 41 51 61 71 81 91
2 2 12 22 32 42 52 62 72 82 92
3 3 13 23 33 43 53 63 73 83 93
4 4 14 24 34 44 54 64 74 84 94
5 5 15 25 35 45 55 65 75 85 95
6 6 16 26 36 46 56 66 76 86 96
7 7 17 27 37 47 57 67 77 87 97
8 8 18 28 38 48 58 68 78 88 98
9 9 19 29 39 49 59 69 79 89 99
10 10 20 30 40 50 60 70 80 90 100
dataF[,seq(1,ncol(dataF),by = 2)] <- dataF[,seq(1,ncol(dataF),by = 2)] + const
> print(dataF)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 238 11 258 31 278 51 298 71 318 91
2 239 12 259 32 279 52 299 72 319 92
3 240 13 260 33 280 53 300 73 320 93
4 241 14 261 34 281 54 301 74 321 94
5 242 15 262 35 282 55 302 75 322 95
6 243 16 263 36 283 56 303 76 323 96
7 244 17 264 37 284 57 304 77 324 97
8 245 18 265 38 285 58 305 78 325 98
9 246 19 266 39 286 59 306 79 326 99
10 247 20 267 40 287 60 307 80 327 100
To generalize, we know that the columns of a dataframe can be referenced with a vector of numbers/column names. Most operations in R are vectorized. You can use column names or numbers based on the pattern you are looking for.
For example, I change the name of my first two columns and want to access just those I do this,
colnames(dataF)[c(1,2)] <- c("Y1","Y2")
#Reference all column names with "Y" in it. You can do any operation you want on this.
dataF[,grep("Y",colnames(dataF))]
Y1 Y2
1 238 11
2 239 12
3 240 13
4 241 14
5 242 15
6 243 16
7 244 17
8 245 18
9 246 19
10 247 20

R calculate the average of one column corresponding to each bin of another column [duplicate]

This question already has an answer here:
R aggregate data in one column based on 2 other columns
(1 answer)
Closed 9 years ago.
I have these data that has two columns. As you can see in the graph, the data has too much noise. So, I want to discretize column "r" with size 5, and assign each row to its corresponding bin, then calculate the average of f for each bin.
> dr
r f
1 65.06919 21.796
2 62.36986 22.836
3 59.81639 22.980
4 57.42822 22.061
5 55.22681 21.012
6 53.23533 21.274
7 51.47815 21.594
8 49.98000 22.117
9 48.76474 20.366
10 47.85394 18.991
11 47.26521 20.920
12 47.01064 20.161
13 47.09565 22.328
14 47.51842 19.610
15 48.27007 18.615
16 49.33559 21.753
17 50.69517 22.754
18 52.32590 22.096
19 54.20332 22.020
20 56.30275 22.111
21 58.60034 21.395
22 61.07373 22.635
23 63.70243 22.128
24 66.46804 21.698
25 62.24147 21.879
26 59.41380 21.637
27 56.72742 21.991
28 54.20332 21.535
29 51.86521 21.093
30 49.73932 20.496
31 47.85394 21.737
32 46.23851 21.890
33 44.92215 21.236
34 43.93177 19.997
35 43.28972 19.661
36 43.01163 20.692
37 43.10452 19.663
38 43.56604 19.273
39 44.38468 20.743
40 45.54119 22.604
41 47.01064 22.167
42 48.76474 20.427
43 50.77401 21.543
44 53.00943 21.391
45 55.44367 21.313
46 58.05170 22.501
47 60.81118 22.414
48 63.70243 22.920
49 59.54830 21.571
50 56.58622 22.454
51 53.75872 22.643
52 51.08816 20.219
53 48.60041 20.300
54 46.32494 19.832
55 44.29447 20.284
56 42.54409 21.284
57 41.10961 21.350
58 40.02499 20.784
59 39.31921 20.383
60 39.01282 20.508
61 39.11521 19.413
62 39.62323 20.043
63 40.52160 18.583
64 41.78516 19.512
65 43.38202 20.849
66 45.27693 21.349
67 47.43416 20.734
68 49.81967 22.055
69 52.40229 22.108
70 55.15433 23.184
71 58.05170 23.147
72 61.07373 23.207
73 57.00877 21.467
74 53.90733 21.549
75 50.93133 23.035
76 48.10405 20.684
77 45.45327 20.189
78 43.01163 19.304
79 40.81666 19.739
80 38.91015 20.976
81 37.33631 21.305
82 36.13862 21.319
83 35.35534 20.133
84 35.01428 20.179
85 35.12834 20.634
86 35.69314 22.478
87 36.68787 21.608
88 38.07887 20.964
89 39.82462 18.409
90 41.88078 20.627
91 44.20407 20.980
92 46.75468 22.206
93 49.49747 21.828
94 52.40229 20.844
95 55.44367 21.619
96 58.60034 21.498
97 54.64430 19.433
98 51.40039 21.293
99 48.27007 20.687
100 45.27693 21.377
101 42.44997 21.282
102 39.82462 20.910
103 37.44329 18.810
104 35.35534 21.223
105 33.61547 20.197
106 32.28002 20.765
107 31.40064 19.781
108 31.01612 20.536
109 31.14482 21.245
110 31.78050 21.117
111 32.89377 20.303
112 34.43835 20.795
113 36.35932 20.754
114 38.60052 21.025
115 41.10961 20.924
116 43.84062 21.475
117 46.75468 21.435
118 49.81967 20.380
119 53.00943 21.590
120 56.30275 20.743
121 52.47857 20.600
122 49.09175 20.818
123 45.80393 21.514
124 42.63801 21.922
125 39.62323 21.469
126 36.79674 22.186
127 34.20526 19.625
128 31.90611 19.703
129 29.96665 18.793
130 28.46050 18.912
131 27.45906 19.239
132 27.01851 18.467
133 27.16616 18.974
134 27.89265 20.090
135 29.15476 19.155
136 30.88689 20.526
137 33.01515 20.273
138 35.46830 19.956
139 38.18377 21.547
140 41.10961 21.260
141 44.20407 20.802
142 47.43416 19.719
143 50.77401 21.645
144 54.20332 18.957
145 50.53712 21.410
146 47.01064 20.536
147 43.56604 20.963
148 40.22437 20.775
149 37.01351 22.257
150 33.97058 21.868
151 31.14482 18.907
152 28.60070 19.644
153 26.41969 17.694
154 24.69818 17.883
155 23.53720 17.975
156 23.02173 18.778
157 23.19483 18.896
158 24.04163 19.561
159 25.49510 20.137
160 27.45906 19.922
161 29.83287 19.574
162 32.52691 19.029
163 35.46830 20.356
164 38.60052 20.330
165 41.88078 20.005
166 45.27693 20.006
167 48.76474 21.056
168 52.32590 20.143
169 48.84670 22.094
170 45.18849 21.252
171 41.59327 22.023
172 38.07887 21.563
173 34.66987 21.408
174 31.40064 21.334
175 28.31960 19.855
176 25.49510 18.648
177 23.02173 17.397
178 21.02380 17.311
179 19.64688 16.714
180 19.02630 18.152
181 19.23538 18.187
182 20.24846 19.910
183 21.95450 20.451
184 24.20744 19.820
185 26.87006 19.862
186 29.83287 19.987
187 33.01515 19.363
188 36.35932 19.498
189 39.82462 19.121
190 43.38202 20.479
191 47.01064 20.311
192 50.69517 21.666
193 47.43416 21.995
194 43.65776 23.158
195 39.92493 24.632
196 36.24914 23.273
197 32.64966 22.535
198 29.15476 19.933
199 25.80698 18.277
200 22.67157 16.169
So, to walk trhough the procedure, looking at each row starting from row 1 would be assigned to bin [65-70], row 2 would be on in [60-65] ...
then for the final result, I want the middle point of each bin and the average of its f values. S, with that I can draw a line for f as a function of f(r)
As #Fernando already mentioned in his comment you could try cut (binning) and tapply:
tapply(df$f, cut(df$r, seq(15, 70, by=5)), mean)
# (15,20] (20,25] (25,30] (30,35] (35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70]
#17.68433 18.55918 19.28683 20.49000 20.87942 20.65430 20.96155 21.35146 21.92259 22.57414 21.74700
Alternatively, you can use the wonderful plyr package.
library(plyr)
ddply(df, .(cut(df$r, 5)), colwise(mean))
However, if you have to ask a question like the above, you are just fine with the tapply solution.

plotting multiple variables in ggplot

I have a data table which looks like this-
pos gtt1 gtt2 ftp1 ftp2
8 100 123 49 101
9 85 93 99 110
10 111 102 53 113
11 88 110 59 125
12 120 118 61 133
13 90 136 64 145
14 130 140 104 158
15 78 147 74 167
16 123 161 81 173
17 160 173 88 180
18 117 180 94 191
19 89 188 104 199
20 175 197 107 213
I want to make a line graph with pos (position) on the x-axis using ggplot. I am trying to show gtt1 and gtt2 lines in one colour and ftp1 and ftp2 in another colour, because they are separate groups (gtt and ftp) of samples. I have successfully created the graph, but all four lines are in different colours. I would like to keep only gtt and ftp in the legend (not all four). Bonus, how can I make these lines little smooth.
Here is what I did so far:
library(reshape2);library(ggplot2)
data <- read.table("myfile.txt",header=TRUE,sep="\t")
data.melt <- melt(data,id="pos")
ggplot(data.melt,aes(x=pos, y=value,colour=variable))+geom_line()
Thanks in advance
The easiest way is to re-shape your data in a slightly different way:
dd1 = melt(dd[,1:3], id=c("pos"))
dd1$type = "gtt"
dd2 = melt(dd[,c(1, 4:5)], id=c("pos"))
dd2$type = "ftp"
dd.melt = rbind(dd1, dd2)
Now we have a column specifying the variable "type":
R> head(dd.melt, 2)
pos variable value type
1 8 gtt1 100 gtt
2 9 gtt1 85 gtt
Once the data is in this format, the ggplot command is straightforward:
ggplot(dd.melt,aes(x=pos, y=value))+
geom_line(aes(colour=type, group=variable)) +
scale_colour_manual(values=c(gtt="blue", ftp="red"))
You can add smoothed lines using stat_smooth:
##span controls the smoothing
g + stat_smooth(se=FALSE, span=0.5)

Iteratively compute drift coefficient from random walk with drift function in R, compile into list

My objective is to list the drift coefficient from a random walk with drift forecast function, applied to a set of historical data (below). Specifically I am trying to gather the drift coefficient starting from the random walk with drift model of the first year, then cumulatively to the last, recording the coefficient each time, meaning iteratively or each additional year (recording this into a list? if that is appropriate). To be clear each new random walk forecast is including all the previous years.
The data is a list of 241 consumption levels, and I am attempting to discern how the drift coefficent would change over the course of iteratively progressing from n=1 to n=241
Where for example the random walk with drift model is Y[t] = c + Y[t-1] + Z[t] where Z[t] is a normal error and c is the coefficient i am looking for. My current attempts at this involve a for loop function and extracting the c coefficient from the rwf() function from the "Forecast" package in R.
To extract this, I am doing as such
rwf(x, h = 1, drift = TRUE)$model[[1]]
which extracts the drift coefficient.
The problem is, my attempts at subsetting the data within the rwf call have failed, and I also don't believe, through trial and error and research, that rwf() supports the subset argument, as an lm model does for example. In this sense my attempts at looping the function have also failed.
An example of such code is
for (i in 1:5){print((rwf(x[1:i], h = 1, drift = TRUE))$model[[1]])}
which gives me the following error
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
In addition: Warning message:
In is.na(rows) : is.na() applied to non-(list or vector) of type 'NULL'
Any help would be much appreciated.
I read SO a lot for help but this is my first time asking a question.
The data is as follows
PCE
1 1306.7
2 1309.6
3 1335.3
4 1341.8
5 1389.2
6 1405.7
7 1414.2
8 1411.0
9 1401.6
10 1406.7
11 1425.0
12 1444.4
13 1474.7
14 1507.8
15 1536.6
16 1555.6
17 1575.2
18 1577.8
19 1583.0
20 1586.6
21 1608.4
22 1619.5
23 1622.4
24 1635.3
25 1636.1
26 1613.9
27 1627.1
28 1653.8
29 1675.6
30 1706.7
31 1732.9
32 1751.0
33 1752.9
34 1769.7
35 1792.1
36 1785.0
37 1787.4
38 1786.9
39 1813.4
40 1822.2
41 1858.7
42 1878.5
43 1901.6
44 1917.0
45 1944.2
46 1957.3
47 1976.0
48 2002.9
49 2019.6
50 2059.5
51 2095.8
52 2134.3
53 2140.2
54 2187.8
55 2212.0
56 2250.0
57 2313.2
58 2347.4
59 2353.5
60 2380.4
61 2390.3
62 2404.2
63 2437.0
64 2449.5
65 2464.6
66 2523.4
67 2562.1
68 2610.3
69 2622.3
70 2651.7
71 2668.6
72 2681.5
73 2702.9
74 2719.5
75 2731.9
76 2755.9
77 2748.4
78 2800.9
79 2826.6
80 2849.1
81 2896.5
82 2935.2
83 2991.2
84 3037.4
85 3108.6
86 3165.5
87 3163.9
88 3175.3
89 3166.0
90 3138.3
91 3149.2
92 3162.2
93 3115.8
94 3142.0
95 3194.4
96 3239.9
97 3274.2
98 3339.6
99 3370.3
100 3405.9
101 3450.3
102 3489.7
103 3509.0
104 3542.5
105 3595.9
106 3616.9
107 3694.2
108 3709.7
109 3739.6
110 3758.5
111 3756.3
112 3793.2
113 3803.3
114 3796.7
115 3710.5
116 3750.3
117 3800.3
118 3821.1
119 3821.1
120 3836.6
121 3807.6
122 3832.2
123 3845.9
124 3875.4
125 3946.1
126 3984.8
127 4063.9
128 4135.7
129 4201.3
130 4237.3
131 4297.9
132 4331.1
133 4388.1
134 4462.5
135 4503.2
136 4588.7
137 4598.8
138 4637.2
139 4686.6
140 4768.5
141 4797.2
142 4789.9
143 4854.0
144 4908.2
145 4920.0
146 5002.2
147 5038.5
148 5078.3
149 5138.1
150 5156.9
151 5180.0
152 5233.7
153 5259.3
154 5300.9
155 5318.4
156 5338.6
157 5297.0
158 5282.0
159 5322.2
160 5342.6
161 5340.2
162 5432.0
163 5464.2
164 5524.6
165 5592.0
166 5614.7
167 5668.6
168 5730.1
169 5781.1
170 5845.5
171 5888.8
172 5936.0
173 5994.6
174 6001.6
175 6050.8
176 6104.9
177 6147.8
178 6204.0
179 6274.2
180 6311.8
181 6363.2
182 6427.3
183 6453.3
184 6563.0
185 6638.1
186 6704.1
187 6819.5
188 6909.9
189 7015.9
190 7085.1
191 7196.6
192 7283.1
193 7385.8
194 7497.8
195 7568.3
196 7642.4
197 7710.0
198 7740.8
199 7770.0
200 7804.2
201 7926.4
202 7953.7
203 7994.1
204 8048.3
205 8076.9
206 8117.7
207 8198.1
208 8308.5
209 8353.7
210 8427.6
211 8465.1
212 8539.1
213 8631.3
214 8700.1
215 8786.2
216 8852.9
217 8874.9
218 8965.8
219 9019.8
220 9073.9
221 9158.3
222 9209.2
223 9244.5
224 9285.2
225 9312.6
226 9289.1
227 9285.8
228 9196.0
229 9076.0
230 9040.9
231 8998.5
232 9050.3
233 9060.2
234 9121.2
235 9186.9
236 9247.1
237 9328.4
238 9376.7
239 9392.7
240 9433.5
241 9482.1
You need at least two points to fit your model. Here's how I'd approach the problem after reading your data into a data.frame named x:
library(forecast)
drifts <- sapply(2:nrow(x), function(zz) rwf(x[1:zz,], drift = TRUE)$model$drift)
I'm not sure if this is what you were expecting or not, but here's a plot of your drift values:

Resources