R implementation cluster analysis

R implementation cluster analysis - r

I am in the process of implementing few algorithms for cluster analysis especially cluster validation. There are few ways such as cross validation, external index, internal index, relative index. I am trying to implement an algorithm that is under internal index.
Internal index - Based on the intrinsic content of the data. It is used to measure the goodness of a clustering structure without respect to external information.
My interest is Silhouette Coefficient
s(i) = b(i) - a(i) / max{a(i), b(i)}
To make it more clear lets assume I have following multi-model distribution:
library(mixtools)
wait = faithful$waiting
mixmdl = normalmixEM(wait)
plot(mixmdl,which=2)
lines(density(wait), lty=2, lwd=2)
We see that there are two clusters and cut off mark is around 68. There are no label data here so no ground truth to do cross-validation (Un-Supervised). So we need a mechanism to evaluate the clusters. In this case we know there are two cluster from visualization but how do we clear show that two distributions are actually belong to cluster. Base on what I red on wikipedia Silhouette gives us that validation.
I want to implement a method (which implements Silhouette) such that it takes a r list of values in my example its wait, number of clusters in this case 2, and the model which is the model and return average s(i).
I have started but can't really figure out how to go forward
Silhouette = function(rList, num_clusters, model) {
}
summary of my list looks like this:
Length Class Mode
clust_A 416014 -none- numeric
clust_B 72737 -none- numeric
clust_C 6078 -none- numeric
myList$clust_A will return points that are belong to that cluster
[1] 13 880 497 1864 392 55 1130 248 437 37 62 153 60 117
[15] 22 106 71 1026 446 1558 23 56 287 402 46 1506 115 2700
[29] 67 134 48 536 41 506 1098 33 30 280 225 16 25 17
[43] 63 1762 477 174 98 76 157 698 47 312 40 3 198 621
[57] 15 34 226 657 48 110 23 250 14 32 137 272 26 257
[71] 270 133 1734 78 134 8 5 225 187 166 35 15 94 2825
[85] 2 8 94 89 54 91 77 17 106 1397 16 25 16 103
problem is that I don't think the existing library accept this type of data structure.

Silhouette assumes that all clusters have the same variance.
IMHO, it does not make sense to use this measure with EM clustering.

Related

How can I get column&row indices (number) for a specific element?

The contingency table of my data shows that there is one element with 21974 value. However, a which function cannot locate where it is. I am wondering if my code has an error or not.
I have the following code:
table(as.numeric(dat[1,2:ncol(dat)]))
# And the result is:
#(Upper: Groups / Bottom: Frequency for each group)
53 58 59 60 65 67 71 72 74 75 78 79 80 81 82 84 88 89 94 21974
143 142 70 226 63 95 89 181 147 344 131 896 480 205 84 159 351 475 364 1
There is one element in a group "21974".
However if I use a which function to figure out where it is, my code cannot locate it:
which(dat[1,] == "21974", arr.ind=T)
Its result is:
row col
I am not sure how this happens and would like to know if I misused the which function.

I think you can use match: match(21974, df) to give you a position in a vector such as df$colname. I think (I've not used it), this would work to find the position in the row as well:
df[match(21974,df$colname),]

How to define range of values of a time series?

First of all, sorry for any mistakes regarding my post, I'm new to this site.
I´m getting started with R now and I´m trying to do some analysis with time series data.
So, I got a times series at hand and already loaded it into R.
I can also plot this times series and add labels to the axes and so on. So far so good.
My problem: When I plot the time series, R would set the range of values on the y-axis to the interval of [0:170] approximately.
This is somehow strange, since the times series contains the daily EUR/USD exchange rates for this year. That means the values are in a range of about 1.05 to 1.2.
The relative values are correct.
If the plot shows a maximum around day 40, the corresponding value in the data set appears to be a maximum.
But it is around 1.4 and not 170.
I hope one can understand my problem.
I would like to have the y-axis on a scale from 1 to 1.2 for example.
The ylim=c(1, 1.2) command will scale the axis to that range but not the values.
It just ignores them.
Does anyone know how to adjust that?
I´d really appreciate it.
Thank you very much in advance.
Thanks a lot for the input so far.
The "critical code" is the following:
> FRB <- read.csv("FRB_H10.csv", header=TRUE, sep=",")
> attach(FRB)
> str(FRB)
'data.frame': 212 obs. of 2 variables:
$ Date: Factor w/ 212 levels "2015-01-01","2015-01-02",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Rate: Factor w/ 180 levels "1.0524","1.0575",..: 180 179 177 178 174 173 175 176 171 172 ...
> plot.ts(Rate)
The result of this last plot is the one shown above.
Changing the variable to numeric yields this:
> as.numeric(Rate)
[1] 180 179 177 178 174 173 175 176 171 172 170 166 180 167 169 160 123 128 150 140 132 128 138 165
[25] 161 163 136 134 134 129 159 158 180 156 140 155 151 142 131 148 104 100 96 104 65 53 27 24
[49] 13 3 8 1 2 7 10 9 21 42 36 50 39 33 23 15 19 29 51 54 26 23 11 6
[73] 4 12 5 16 20 18 17 14 22 30 34 49 92 89 98 83 92 141 125 110 81 109 151 149
[97] 162 143 85 69 77 61 180 30 32 38 52 37 78 127 120 73 105 126 131 106 122 119 107 112
[121] 157 137 152 96 93 99 87 94 86 70 71 180 67 43 66 58 84 57 55 47 35 25 26 41
[145] 31 48 48 75 63 59 38 60 46 44 28 40 45 52 62 101 82 74 68 60 64 102 144 168
[169] 159 154 108 91 98 118 111 72 76 180 95 90 117 139 131 116 130 133 145 103 79 88 115 97
[193] 106 113 89 102 121 102 119 114 124 148 180 153 164 161 147 135 146 141 80 56
So, it remains unchanged. This is very strange. The data excerpt shows that "Rate" takes on values between 1.1 and 1.5 approximately, so really not the values that are shown above. :/
The data set can be found under this link:
https://www.dropbox.com/s/ndxstdl1aae5glt/FRB_H10.csv?dl=0
It should be alright. I got it from the data base from the Federal Reserve System, so quite a decent source.
(Had to remove the link to the data excerpt because my reputation only allows for 2 links to be posted at a time. But the entire data set should be even better, I guess.

#BlankUsername
Thanks very much for the link. I got it working now using this code:
FRB <- read.csv("FRB_H10.csv", header=TRUE, sep=",")
> attach(FRB)
> as.numeric(paste(Rate))
[1] NA 1.2015 1.1918 1.1936 1.1820 1.1811 1.1830 1.1832 1.1779 1.1806 1.1598 1.1517 NA
[14] 1.1559 1.1584 1.1414 1.1279 1.1290 1.1370 1.1342 1.1308 1.1290 1.1337 1.1462 1.1418 1.1432
[27] 1.1330 1.1316 1.1316 1.1300 1.1410 1.1408 NA 1.1395 1.1342 1.1392 1.1372 1.1346 1.1307
[40] 1.1363 1.1212 1.1197 1.1190 1.1212 1.1070 1.1006 1.0855 1.0846 1.0707 1.0576 1.0615 1.0524
[53] 1.0575 1.0605 1.0643 1.0621 1.0792 1.0928 1.0908 1.0986 1.0919 1.0891 1.0818 1.0741 1.0768
[66] 1.0874 1.0990 1.1008 1.0850 1.0818 1.0671 1.0598 1.0582 1.0672 1.0596 1.0742 1.0780 1.0763
[79] 1.0758 1.0729 1.0803 1.0876 1.0892 1.0979 1.1174 1.1162 1.1194 1.1145 1.1174 1.1345 1.1283
[92] 1.1241 1.1142 1.1240 1.1372 1.1368 1.1428 1.1354 1.1151 1.1079 1.1126 1.1033 NA 1.0876
[105] 1.0888 1.0914 1.0994 1.0913 1.1130 1.1285 1.1271 1.1108 1.1232 1.1284 1.1307 1.1236 1.1278
[118] 1.1266 1.1238 1.1244 1.1404 1.1335 1.1378 1.1190 1.1178 1.1196 1.1156 1.1180 1.1154 1.1084
[131] 1.1090 NA 1.1076 1.0952 1.1072 1.1025 1.1150 1.1020 1.1015 1.0965 1.0898 1.0848 1.0850
[144] 1.0927 1.0884 1.0976 1.0976 1.1112 1.1055 1.1026 1.0914 1.1028 1.0962 1.0953 1.0868 1.0922
[157] 1.0958 1.0994 1.1042 1.1198 1.1144 1.1110 1.1078 1.1028 1.1061 1.1200 1.1356 1.1580 1.1410
[170] 1.1390 1.1239 1.1172 1.1194 1.1263 1.1242 1.1104 1.1117 NA 1.1182 1.1165 1.1262 1.1338
[183] 1.1307 1.1260 1.1304 1.1312 1.1358 1.1204 1.1133 1.1160 1.1252 1.1192 1.1236 1.1246 1.1162
[196] 1.1200 1.1276 1.1200 1.1266 1.1249 1.1282 1.1363 NA 1.1382 1.1437 1.1418 1.1360 1.1320
[209] 1.1359 1.1345 1.1140 1.1016
Warning message:
NAs introduced by coercion
> Rate <- cbind(paste(Rate))
> plot(Rate)
Warning message:
In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
> plot.ts(Rate, ylab="EUR/USD")
Despite the warning message, I get the following output (shown below). Like I intended to plot it.
Nevertheless, I do not really understand why it works the way it did. Why I have to use the paste() command and what it does exactly. I get the basic idea of what the classes do, but am very new to this whole world of R.
One thing I came to realize already is that R is such a powerful program. And yet confusing if you are a beginner. :D

R seq function too many arguments?

I am getting an error that I don't really understand at all. I was just messing around with generating some sequences, and I came across this problem:
This should create a sequence of 50 numbers.
seq.int(from=1,to=1000,by=5,length.out=50)
But if I enter this in the console I get the error message:
Error in seq.int(from = 1, to = 1000, by = 5, length.out = 50) :
too many arguments
If I look at the help (?seq), in the Usage section there is this line in there which makes it seem as though I called the function correctly, and it allows this many number of arguments:
seq.int(from, to, by, length.out, along.with, ...)
So what the heck is going on? I am I missing something fundamental, or are the docs out of date?
NOTE
The arguments I am providing to the function in the code sample are just for sake of example. I'm not trying to solve a particular problem, just curious as to why I get the error.

It's not clear what you expect as output from this line of code, and you're getting an error because R doesn't want to resolve the contradictions for you.
Here is some valid output, and the line of code you'd use to achieve each. This is a case where you need to decide for yourself which approach to use given the task you have in mind:
Override length.out
[1] 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
...
[199] 991 996
#via:
seq.int(from=1,to=1000,by=5)
Override by
[1] 1.00000 21.38776 41.77551 62.16327 82.55102 102.93878 123.32653
[8] 143.71429 164.10204 184.48980 204.87755 225.26531 245.65306 266.04082
[15] 286.42857 306.81633 327.20408 347.59184 367.97959 388.36735 408.75510
[22] 429.14286 449.53061 469.91837 490.30612 510.69388 531.08163 551.46939
[29] 571.85714 592.24490 612.63265 633.02041 653.40816 673.79592 694.18367
[36] 714.57143 734.95918 755.34694 775.73469 796.12245 816.51020 836.89796
[43] 857.28571 877.67347 898.06122 918.44898 938.83673 959.22449 979.61224
[50] 1000.00000
#via:
seq.int(from=1,to=1000,length.out=50)
Override to
[1] 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101
[22] 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206
[43] 211 216 221 226 231 236 241 246
#via:
seq.int(from=1,by=5,length.out=50)
Override from
[1] 755 760 765 770 775 780 785 790 795 800 805 810 815 820 825 830 835 840
[19] 845 850 855 860 865 870 875 880 885 890 895 900 905 910 915 920 925 930
[37] 935 940 945 950 955 960 965 970 975 980 985 990 995 1000
#via:
seq.int(to=1000,by=5,length.out=50)
A priori, R has no way of telling which of the above you'd like, nor should it. You as programmer need to decide which inputs take precedence.
And you're right that this should be documented; for now, take a look at the source of .Primitive("seq.int"), as linked originally by #nongkrong.

No, there is nothing fundamental to the R language that I was missing that was the source of the problem. The problem is that the documents, at least at time of writing, are misleading and/or incorrect.

How to process multi columns data in data.frame with plyr

I am trying to solve the DSC(Differential scanning calorimetry) data with R but it seems that I ran into some troubles. All this used to be done in Origin or Qtiplot tediously in my lab.But I wonder if there is another way to do it in batch.But the result did not goes well. For example, maybe I have used the wrong colnames of my data.frame,the code
dat$0.5min
Error: unexpected numeric constant in "dat$0.5"
can not reach my data.
So below is the full description of my purpose, thank you in advance!
the DSC data is like this（I store the CSV file in my GoogleDrive Link ）　:
T1 0.5min T2 1min
40.59 -0.2904 40.59 -0.2545
40.81 -0.281 40.81 -0.2455
41.04 -0.2747 41.04 -0.2389
41.29 -0.2728 41.29 -0.2361
41.54 -0.2553 41.54 -0.2239
41.8 -0.07 41.8 -0.0732
42.06 0.1687 42.06 0.1414
42.32 0.3194 42.32 0.2817
42.58 0.3814 42.58 0.3421
42.84 0.3863 42.84 0.3493
43.1 0.3665 43.11 0.3322
43.37 0.3438 43.37 0.3109
43.64 0.3265 43.64 0.2937
43.9 0.3151 43.9 0.2819
44.17 0.3072 44.17 0.2735
44.43 0.2995 44.43 0.2656
44.7 0.2899 44.7 0.2563
44.96 0.2779 44.96 0.245
in fact I have merge the data into a data.frame and hope I can adjust it and do something further.
the command is:
dat<-read.csv("Book1.csv",header=F)
colnames(dat)<-c('T1','0.5min','T2','1min','T3','2min','T4','4min','T5','8min','T6','10min',
'T7','20min','T8','ascast1','T9','ascast2','T10','ascast3','T11','ascast4',
'T12','ascast5'
)
so actually dat is a data.frame with 1163 obs. of 24 variables.
T1,T2,T3.....T12 means temperature that the samples were tested of DSC although in the same interval they do differ a little due to the unstability of the machine.
And the colname along T1~T12 is Heat Flow of different heat treatment durations that records by the machine and ascast1~ascast5 means nothing done to the sample to check the accuracy of the machine.
Now I need to do something like the following:
for T1~T2 is in Celsius Degrees，I need to change them into Kelvin Degrees whichi means every data plus 273.16.
Two temperature is chosen to compare the result that is Ts=180.25,Te=240.45(all is discussed in Celsius Degrees and I have seen it Qtiplot to make sure). To be clear I list the two temperature and the first 6 columns data.
T1 0.5min T2 1min T3 2min T4 4min
180.25 -0.01710000 180.25 -0.01780000 180.25 -0.02120000 180.25 -0.02020000
. . . .
. . . .
240.45 0.05700000 240.45 0.04500000 240.45 0.05780000 240.45 0.05580000
That all Heat Flow in Ts should be the same that can be made 0 for convenience. So based on the different values Heat Flow of different times like 0.5min,1min,2min,4min,8min,10min,20min and ascas1~ascast5 all Heat Flow value should be minus the Heat Flow value in Ts.
And for Heat Flow in Te, the value should be adjust to make sure that all the Heat Flow data are the same in Te. The purpose is like the following, (1) calculate mean of the 12 heat flow data in Te. Let's use Hmean for the mean heat flow.So Hmean is the value that all Heat Flow should be. (2) for data in column 0.5min,I use col("0.5min") to denote, and the lineal transform formula is like the following:
col("0.5min")-[([0.05700000-(-0.01710000)]-Hmean)/(Te-Ts)]*(col(T1)-Ts)
Actually, [0.05700000-(-0.01710000)] is done in step 2,but I write it for your reference. And this formula is used for different pair of T1~T12 and columns,like (T1,0.5min),(T2, 1min),(T3,1min).....all is 12 pairs.
Now we can plot the 12 pairs of data on the same plot with intervals from 180~240(also in Celsius Degrees) to magnify the details of differences between the different scans of DSC.
I have been stuck on this problems for 2 days , so I return to stackoverflow for help.
Thanks!

I am assuming that your question was right in the beginning where you got the following error,
dat$0.5min
Error: unexpected numeric constant in "dat$0.5"
As I could not find a question in the rest of the steps. They just seemed like a step by step procedure of an experiment.
To fix that error, the problem is the column name has a number in it so to use the column name in the way you want (to reference a column), you should use "`", accent mark, symbol.
>dataF <- data.frame("0.5min"=1:10,"T2"=11:20,check.names = F)
> dataF$`0.5min`
[1] 1 2 3 4 5 6 7 8 9 10
Based on comments adding more information,
You can add a constant to add to alternate columns in the following manner,
dataF <- data.frame(matrix(1:100,10,10))
const <- 237
> print(dataF)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 11 21 31 41 51 61 71 81 91
2 2 12 22 32 42 52 62 72 82 92
3 3 13 23 33 43 53 63 73 83 93
4 4 14 24 34 44 54 64 74 84 94
5 5 15 25 35 45 55 65 75 85 95
6 6 16 26 36 46 56 66 76 86 96
7 7 17 27 37 47 57 67 77 87 97
8 8 18 28 38 48 58 68 78 88 98
9 9 19 29 39 49 59 69 79 89 99
10 10 20 30 40 50 60 70 80 90 100
dataF[,seq(1,ncol(dataF),by = 2)] <- dataF[,seq(1,ncol(dataF),by = 2)] + const
> print(dataF)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 238 11 258 31 278 51 298 71 318 91
2 239 12 259 32 279 52 299 72 319 92
3 240 13 260 33 280 53 300 73 320 93
4 241 14 261 34 281 54 301 74 321 94
5 242 15 262 35 282 55 302 75 322 95
6 243 16 263 36 283 56 303 76 323 96
7 244 17 264 37 284 57 304 77 324 97
8 245 18 265 38 285 58 305 78 325 98
9 246 19 266 39 286 59 306 79 326 99
10 247 20 267 40 287 60 307 80 327 100
To generalize, we know that the columns of a dataframe can be referenced with a vector of numbers/column names. Most operations in R are vectorized. You can use column names or numbers based on the pattern you are looking for.
For example, I change the name of my first two columns and want to access just those I do this,
colnames(dataF)[c(1,2)] <- c("Y1","Y2")
#Reference all column names with "Y" in it. You can do any operation you want on this.
dataF[,grep("Y",colnames(dataF))]
Y1 Y2
1 238 11
2 239 12
3 240 13
4 241 14
5 242 15
6 243 16
7 244 17
8 245 18
9 246 19
10 247 20

Iteratively compute drift coefficient from random walk with drift function in R, compile into list

My objective is to list the drift coefficient from a random walk with drift forecast function, applied to a set of historical data (below). Specifically I am trying to gather the drift coefficient starting from the random walk with drift model of the first year, then cumulatively to the last, recording the coefficient each time, meaning iteratively or each additional year (recording this into a list? if that is appropriate). To be clear each new random walk forecast is including all the previous years.
The data is a list of 241 consumption levels, and I am attempting to discern how the drift coefficent would change over the course of iteratively progressing from n=1 to n=241
Where for example the random walk with drift model is Y[t] = c + Y[t-1] + Z[t] where Z[t] is a normal error and c is the coefficient i am looking for. My current attempts at this involve a for loop function and extracting the c coefficient from the rwf() function from the "Forecast" package in R.
To extract this, I am doing as such
rwf(x, h = 1, drift = TRUE)$model[[1]]
which extracts the drift coefficient.
The problem is, my attempts at subsetting the data within the rwf call have failed, and I also don't believe, through trial and error and research, that rwf() supports the subset argument, as an lm model does for example. In this sense my attempts at looping the function have also failed.
An example of such code is
for (i in 1:5){print((rwf(x[1:i], h = 1, drift = TRUE))$model[[1]])}
which gives me the following error
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
In addition: Warning message:
In is.na(rows) : is.na() applied to non-(list or vector) of type 'NULL'
Any help would be much appreciated.
I read SO a lot for help but this is my first time asking a question.
The data is as follows
PCE
1 1306.7
2 1309.6
3 1335.3
4 1341.8
5 1389.2
6 1405.7
7 1414.2
8 1411.0
9 1401.6
10 1406.7
11 1425.0
12 1444.4
13 1474.7
14 1507.8
15 1536.6
16 1555.6
17 1575.2
18 1577.8
19 1583.0
20 1586.6
21 1608.4
22 1619.5
23 1622.4
24 1635.3
25 1636.1
26 1613.9
27 1627.1
28 1653.8
29 1675.6
30 1706.7
31 1732.9
32 1751.0
33 1752.9
34 1769.7
35 1792.1
36 1785.0
37 1787.4
38 1786.9
39 1813.4
40 1822.2
41 1858.7
42 1878.5
43 1901.6
44 1917.0
45 1944.2
46 1957.3
47 1976.0
48 2002.9
49 2019.6
50 2059.5
51 2095.8
52 2134.3
53 2140.2
54 2187.8
55 2212.0
56 2250.0
57 2313.2
58 2347.4
59 2353.5
60 2380.4
61 2390.3
62 2404.2
63 2437.0
64 2449.5
65 2464.6
66 2523.4
67 2562.1
68 2610.3
69 2622.3
70 2651.7
71 2668.6
72 2681.5
73 2702.9
74 2719.5
75 2731.9
76 2755.9
77 2748.4
78 2800.9
79 2826.6
80 2849.1
81 2896.5
82 2935.2
83 2991.2
84 3037.4
85 3108.6
86 3165.5
87 3163.9
88 3175.3
89 3166.0
90 3138.3
91 3149.2
92 3162.2
93 3115.8
94 3142.0
95 3194.4
96 3239.9
97 3274.2
98 3339.6
99 3370.3
100 3405.9
101 3450.3
102 3489.7
103 3509.0
104 3542.5
105 3595.9
106 3616.9
107 3694.2
108 3709.7
109 3739.6
110 3758.5
111 3756.3
112 3793.2
113 3803.3
114 3796.7
115 3710.5
116 3750.3
117 3800.3
118 3821.1
119 3821.1
120 3836.6
121 3807.6
122 3832.2
123 3845.9
124 3875.4
125 3946.1
126 3984.8
127 4063.9
128 4135.7
129 4201.3
130 4237.3
131 4297.9
132 4331.1
133 4388.1
134 4462.5
135 4503.2
136 4588.7
137 4598.8
138 4637.2
139 4686.6
140 4768.5
141 4797.2
142 4789.9
143 4854.0
144 4908.2
145 4920.0
146 5002.2
147 5038.5
148 5078.3
149 5138.1
150 5156.9
151 5180.0
152 5233.7
153 5259.3
154 5300.9
155 5318.4
156 5338.6
157 5297.0
158 5282.0
159 5322.2
160 5342.6
161 5340.2
162 5432.0
163 5464.2
164 5524.6
165 5592.0
166 5614.7
167 5668.6
168 5730.1
169 5781.1
170 5845.5
171 5888.8
172 5936.0
173 5994.6
174 6001.6
175 6050.8
176 6104.9
177 6147.8
178 6204.0
179 6274.2
180 6311.8
181 6363.2
182 6427.3
183 6453.3
184 6563.0
185 6638.1
186 6704.1
187 6819.5
188 6909.9
189 7015.9
190 7085.1
191 7196.6
192 7283.1
193 7385.8
194 7497.8
195 7568.3
196 7642.4
197 7710.0
198 7740.8
199 7770.0
200 7804.2
201 7926.4
202 7953.7
203 7994.1
204 8048.3
205 8076.9
206 8117.7
207 8198.1
208 8308.5
209 8353.7
210 8427.6
211 8465.1
212 8539.1
213 8631.3
214 8700.1
215 8786.2
216 8852.9
217 8874.9
218 8965.8
219 9019.8
220 9073.9
221 9158.3
222 9209.2
223 9244.5
224 9285.2
225 9312.6
226 9289.1
227 9285.8
228 9196.0
229 9076.0
230 9040.9
231 8998.5
232 9050.3
233 9060.2
234 9121.2
235 9186.9
236 9247.1
237 9328.4
238 9376.7
239 9392.7
240 9433.5
241 9482.1

You need at least two points to fit your model. Here's how I'd approach the problem after reading your data into a data.frame named x:
library(forecast)
drifts <- sapply(2:nrow(x), function(zz) rwf(x[1:zz,], drift = TRUE)$model$drift)
I'm not sure if this is what you were expecting or not, but here's a plot of your drift values:

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R implementation cluster analysis - r

Silhouette assumes that all clusters have the same variance. IMHO, it does not make sense to use this measure with EM clustering.

Related

How can I get column&row indices (number) for a specific element?

How to define range of values of a time series?

R seq function too many arguments?

How to process multi columns data in data.frame with plyr

Iteratively compute drift coefficient from random walk with drift function in R, compile into list

Categories

Resources