This is what I am trying to do:
x <- c(1,2,3,3,2,3,4,5,6)
my_acf = acf(x,plot=F)
> my_acf
Autocorrelations of series ‘x’, by lag
0 1 2 3 4 5 6 7 8
1.000 0.497 0.097 -0.047 -0.050 -0.075 -0.231 -0.376 -0.316
I want to extract only 0.497, the correlation coefficient on the first lag, and I want to have it as a numeric value. How can I do that?
Thank You
The answer is to use my_acf$acf[2]. Here is what lead me to the solution:
> attributes(my_acf)
$names
[1] "acf" "type" "n.used" "lag" "series" "snames"
$class
[1] "acf"
> my_acf$acf
, , 1
[,1]
[1,] 1.00000000
[2,] 0.49747475
[3,] 0.09722222
[4,] -0.04734848
[5,] -0.04987374
[6,] -0.07512626
[7,] -0.23106061
[8,] -0.37563131
[9,] -0.31565657
> my_acf$acf[2]
[1] 0.4974747
You can try like this
my_acf$acf
Related
I'm studying from a textbook on data mining and I can't figure out how the author reads the nn values from the gcv output. The code and output are below:
## cv
alpha <- seq(0.20, 1, by = 0.01)
n1 = length(alpha)
g = matrix(nrow = n1, ncol = 4)
for (k in 1:length(alpha)) {
g[k,] <- gcv(NOx ~ lp(EquivRatio, nn = alpha[k]), data = ethanol)
}
g
the csv file is here:
https://github.com/jgscott/ECO395M/blob/master/data/ethanol.csv
I'm usin locfit library in R.
How do you find with given output?
The nn values are not read from the output - they are given in the input. In the loop, nn is assigned as the kth value of the object alpha.
Let's look at the output of the first 16 rows of g, which is the same as the picture you included in your question:
g[1:16,]
#> [,1] [,2] [,3] [,4]
#> [1,] -3.220084 18.81266 16.426487 0.1183932
#> [2,] -3.249601 17.61614 15.436227 0.1154507
#> [3,] -3.319650 16.77004 14.752039 0.1151542
#> [4,] -3.336464 15.44404 13.889209 0.1115457
#> [5,] -3.373011 14.52391 13.115430 0.1099609
#> [6,] -3.408908 13.96789 12.634934 0.1094681
#> [7,] -3.408908 13.96789 12.634934 0.1094681
#> [8,] -3.469254 12.99316 11.830996 0.1085293
#> [9,] -3.504310 12.38808 11.283837 0.1078784
#> [10,] -3.529167 11.93838 10.928859 0.1073628
#> [11,] -3.546728 11.46960 10.516520 0.1065792
#> [12,] -3.552238 11.26372 10.322329 0.1061728
#> [13,] -3.576083 11.03575 10.135243 0.1062533
#> [14,] -3.679128 10.54096 9.662613 0.1079229
#> [15,] -3.679128 10.54096 9.662613 0.1079229
#> [16,] -3.699044 10.46534 9.578396 0.1082955
Note that rows 11, 12 and 13 were created inside your loop using alpha[11], alpha[12] and alpha[13]. These values were passed to the nn argument of lp. If you want the nn values included in your table, all you need to do is:
cbind(g, nn = alpha)
#> nn
#> [1,] -3.220084 18.812657 16.426487 0.1183932 0.20
#> [2,] -3.249601 17.616143 15.436227 0.1154507 0.21
#> [3,] -3.319650 16.770041 14.752039 0.1151542 0.22
#> [4,] -3.336464 15.444040 13.889209 0.1115457 0.23
#> [5,] -3.373011 14.523910 13.115430 0.1099609 0.24
#> [6,] -3.408908 13.967891 12.634934 0.1094681 0.25
#> [7,] -3.408908 13.967891 12.634934 0.1094681 0.26
#> [8,] -3.469254 12.993165 11.830996 0.1085293 0.27
#> [9,] -3.504310 12.388077 11.283837 0.1078784 0.28
#> [10,] -3.529167 11.938379 10.928859 0.1073628 0.29
#> [11,] -3.546728 11.469598 10.516520 0.1065792 0.30
#> [12,] -3.552238 11.263716 10.322329 0.1061728 0.31
#> [13,] -3.576083 11.035752 10.135243 0.1062533 0.32
#> [14,] -3.679128 10.540964 9.662613 0.1079229 0.33
#> [15,] -3.679128 10.540964 9.662613 0.1079229 0.34
#> [16,] -3.699044 10.465337 9.578396 0.1082955 0.35
I'm playing around with some data, my matrix looks as shown in results:
results
[,1] [,2] [,3]
[1,] 1.7 0.0015 -1566.276
[2,] 1.7 0.0016 -1564.695
[3,] 1.7 0.0017 -1564.445
[4,] 1.7 0.0018 -1565.373
[5,] 1.7 0.0019 -1567.352
[6,] 1.7 0.0020 -1570.274
[7,] 1.8 0.0015 -1568.299
[8,] 1.8 0.0016 -1565.428
[9,] 1.8 0.0017 -1563.965
[10,] 1.8 0.0018 -1563.750
[11,] 1.8 0.0019 -1564.647
[12,] 1.8 0.0020 -1566.544
[13,] 1.9 0.0015 -1571.798
[14,] 1.9 0.0016 -1567.635
[15,] 1.9 0.0017 -1564.960
[16,] 1.9 0.0018 -1563.602
[17,] 1.9 0.0019 -1563.418
[18,] 1.9 0.0020 -1564.289
[19,] 2.0 0.0015 -1576.673
[20,] 2.0 0.0016 -1571.220
[21,] 2.0 0.0017 -1567.332
[22,] 2.0 0.0018 -1564.831
[23,] 2.0 0.0019 -1563.566
[24,] 2.0 0.0020 -1563.410
I wanted to print the row where the maximum on the 3rd column occurs, so I split it down in the following 2 lines:
max(results[,3])
results[results[,3] == -1563.41,]
The result didn't give me the desired row.
I tried putting the value in inverted commas as so:
max(results[,3])
results[results[,3] == "-1563.41",]
but this didn't work either.
the only code which worked was when I nested the max line of code in the 2nd line of code as so:
results[results[,3] == max(results[,3]),]
could someone please explain why breaking it down into steps didn't work??
I tried turning it into a data frame which didn't work either. using the filter from the tidyverse package didn't work either.
thanks,
R prints numbers to a maximum of getOption("digits") significant digits. I suspect that -1563.410 is the maximum rounded to seven significant digits, not the exact maximum. If no element of results[, 3] is exactly equal to the rounded number -1563.410, then
results[results[, 3] == -1563.41, ]
returns a 0-by-3 matrix. Meanwhile, max(results[, 3]) is the exact maximum, so
results[results[, 3] == max(results[, 3]), ]
returns the matrix subset that you want.
Note that if there are n rows whose third element is equal to max(results[, 3]), then the subset is a length-3 vector if n = 1 and an n-by-3 matrix if n > 1 (see ?Extract). If, in the n > 1 case, you only want the first of the n rows, then you should do:
results[which.max(results[, 3]), ]
I have a data.frame containing a vector of numeric values (prcp_log).
waterdate PRCP prcp_log
<date> <dbl> <dbl>
1 2007-10-01 0 0
2 2007-10-02 0.02 0.0198
3 2007-10-03 0.31 0.270
4 2007-10-04 1.8 1.03
5 2007-10-05 0.03 0.0296
6 2007-10-06 0.19 0.174
I then pass this data through Christiano-Fitzgerald band pass filter using the following command from the mfilter package.
library(mFilter)
US1ORLA0076_cffilter <- cffilter(US1ORLA0076$prcp_log,pl=180,pu=365,root=FALSE,drift=FALSE,
type=c("asymmetric"),
nfix=NULL,theta=1)
Which creates an S3 object containing, among other things, and vector of "trend" values and a vector of "cycle" values, like so:
head(US1ORLA0076_cffilter$trend)
[,1]
[1,] 0.05439408
[2,] 0.07275321
[3,] 0.32150292
[4,] 1.07958965
[5,] 0.07799329
[6,] 0.22082246
head(US1ORLA0076_cffilter$cycle)
[,1]
[1,] -0.05439408
[2,] -0.05295058
[3,] -0.05147578
[4,] -0.04997023
[5,] -0.04843449
[6,] -0.04686915
Plotted:
plot(US1ORLA0076_cffilter)
I then apply the following mathematical operation in attempt to remove the trend and seasonal components from the original numeric vector:
US1ORLA0076$decomp <- ((US1ORLA0076$prcp_log - US1ORLA0076_cffilter$trend) - US1ORLA0076_cffilter$cycle)
Which creates an output of values which includes unexpected elements such as dashes and letters.
head(US1ORLA0076$decomp)
[,1]
[1,] 0.000000e+00
[2,] 0.000000e+00
[3,] 1.387779e-17
[4,] -2.775558e-17
[5,] 0.000000e+00
[6,] 6.938894e-18
What has happened here? What do these additional characters signify? How can perform this mathematical operation and achieve the desired output of simply $log_prcp minus both the $tend and $cycle values?
I am happy to provide any additional info that will help right away, just ask.
I have two matrices, train and test. How do I "fit" a singular value decomposition on train and apply the fitted transformation to test?
For example
library(irlba)
# train
train <- cbind(matrix(runif(16, min=0, max=1), nrow=8),
matrix(runif(16, min=30, max=31), nrow=8))
train[1:4, ] = train[1:4, ] + 50
# test
test <- cbind(matrix(runif(16, min=0, max=1), nrow=8),
matrix(runif(16, min=30, max=31), nrow=8))
test[1:4, ] = test[1:4, ] + 50
# trunacted SVD applied to train
S <- irlba(t(train), nv=2)
> train
[,1] [,2] [,3] [,4]
[1,] 50.39686 50.8733 80.57 80.51
[2,] 50.42719 50.2288 80.64 80.17
[3,] 50.87391 50.6059 80.19 80.61
[4,] 50.52439 50.7037 80.59 80.36
[5,] 0.43121 0.4681 30.93 30.76
[6,] 0.69381 0.5647 30.12 30.11
[7,] 0.02068 0.3382 30.37 30.04
[8,] 0.61101 0.5401 30.12 30.86
> S$v
[,1] [,2]
[1,] 0.4819 0.23134
[2,] 0.4805 0.18348
[3,] 0.4816 0.07372
[4,] 0.4816 -0.05819
[5,] 0.1370 -0.59769
[6,] 0.1342 -0.20746
[7,] 0.1335 -0.70946
[8,] 0.1358 -0.01972
Now, how do I reduce the dimensions of test? (Also, please note that my real datasets are large and sparse.)
New R user. I'm trying to split a dataset based on deciles, using cut according to the process in this question. I want to add the decile values as a new column in a dataframe, but when I do this the lowest value is listed as NA for some reason. This happens regardless of whether include.lowest=TRUE or FALSE. Anyone have any idea why?
Happens when I use this sample set, too, so it's not exclusive to my data.
data <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
> decile <- cut(data, quantile(data, (0:10)/10, labels=TRUE, include.lowest=FALSE))
> df <- cbind(data, decile)
> df
data decile
[1,] 1 NA
[2,] 2 1
[3,] 3 2
[4,] 4 2
[5,] 5 3
[6,] 6 3
[7,] 7 4
[8,] 8 4
[9,] 9 5
[10,] 10 5
[11,] 11 6
[12,] 12 6
[13,] 13 7
[14,] 14 7
[15,] 15 8
[16,] 16 8
[17,] 17 9
[18,] 18 9
[19,] 19 10
[20,] 20 10
There are two problems, first you have a couple of things wrong with your cut call. I think you meant
cut(data, quantile(data, (0:10)/10), include.lowest=FALSE)
## ^missing parenthesis
Also, labels should be FALSE, NULL, or a vector of length(breaks) containing the required labels.
Second, the main issue is that because you set include.lowest=FALSE, and data[1] is 1, which corresponds to the first break as defined by
> quantile(data, (0:10)/10)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1.0 2.9 4.8 6.7 8.6 10.5 12.4 14.3 16.2 18.1 20.0
the value 1 doesn't fall into any category; it is beyond the lower limit of the categories defined by your breaks.
I'm not sure what you want, but you could try one of these two alternatives, depending on which class you want 1 to be in:
> cut(data, quantile(data, (0:10)/10), include.lowest=TRUE)
[1] [1,2.9] [1,2.9] (2.9,4.8] (2.9,4.8] (4.8,6.7] (4.8,6.7]
[7] (6.7,8.6] (6.7,8.6] (8.6,10.5] (8.6,10.5] (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20] (18.1,20]
10 Levels: [1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] (8.6,10.5] ... (18.1,20]
> cut(data, c(0, quantile(data, (0:10)/10)), include.lowest=FALSE)
[1] (0,1] (1,2.9] (2.9,4.8] (2.9,4.8] (4.8,6.7] (4.8,6.7]
[7] (6.7,8.6] (6.7,8.6] (8.6,10.5] (8.6,10.5] (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20] (18.1,20]
11 Levels: (0,1] (1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] ... (18.1,20]