simulate observations and calculate sample autocorrelation - r

Simulating rk (r stands for autocorrelation) for {et} where each et is iid N(0,1).
R code: simulate 100 observations of {et} and calculate r1.
Here is my code so far:
x=rnorm(100,0,1)
x
y=ts(x)
trial_r1=acf(y)[1]
trial_r1
Is my code right? How to get r1 after running acf()

(I'll post as an answer, both to close the question, plus to help with searching for answers to similarly-structured questions.)
When looking for what you believe is but one part of a structured return, it's useful to look at the return value in detail. One common way to do this is with str:
set.seed(42)
x <- rnorm(100, mean = 0, sd = 1)
ret <- acf(ts(x))
str(ret)
## List of 6
## $ acf : num [1:21, 1, 1] 1 0.05592 -0.00452 0.03542 0.00278 ...
## $ type : chr "correlation"
## $ n.used: int 100
## $ lag : num [1:21, 1, 1] 0 1 2 3 4 5 6 7 8 9 ...
## $ series: chr "ts(x)"
## $ snames: NULL
## - attr(*, "class")= chr "acf"
In this instance, you'll see two clusters of numbers in $acf and $lag. The latter "clearly" is just an array of incrementing integers so is not that interesting in this endeavor, but the former looks more interesting. By seeing that the results is ultimately just a list, you can use dollar-sign subsetting (or [[, over to you) to extract what you need:
ret$acf
## , , 1
## [,1]
## [1,] 1.000000e+00
## [2,] 5.592310e-02
## [3,] -4.524017e-03
## [4,] 3.541639e-02
## [5,] 2.784590e-03
## ...snip...
In the case of your question, you should notice that the first element of this 3-dimensional array is the perfectly-autocorrelated 1, but your first real autocorrelation of concern is the second element, or 0.0559. So your first value is attainable with ret$acf[2,,] (or more formally ret$acf[2,1,1]).

Related

Remove outlier from five-number summary statistics

How can I force fivenum function to not put outliers as my maximum/minimum values?
I want to be able to see uppper and lower whisker numbers on my boxplot.
My code:
boxplot(data$`Weight(g)`)
text(y=fivenum(data$`Weight(g)`),labels=fivenum(data$`Weight(g)`),x=1.25, title(main = "Weight(g)"))
boxplot returns a named-list that includes things you can use to remove outliers in your call to fivenum:
$out includes the literal outliers. It can be tempting to use setdiff(data$`Weight(g)`), but that may be prone to problems due to R FAQ 7.31 (and floating-point equality), so I recommend against this; instead,
$stats includes the numbers used for the boxplot itself without the outliers. I suggest we work with this.
(BTW, title(.) does its work via side-effect, and it is not used by text(.), I suggest you move that call.)
Reproducible data/code:
vec <- c(1, 10:20, 30)
bp <- boxplot(vec)
str(bp)
# List of 6
# $ stats: num [1:5, 1] 10 12 15 18 20
# $ n : num 13
# $ conf : num [1:2, 1] 12.4 17.6
# $ out : num [1:2] 1 30
# $ group: num [1:2] 1 1
# $ names: chr "1"
five <- fivenum(vec[ vec >= min(bp$stats) & vec <= max(bp$stats)])
text(x=1.25, y=five, labels=five)
title("Weight(g)")

How do I matricise a column/vector (applying a function like sum/diff/boolean)? [duplicate]

I am trying create a data.frame from which to create a graph. I have a function and two vectors that I want to use as the two inputs. This is a bit simplified, but basically all I have is:
relGPA <- seq(-1.5,1.5,.2)
avgGPA <- c(-2,0,2)
f <- function(relGPA, avgGPA) 1/(1+exp(sum(relGPA*pred.model$coef[1],avgGPA*pred.model$coef[2])))
and all I want is a data.frame with 3 columns for the avgGPA values, and 16 rows for the relGPA values with the resulting values in the cells.
I apologize for how basic this is, but I assure you I have tried to make this happen without your assistance. I have tried following the examples on the sapply and mapply man pages, but I'm just a little too new to R to see what I'm trying to do.
Thanks!
Cannot be tested with the information offered, but this should work:
expGPA <- outer(relGPA, avgGPA, FUN=f) # See below for way to make this "work"
Another useful function when you want to generate combinations is expand.grid and this would get you the "long form":
expGPA2 <-expand.grid(relGPA, avgGPA)
expGPA2$fn <- apply(expGPA2, 1, f)
The long form is what lattice and ggplot will expect as input format for higher level plotting.
EDIT: It may be necessary to construct a more specific method for passing column references to the function as pointed out by djhurio and (solved) by Sam Swift with the Vectorize strategy. In the case of apply, the sum function would work out of the box as described above, but the division operator would not, so here is a further example that can be generalized to more complex functions with multiple arguments. All the programmer needs is the number of the column for the appropriate argument in the "apply()"-ed" function, because (unfortunately) the column names are not carried through to the x argument:
> expGPA2$fn <- apply(expGPA2, 1, function(x) x[1]/x[2])
> str(expGPA2)
'data.frame': 48 obs. of 3 variables:
$ Var1: num -1.5 -1.3 -1.1 -0.9 -0.7 ...
$ Var2: num -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 ...
$ fn : num 0.75 0.65 0.55 0.45 0.35 ...
- attr(*, "out.attrs")=List of 2
..$ dim : int 16 3
..$ dimnames:List of 2
.. ..$ Var1: chr "Var1=-1.5" "Var1=-1.3" "Var1=-1.1" "Var1=-0.9" ...
.. ..$ Var2: chr "Var2=-2" "Var2= 0" "Var2= 2"
Edit2: (2013-01-05) Looking at this a year later, I realized that SamSwift's function could be vectorized by making its body use "+" instead of sum:
1/(1+exp( relGPA*pred.model$coef[1] + avgGPA*pred.model$coef[2]) # all vectorized fns

How to get labels from hclust result

let's say i have a dataset like this
dt<-data.frame(id=1:4,X=sample(4),Y=sample(4))
and then i try to make a hierarchical clustering using the below code
dis<-dist(dt[,-1])
clusters <- hclust(dis)
plot(clusters)
and it works well
The point is when i ask for
clusters$labels
it gives me NULL, meanwhile i expect to see the label of indivisuals in order like
1, 4, 2, 3
it is important to have them with the order that they are added in plot
Use cluster$order rather than labels if you happened to not have assigned the labels.
Infact you can see all the contents by using function called summary
clusters <- hclust(dis)
plot(clusters)
summary(clusters)
clusters$order
You can compare with the plot i received at my end, it is offcourse little different than yours
My outcome:
> clusters$order
[1] 4 1 2 3
Content of summary command:
> summary(clusters)
Length Class Mode
merge 6 -none- numeric
height 3 -none- numeric
order 4 -none- numeric
labels 0 -none- NULL
method 1 -none- character
call 2 -none- call
dist.method 1 -none- character
You can observe that since there is null value against labels, hence you are not getting the labels. To receive the labels you need to assign them first using clusters$labels <- c("A","B","C","D") or you can assign with the rownames, once your labels are assigned you will no longer see the numbers you will able to see the names/labels.
In my case I have not assigned any name hence receiving the numbers instead.
You can put the labels in the plot function itself as well.
From the documentation ?hclust
labels
A character vector of labels for the leaves of the tree. By
default the row names or row numbers of the original data are used. If
labels = FALSE no labels at all are plotted.
You could use the following code:
# your data, I changed the id to characters to make it more clear
set.seed(1234) # for reproducibility
dt<-data.frame(id=c("A", "B", "C", "D"),X=sample(4),Y=sample(4))
dt
# your code, no labels
dis<-dist(dt[,-1])
clusters <- hclust(dis)
clusters$labels
# add labels, plot and check labels
clusters$labels <- dt$id
plot(clusters)
## labels in the order plotted
clusters$labels[clusters$order]
## [1] A D B C
## Levels: A B C D
Please let me know whether this is what you want.
Please make sure you use rownames(...) to ensure your data has labels
> rownames(dt) <- dt$id
> dt
id X Y
1 1 2 1
2 2 4 3
3 3 1 2
4 4 3 4
> dis<-dist(dt[,-1])
> clusters <- hclust(dis)
> str(clusters)
List of 7
$ merge : int [1:3, 1:2] -1 -2 1 -3 -4 2
$ height : num [1:3] 1.41 1.41 3.16
$ order : int [1:4] 1 3 2 4
$ labels : chr [1:4] "1" "2" "3" "4"
$ method : chr "complete"
$ call : language hclust(d = dis)
$ dist.method: chr "euclidean"
- attr(*, "class")= chr "hclust"
>

Why is this matrix not numeric? Then `as.numeric` destroys the matrix and return a vector

I have a data frame called input. The first column refers to an Article ID (ArtID), the subsequent columns will be used to create the matrix.
Based on the ArtID, I want R to generate a 2x2 matrix (more precise: It needs to be a numeric 2x2 matrix). Specifically, I want to create a matrix for the first row (ArtID == 1), the second row(ArtID == 2) and so on...
What I came up with so far is this:
for(i in 1:3) {stored.matrix = matrix(input[which(ArtID ==i),-1],nrow = 2)
This gives me a 2x2 matrix, but it is not numeric (which it needs to be).
If I apply as.numeric, the matrix is no longer a 2x2 matrix.
How do I get a 2x2 numerical matrix?
Minimal reproducible example:
ArtID = c(1,2,3)
AC_AC = c(1,1,1)
MKT_AC = c(0.5,0.6,0.2)
AC_MKT = c(0.5,0.6,0.2)
MKT_MKT = c(1,1,1)
input = data.frame(ArtID, AC_AC, MKT_AC, AC_MKT, MKT_MKT)
stored.matrix = matrix(input[which(ArtID ==i),-1],nrow = 2)
# [,1] [,2]
#[1,] 1 0.5
#[2,] 0.5 1
is.numeric(stored.matrix)
# [1] FALSE
as.numeric(stored.matrix)
## [1] 1.0 0.5 0.5 1.0
As you can see after applying as.numeric() the matrix is no longer 2x2.
Can anyone help?
You could use unlist():
matrix(unlist(input[ArtID ==i,-1]),2)
or use
storage.mode(m) <- "numeric"
when you have only numerical values in your data frame, it is more appropriate to use a matrix. Convert your data frame to a matrix will solve all problem. Also,
input <- data.matrix(input)
ArtID = c(1,2,3)
AC_AC = c(1,1,1)
MKT_AC = c(0.5,0.6,0.2)
AC_MKT = c(0.5,0.6,0.2)
MKT_MKT = c(1,1,1)
input = data.frame(ArtID, AC_AC, MKT_AC, AC_MKT, MKT_MKT)
input <- data.matrix(input) ## <- this line
stored.matrix = matrix(input[which(ArtID ==i),-1], 2)
is.numeric(stored.matrix)
# [1] TRUE
So what was the problem?
If input is a data frame, input[which(ArtID == i),-1] by row subsetting still returns a data frame. A data frame is a special type of list. When you feed a list to matrix(), you get into a situation of matrix list.
If you read ?matrix for what data it can take, you will see:
data: an optional data vector (including a list or ‘expression’
vector). Non-atomic classed R objects are coerced by
‘as.vector’ and all attributes discarded.
Note that a list is also of vector data type (e.g., is.vector(list(a = 1)) gives TRUE), so it is legitimate to feed a list to matrix. You can try
test <- matrix(list(a = 1, b = 2, c = 3, d = 4), 2)
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
This is indeed a matrix in the sense that class(test) give "matrix"), but
str(test)
#List of 4
# $ : num 1
# $ : num 2
# $ : num 3
# $ : num 4
# - attr(*, "dim")= int [1:2] 2 2
typeof(test)
# [1] "list"
so it is not the usual numerical matrix we refer to.
The input list can be ragged, too.
test <- matrix(list(a = 1, b = 2:3, c = 4:6, d = 7:10), 2)
# [,1] [,2]
#[1,] 1 Integer,3
#[2,] Integer,2 Integer,4
str(test)
#List of 4
# $ : num 1
# $ : int [1:2] 2 3
# $ : int [1:3] 4 5 6
# $ : int [1:4] 7 8 9 10
# - attr(*, "dim")= int [1:2] 2 2
And I was wondering why typeof() gives me list... :)
Yes, so had realized something unusual. The storage mode of a matrix is determined by that of its element. For a matrix list, elements are list, hence the matrix has "list" mode.

mapply basics? - how to create a matrix from two vectors and a function

I am trying create a data.frame from which to create a graph. I have a function and two vectors that I want to use as the two inputs. This is a bit simplified, but basically all I have is:
relGPA <- seq(-1.5,1.5,.2)
avgGPA <- c(-2,0,2)
f <- function(relGPA, avgGPA) 1/(1+exp(sum(relGPA*pred.model$coef[1],avgGPA*pred.model$coef[2])))
and all I want is a data.frame with 3 columns for the avgGPA values, and 16 rows for the relGPA values with the resulting values in the cells.
I apologize for how basic this is, but I assure you I have tried to make this happen without your assistance. I have tried following the examples on the sapply and mapply man pages, but I'm just a little too new to R to see what I'm trying to do.
Thanks!
Cannot be tested with the information offered, but this should work:
expGPA <- outer(relGPA, avgGPA, FUN=f) # See below for way to make this "work"
Another useful function when you want to generate combinations is expand.grid and this would get you the "long form":
expGPA2 <-expand.grid(relGPA, avgGPA)
expGPA2$fn <- apply(expGPA2, 1, f)
The long form is what lattice and ggplot will expect as input format for higher level plotting.
EDIT: It may be necessary to construct a more specific method for passing column references to the function as pointed out by djhurio and (solved) by Sam Swift with the Vectorize strategy. In the case of apply, the sum function would work out of the box as described above, but the division operator would not, so here is a further example that can be generalized to more complex functions with multiple arguments. All the programmer needs is the number of the column for the appropriate argument in the "apply()"-ed" function, because (unfortunately) the column names are not carried through to the x argument:
> expGPA2$fn <- apply(expGPA2, 1, function(x) x[1]/x[2])
> str(expGPA2)
'data.frame': 48 obs. of 3 variables:
$ Var1: num -1.5 -1.3 -1.1 -0.9 -0.7 ...
$ Var2: num -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 ...
$ fn : num 0.75 0.65 0.55 0.45 0.35 ...
- attr(*, "out.attrs")=List of 2
..$ dim : int 16 3
..$ dimnames:List of 2
.. ..$ Var1: chr "Var1=-1.5" "Var1=-1.3" "Var1=-1.1" "Var1=-0.9" ...
.. ..$ Var2: chr "Var2=-2" "Var2= 0" "Var2= 2"
Edit2: (2013-01-05) Looking at this a year later, I realized that SamSwift's function could be vectorized by making its body use "+" instead of sum:
1/(1+exp( relGPA*pred.model$coef[1] + avgGPA*pred.model$coef[2]) # all vectorized fns

Resources