Say, I have two vectors of the same length
A = mtcars$mpg
B = mtcars$cyl
I can calculate correlation between whole vectors
cor (A, B)
and get one single value (-0.852162).
What I need is to calculate correlation between the two vectors with a sampling rate of 10, which means I start at the first datapoint in A and B, take 5 values on the right from it (there are no values on the left), calculate a correlation coefficient ad write it in a vector C. Then I take the next value in A & B, take 5 values on the right and 1 on the left, write it into a vector; then shift again to the next value and so forth. The resulting vector C must contain the same number of values as A or B (N=32), and each value in C represents a correlation b/w A and B with a sampling rate 10 (5 values on the left and 5 on the right from that datapoint, if availible).
Is there any elegant and simple way to do it in R?
P.S.: The ease of coding is more important, than the time needed for calculations.
The TTR package may provide what you are looking for.
It should be as simple as:
TTR::runCor(A, B)
There is a whole blog post about rolling correlation here.
I am having some issues in interpreting the results from prcomp().
Say I have a centered and scaled data.table called dat, with N columns and M rows. Indeed every column represents a feature and every row a record. I also got a M-dimensional vector of outcomes Y.
I wanted to know what the PCA of this system says. So I just executed:
dat.pca=prcomp(dat,retx=TRUE)
By the elbow method I decided to retain 5 PCA modes, accounting for 90% of the variance. Then, I got the following data.table:
dat.pcadata=as.data.table(dat.pca$x)
dat.pcadata has M rows and N columns, and each column corresponds to a PCA mode.
My question is: do I understand correctly if I say that now my system should be trained to forecast the outcomes Y using the first 5 columns of dat.pcadata as features?
I have the following problem. Maybe you can help me!
I have 60 matrices (60 trials). Each of those matrices is 16*1000 fileds big (16 angles and 1000 timestamps). The 16 angels are bodyangels.
Now I want to calculate the euclicid distance for each of the combinations (1770). So I would get 1770 matrices which are are 16*1000 fileds big.
The list for every combinations I would get through this formula:
>comb <- combinations(n=60, r=2, v=n, set=TRUE, repeats.allowed=FALSE)
The formula which I want to apply to each of these combinations is:
> dab <- sqrt(sum((a-b)^2)) # a and b are two matrices
I tried to create a function, which is fortunately only for single values, but not for whole matrices.:
>dist.fun <- function(x,y)
>{
>z <- sqrt(sum((x)-(y))^2)
> return(z)
>}
Out of those distances I want to create an euclidic distance matrix to do a cluster analysis.
>plot(hclust((as.dist(m)),method="ward.D2")) # m is the euclidic disdance matrix
I hope, anyone can help me with this problem. The data is biomechanical data from gymnasts, which I want to investigate in terms of variant and invariant components and prototypes.
I trying to learn how to use R for statistics and I would like to how can I can I generate 20 000 (K number of pairs) times a set of two samples each with 50 points from the same normal distribution(mean 2.5 and variance 9)?
So far I know that this is how I make 50 points from a normal distribution:
rnorm(50,2.5,3)
But how do I generate 20 000 times a set of two samples so I can perform tests on the K pairs later?
x <- lapply(c(1:20000),
function(x){
lapply(c(1:2), function(y) rnorm(50,2.5,3))
})
This produces 20000 paired samples, where each sample is composed of 50 observations from a N(2.5,3^2) distribution. Note that x is a list where each slot is a list of two vector of length 50.
To t-test the samples, you'll need to extract the vectors and give them to function t-test.
t.tests <- lapply(x, function(y) t.test(x=y[[2]], y=y[[1]]))
Something along the lines of
yourresults <- replicate(20000,{yourtest(matrix(rnorm(100,2.5,3),nc=2),<...>)})
or
yourresults <- replicate(20000,{yourtest(rnorm(50,2.5,3),rnorm(50,2.5,3),<...>)})
where yourtest is whatever your function is that's carrying out some test, and <...> is whatever other arguments you pass to yourtest. The first one is suitable if it expects a matrix with two columns, the second is suitable if it expects two vectors. You can adapt this approach to other forms of input - such as a formula interface - in the obvious way.
I'm a novice R user, who's learning to use this coding language to deal with data problems in research. I am trying to understand how knowledge evolves within an industry by looking at patenting in subclasses. So far I managed to get the following:
# kn.matrices<-with(patents, table(Class,year,firm))
# kn.ind <- with(patents, table(Class, year))
patents is my datafile, with Subclass, app.yr, and short.name as three of the 14 columns
# for (k in 1:37)
# kn.firms = assign(paste("firm", k ,sep=''),kn.matrices[,,k])
There are 37 different firms (in the real dataset, here only 5)
This has given 37 firm-specific and 1 industry-specific 2635 by 29 matrices (in the real dataset). All firm-specific matrices are called firmk with k going from 1 until 37.
I would like to perform many operations in each of the firm-specific matrices (e.g. compare the numbers in app.yr 't' with the average of the 3 previous years across all rows) so I am looking for a way that allows me to loop the operations for every matrix named firm1,firm2,firm3...,firm37 and that generates new matrices with consistent naming, e.g. firm1.3yearcomparison
Hopefully I framed this question in an appropriate way. Any help would be greatly appreciated.
Following comments I'm trying to add a minimal reproducible example
year<-c(1990,1991,1989,1992,1993,1991,1990,1990,1989,1993,1991,1992,1991,1991,1991,1990,1989,1991,1992,1992,1991,1993)
firm<-(c("a","a","a","b","b","c","d","d","e","a","b","c","c","e","a","b","b","e","e","e","d","e"))
class<-c(1900,2000,3000,7710,18000,19000,36000,115000,212000,215000,253600,383000,471000,594000)
These three vectors thus represent columns in a spreadsheet that forms the "patents" matrix mentioned before.
it looks like you already have a 3 dimensional array with all your data. You can basically view this as your 38 matrices all piled one on top of the other. You don't want to split this into 38 matrices and use loops. Instead, you can use R's apply function and extraction functions. Just view the help topic on the apply() family and it should show you how to do what you want. Here are a few basic examples
examples:
# returns the sums of all columns for all matrices
apply(kn.matrices, 3, colSums)
# extract the 5th row of all matrices
kn.matrices[5, , ]
# extract the 5th column of all matrices
kn.matrices[, 5, ]
# extract the 5th matrix
kn.matrices[, , 5]
# mean of 5th column for all matrices
colMeans(kn.matrices[, 5, ])