Standard Deviation Loop in R - r

I want to create a loop that takes the standard deviation of positions 1 through 3 in "y" then takes standard deviation of positions 4 through 6 etc.
Here is my code I came up with so far but am stuck since the new vector "i" increasing by those same values.
Here is a hypothetical dataset.
x <-rep(1:10, each =3)
y <- rnorm(30, mean=4,sd=1)
data <- cbind(x,y)
sd.v = NULL
for (i in c(1,4,7,10)){
sd.v[i] <- sd(y[c(i,i+1,i+2)])
}
I am really more so interested in creating a loop rather than using apply, sapply, tapply or something else.

If you really want a loop, here is an approach:
set.seed(42)
y <- rnorm(30, mean=4,sd=1)
sd.y <- as.numeric()
for(i in 1:10){
sd.y[i] <- sd(y[(1+(i-1)*3):(3+(i-1)*3)])
}
sd.y
# [1] 0.9681038 0.3783425 1.1031686 1.1799477 0.6867556 1.6987277
# [7] 1.8859794 1.4993717 1.2956209 1.1116502

Related

Seeking an lapply like function for a list of lists

I have a list in R which looks something like this
b0=5;b1=2
f <- function(x) b0 + b1*x
Nsim <- 100
my.list <- vector("list", Nsim)
for(i in 1:Nsim){
x <- rep(0,1000)
y <- x
y[1] <- f(x[1])
for(j in 2:1000){
x[j] <- x[j-1] + rnorm(1,0,0.1)
y[j] < f(x[j])
}
my.list[[i]]$x <- x
my.list[[i]]$y <- y
}
In reality, f is the result of an optimisation routine and x tracks the input value over time and y is the function values which are generated. So in essence, I have Nsim time series. I want to plot metrics of these time series over time by averaging over the index i. For instance, the average performance of the algorithm over time.
At the moment I'm doing this with a bespoke function for each metric I want to calculate (e.g. mean squared error of x from the true value of x, another for generating error bars and so on). I want to use something like lapply to average over i so I can visualise how x and y evolve over time but that doesn't do the right thing.
Is what I want to output is a pointwise summary of the results. As an analogy, if my.list[[i]]$x was instead stored as a matrix, I could take colMeans() to see the average value of x over "time".
Is there a function/package which is good for working with lists of lists?
At least for what has been presented there is no real reason to use a list of lists. The x's are all the same and equal to 1, 2, 3, ... so this could be represented by a matrix with the x component being implicit or represented by row names or we could represent this as a ts object or zoo object. In the last two cases if X is the object time(X) is the common x.
mat <- sapply(my.list, "[[", "y")
ts(mat)
library(zoo); zoo(mat)
Alternately, get rid of my.list and construct one of these directly in the code.

Creating a function that determines the impact of an outlier

My big-picture goal is to demonstrate the difference outliers can have on a dataset's average. I'm trying to create a function that uses the size of an outlier "k" as an input and outputs the average. Basically, the function needs to take any value "k" (which is the outlier) and return the average of vector x if the first value of x were replaced with k. For example, say the dataset is the heights of a population of students. The first value is supposed to be 71.3 cm but the kid accidentally put 713 cm. In this case, I want my function to tell me what would be the average of my vector if there was an outlier of value 713 (k = 713). So far I have the following, where x is the name of the dataset of heights.
average_err <- function(k) {
x[1] <- k
mean(x[1])
}
Then calculate the average if there was an outlier of 713
average_err(713)
However, my output is always identical to my input. Will someone please help me?
I would suggest:
average_err <- function(x,k) {
mean(c(x,k))
}
In the above, instead of replacing one of the x-values with an outlier, you're adding an outlier to the existing x-vactor. As #SteveM suggested, you should also have the function take x as an argument
x <- rnorm(25)
average_err(x, 100)
# [1] 3.627824
You could also build it to print both the mean of the original x, x with k and the difference:
average_err <- function(x,k) {
m1 <- mean(x)
m2 <- mean(c(x,k))
d <- m2-m1
out <- data.frame(mean = c(m1, m2, d))
rownames(out) = c("x", "x,k", "difference")
out
}
average_err(x,100)
# mean
# x -0.2270631
# x,k 3.6278239
# difference 3.8548870
I'm not sure if I understand well, but I would rather replace "mean(x[1])" with "mean(x)" in your case. If you write mean(x[1]), you will do the average of one value only, the one you have replace with the outlier k.
average_err <- function(k) {
x[1] <- k
mean(x)
}

Computing Spearman's rho for increasing subsets of rows in for Loop

I am trying to fit a for Loop in R in order to run correlations for multiple subsets in a data frame and then store the results in a vector.
What I have in this loop is a data frame with 2 columns, x and y, and 30 rows of different continuous measurement values in each column. The process should be repeated 100 times. The data can be invented.
What I need, is to compute the Spearman's rho for the first five rows (between x and y) and then for increasing subsets (e.g., the sixth first rows, the sevenths first rows etc.). Then, I'd need to store the rho results in a vector that I can further use.
What I had in mind (but does not work):
sortvector <- 1:(30)
for (i in 1:100)
{
sortvector <- sample(sortvector, replace = F)
xtemp <- x[sortvector]
rho <- cor.test(xtemp,y, method="spearman")$estimate
}
The problem is that the code gives me one value of rho for the whole dataframe, but I need it for increments of subsets.
How can I get rho for subsets of increasing values in a for-loop? And how can i store the coefficients in a vector that i can use afterwards?
Any help would be much appreciated, thanks.
Cheers
The easiest approach is to convertfor loop into sapply function, which returns a vector of rho's as a result of your bootstrapping:
sortvector <- 1:(30)
x <- rnorm(30)
y <- rnorm(30)
rho <- sapply(1:100, function(i) {
sortvector <- sample(sortvector, replace = F)
xtemp <- x[sortvector]
cor.test(xtemp, y, method = "spearman")$estimate
})
head(rho)
Output:
rho rho rho rho rho rho
0.014460512 -0.239599555 0.003337041 -0.126585095 0.007341491 0.264516129

log- and z-transforming my data in R

I'm preparing my data for a PCA, for which I need to standardize it. I've been following someone else's code in vegan but am not getting a mean of zero and SD of 1, as I should be.
I'm using a data set called musci which has 13 variables, three of which are labels to identify my data.
log.musci<-log(musci[,4:13],10)
stand.musci<-decostand(log.musci,method="standardize",MARGIN=2)
When I then check for mean=0 and SD=1...
colMeans(stand.musci)
sapply(stand.musci,sd)
I get mean values ranging from -8.9 to 3.8 and SD values are just listed as NA (for every data point in my data set rather than for each variable). If I leave out the last variable in my standardization, i.e.
log.musci<-log(musci[,4:12],10)
the means don't change, but the SDs now all have a value of 1.
Any ideas of where I've gone wrong?
Cheers!
You data is likely a matrix.
## Sample data
dat <- as.matrix(data.frame(a=rnorm(100, 10, 4), b=rexp(100, 0.4)))
So, either convert to a data.frame and use sapply to operate on columns
dat <- data.frame(dat)
scaled <- sapply(dat, scale)
colMeans(scaled)
# a b
# -2.307095e-16 2.164935e-17
apply(scaled, 2, sd)
# a b
# 1 1
or use apply to do columnwise operations
scaled <- apply(dat, 2, scale)
A z-transformation is quite easy to do manually.
See below using a random string of data.
data <- c(1,2,3,4,5,6,7,8,9,10)
data
mean(data)
sd(data)
z <- ((data - mean(data))/(sd(data)))
z
mean(z) == 0
sd(z) == 1
The logarithm transformation (assuming you mean a natural logarithm) is done using the log() function.
log(data)
Hope this helps!

Computing linear regressions for every possible permutation of matrix columns

I have a (k x n) matrix. I have initially managed to linearly regress (using the lm function) column 1 with each and every other column and extracted only the coefficients.
fore.choose <- matrix(0, 1, NCOL(assets))
for(i in seq(1, NCOL(assets), 1))
{
abc <- lm(assets[,1]~assets[,i])$coefficients
fore.choose[1,i] <- abc[2:length(abc)]
}
The coefficients are placed in the fore.choose matrix.
What I now need to do is to linearly regress column 2 with each and every other column, and then column 3 and so on and so forth and extract only the coefficients.
The output will be a square matrix of OLS univariate coefficients. Kind of similar to a correlation matrix, but it is the beta coefficients I am interested in.
fore.choose <- matrix(0, 1, NCOL(assets))
will initially need to become
fore.choose <- matrix(0, NCOL(assets), NCOL(assets))
I'd just compute the coefficients directly from the correlation matrix, using beta = cor(x,y)*sd(x)/sd(y), like this:
# set up some sample data
set.seed(1)
d <- matrix(rnorm(50), ncol=5)
# get the coefficients
s <- apply(d, 2, sd)
cor(d)*outer(s, s, "/")
You could also use lsfit to get the coefficients of one term on all the others at once and then only have one loop to do:
sapply(1:ncol(d), function(i) {
coef(lsfit(d[,i], d))[2,]
})
I'm sure there must be a more elegant way than to nested loops.
fore.choose <- matrix(NA, NCOL(assets), NCOL(assets))
abc <- NULL
for(i in seq_len(ncol(assets))){ # loop over "dependant" columns
for(j in seq_len(ncol(assets))){ # loop over "independant" columns
abc <- lm(assets[,i]~assets[,j])$coefficients
fore.choose[i,j] <- abc[-1]
}
}

Resources