For-Loop indexing issue in R [duplicate] - r

This question already has answers here:
calculating mean for every n values from a vector
(3 answers)
Closed 1 year ago.
I have 28 groups of 48 rows in an R dataframe. I'm trying to take the standard deviation of each group. I used the following statement in R Studio:
stddev <- vector();
for (i in 1:28) { stddev[i] <- sd(in.subj[((i * 48) -47):(i * 48), 5]); }
When I check the values of stddev[] afterward, stddev[1] = NA. Likewise, when I check the standard deviations of individual groups, like sd(in.subj[49:96,5]) I get different values than the for loop printed out.
What would be the cause of these issues?
Thanks!

you can try :
tapply(in.subj[,5], gl(28,48), sd)
if there is some NAs in your data :
tapply(in.subj[,5], gl(28,48), sd, na.rm=T)

Related

R Function to extract values greater than x [duplicate]

This question already has answers here:
How can i select values from a vector in R using logical operators?
(3 answers)
Closed 1 year ago.
I am running a simulation model and would like to know how to extract all values greater than 6 in this gamma distribution. Thank you!
cost <- 100
n_samp <-1000
gamma<-rgamma(n_samp,2,0.5)
You can also subset the array gamma with a logical vector:
gt6_values = gamma[gamma > 6]
You can use subset from base R to get just the values greater than 6.
subset(gamma, gamma > 6)

Why this conditional subsetting from a csv file returns incorrect answer in R? [duplicate]

This question already has answers here:
How to count TRUE values in a logical vector
(8 answers)
Closed 5 years ago.
Suppose I have the following data called D (9 columns, 395 rows):
D = read.csv("https://docs.google.com/uc?id=0B5V8AyEFBTmXQ1QwWVZuS3FXOHc&export=download")
In D, when I try to find out the length of p.values that are less than .05, I get an erroneous answer:
length(D$p.value <= .05) # Returns "395", which is the total number of rows not those <= .05
I'm wondering what the correct code code return the correct length of p.values that are less than .05 in D?
Try this:
sum(D$p.value <= .05)
I believe your problem may be that you are simply counting the size of the comparison vector. Of course, its size is the same as the data frame. Instead, my answer counts only entries for which the inequality is actually true.
#RichScriven edit: Summing the inequality will automatically convert the booleans to numbers, either 0 or 1.
Note that if you take a sum of a vector containing even one NA value then the resulting sum will also be NA. One option would be to ignore those NA values by removing them via:
sum(D$p.value <= .05, na.rm = TRUE)

How to calculate cumulative mean in R? [duplicate]

This question already has answers here:
Calculate cumulative average (mean)
(7 answers)
Closed 5 years ago.
(I am sorry if the term is not correct).
In R, I have a numeric vector x. I want to create new vector y where:
y[i] = mean (x[1:i)
It is easy to write a function to calculate y, but is there a built-in function in R which do the task?
Thank you very much
Try this
y <- cumsum(x) / seq_along(x)
Reference
https://stat.ethz.ch/pipermail/r-help/2008-May/162729.html

How to create quintile of variable in R [duplicate]

This question already has answers here:
split a vector by percentile
(5 answers)
Closed 6 years ago.
Is it possible to bin a variable in to quintile (1/5th) using R. And select only the variables that fall in the 5th bin.
As of now I am using the closest option which is quartile (.75) as there is not a function to do quintile.
Any suggestions please.
Not completely sure what you mean, but this divides a dataset into 5 equal groups based on value and subsequently selects the fifth group
obs = rnorm(100)
qq = quantile(obs, probs = seq(0, 1, .2))
obs[obs >= qq[5]]

Calculate cumulative standard deviation [duplicate]

This question already has answers here:
Efficient calculation of matrix cumulative standard deviation in r
(2 answers)
Closed 9 years ago.
I'm trying to calculate the standard deviation of values in a time series, but I'd like to do it incrementally by advancing one day from the initial date value each time. I know there is a way to do this in R (probably using ddply?) that doesn't involve a nasty for-loop. Thanks for any help!
d<-seq(from=as.Date("2013-01-01"), to=as.Date("2013-02-01"), by="day")
v <-rnorm(32, 10, 5)
test.df<-data.frame(the_date=d, value=v)
Here's the way I'm doing it now.
result <- c()
for(i in 2:nrow(test.df)){ result[i-1] <- sd(test.df[1:i,]$value)}
Use TTR::runSD with cumulative=TRUE.
library(TTR)
x <- xts(test.df[,2],test.df[,1])
runSD(x, n=1, cumulative=TRUE)

Resources