I'm sure this is very obvious but i'm a begginer in R and i spent a good part of the afternoon trying to solve this...
I'm trying to create a loop to sum observation in my time serie in steps of five.
for example :
input:
1
2
3
4
5
5
6
6
7
4
5
5
4
4
5
6
5
6
4
4
output:
15
28
23
25
My time serie as only one variable, and 7825 obserbations.
The finality of the loop is to calculate the weekly realized volatility. My observations are squared returns. Once i'll have my loop, i'll be able to extract the square root and have my weekly realized volatility.
Thank you very much in advance for any help you can provide.
H.
We can create a grouping variable with gl and use that to get the sum in tapply
tapply(input, as.integer(gl(length(input), 5, length(input))),
FUN = sum, na.rm = TRUE)
# 1 2 3 4
# 15 28 23 25
data
input <- scan(text = "1 2 3 4 5 5 6 6 7 4 5 5 4 4 5 6 5 6 4 4", what = numeric())
Here is another base R option using sapply + split
> sapply(split(x,ceiling(seq_along(x)/5)),sum)
1 2 3 4
15 28 23 25
Data
x <- c(1, 2, 3, 4, 5, 5, 6, 6, 7, 4, 5, 5, 4, 4, 5, 6, 5, 6, 4, 4)
Related
I have a dataframe with observations from three years time, with column df$week that indicates the week of the observation. (The week count of the second year continues from the count of the first, so the data contains 207 weeks).
I would like to divide the data to longer time periods, to df$period that would include all observations from several weeks' time.
If a period would be the length of three weeks, and I the data would include 13 observations in six weeks time, the I idea would be to divide
weeks <- c(1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 6)
into
periods <- c(1, 1, 1, 2, 2, 3, 3), c(4, 5, 5, 6, 6, 6)
periods
[1]
1 1 1 2 2 3 3
[2]
4 5 5 6 6 6
To look something like
> df
week period
1 1 1
2 1 1
3 1 1
4 2 1
5 2 1
6 3 1
7 3 1
8 4 2
9 5 2
10 5 2
11 6 2
12 6 2
13 6 2
>
The data contains +13k rows so would need to do some sort of map in style of
mapPeriod <- function(df, fun) {
out <- vector("vector_of_weeks", length(df))
for (i in seq_along(df)) {
out[i] <- fun(df[[i]])
}
out
}
I just don't know what to include in the fun to divide the weeks to the decided sequences of periods. Can function rep be of assistance here? How?
I would be very grateful for all input and suggestions.
split(weeks, f = (weeks - 1) %/% 3)
$`0`
[1] 1 1 1 2 2 3 3
$`1`
[1] 4 5 5 6 6 6
from comments below
weeks <- c(1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 6)
df <- data.frame(weeks)
library(data.table)
df$period <- data.table::rleid((weeks - 1) %/% 3)
# weeks period
# 1 1 1
# 2 1 1
# 3 1 1
# 4 2 1
# 5 2 1
# 6 3 1
# 7 3 1
# 8 4 2
# 9 5 2
# 10 5 2
# 11 6 2
# 12 6 2
# 13 6 2
Is there an efficient way to create an ID column using rep/seq or some other function I'm not thinking of to make a sequence such as the following:
1, 2, 3, 4, 4, 5, 5, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10.....
So every 3 numbers the following 3 numbers get repeated an additional time. My actual data will require a sequence that is:
1:1000- 1 each
1001:2000- 2 each
2001:3000 - 3 each
....
Any ideas/help would be greatly appreciated.
We can use
v2 <- 1:7000
rep(v2, as.integer(gl(length(v2), 1000, length(v2))))
For the first case
v1 <- 1:15
rep(v1, as.integer(gl(length(v1), 3, length(v1))))
[1] 1 2 3 4 4 5 5 6 6 7 7 7 8 8 8 9 9 9 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 13 14 14 14 14 14 15 15 15 15
We can use rep inside rep.int.
First case:
rep.int(1:12, rep(1:4, each = 3))
Second case:
rep.int(1:3e3, rep(1:3, each = 1e3))
I have 2 integer vectors named k and r. I would like to create a new vector which will contain numbers that are within both + unique numbers for both of them.
See example below:
k:
1 4 8 9 10
r:
1 10 4 12 14
The desired result:
1 4 8 9 10 12 14
you are searching for union(k,r)
We can use unique with sort
sort(unique(c(r, k)))
#[1] 1 4 8 9 10 12 14
data
k <- c(1, 4, 8, 9, 10)
r <- c(1, 10, 4, 12, 14)
This question already has answers here:
Repeat vector to fill down column in data frame
(2 answers)
Closed 4 years ago.
I want to create the vectors with R commands:
(4, 6, 3, 4, 6, 3, ..., 4, 6, 3, 4, 6) where there are 10 occurrences of 4, 10 occurrences of 6, and 9 occurrences of 3.
Try rep and its length.out argument
x <- rep(c(4, 6, 3), length.out = 29)
x
#[1] 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6
Count the occurrences of each element
table(x)
#x
# 3 4 6
# 9 10 10
You could also use rep_len as suggested by #snoram
rep_len(c(4, 6, 3), 29)
we've got a problem with removing two outliers from our dataset. The data is about an experiment with two independent and one dependent variable. We've exercised the multiple regression and analyzed the "Normal Q-Q" plot. It showed us two outliers (10,46). Now we would like to remove those two cases, before rerunning the multiple regression without the outliers.
We've already tried out various commands recommended in several R platforms but unfortunately nothing worked out.
We would be glad, if anyone of you had an idea that would help us solving our problem.
Thank You very much for helping.
Since no data was provided, I fabricated some:
> x <- data.frame(a = c(10, 12, 14, 6, 10, 8, 11, 9), b = c(1, 2, 3, 24, 4, 1, 2, 4),
c = c(2, 1, 3, 6, 3, 4, 2, 48))
> x
a b c
1 10 1 2
2 12 2 1
3 14 3 3
4 6 24 6
5 10 4 3
6 8 1 4
7 11 2 2
8 9 4 48
If the 4th case in column x$b and the 8th case in column x$c are outliers:
> x1 <- x[-c(4, 8), ]
> x1
a b c
1 10 1 2
2 12 2 1
3 14 3 3
5 10 4 3
6 8 1 4
7 11 2 2
Is this what you need?