Create special vectors with R commands [duplicate] - r

This question already has answers here:
Repeat vector to fill down column in data frame
(2 answers)
Closed 4 years ago.
I want to create the vectors with R commands:
(4, 6, 3, 4, 6, 3, ..., 4, 6, 3, 4, 6) where there are 10 occurrences of 4, 10 occurrences of 6, and 9 occurrences of 3.

Try rep and its length.out argument
x <- rep(c(4, 6, 3), length.out = 29)
x
#[1] 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6
Count the occurrences of each element
table(x)
#x
# 3 4 6
# 9 10 10
You could also use rep_len as suggested by #snoram
rep_len(c(4, 6, 3), 29)

Related

Divide data in to chunks with multiple values in each chunk in R

I have a dataframe with observations from three years time, with column df$week that indicates the week of the observation. (The week count of the second year continues from the count of the first, so the data contains 207 weeks).
I would like to divide the data to longer time periods, to df$period that would include all observations from several weeks' time.
If a period would be the length of three weeks, and I the data would include 13 observations in six weeks time, the I idea would be to divide
weeks <- c(1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 6)
into
periods <- c(1, 1, 1, 2, 2, 3, 3), c(4, 5, 5, 6, 6, 6)
periods
[1]
1 1 1 2 2 3 3
[2]
4 5 5 6 6 6
To look something like
> df
week period
1 1 1
2 1 1
3 1 1
4 2 1
5 2 1
6 3 1
7 3 1
8 4 2
9 5 2
10 5 2
11 6 2
12 6 2
13 6 2
>
The data contains +13k rows so would need to do some sort of map in style of
mapPeriod <- function(df, fun) {
out <- vector("vector_of_weeks", length(df))
for (i in seq_along(df)) {
out[i] <- fun(df[[i]])
}
out
}
I just don't know what to include in the fun to divide the weeks to the decided sequences of periods. Can function rep be of assistance here? How?
I would be very grateful for all input and suggestions.
split(weeks, f = (weeks - 1) %/% 3)
$`0`
[1] 1 1 1 2 2 3 3
$`1`
[1] 4 5 5 6 6 6
from comments below
weeks <- c(1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 6)
df <- data.frame(weeks)
library(data.table)
df$period <- data.table::rleid((weeks - 1) %/% 3)
# weeks period
# 1 1 1
# 2 1 1
# 3 1 1
# 4 2 1
# 5 2 1
# 6 3 1
# 7 3 1
# 8 4 2
# 9 5 2
# 10 5 2
# 11 6 2
# 12 6 2
# 13 6 2

Loop to sum observation of a time serie in R

I'm sure this is very obvious but i'm a begginer in R and i spent a good part of the afternoon trying to solve this...
I'm trying to create a loop to sum observation in my time serie in steps of five.
for example :
input:
1
2
3
4
5
5
6
6
7
4
5
5
4
4
5
6
5
6
4
4
output:
15
28
23
25
My time serie as only one variable, and 7825 obserbations.
The finality of the loop is to calculate the weekly realized volatility. My observations are squared returns. Once i'll have my loop, i'll be able to extract the square root and have my weekly realized volatility.
Thank you very much in advance for any help you can provide.
H.
We can create a grouping variable with gl and use that to get the sum in tapply
tapply(input, as.integer(gl(length(input), 5, length(input))),
FUN = sum, na.rm = TRUE)
# 1 2 3 4
# 15 28 23 25
data
input <- scan(text = "1 2 3 4 5 5 6 6 7 4 5 5 4 4 5 6 5 6 4 4", what = numeric())
Here is another base R option using sapply + split
> sapply(split(x,ceiling(seq_along(x)/5)),sum)
1 2 3 4
15 28 23 25
Data
x <- c(1, 2, 3, 4, 5, 5, 6, 6, 7, 4, 5, 5, 4, 4, 5, 6, 5, 6, 4, 4)

How do you efficiently return the order of an increasing index? [duplicate]

This question already has answers here:
Create group names for consecutive values
(4 answers)
Closed 4 years ago.
I have the following index vector:
TestVec = rep(c(6,8,9,11,18), each = 10)
This reads c(6, 6, ..., 6, 8, 8, ..., 8, 9, 9, ..., 9, ...).
I would like to convert this vector into c(1, 1, ..., 1, 2, 2, ..., 2, 3, 3, ..., 3, ...)
Try
I have improvised a quick-and-dirty method, as follows:
sapply(TestVec, function(x) {which(x == unique(TestVec))})
This works fine, but this takes a lot of time in a large dataset.
Is there any efficient way to improve?
match(TestVec, unique(TestVec))
Another option:
as.numeric(as.factor(TestVec))
# [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5
Requiring data.table:
rleid(TestVec)
Here is another one,
c(1, cumsum(diff(TestVec) != 0)) + 1

How do I duplicate values in R? [duplicate]

This question already has answers here:
Create sequence of repeated values, in sequence?
(3 answers)
Closed 4 years ago.
Assume I have the following vector:
v1 <- c(1, 2, 3, 4, 5)
If I wanted to expand this vector so that there are 50 1 values, 50 2 values, etc., how would I do this?
Please let me know if you need any clarification.
Have a look at this:
v1 <- c(1, 2, 3, 4, 5)
rep(v1, 2)
# [1] 1 2 3 4 5 1 2 3 4 5
Or with each (after #Rui's comment):
rep(v1, each = 2)
# [1] 1 1 2 2 3 3 4 4 5 5

For loop that only counts unique values [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
My data frame consists of these columns: A_NUMBER, B_NUMBER, DURATION. I would like to count how many times A_NUMBER calls to a different B_NUMBER (to see how big their network is).
I first created a new column with all values set equal to 0.
df$CFU <- rep (0,nrow(df))
Next, I tried the following for loop:
for (j in 1:nrow(df)){ for (i in 1:nrow(unique(df$B_NUMBER))){
if(df$A_NUMBER[i] == df$A_NUMBER[j]) {df$CFU[j] <- sum(df$CFU[j],1) }}}
Then I get the following error:
'error in 1:nrow(unique(df$B_NUMBER)): argument of length 0.
How should I solve this?
The way I understood your question is that you are looking for is a list of unique B_NUMBERs for each A_NUMBER.
A_NUMBER = round(runif(100,0,10))
B_NUMBER = round(runif(100,0,10))
df = cbind(A_NUMBER, B_NUMBER)
aggregate(B_NUMBER ~ A_NUMBER, data=df, unique)
A_NUMBER B_NUMBER
1 0 10, 8
2 1 9, 3, 1, 7, 8, 0
3 2 7, 0, 6, 1, 9, 2, 10
4 3 7, 3, 6, 8, 4, 5
5 4 7, 9, 3, 10, 4, 8, 1, 2, 5
6 5 6, 5, 2, 8
7 6 4, 8, 9, 6, 10, 3
8 7 7, 3, 6, 0, 4, 1, 9, 8
9 8 7, 9, 8, 5, 2
10 9 8, 6, 2, 9, 0, 4, 1
11 10 7
and then you can call the length of the vectors as
aggregate(B_NUMBER ~ A_NUMBER, data=df, function(x) length(unique(x))
A_NUMBER B_NUMBER
1 0 2
2 1 6
3 2 7
4 3 6
5 4 9
6 5 4
7 6 6
8 7 8
9 8 5
10 9 7
11 10 1
and check whether it was correct by
subset(df,A_NUMBER == 8)
A_NUMBER B_NUMBER
[1,] 8 7
[2,] 8 9
[3,] 8 7
[4,] 8 8
[5,] 8 5
[6,] 8 7
[7,] 8 2
[8,] 8 2
[9,] 8 8
Looks good, only 7s, 9s, 8s, 5s and 2s!
Because you did not provide an example data, it is difficult to further examine what happened to your for loop. But based on the error message, it is clear that 1:nrow(unique(df$B_NUMBER)) is not working. The function unique returns a vector, which is one-dimensional. If you take this vector as your input to nrow, it will return NULL. It is possible that what you need is length, not nrow, in this case.
By the way, df$CFU <- rep(0, nrow(df)) can be simplified to df$CFU <- 0

Resources