Keeping all data around "rollmean" output - r

I have recently found out that rollmean will provide me with the moving average around a number in my matrix. The problem I have is that my matrix shrinks and I also lose the row names when the function is executed. For example the matrix MA.test as such is quantities per day in the rows (A = Mon, B = Tues, etc.):
> MA.Test
a b c d e f g h i j k l m n o p q r s t
A 49 21 6 27 34 49 21 6 27 34 49 21 6 27 34 49 21 6 27 34
B 35 23 37 47 45 35 23 37 47 45 35 23 37 47 45 35 23 37 47 45
C 40 0 20 10 19 40 0 20 10 19 40 0 20 10 19 40 0 20 10 19
D 8 46 22 3 28 8 46 22 3 28 8 46 22 3 28 8 46 22 3 28
E 30 7 1 42 39 30 7 1 42 39 30 7 1 42 39 30 7 1 42 39
F 9 16 32 14 33 9 16 32 14 33 9 16 32 14 33 9 16 32 14 33
G 48 5 13 15 11 48 5 13 15 11 48 5 13 15 11 48 5 13 15 11
H 12 38 36 18 24 12 38 36 18 24 12 38 36 18 24 12 38 36 18 24
I 43 26 17 44 25 43 26 17 44 25 43 26 17 44 25 43 26 17 44 25
J 41 2 29 31 4 41 2 29 31 4 41 2 29 31 4 41 2 29 31 4
When I apply the function for an average covering 3 days each side (which would be using 7, incorporating the day, I would use rollmean(MA.Test,7) and label this MA.Test.1 and get the following:
> MA.Test.1 = rollmean(MA.Test,7)
> MA.Test.1
a b c d e f g h i j k l m n o p q r s t
[1,] 31 17 19 23 30 31 17 19 23 30 31 17 19 23 30 31 17 19 23 30
[2,] 26 19 23 21 28 26 19 23 21 28 26 19 23 21 28 26 19 23 21 28
[3,] 27 20 20 21 26 27 20 20 21 26 27 20 20 21 26 27 20 20 21 26
[4,] 27 20 21 24 23 27 20 21 24 23 27 20 21 24 23 27 20 21 24 23
My queries is two fold:
I am aware the output begins with a MA around row D and ends at row G as I will have no values for rows A/B/C or H/I/J as they have insufficient surrounding data; how would I still KEEP these rows in the output with simply an "NA"?
I am losing the row names - simple enough for this small example, but my real data set contains +100 rows and these row names are dates; how would I keep the original column names in the output?
My desired final output would look as such:
> MA.Test.1 = rollmean(MA.Test,7)
> MA.Test.1
a b c d e f g h i j k l m n o p q r s t
A NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
B NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
C NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
D 31 17 19 23 30 31 17 19 23 30 31 17 19 23 30 31 17 19 23 30
E 26 19 23 21 28 26 19 23 21 28 26 19 23 21 28 26 19 23 21 28
F 27 20 20 21 26 27 20 20 21 26 27 20 20 21 26 27 20 20 21 26
G 27 20 21 24 23 27 20 21 24 23 27 20 21 24 23 27 20 21 24 23
H NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
I NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
J NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Thank you kindly for any solutions offered!

Use fill=NA to pad with NA. Then you can set the rownames of the result to that of the input.
MA.Test.1 <- rollmean(MA.Test,7,fill=NA)
rownames(MA.Test.1) <- rownames(MA.Test)
But if your actual data have Dates as row names, then you could just use zoo (or xts).
library(xts)
ma <- MA.Test
rownames(ma) <- Sys.Date()-9:0
# zoo
z <- zoo(ma, as.Date(rownames(ma)))
z1 <- rollmean(z, 7, fill=NA)
# xts
x <- as.xts(ma)
x1 <- rollmean(x, 7, fill=NA)

Related

Generating a vector with n repetitions of x, then y, then z, with a fixed upper bound

I am trying to create a vector where I have 3 repetitions of the number 1, then 3 repetitions of the number 2, and so on up to, for instance, 3 repetitions of the number 36.
c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5...)
I have tried the following use of rep() but got the following error:
Error in rep(3, seq(1:36)) : argument 'times' incorrect
What formulation do I need to use to properly generate the vector I want?
sort(rep(1:36, 3))
Or even better as #Wimpel mentioned in the comments, use the each argument of the rep function.
rep(1:36, each = 3)
output
# [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17 17 18 18 18 19 19 19 20 20 20 21 21 21 22
# [65] 22 22 23 23 23 24 24 24 25 25 25 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34 34 34 35 35 35 36 36 36
This one should work. However probably not the most elegant.
reps = c()
n = 36
for(i in 1:n){
reps = append(reps, rep(i, 3))
}
reps
alternatively using the rep function properly (see documentation (?rep for argument each):
rep(1:36,each = 3)
rep approach is preferable (see existing answers)
Here are some other options:
> kronecker(1:36, rep(1, 3))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36
> c(outer(rep(1, 3), 1:36))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36

Changing multiple column names with a vector

I have tried to change column names based on a vector as follows:
library(data.table)
df <- fread(
"radio1 radio2 radio3 radio4 radio5 radio6 radio7
8 12 18 32 40 36 32
6 12 18 24 30 36 30
8 16 18 24 30 36 18
4 12 12 24 30 36 24
6 16 24 32 40 48 24
8 12 18 24 30 36 30
8 12 18 24 30 36 18
8 16 24 32 40 48 40
8 16 24 24 30 48 48",
header = TRUE
)
var <- c("radio1","radio2","radio3","radio4","radio5", "radio6", "radio7")
recode <- c("A","B","C","D","E", "F", "G")
variables <- cbind(var, recode)
variables <- as.data.table(variables)
for (i in seq_len(ncol(df))) {
colnames(df[[i]]) <- variables$recode[match(names(df)[i], variables $var)]
}
I however get the error:
Error in `colnames<-`(`*tmp*`, value = variables$recode[match(names(df)[i], :
attempt to set 'colnames' on an object with less than two dimensions
What am I doing wrong? Is there a better way to do this?
You can use match directly.
names(df) <- variables$recode[match(names(df), variables$var)]
df
# A B C D E F G
#1: 8 12 18 32 40 36 32
#2: 6 12 18 24 30 36 30
#3: 8 16 18 24 30 36 18
#4: 4 12 12 24 30 36 24
#5: 6 16 24 32 40 48 24
#6: 8 12 18 24 30 36 30
#7: 8 12 18 24 30 36 18
#8: 8 16 24 32 40 48 40
#9: 8 16 24 24 30 48 48
By changing colnames(df[[i]]) to colnames(df)[i], the loop works fine:
for (i in seq_len(ncol(df))) {
colnames(df)[i] <- variables$recode[match(names(df)[i], variables$var)] }
> df
A B C D E F G
1: 8 12 18 32 40 36 32
2: 6 12 18 24 30 36 30
3: 8 16 18 24 30 36 18
4: 4 12 12 24 30 36 24
5: 6 16 24 32 40 48 24
6: 8 12 18 24 30 36 30
7: 8 12 18 24 30 36 18
8: 8 16 24 32 40 48 40
9: 8 16 24 24 30 48 48

Build the dataframe with conditional vector in R

I have two vectors
a <- c(18,19,19,19,21,21,22,23,24,25,26,27,28,30,31,35,36,37)
b <- c(19,25,31,37)
I need the data frame following format:
a b
18 19
19 19
19 19
19 19
21 25
21 25
22 25
23 25
24 25
25 25
26 31
27 31
28 31
30 31
31 31
35 37
36 37
37 37
Here value 19 in vector b repeat up to the value 19 in vector a.
After that 21(in a) is the greater than 19 ,so the next value of 25(in b) is be repeat until the 25(in a )
in similar way construct the dataframe.
Thank you.
We can get the position index from findInterval, use that to create the times for the rep
i1 <- findInterval(b, a)
data.frame(a, b = rep(b, c(i1[1], diff(i1))))
# a b
#1 18 19
#2 19 19
#3 19 19
#4 19 19
#5 21 25
#6 21 25
#7 22 25
#8 23 25
#9 24 25
#10 25 25
#11 26 31
#12 27 31
#13 28 31
#14 30 31
#15 31 31
#16 35 37
#17 36 37
#18 37 37
Alternatively,
data.frame(a, b = sapply(a, function(x) b[x <= b][1]))
# a b
# 1 18 19
# 2 19 19
# 3 19 19
# 4 19 19
# 5 21 25
# 6 21 25
# 7 22 25
# 8 23 25
# 9 24 25
# 10 25 25
# 11 26 31
# 12 27 31
# 13 28 31
# 14 30 31
# 15 31 31
# 16 35 37
# 17 36 37
# 18 37 37

R, shifting parts of rows up [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
How do I shift the data under ChangeJanAug from row 21 up, in order that the NA are filled with the correct numbers? Since I do not want to shift all the rows, I have to clue what to do.
city latitude JanTemp AprTemp AugTemp ChangeJanAug
1 MiamiFL 26 67 75 83 NA
2 HoustonTX 30 50 68 82 NA
3 MobileAL 31 50 68 82 NA
4 DallasTX 33 43 66 85 NA
5 PhoenixAZ 33 54 70 92 NA
6 LosAngelesCA 34 58 63 75 NA
7 MemphisTN 35 40 63 81 NA
8 NorfolkVA 37 39 57 77 NA
9 SanFranciscoCA 38 49 56 64 NA
10 BaltimoreMD 39 32 53 76 NA
11 KansasCityMO 39 28 55 76 NA
12 WashingtonDC 39 31 53 74 NA
13 PittsburghPA 40 25 50 71 NA
14 ClevelandOH 41 25 48 70 NA
15 NewYorkNY 41 32 53 76 NA
16 BostonMA 42 29 48 72 NA
17 SyracuseNY 43 22 46 68 NA
18 MinneapolisMN 45 12 46 71 NA
19 PortlandOR 46 40 51 69 NA
20 DuluthMN 47 7 39 64 NA
21 <NA> NA NA NA NA 16
22 <NA> NA NA NA NA 32
23 <NA> NA NA NA NA 32
24 <NA> NA NA NA NA 42
25 <NA> NA NA NA NA 38
26 <NA> NA NA NA NA 17
27 <NA> NA NA NA NA 41
28 <NA> NA NA NA NA 38
29 <NA> NA NA NA NA 15
30 <NA> NA NA NA NA 44
31 <NA> NA NA NA NA 48
32 <NA> NA NA NA NA 43
33 <NA> NA NA NA NA 46
34 <NA> NA NA NA NA 45
35 <NA> NA NA NA NA 44
36 <NA> NA NA NA NA 43
37 <NA> NA NA NA NA 46
38 <NA> NA NA NA NA 59
39 <NA> NA NA NA NA 29
40 <NA> NA NA NA NA 57
Thank you so much!
I agree with the comment of #Heroka that it would have been better to avoid such a situation. But now that you have the data in this form, you could use the following line of code to shift up the entries of the column ChangeJanAug by 20 rows:
df$ChangeJanAug <- c(df$ChangeJanAug[21:nrow(df)],rep(NA,(nrow(df)-20)))
Afterwards you could "clean up" the block of NA entries with
df <- df[1:20,]
If you plan to remove the NAs like this, you may not need to bother about vector recycling and you could use simply
df$ChangeJanAug <- df$ChangeJanAug[21:nrow(df)]
in the first step.
This could be an option
data$ChangeJanAug_new = c(data$ChangeJanAug[-(seq(20))], rep(NA, 20))
out = data[colnames(data) != "ChangeJanAug"]
#later if you want to remove NAs you could do this
out[!is.na(out$ChangeJanAug_new),]
Using na.omit and cbind you could do this (Given you original data is exactly as you mentioned in the question)
cbind(na.omit(data[,-6]), ChangeJanAug = na.omit(data$ChangeJanAug))
# city latitude JanTemp AprTemp AugTemp ChangeJanAug
#1 MiamiFL 26 67 75 83 16
#2 HoustonTX 30 50 68 82 32
#3 MobileAL 31 50 68 82 32
#4 DallasTX 33 43 66 85 42
#5 PhoenixAZ 33 54 70 92 38
#6 LosAngelesCA 34 58 63 75 17
#7 MemphisTN 35 40 63 81 41
#8 NorfolkVA 37 39 57 77 38
#9 SanFranciscoCA 38 49 56 64 15
#10 BaltimoreMD 39 32 53 76 44
#11 KansasCityMO 39 28 55 76 48
#12 WashingtonDC 39 31 53 74 43
#13 PittsburghPA 40 25 50 71 46
#14 ClevelandOH 41 25 48 70 45
#15 NewYorkNY 41 32 53 76 44
#16 BostonMA 42 29 48 72 43
#17 SyracuseNY 43 22 46 68 46
#18 MinneapolisMN 45 12 46 71 59
#19 PortlandOR 46 40 51 69 29
#20 DuluthMN 47 7 39 64 57

Splitting and iterative simple regression in r

I am pretty much new to r and I have a dummy example of a bigger table underneath. I want to split the table based on id (a,b,c,d) and do iterative simple linear regression for every subset:
x is my x variable, and columns 1:6 are y variables, to have an output of each id and each column from 1:6. Also, it would be great if I could output the model p values of the slopes into a new data frame
id x 1 2 3 4 5 6
1 a 74 18 19 NA 23 29 1
2 a 77 16 19 17 22 29 2
3 a 79 16 NA 19 23 29 3
4 a 81 17 20 18 23 29 4
5 b 74 19 20 19 23 28 11
6 b 76 15 19 18 26 28 12
7 b 79 19 21 20 24 28 NA
8 b 81 19 21 20 23 28 14
9 c 68 19 20 20 23 29 8
10 c 70 17 22 22 27 29 9
11 c 73 18 22 21 23 29 10
12 c 75 19 20 19 23 29 11
13 d 65 18 18 19 22 28 5
14 d 68 18 NA 18 20 29 6
15 d 70 18 19 18 23 28 7
16 d 72 19 17 19 22 28 8
I tried to do use plyr package but it didn't work out
regression = NULL
for ( i in 3:ncol(dumm)){
regression[i] <- dlply(dumm, .(id), function(z) lm(dumm[,i]~dumm$x, z))
}
coefs <- ldply(regression, coef)
Thanks in advance!

Resources