R, shifting parts of rows up [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
How do I shift the data under ChangeJanAug from row 21 up, in order that the NA are filled with the correct numbers? Since I do not want to shift all the rows, I have to clue what to do.
city latitude JanTemp AprTemp AugTemp ChangeJanAug
1 MiamiFL 26 67 75 83 NA
2 HoustonTX 30 50 68 82 NA
3 MobileAL 31 50 68 82 NA
4 DallasTX 33 43 66 85 NA
5 PhoenixAZ 33 54 70 92 NA
6 LosAngelesCA 34 58 63 75 NA
7 MemphisTN 35 40 63 81 NA
8 NorfolkVA 37 39 57 77 NA
9 SanFranciscoCA 38 49 56 64 NA
10 BaltimoreMD 39 32 53 76 NA
11 KansasCityMO 39 28 55 76 NA
12 WashingtonDC 39 31 53 74 NA
13 PittsburghPA 40 25 50 71 NA
14 ClevelandOH 41 25 48 70 NA
15 NewYorkNY 41 32 53 76 NA
16 BostonMA 42 29 48 72 NA
17 SyracuseNY 43 22 46 68 NA
18 MinneapolisMN 45 12 46 71 NA
19 PortlandOR 46 40 51 69 NA
20 DuluthMN 47 7 39 64 NA
21 <NA> NA NA NA NA 16
22 <NA> NA NA NA NA 32
23 <NA> NA NA NA NA 32
24 <NA> NA NA NA NA 42
25 <NA> NA NA NA NA 38
26 <NA> NA NA NA NA 17
27 <NA> NA NA NA NA 41
28 <NA> NA NA NA NA 38
29 <NA> NA NA NA NA 15
30 <NA> NA NA NA NA 44
31 <NA> NA NA NA NA 48
32 <NA> NA NA NA NA 43
33 <NA> NA NA NA NA 46
34 <NA> NA NA NA NA 45
35 <NA> NA NA NA NA 44
36 <NA> NA NA NA NA 43
37 <NA> NA NA NA NA 46
38 <NA> NA NA NA NA 59
39 <NA> NA NA NA NA 29
40 <NA> NA NA NA NA 57
Thank you so much!

I agree with the comment of #Heroka that it would have been better to avoid such a situation. But now that you have the data in this form, you could use the following line of code to shift up the entries of the column ChangeJanAug by 20 rows:
df$ChangeJanAug <- c(df$ChangeJanAug[21:nrow(df)],rep(NA,(nrow(df)-20)))
Afterwards you could "clean up" the block of NA entries with
df <- df[1:20,]
If you plan to remove the NAs like this, you may not need to bother about vector recycling and you could use simply
df$ChangeJanAug <- df$ChangeJanAug[21:nrow(df)]
in the first step.

This could be an option
data$ChangeJanAug_new = c(data$ChangeJanAug[-(seq(20))], rep(NA, 20))
out = data[colnames(data) != "ChangeJanAug"]
#later if you want to remove NAs you could do this
out[!is.na(out$ChangeJanAug_new),]
Using na.omit and cbind you could do this (Given you original data is exactly as you mentioned in the question)
cbind(na.omit(data[,-6]), ChangeJanAug = na.omit(data$ChangeJanAug))
# city latitude JanTemp AprTemp AugTemp ChangeJanAug
#1 MiamiFL 26 67 75 83 16
#2 HoustonTX 30 50 68 82 32
#3 MobileAL 31 50 68 82 32
#4 DallasTX 33 43 66 85 42
#5 PhoenixAZ 33 54 70 92 38
#6 LosAngelesCA 34 58 63 75 17
#7 MemphisTN 35 40 63 81 41
#8 NorfolkVA 37 39 57 77 38
#9 SanFranciscoCA 38 49 56 64 15
#10 BaltimoreMD 39 32 53 76 44
#11 KansasCityMO 39 28 55 76 48
#12 WashingtonDC 39 31 53 74 43
#13 PittsburghPA 40 25 50 71 46
#14 ClevelandOH 41 25 48 70 45
#15 NewYorkNY 41 32 53 76 44
#16 BostonMA 42 29 48 72 43
#17 SyracuseNY 43 22 46 68 46
#18 MinneapolisMN 45 12 46 71 59
#19 PortlandOR 46 40 51 69 29
#20 DuluthMN 47 7 39 64 57

Related

Matrix.utils merge.Matrix and merge different results

I am very confused by the R package Matrix.utils and its implementation of merge.Matrix(). I want to merge two matrices with 0 common values, but merge common column names and fill the rest with zeros.
The results are inconsistent and sensitive to whether merge() or merge.Matrix() is specified. I expected this to be similar to the dplyr::join() function but this is not true.
Simulating the data I plan to use:
mtx.x <- sample(1:100, 100) ; mtx.x <- matrix(mtx.x, nrow = 10)
mtx.y <- sample(1:100, 100) ; mtx.y <- matrix(mtx.y, nrow = 10)
colnames(mtx.x) <- letters[1:10] ; colnames(mtx.y) <- letters[6:15]
mtx.x ; mtx.y
a b c d e f g h i j
[1,] 82 61 76 36 27 67 85 38 29 87
[2,] 83 89 43 70 81 30 35 17 39 95
[3,] 1 75 69 54 66 3 10 47 93 73
[4,] 52 98 26 88 51 64 31 72 13 92
[5,] 44 74 86 9 63 58 50 56 6 49
[6,] 24 16 77 12 55 97 18 45 14 40
[7,] 11 5 79 94 2 80 37 15 41 42
[8,] 100 84 65 59 34 62 53 60 99 28
[9,] 19 78 8 25 96 21 90 46 68 71
[10,] 32 20 7 4 57 91 22 48 33 23
f g h i j k l m n o
[1,] 24 22 8 94 89 7 50 93 40 4
[2,] 63 80 32 44 64 83 16 96 46 47
[3,] 85 30 81 95 23 91 19 92 99 52
[4,] 21 55 61 58 27 76 67 65 37 14
[5,] 9 66 12 2 41 11 56 84 87 39
[6,] 18 57 88 3 68 100 74 62 82 25
[7,] 70 90 43 54 72 86 69 20 29 51
[8,] 1 59 60 45 79 75 15 5 73 10
[9,] 38 28 26 17 53 36 97 13 77 49
[10,] 6 71 98 35 42 31 78 33 48 34
Case 1: merge() with all.x/all.y set to TRUE does what I want
merge(x = mtx.x, y = mtx.y,
all.x = T, all.y = T)
f g h i j a b c d e k l m n o
1 1 59 60 45 79 NA NA NA NA NA 75 15 5 73 10
2 3 10 47 93 73 1 75 69 54 66 NA NA NA NA NA
3 6 71 98 35 42 NA NA NA NA NA 31 78 33 48 34
4 9 66 12 2 41 NA NA NA NA NA 11 56 84 87 39
5 18 57 88 3 68 NA NA NA NA NA 100 74 62 82 25
6 21 55 61 58 27 NA NA NA NA NA 76 67 65 37 14
7 21 90 46 68 71 19 78 8 25 96 NA NA NA NA NA
8 24 22 8 94 89 NA NA NA NA NA 7 50 93 40 4
9 30 35 17 39 95 83 89 43 70 81 NA NA NA NA NA
10 38 28 26 17 53 NA NA NA NA NA 36 97 13 77 49
11 58 50 56 6 49 44 74 86 9 63 NA NA NA NA NA
12 62 53 60 99 28 100 84 65 59 34 NA NA NA NA NA
13 63 80 32 44 64 NA NA NA NA NA 83 16 96 46 47
14 64 31 72 13 92 52 98 26 88 51 NA NA NA NA NA
15 67 85 38 29 87 82 61 76 36 27 NA NA NA NA NA
16 70 90 43 54 72 NA NA NA NA NA 86 69 20 29 51
17 80 37 15 41 42 11 5 79 94 2 NA NA NA NA NA
18 85 30 81 95 23 NA NA NA NA NA 91 19 92 99 52
19 91 22 48 33 23 32 20 7 4 57 NA NA NA NA NA
20 97 18 45 14 40 24 16 77 12 55 NA NA NA NA NA
Case 2: merge.Matrix() with same arguments wants me to specify by.x/by.y
merge.Matrix(x = mtx.x, y = mtx.y,
all.x = T, all.y = T)
Error in grr::matches(by.x, by.y, all.x, all.y, nomatch = NULL) :
argument "by.x" is missing, with no default
Case 3: specifying by.x/by.y as respective column names does not merge common columns. also, no idea why its offsetting the matrices by 5 and not 10, the matrices have no common values.
merge.Matrix(x = mtx.x, y = mtx.y,
all.x = T, all.y = T,
by.x = colnames(mtx.x), by.y = colnames(mtx.y))
a b c d e f g h i j y.f y.g y.h y.i y.j k l m n o
82 61 76 36 27 67 85 38 29 87 NA NA NA NA NA NA NA NA NA NA
83 89 43 70 81 30 35 17 39 95 NA NA NA NA NA NA NA NA NA NA
1 75 69 54 66 3 10 47 93 73 NA NA NA NA NA NA NA NA NA NA
52 98 26 88 51 64 31 72 13 92 NA NA NA NA NA NA NA NA NA NA
44 74 86 9 63 58 50 56 6 49 NA NA NA NA NA NA NA NA NA NA
24 16 77 12 55 97 18 45 14 40 24 22 8 94 89 7 50 93 40 4
11 5 79 94 2 80 37 15 41 42 63 80 32 44 64 83 16 96 46 47
100 84 65 59 34 62 53 60 99 28 85 30 81 95 23 91 19 92 99 52
19 78 8 25 96 21 90 46 68 71 21 55 61 58 27 76 67 65 37 14
32 20 7 4 57 91 22 48 33 23 9 66 12 2 41 11 56 84 87 39
fill.x NA NA NA NA NA NA NA NA NA NA 18 57 88 3 68 100 74 62 82 25
fill.x NA NA NA NA NA NA NA NA NA NA 70 90 43 54 72 86 69 20 29 51
fill.x NA NA NA NA NA NA NA NA NA NA 1 59 60 45 79 75 15 5 73 10
fill.x NA NA NA NA NA NA NA NA NA NA 38 28 26 17 53 36 97 13 77 49
fill.x NA NA NA NA NA NA NA NA NA NA 6 71 98 35 42 31 78 33 48 34
Case 4: by.x/by.y specified as common column names, all.x/all.y set to TRUE and fill.x/fill.y set to 0 does not do a full join as the documentation claims
common <- intersect(colnames(mtx.x), colnames(mtx.y))
merge.Matrix(x = mtx.x, y = mtx.y,
all.x = T, all.y = T,
by.x = common, by.y = common)
a b c d e f g h i j y.f y.g y.h y.i y.j k l m n o
82 61 76 36 27 67 85 38 29 87 24 22 8 94 89 7 50 93 40 4
83 89 43 70 81 30 35 17 39 95 63 80 32 44 64 83 16 96 46 47
1 75 69 54 66 3 10 47 93 73 85 30 81 95 23 91 19 92 99 52
52 98 26 88 51 64 31 72 13 92 21 55 61 58 27 76 67 65 37 14
44 74 86 9 63 58 50 56 6 49 9 66 12 2 41 11 56 84 87 39

Fill NA while keeping continous scale [duplicate]

This question already has answers here:
How to replace NA values in a data.table with na.spline
(2 answers)
How to replace NA (missing values) in a data frame with neighbouring values
(3 answers)
Closed 2 years ago.
I'd like to know if there's a way to fill NA values while keeping a continuous scale for a numeric vector.
Suppose I have a vector like this:
set.seed(55)
as.list(missForest::prodNA(data.frame(a=c(1:100)),noNA=0.3))
$a
[1] 1 NA 3 NA 5 NA 7 8 9 10 11 12 13 14 15 16 17 18 19 NA
[21] 21 22 23 24 NA 26 27 28 29 30 31 32 33 NA 35 NA 37 38 39 40
[41] 41 42 43 NA 45 46 47 48 NA 50 51 52 53 54 55 56 57 NA NA 60
[61] 61 62 NA NA 65 66 NA NA NA NA NA NA NA 74 75 NA 77 NA 79 NA
[81] 81 82 NA 84 85 86 NA 88 89 90 91 92 NA 94 95 NA NA NA NA 100
How can I get
> as.list(data.frame(a=c(1:100)))
$a
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[21] 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
[41] 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
[61] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
[81] 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
by filling NA?
You can use zoo's na.spline
x <- missForest::prodNA(data.frame(a=c(1:100)),noNA=0.3)$a
zoo::na.spline(x)
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
#[16] 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
#[31] 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
#[46] 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
#[61] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
#[76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#[91] 91 92 93 94 95 96 97 98 99 100

Simple example of using tryCatch()

I am struggling to figure out how to use tryCatch() to throw an error. I have read several blog posts, Hadley's write-up in advanced R and several SO posts. But for some reason, it just hasn't sunk in yet. My dummy example is this: when a vector has a length that is less that 160, stop executing the function and instead provide the user with an error message. Pretty simple stuff but apparently not for me. I feel like this function should do exactly that:
dummy_fun <- function(x) {
tryCatch(length(x) < 160 ,
error = function(e) {
print("An error message")
}
)
return(x*2)
}
But when I run the function, the error is not caught:
>dummy_fun(airquality$Ozone)
[1] 82 72 24 36 NA 56 46 38 16 NA 14 32 22 28 36 28 68 12 60 22 2 22 8 64 NA NA NA 46 90 230 74
[32] NA NA NA NA NA NA 58 NA 142 78 NA NA 46 NA NA 42 74 40 24 26 NA NA NA NA NA NA NA NA NA NA 270
[63] 98 64 NA 128 80 154 194 194 170 NA 20 54 NA 14 96 70 122 158 126 32 NA NA 160 216 40 104 164 100 128 118 78
[94] 18 32 156 70 132 244 178 220 NA NA 88 56 130 NA 44 118 46 62 88 42 18 NA 90 336 146 NA 152 236 168 170 192
[125] 156 146 182 94 64 40 46 42 48 88 42 56 18 26 92 36 26 48 32 26 46 72 14 28 60 NA 28 36 40
Even though length is clearly less than 160.
>length(airquality$Ozone) < 160
[1] TRUE
If I use stop or stopifnot, it stops the code but then automatically opens up the debugging window (at least in RStudio) whereas I'd just like an error, telling the user that there is an error:
dummy_fun2 <- function(x) {
stop(length(x) < 160 ,"An error message")
return(x*2)
}
dummy_fun2(airquality$Ozone)
And stopifnot:
dummy_fun3 <- function(x) {
stopifnot(length(x) < 160 ,"An error message")
return(x*2)
}
dummy_fun3(airquality$Ozone)
So, I am curious if anyone has any idea what I am doing wrong here. I'm sure I'll get this labelled as a duplicate post but I truly am lost with this.
dummy_fun <- function(x) {
if (length(x) < 160) stop("x is not big enough")
return(x*2)
}
dummy_fun(airquality$Ozone)
dummy_fun(rep(1, 50))
dummy_fun(rep(1, 500))
Is this what you are looking for?
onError <- function(){
print("An error message")
}
dummy_fun <- function(x) {
tryCatch(length(x) < 160 , finally = onError())
return(x*2)
}
dummy_fun(airquality$Ozone)
dummy_fun(airquality$Ozone)
[1] "An error message"
[1] 82 72 24 36 NA 56 46 38 16 NA 14 32 22 28 36 28 68 12 60 22 2 22 8 64 NA NA NA 46 90 230 74 NA NA NA
[35] NA NA NA 58 NA 142 78 NA NA 46 NA NA 42 74 40 24 26 NA NA NA NA NA NA NA NA NA NA 270 98 64 NA 128 80 154
[69] 194 194 170 NA 20 54 NA 14 96 70 122 158 126 32 NA NA 160 216 40 104 164 100 128 118 78 18 32 156 70 132 244 178 220 NA
[103] NA 88 56 130 NA 44 118 46 62 88 42 18 NA 90 336 146 NA 152 236 168 170 192 156 146 182 94 64 40 46 42 48 88 42 56
[137] 18 26 92 36 26 48 32 26 46 72 14 28 60 NA 28 36 40

Keeping all data around "rollmean" output

I have recently found out that rollmean will provide me with the moving average around a number in my matrix. The problem I have is that my matrix shrinks and I also lose the row names when the function is executed. For example the matrix MA.test as such is quantities per day in the rows (A = Mon, B = Tues, etc.):
> MA.Test
a b c d e f g h i j k l m n o p q r s t
A 49 21 6 27 34 49 21 6 27 34 49 21 6 27 34 49 21 6 27 34
B 35 23 37 47 45 35 23 37 47 45 35 23 37 47 45 35 23 37 47 45
C 40 0 20 10 19 40 0 20 10 19 40 0 20 10 19 40 0 20 10 19
D 8 46 22 3 28 8 46 22 3 28 8 46 22 3 28 8 46 22 3 28
E 30 7 1 42 39 30 7 1 42 39 30 7 1 42 39 30 7 1 42 39
F 9 16 32 14 33 9 16 32 14 33 9 16 32 14 33 9 16 32 14 33
G 48 5 13 15 11 48 5 13 15 11 48 5 13 15 11 48 5 13 15 11
H 12 38 36 18 24 12 38 36 18 24 12 38 36 18 24 12 38 36 18 24
I 43 26 17 44 25 43 26 17 44 25 43 26 17 44 25 43 26 17 44 25
J 41 2 29 31 4 41 2 29 31 4 41 2 29 31 4 41 2 29 31 4
When I apply the function for an average covering 3 days each side (which would be using 7, incorporating the day, I would use rollmean(MA.Test,7) and label this MA.Test.1 and get the following:
> MA.Test.1 = rollmean(MA.Test,7)
> MA.Test.1
a b c d e f g h i j k l m n o p q r s t
[1,] 31 17 19 23 30 31 17 19 23 30 31 17 19 23 30 31 17 19 23 30
[2,] 26 19 23 21 28 26 19 23 21 28 26 19 23 21 28 26 19 23 21 28
[3,] 27 20 20 21 26 27 20 20 21 26 27 20 20 21 26 27 20 20 21 26
[4,] 27 20 21 24 23 27 20 21 24 23 27 20 21 24 23 27 20 21 24 23
My queries is two fold:
I am aware the output begins with a MA around row D and ends at row G as I will have no values for rows A/B/C or H/I/J as they have insufficient surrounding data; how would I still KEEP these rows in the output with simply an "NA"?
I am losing the row names - simple enough for this small example, but my real data set contains +100 rows and these row names are dates; how would I keep the original column names in the output?
My desired final output would look as such:
> MA.Test.1 = rollmean(MA.Test,7)
> MA.Test.1
a b c d e f g h i j k l m n o p q r s t
A NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
B NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
C NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
D 31 17 19 23 30 31 17 19 23 30 31 17 19 23 30 31 17 19 23 30
E 26 19 23 21 28 26 19 23 21 28 26 19 23 21 28 26 19 23 21 28
F 27 20 20 21 26 27 20 20 21 26 27 20 20 21 26 27 20 20 21 26
G 27 20 21 24 23 27 20 21 24 23 27 20 21 24 23 27 20 21 24 23
H NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
I NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
J NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Thank you kindly for any solutions offered!
Use fill=NA to pad with NA. Then you can set the rownames of the result to that of the input.
MA.Test.1 <- rollmean(MA.Test,7,fill=NA)
rownames(MA.Test.1) <- rownames(MA.Test)
But if your actual data have Dates as row names, then you could just use zoo (or xts).
library(xts)
ma <- MA.Test
rownames(ma) <- Sys.Date()-9:0
# zoo
z <- zoo(ma, as.Date(rownames(ma)))
z1 <- rollmean(z, 7, fill=NA)
# xts
x <- as.xts(ma)
x1 <- rollmean(x, 7, fill=NA)

Apply NA to the rows that meet a condition in R

I have a data.frame like such:
set.seed(126)
df <- data.frame(a=sample(c(1:100, NA), 10), b=sample(1:100, 10), c=sample(1:100, 10))
a b c
1 65 48 19
2 46 15 80
3 NA 47 84
4 68 34 46
5 23 75 42
6 92 87 68
7 79 28 48
8 84 55 9
9 28 43 38
10 94 99 77
>
I'd like to write a function that transforms all values in all columns to NA if df$a is NA However, I don't want to just assign b and c the value of NA, rather I would like a function that turns all columns in the data.frame to NA if the condition is.na(a) is met, no matter the number of columns.
I think you are just looking for
df[is.na(df$a), ] <- NA
# a b c
# 1 65 48 19
# 2 46 15 80
# 3 NA NA NA
# 4 68 34 46
# 5 23 75 42
# 6 92 87 68
# 7 79 28 48
# 8 84 55 9
# 9 28 43 38
# 10 94 99 77

Resources