fill rows with NA based on vector sequence [duplicate] - r

This question already has answers here:
How do I use tidyr to fill in completed rows within each value of a grouping variable?
(4 answers)
Closed 2 years ago.
I have the following data.frame:
df=data.frame(x=c(1:3,8:10,15),y=rnorm(7))
x y
1 0.05976784
2 -1.01992023
3 -1.16075185
8 0.48641141
9 0.54460423
10 -0.59915799
15 -0.60785783
I simply need to fill the rows with NA by following df$x sequence from 1 to 17.
Here my expected output:
x y
1 0.05976784
2 -1.01992023
3 -1.16075185
4 NA
5 NA
6 NA
7 NA
8 0.48641141
9 0.54460423
10 -0.59915799
11 NA
12 NA
13 NA
14 NA
15 -0.60785783
16 NA
17 NA
How can I achieve this?
Any suggestion?

Using base::match:
data.frame(x=1:17, df$y[match(1:17, df$x)])

We could use complete from tidyr
tidyr::complete(df, x = 1:17)
# A tibble: 17 x 2
# x y
# <dbl> <dbl>
# 1 1 -0.560
# 2 2 -0.230
# 3 3 1.56
# 4 4 NA
# 5 5 NA
# 6 6 NA
# 7 7 NA
# 8 8 0.0705
# 9 9 0.129
#10 10 1.72
#11 11 NA
#12 12 NA
#13 13 NA
#14 14 NA
#15 15 0.461
#16 16 NA
#17 17 NA
data
set.seed(123)
df=data.frame(x=c(1:3,8:10,15),y=rnorm(7))

Related

Match and re-order rows in multiple columns in R (tidyverse)

I have a dataset like this (in the actual dataset, I have more columns like subj01):
# A tibble: 10 x 4
item subj01 subj02 subj03
<int> <dbl> <dbl> <dbl>
1 1 1 1 1
2 2 2 2 6
3 3 5 5 9
4 4 9 6 NA
5 5 10 8 NA
6 6 NA 9 NA
7 7 NA 10 NA
8 8 NA NA NA
9 9 NA NA NA
10 10 NA NA NA
I created the dataset using the code below.
data = tibble(item = 1:10, subj01 = c(1,2,5,9,10,NA,NA,NA,NA,NA), subj02 = c(1,2,5,6,8,9,10,NA,NA,NA), subj03 = c(1,6,9,NA,NA,NA,NA,NA,NA,NA))
I would like to reorder all the columns beginning with "subj" so that the position of the values match that in the item column.
That is, for this example dataset, I would like to end up with this:
# A tibble: 10 x 4
item subj01 subj02 subj03
<int> <dbl> <dbl> <dbl>
1 1 1 1 1
2 2 2 2 NA
3 3 NA NA NA
4 4 NA NA NA
5 5 5 5 NA
6 6 NA 6 6
7 7 NA NA NA
8 8 NA 8 NA
9 9 9 9 9
10 10 10 10 NA
I've figured that I can match and re-order one column by running this:
data$subj01[match(data$item,data$subj01)]
[1] 1 2 NA NA 5 NA NA NA 9 10
But I am struggling to apply this across multiple columns (ideally I'd like to embed the command in a dplyr pipe).
I tried the command below, but this gave me an error "Error in mutate(x. = x.[match(item, x.)]) : object 'x.' not found".
data = data %>% across(mutate(x.=x.[match(item,x.)]))
I'd appreciate any suggestions! Thank you.
library(tidyverse)
data %>%
pivot_longer(-item) %>%
filter(!is.na(value)) %>%
mutate(item = value) %>%
complete(item = 1:10, name) %>%
pivot_wider(names_from = name, values_from = value)
# A tibble: 10 × 4
item subj01 subj02 subj03
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 1
2 2 2 2 NA
3 3 NA NA NA
4 4 NA NA NA
5 5 5 5 NA
6 6 NA 6 6
7 7 NA NA NA
8 8 NA 8 NA
9 9 9 9 9
10 10 10 10 NA

Avoid the for loops-R

I have two data frames, x and y. For each value of x[,2], I look if the value is equal to the value of the elements of y[,1]. If so, I add a third column in the first data frame that contains the values of y[,2].
I managed to do that with loops, but how can I do this using vectors?
x=data.frame(1:15,15:1)
y=data.frame(3:5,c(7.2,8.5,0.3))
for ( i in 1:nrow(x)) {
for (j in 1:nrow(y)) {
if (x[i,2]==y[j,1]){
x[i,3]=y[j,2]
}else{
}
}
}
Use a join instead of loops - based on the loop comparision, the second column of 'x' is compared with the first column of 'y', thus those columns are used in the on, assign (:=) the second column (col2) from the second dataset to create the new column 'col3' in first data
library(data.table)
setDT(x)[y, col3 := i.col2, on = .(col2 = col1)]
-output
> x
col1 col2 col3
1: 1 15 NA
2: 2 14 NA
3: 3 13 NA
4: 4 12 NA
5: 5 11 NA
6: 6 10 NA
7: 7 9 NA
8: 8 8 NA
9: 9 7 NA
10: 10 6 NA
11: 11 5 0.3
12: 12 4 8.5
13: 13 3 7.2
14: 14 2 NA
15: 15 1 NA
data
x <- data.frame(col1 = 1:15, col2 = 15:1)
y <- data.frame(col1 = 3:5, col2 = c(7.2,8.5,0.3))
Update: Many thanks to #TrainingPizza (who has drawn my attention to the false output of my first answer and also provided how it could work:
library(dplyr)
x %>%
rowwise() %>%
mutate(col3 = ifelse(col2 %in% y$col1, y$col2[y$col1==col2], NA))
col1 col2 col3
<int> <int> <dbl>
1 1 15 NA
2 2 14 NA
3 3 13 NA
4 4 12 NA
5 5 11 NA
6 6 10 NA
7 7 9 NA
8 8 8 NA
9 9 7 NA
10 10 6 NA
11 11 5 0.3
12 12 4 8.5
13 13 3 7.2
14 14 2 NA
15 15 1 NA
First answer (not correct)
Here is dplyr way how to avoid the for - loop:
library(dplyr)
x %>%
mutate(V3 = ifelse(V2 %in% y$V1, y$V2, NA))
V1 V2 V3
1 1 15 NA
2 2 14 NA
3 3 13 NA
4 4 12 NA
5 5 11 NA
6 6 10 NA
7 7 9 NA
8 8 8 NA
9 9 7 NA
10 10 6 NA
11 11 5 8.5
12 12 4 0.3
13 13 3 7.2
14 14 2 NA
15 15 1 NA

How can I make some row values NA if other is NA in R?

I have a dataframe with three columns Time, observed value (Obs.Value), and an interpolated value (Interp.Value). If the value of Obs.Value is NA then the value of Interp.Value should also be NA. I can make the whole row NA but I need to keep the Time value.
Here is the repex:
dat <- data.frame(matrix(ncol = 3, nrow = 10))
x <- c("Time", "Obs.Value", "Interp.Value")
colnames(dat) <- x
dat$Time <- seq(1,10,1)
dat$Obs.Value <- c(5,6,7,NA,NA,5,4,3,NA,2)
interp <- approx(dat$Time,dat$Obs.Value,dat$Time)
dat$Interp.Value <- round(interp$y,1)
Here is the code that makes the whole row NA
dat[with(dat, is.na(Obs.Value)|is.na("Interp.Value")),] <- NA
Here is what the output should look like:
Time Obs.Value Interp.Value
1 1 5 5
2 2 6 6
3 3 7 7
4 4 NA NA
5 5 NA NA
6 6 5 5
7 7 4 4
8 8 3 3
9 9 NA NA
10 10 2 2
dat$Interp.Value[is.na(dat$Obs.Value)] <- NA
dat
# Time Obs.Value Interp.Value
# 1 1 5 5
# 2 2 6 6
# 3 3 7 7
# 4 4 NA NA
# 5 5 NA NA
# 6 6 5 5
# 7 7 4 4
# 8 8 3 3
# 9 9 NA NA
# 10 10 2 2
Or if either column being NA is sufficient, then
dat[!complete.cases(dat[,-1]),-1] <- NA
If there is only one column to change #r2evans' answer is pretty straightforward and way to go. If there are more than one column that you want to change you can use across in dplyr.
library(dplyr)
dat %>%
mutate(across(-c(Time,Obs.Value), ~replace(., is.na(Obs.Value), NA)))
# Time Obs.Value Interp.Value
#1 1 5 5
#2 2 6 6
#3 3 7 7
#4 4 NA NA
#5 5 NA NA
#6 6 5 5
#7 7 4 4
#8 8 3 3
#9 9 NA NA
#10 10 2 2

Column with mean of 7 subsequent rows in R [duplicate]

This question already has answers here:
Calculating moving average
(17 answers)
Closed 2 years ago.
I got this df:
df <- data.frame(flow = c(1,2,3,4,5,6,7,8,9,10,11))
flow
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
and i want to get the week average from the line we're, like this:
flow flow7mean
1 1 4 `(mean of 1,2,3,4,5,6,7)`
2 2 5 (mean of 2,3,4,5,6,7,8)
3 3 6 (mean of 3,4,5,6,7,8,9)
4 4 7 (mean of 4,5,6,7,8,9,10)
5 5 8 (mean of 5,6,7,8,9,10,11)
6 6 NA (it's ok, because there is just 6 flow data)
7 7 NA
8 8 NA
9 9 NA
10 10 NA
11 11 NA
i have tried some loop solutions, but i think that a vectorized solution is better
Try this using rollmean() from zoo package:
library(zoo)
#Code
df$M <- rollmean(df$flow,k = 7,align = 'left',fill=NA)
Output:
df
flow M
1 1 4
2 2 5
3 3 6
4 4 7
5 5 8
6 6 NA
7 7 NA
8 8 NA
9 9 NA
10 10 NA
11 11 NA
We can use roll_mean from RcppRoll
library(RcppRoll)
df$flow7mean <- roll_mean(df$flow, 7, fill = NA, align = 'left')
-output
df
# flow flow7mean
#1 1 4
#2 2 5
#3 3 6
#4 4 7
#5 5 8
#6 6 NA
#7 7 NA
#8 8 NA
#9 9 NA
#10 10 NA
#11 11 NA
Here is a base R option using embed
within(df,flow7mean <- `length<-`(rowMeans(embed(flow,7)),length(flow)))
which gives
flow flow7mean
1 1 4
2 2 5
3 3 6
4 4 7
5 5 8
6 6 NA
7 7 NA
8 8 NA
9 9 NA
10 10 NA
11 11 NA

R remove first row of data frame until first row has no NA

I am applying na.approx on a data frame, which will not work if an NA happens to be in the very first or very last row of my data base.
How do I write a function doing the following:
"While any value of the first row of the data frame is NA, remove the first row"
Example data frame:
x1=x2=c(1,2,3,4,5,6,7,8,9,10,11,12)
x3=x4=c(NA,NA,3,4,5,6,NA,NA,NA,NA,11,12)
df=data.frame(x1,x2,x3,x4)
result for this example data frame should look like this:
result=df[-1:-2,]
My current attempts all look similar to this:
replace_na=function(df){
while(anyNA(df[1,])=TRUE){
df=df[-1,],
return(df)
}
#this is where I would apply the na.approx function to the data frame
}
Any help would be greatly appreciated, thanks!
You can use the complete.cases. With the cumsum, first incomplete rows will be deleted:
df[cumsum(complete.cases(df)) != 0, ]
x1 x2 x3 x4
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 NA NA
8 8 8 NA NA
9 9 9 NA NA
10 10 10 NA NA
11 11 11 11 11
12 12 12 12 12
#Psidom's answer is great, but you can also fix your own custom function:
replace_na=function(df){
while(anyNA(df[1,])==TRUE){
df=df[-1,]
}
#this is where I would apply the na.approx function to the data frame
return(df)
}
On its second line, == is the equal sign you need to use. On the second line, comma was superfluous. And last, return() needed to be moved out of the while loop.
replace_na(df)
# x1 x2 x3 x4
# 3 3 3 3 3
# 4 4 4 4 4
# 5 5 5 5 5
# 6 6 6 6 6
# 7 7 7 NA NA
# 8 8 8 NA NA
# 9 9 9 NA NA
# 10 10 10 NA NA
# 11 11 11 11 11
# 12 12 12 12 12
We can also use which.max and is.na
df[which.max(!rowSums(is.na(df))):nrow(df),]
# x1 x2 x3 x4
#3 3 3 3 3
#4 4 4 4 4
#5 5 5 5 5
#6 6 6 6 6
#7 7 7 NA NA
#8 8 8 NA NA
#9 9 9 NA NA
#10 10 10 NA NA
#11 11 11 11 11
#12 12 12 12 12

Resources