How can I make some row values NA if other is NA in R? - r

I have a dataframe with three columns Time, observed value (Obs.Value), and an interpolated value (Interp.Value). If the value of Obs.Value is NA then the value of Interp.Value should also be NA. I can make the whole row NA but I need to keep the Time value.
Here is the repex:
dat <- data.frame(matrix(ncol = 3, nrow = 10))
x <- c("Time", "Obs.Value", "Interp.Value")
colnames(dat) <- x
dat$Time <- seq(1,10,1)
dat$Obs.Value <- c(5,6,7,NA,NA,5,4,3,NA,2)
interp <- approx(dat$Time,dat$Obs.Value,dat$Time)
dat$Interp.Value <- round(interp$y,1)
Here is the code that makes the whole row NA
dat[with(dat, is.na(Obs.Value)|is.na("Interp.Value")),] <- NA
Here is what the output should look like:
Time Obs.Value Interp.Value
1 1 5 5
2 2 6 6
3 3 7 7
4 4 NA NA
5 5 NA NA
6 6 5 5
7 7 4 4
8 8 3 3
9 9 NA NA
10 10 2 2

dat$Interp.Value[is.na(dat$Obs.Value)] <- NA
dat
# Time Obs.Value Interp.Value
# 1 1 5 5
# 2 2 6 6
# 3 3 7 7
# 4 4 NA NA
# 5 5 NA NA
# 6 6 5 5
# 7 7 4 4
# 8 8 3 3
# 9 9 NA NA
# 10 10 2 2
Or if either column being NA is sufficient, then
dat[!complete.cases(dat[,-1]),-1] <- NA

If there is only one column to change #r2evans' answer is pretty straightforward and way to go. If there are more than one column that you want to change you can use across in dplyr.
library(dplyr)
dat %>%
mutate(across(-c(Time,Obs.Value), ~replace(., is.na(Obs.Value), NA)))
# Time Obs.Value Interp.Value
#1 1 5 5
#2 2 6 6
#3 3 7 7
#4 4 NA NA
#5 5 NA NA
#6 6 5 5
#7 7 4 4
#8 8 3 3
#9 9 NA NA
#10 10 2 2

Related

conditionally adding columns to a list of dataframes

I have a list of dataframes with either 2 or 4 columns.
a <- data.frame(a=1:10,
b=1:10,
c=1:10,
d=1:10)
b <- data.frame(a=1:10,
b=1:10)
list_of_df <- list(a,b)
I want to add 2 empty columns to each dataframe with only 2 columns.
I've tried this lapply approach:
lapply(list_of_df, function(x) ifelse(ncol(x) < 4,x%>%add_column(empty=NA),x <- x))
Which does not work unfortunately. How can I fix this?
I came up with something similar:
add_col <- function(x){
col_to_add <- 4 - ncol(x)
if(col_to_add == 0) return(x)
z <- rep(NA, nrow(x))
for (i in 1:col_to_add){
x <- cbind(x, z)
}
x
}
lapply(list_of_df, add_col)
I would use a for loop to avoid copying the whole list:
for (i in seq_along(list_of_df)) {
n_columns = ncol(list_of_df[[i]])
if (n_columns == 2L) {
list_of_df[[i]][c('empty1', 'empty2')] <- NA
}
}
Result:
> list_of_df
[[1]]
a b c d
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
[[2]]
a b empty1 empty2
1 1 1 NA NA
2 2 2 NA NA
3 3 3 NA NA
4 4 4 NA NA
5 5 5 NA NA
6 6 6 NA NA
7 7 7 NA NA
8 8 8 NA NA
9 9 9 NA NA
10 10 10 NA NA
We could use bind_rows and then group_split and map from purrr to remove the id_Group column:
library(dplyr)
library(purrr)
bind_rows(list_of_df) %>%
group_split(id_Group =cumsum(a==1)) %>%
map(., ~ (.x %>% ungroup() %>%
select(-id_Group)))
[[1]]
# A tibble: 10 x 4
a b c d
<int> <int> <int> <int>
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
[[2]]
# A tibble: 10 x 4
a b c d
<int> <int> <int> <int>
1 1 1 NA NA
2 2 2 NA NA
3 3 3 NA NA
4 4 4 NA NA
5 5 5 NA NA
6 6 6 NA NA
7 7 7 NA NA
8 8 8 NA NA
9 9 9 NA NA
10 10 10 NA NA

Column with mean of 7 subsequent rows in R [duplicate]

This question already has answers here:
Calculating moving average
(17 answers)
Closed 2 years ago.
I got this df:
df <- data.frame(flow = c(1,2,3,4,5,6,7,8,9,10,11))
flow
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
and i want to get the week average from the line we're, like this:
flow flow7mean
1 1 4 `(mean of 1,2,3,4,5,6,7)`
2 2 5 (mean of 2,3,4,5,6,7,8)
3 3 6 (mean of 3,4,5,6,7,8,9)
4 4 7 (mean of 4,5,6,7,8,9,10)
5 5 8 (mean of 5,6,7,8,9,10,11)
6 6 NA (it's ok, because there is just 6 flow data)
7 7 NA
8 8 NA
9 9 NA
10 10 NA
11 11 NA
i have tried some loop solutions, but i think that a vectorized solution is better
Try this using rollmean() from zoo package:
library(zoo)
#Code
df$M <- rollmean(df$flow,k = 7,align = 'left',fill=NA)
Output:
df
flow M
1 1 4
2 2 5
3 3 6
4 4 7
5 5 8
6 6 NA
7 7 NA
8 8 NA
9 9 NA
10 10 NA
11 11 NA
We can use roll_mean from RcppRoll
library(RcppRoll)
df$flow7mean <- roll_mean(df$flow, 7, fill = NA, align = 'left')
-output
df
# flow flow7mean
#1 1 4
#2 2 5
#3 3 6
#4 4 7
#5 5 8
#6 6 NA
#7 7 NA
#8 8 NA
#9 9 NA
#10 10 NA
#11 11 NA
Here is a base R option using embed
within(df,flow7mean <- `length<-`(rowMeans(embed(flow,7)),length(flow)))
which gives
flow flow7mean
1 1 4
2 2 5
3 3 6
4 4 7
5 5 8
6 6 NA
7 7 NA
8 8 NA
9 9 NA
10 10 NA
11 11 NA

R remove first row of data frame until first row has no NA

I am applying na.approx on a data frame, which will not work if an NA happens to be in the very first or very last row of my data base.
How do I write a function doing the following:
"While any value of the first row of the data frame is NA, remove the first row"
Example data frame:
x1=x2=c(1,2,3,4,5,6,7,8,9,10,11,12)
x3=x4=c(NA,NA,3,4,5,6,NA,NA,NA,NA,11,12)
df=data.frame(x1,x2,x3,x4)
result for this example data frame should look like this:
result=df[-1:-2,]
My current attempts all look similar to this:
replace_na=function(df){
while(anyNA(df[1,])=TRUE){
df=df[-1,],
return(df)
}
#this is where I would apply the na.approx function to the data frame
}
Any help would be greatly appreciated, thanks!
You can use the complete.cases. With the cumsum, first incomplete rows will be deleted:
df[cumsum(complete.cases(df)) != 0, ]
x1 x2 x3 x4
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 NA NA
8 8 8 NA NA
9 9 9 NA NA
10 10 10 NA NA
11 11 11 11 11
12 12 12 12 12
#Psidom's answer is great, but you can also fix your own custom function:
replace_na=function(df){
while(anyNA(df[1,])==TRUE){
df=df[-1,]
}
#this is where I would apply the na.approx function to the data frame
return(df)
}
On its second line, == is the equal sign you need to use. On the second line, comma was superfluous. And last, return() needed to be moved out of the while loop.
replace_na(df)
# x1 x2 x3 x4
# 3 3 3 3 3
# 4 4 4 4 4
# 5 5 5 5 5
# 6 6 6 6 6
# 7 7 7 NA NA
# 8 8 8 NA NA
# 9 9 9 NA NA
# 10 10 10 NA NA
# 11 11 11 11 11
# 12 12 12 12 12
We can also use which.max and is.na
df[which.max(!rowSums(is.na(df))):nrow(df),]
# x1 x2 x3 x4
#3 3 3 3 3
#4 4 4 4 4
#5 5 5 5 5
#6 6 6 6 6
#7 7 7 NA NA
#8 8 8 NA NA
#9 9 9 NA NA
#10 10 10 NA NA
#11 11 11 11 11
#12 12 12 12 12

Removing rows from each dataframe in list with condition in R

I have such a list:
df1 <- data.frame(a=c(NA, NA, 1:10), b=c(NA, 1:11))
df2 <- data.frame(a=1:10, b=c(NA,1:9))
mylist <- list(df1, df2)
> mylist
[[1]]
a b
1 NA NA
2 NA 1
3 1 2
4 2 3
5 3 4
6 4 5
7 5 6
8 6 7
9 7 8
10 8 9
11 9 10
12 10 11
[[2]]
a b
1 1 NA
2 2 1
3 3 2
4 4 3
5 5 4
6 6 5
7 7 6
8 8 7
9 9 8
10 10 9
I'd like to remove all rows with more than 1 NA in a row in each data frame. How can I do that?
I found out how to delete rows
lapply(mylist, `[`, -1,)
and how to calculate the sum of NAs
NAsums <- function(x) {rowSums(is.na(x))}
lapply(mylist, NAsums)
But I can't figure out how to combine the two steps..
We loop through the list (lapply), use rowSums to get the number of NA elements in each row, convert to a logical vector (<2), and use that to subset the rows.
lapply(mylist, function(x) x[rowSums(is.na(x))<2,])
#[[1]]
# a b
#2 NA 1
#3 1 2
#4 2 3
#5 3 4
#6 4 5
#7 5 6
#8 6 7
#9 7 8
#10 8 9
#11 9 10
#12 10 11
#[[2]]
# a b
#1 1 NA
#2 2 1
#3 3 2
#4 4 3
#5 5 4
#6 6 5
#7 7 6
#8 8 7
#9 9 8
#10 10 9

Selecting values in a dataframe based on a priority list

I am new to R so am still getting my head around the way it works. My problem is as follows, I have a data frame and a prioritised list of columns (pl), I need:
To find the maximum value from the columns in pl for each row and create a new column with this value (df$max)
Using the priority list, subtract this maximum value from the priority value, ignoring NAs and returning the absolute difference
Probably better with an example:
My priority list is
pl <- c("E","D","A","B")
and the data frame is:
A B C D E F G
1 15 5 20 9 NA 6 1
2 3 2 NA 5 1 3 2
3 NA NA 3 NA NA NA NA
4 0 1 0 7 8 NA 6
5 1 2 3 NA NA 1 6
So for the first line the maximum is from column A (15) and the priority value is from column D (9) since E is a NA. The answer I want should look like this.
A B C D E F G MAX MAX-PR
1 15 5 20 9 NA 6 1 15 6
2 3 2 NA 5 1 3 2 5 4
3 NA NA 3 NA NA NA NA NA NA
4 0 1 0 7 8 NA 6 8 0
5 1 2 3 NA NA 1 6 2 1
How about this?
df$MAX <- apply(df[,pl], 1, max, na.rm = T)
df$MAX_PR <- df$MAX - apply(df[,pl], 1, function(x) x[!is.na(x)][1])
df$MAX[is.infinite(df$MAX)] <- NA
> df
# A B C D E F G MAX MAX_PR
# 1 15 5 20 9 NA 6 1 15 6
# 2 3 2 NA 5 1 3 2 5 4
# 3 NA NA 3 NA NA NA NA NA NA
# 4 0 1 0 7 8 NA 6 8 0
# 5 1 2 3 NA NA 1 6 2 1
Example:
df <- data.frame(A=c(1,NA,2,5,3,1),B=c(3,5,NA,6,NA,10),C=c(NA,3,4,5,1,4))
pl <- c("B","A","C")
#now we find the maximum per row, ignoring NAs
max.per.row <- apply(df,1,max,na.rm=T)
#and the first element according to the priority list, ignoring NAs
#(there may be a more efficient way to do this)
first.per.row <- apply(df[,pl],1, function(x) as.vector(na.omit(x))[1])
#and finally compute the difference
max.less.first.per.row <- max.per.row - first.per.row
Note that this code will break for any row that is all NA. There is no check against that.
Here a simple version. First , I take only pl columns , for each line I remove na then I compute the max.
df <- dat[,pl]
cbind(dat, t(apply(df, 1, function(x) {
x <- na.omit(x)
c(max(x),max(x)-x[1])
}
)
)
)
A B C D E F G 1 2
1 15 5 20 9 NA 6 1 15 6
2 3 2 NA 5 1 3 2 5 4
3 NA NA 3 NA NA NA NA -Inf NA
4 0 1 0 7 8 NA 6 8 0
5 1 2 3 NA NA 1 6 2 1

Resources