Replace NA's and delete columns in an efficient way

Replace NA's and delete columns in an efficient way - r

I've got a dataframe which looks like follows:
# Code:
m3 <- c(NA, -3, NA, NA, -3)
m2 <- c(rep(NA, 5))
m1 <- c(rep(NA, 5))
Zero <- c(rep(NA, 5))
p1 <- c(1, NA, NA, 1, NA)
p2 <- c(NA, NA, NA, 2, NA)
p3 <- c(3, NA, 3, 3, NA)
df <- data.frame(m3, m2, m1, Zero, p1, p2, p3)
# Output:
m3 m2 m1 Zero p1 p2 p3
1 NA NA NA NA 1 NA 3
2 -3 NA NA NA NA NA NA
3 NA NA NA NA NA NA 3
4 NA NA NA NA 1 2 3
5 -3 NA NA NA NA NA NA
I need to insert a -3 in the whole row, if there is a -3 in the first column. I also need to delete all columns, but p1, p2, and p3. The final result should look like follows:
# Final output:
p1 p2 p3
1 1 NA 3
2 -3 -3 -3
3 NA NA 3
4 1 2 3
5 -3 -3 -3
I found a solution, but it seems very inefficient to me. I need to perform this operation multiple times and therefore need a code, which is as efficient as possible. My inefficient solution looks like follows:
# Inefficient code:
for(i in 1:length(df$m3)){
if(is.na(df$m3[i]) == FALSE){
df[i, ] <- -3
}
}
df <- df[ , 5:length(df)]
Is there a more efficient way? Thank you very much in advance!

update values:
df[df$m3 %in% -3,] <- -3
select columns:
df <- df[, c("p1", "p2", "p3")]

You can use data.table
dt <- data.table(df)
dt[m3 == -3, paste0('p', 1:3) := -3]
dt <- dt[, c("p1", "p2", "p3"), with = FALSE]

Related

How to verify if when a column is NA the other is not?

I have a dataframe with two columns. I need to check if where a column is NA the other is not. Thanks
Edited.
I would like to know, for each row of the dataframe, if there are rows with both columns not NA.

You can use the following code to check which row has no NA values:
df <- data.frame(x = c(1, NA),
y = c(2, NA))
which(rowSums(is.na(df))==ncol(df))
Output:
[1] 1
As you can see the first rows has no NA values so both columns have no NA values.

Here's a simple code to generate a column of the NA count for each row:
x <- sample(c(1, NA), 25, replace = TRUE)
y <- sample(c(1, NA), 25, replace = TRUE)
df <- data.frame(x, y)
df$NA_Count <- apply(df, 1, function(x) sum(is.na(x)))
df
x y NA_Count
1 NA 1 1
2 NA NA 2
3 1 NA 1
4 1 NA 1
5 NA NA 2
6 1 NA 1
7 1 1 0
8 1 1 0
9 1 1 0

Delete records containing more than 5 null values?

I would like to know how I can remove from a dataset the records that have more than 5 null values in the columns that define them. The following code allows you to delete records with any NA in any column. However, how can I modify it to do exactly what I ask? Any ideas?
df [ complete.cases (df),]

Here is an example data frame. One of the rows has 6 NA values.
We sum the NA values by row in a new column, filter where the number of NA is less than or equal to 5, then remove the new column.
df <- data.frame(a = c(1,NA,1,1),
b = c(1, NA, NA, 1),
c = c(1, NA, NA, NA),
d = c(1, NA, NA ,NA),
e = c(1, NA, NA, NA),
f = c(1, NA, NA, NA))
a b c d e f
1 1 1 1 1 1 1
2 NA NA NA NA NA NA
3 1 NA NA NA NA NA
4 1 1 NA NA NA NA
df %>%
mutate(count = rowSums(is.na(df))) %>%
filter(count <= 5) %>%
select(-count)
a b c d e f
1 1 1 1 1 1 1
2 1 NA NA NA NA NA
3 1 1 NA NA NA NA

I'm assuming you are referring to values of NA in your data indicating a missing value. NULL is returned by expressions and functions whose value is undefined. First create some reproducible data:
set.seed(42)
vals <- sample.int(1000, 250)
idx <- sample.int(250, 100)
vals[idx] <- NA
example <- as.data.frame(matrix(vals, 25))
Now compute the number of missing values by row and exclude the rows with more than 5 missing values:
na.count <- rowSums(is.na(example))
example[na.count<=5, ]

Conditional filter with if statements

My data consists of columns and rows. Each column has "NA" and different numbers.
For example column1 is:
2
1
1
NA
1
NA
NA
NA
I want to assign a column id to the numbers in each column.
for(j in 1:54){
if(!(col[j] <-"NA")){
col[j] <- i
}
}
Expected result for column1:
1
1
NA
NA
NA
1
NA
NA
1
**column 2: **
2
2
NA
NA
NA
2
NA
NA
2

You can use
v <- c(2, 1, NA, NA, 4, 5, NA)
id <- ifelse(!is.na(v), 1, NA)
id
1 1 NA NA 1 1 NA
This means you don't need the for loop here. If you can apply a function to a vector you should avoid using the for loop.
Also, please provide your data so that others can actually use it (like in my code above).
EDIT
According to the comments you have multiple columns. You can use same code. See here
df <- data.frame(a= c(2, 1, NA, NA, 4, 5, NA), b= c(3, NA, NA, NA, 5, NA, 6))
id <- sapply(1:ncol(df), function(i){
ifelse(!is.na(df[ , i]), i, NA)})
id
a b
[1,] 1 2
[2,] 1 NA
[3,] NA NA
[4,] NA NA
[5,] 1 2
[6,] 1 NA
[7,] NA 2

How to replace NAs in the middle and the left side of each row in a data frame with a value?

I have a data frame that has NA's in every row. Some are on the left, some in the middle, and some on the right. Something like this:
a <- c(NA, NA, 1, NA)
b <- c(NA, 1, 1, NA)
c <- c(NA, NA, 1, 1)
d <- c(1, 1, NA, 1)
df <- data.frame(a, b, c, d)
df
# a b c d
# NA NA NA 1
# NA 1 NA 1
# 1 1 1 NA
# NA NA 1 1
I would like to replace all the NAs that are in the middle and on the right side with 0 but keep all the NA's leading to a 1 on the left as NA. So I would like an efficient way (my data frame is large) to have this data frame:
# a b c d
# NA NA NA 1
# NA 1 0 1
# 1 1 1 0
# NA NA 1 1

We can use apply to loop over the rows, find the index of the first occurence of 1. Then replace the NAs from that element to the last with 0
df[] <- t(apply(df, 1, function(x) {
i1 <- which(x == 1)[1]
i2 <- i1:length(x)
x[i2][is.na(x[i2])] <- 0
x}))
Or another option is
df[] <- t(apply(df, 1, function(x) replace(x,
cumsum(x ==1 & !is.na(x)) >= 1 & is.na(x), 0)))

how to merge matrices in R with different number of rows

I would like to merge several matrices using their row names.
These matrices do not have the same number of rows and columns.
For instance:
m1 <- matrix(c(1, 2, 3, 4, 5, 6), 3, 2)
rownames(m1) <- c("a","b","c")
m2 <- matrix(c(1, 2, 3, 5, 4, 5, 6, 2), 4, 2)
rownames(m2) <- c("a", "b", "c", "d")
m3 <- matrix(c(1, 2, 3, 4), 2,2)
rownames(m3) <- c("d", "e")
mlist <- list(m1, m2, m3)
For them I would like to get:
Row.names V1.x V2.x V1.y V2.y V1.z V2.z
a 1 4 1 4 NA NA
b 2 5 2 5 NA NA
c 3 6 3 6 NA NA
d NA NA 5 2 1 3
e NA NA NA NA 2 4
I have tried to use lapply with the function merge:
M <- lapply(mlist, merge, mlist, by = "row.names", all = TRUE)
However, it did not work:
Error in data.frame(c(1, 2, 3, 4, 5, 6), c(1, 2, 3, 5, 4, 5, 6, 2), c(1, :
arguments imply differing number of rows: 3, 4, 2
Is there an elegant way to merge these matrices?

You are trying to apply a reduction (?Reduce) to the list of matrices, where the reduction is basically merge. The problem is that merge(m1, m2, by = "row.names", all = T) doesn't give you a new merged matrix with row names, but instead returns the row names in the first column. This is why we need additional logic in the reduction function.
Reduce(function(a,b) {
res <- merge(a,b,by = "row.names", all = T);
rn <- res[,1]; # Row.names column of merge
res <- res[,-1]; # Actual data
row.names(res) <- rn; # Assign row.names
return(res) # Return the merged data with proper row.names
},
mlist[-1], # Reduce (left-to-right) by applying function(a,b) repeatedly
init = mlist[[1]] # Start with the first matrix
)

Or alternatively:
df <- mlist[[1]]
for (i in 2:length(mlist)) {
df <- merge(df, mlist[[i]], by = "row.names", all=T)
rownames(df) <- df$Row.names
df <- df[ , !(names(df) %in% "Row.names")]
}
# V1.x V2.x V1.y V2.y V1 V2
# a 1 4 1 4 NA NA
# b 2 5 2 5 NA NA
# c 3 6 3 6 NA NA
# d NA NA 5 2 1 3
# e NA NA NA NA 2 4

This could also be conceptualised as a reshape operation if the right long-form data.frame is passed to the function:
tmp <- do.call(rbind, mlist)
tmp <- data.frame(tmp, id=rownames(tmp),
time=rep(seq_along(mlist),sapply(mlist,nrow)) )
reshape(tmp, direction="wide")
# id X1.1 X2.1 X1.2 X2.2 X1.3 X2.3
#a a 1 4 1 4 NA NA
#b b 2 5 2 5 NA NA
#c c 3 6 3 6 NA NA
#d d NA NA 5 2 1 3
#e e NA NA NA NA 2 4

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Replace NA's and delete columns in an efficient way - r

update values: df[df$m3 %in% -3,] <- -3 select columns: df <- df[, c("p1", "p2", "p3")]

You can use data.table dt <- data.table(df) dt[m3 == -3, paste0('p', 1:3) := -3] dt <- dt[, c("p1", "p2", "p3"), with = FALSE]

Related

How to verify if when a column is NA the other is not?

Delete records containing more than 5 null values?

Conditional filter with if statements

How to replace NAs in the middle and the left side of each row in a data frame with a value?

how to merge matrices in R with different number of rows

Categories

Resources