missing values filling in R [duplicate] - r

This question already has answers here:
How to implement coalesce efficiently in R
(9 answers)
nested ifelse() is the worst; what's the best? [duplicate]
(3 answers)
Closed 5 years ago.
I want help on R programming to fill col4
col4=col1, in case col1 is NA then col4=col2, in case col1 and col2 are NA then col4=col3
id col1 col2 col3
1 10 NA NA
2 NA 12 NA
3 NA NA 13
4 NA NA 1
5 2 3 NA
Answer:
id col4
1 10
2 12
3 13
4 1
5 2

Easily done with coalesce from dplyr. This solution works for N number of columns:
library(dplyr)
data %>%
mutate(col4 = coalesce(!!!data[-1]))
Result:
id col1 col2 col3 col4
1 1 10 NA NA 10
2 2 NA 12 NA 12
3 3 NA NA 13 13
4 4 NA NA 1 1
5 5 2 3 NA 2
Data:
data = read.table(text = "id col1 col2 col3
1 10 NA NA
2 NA 12 NA
3 NA NA 13
4 NA NA 1
5 2 3 NA", header = T)
Notes:
!!! shouldn't be confused with the negation operator ! (understandable confusion). It is an operator that is part of rlang, or the tidyverse (also available to dplyr) which enables explicit splicing.
What this means is that instead of inputting the entire data frame into coalesce (coalesce(data[-1])), I am separating the columns of data[-1] (or elements of the list) and have each element as an input to coalesce. So this:
coalesce(!!!data[-1])
is actually equivalent to this:
coalesce(col1, col2, col3)
The advantage of writing it this way is that you don't have to know the column names nor how many columns there are to begin with.

Using dplyr::coalesce, or any of the answers at How to implement coalesce in R?:
xx$col4 = with(xx, coalesce(col1, col2, col3))

Related

R - Select all rows that have one NA value at most? [duplicate]

This question already has answers here:
How to delete rows from a dataframe that contain n*NA
(4 answers)
Closed 3 days ago.
I'm trying to impute my data and keep as many observations as I can. I want to select observations that have 1 NA value at most from the data found at: mlbench::data(PimaIndiansDiabetes2).
For example:
Var1 Var2 Var3
1 NA NA
2 34 NA
3 NA NA
4 NA 55
5 NA NA
6 40 28
What I would like returned:
Var1 Var2 Var3
2 34 NA
4 NA 55
6 40 28
This code returns rows with NA values and I know that I could join all observations with 1 NA value using merge() to observations without NA values. I'm not sure how to do extract those though.
na_rows <- df[!complete.cases(df), ]
A base R solution:
df[rowSums(is.na(df)) <= 1, ]
Its dplyr equivalent:
library(dplyr)
df %>%
filter(rowSums(is.na(pick(everything()))) <= 1)

R - Merging rows with numerous NA values to another column

I would like to ask the R community for help with finding a solution for my data, where any consecutive row with numerous NA values is combined and put into a new column.
For example:
df <- data.frame(A= c(1,2,3,4,5,6), B=c(2, "NA", "NA", 5, "NA","NA"), C=c(1,2,"NA",4,5,"NA"), D=c(3,"NA",5,"NA","NA","NA"))
A B C D
1 1 2 1 3
2 2 NA 2 NA
3 3 NA NA 5
4 4 5 4 NA
5 5 NA 5 NA
6 6 NA NA NA
Must be transformed to this:
A B C D E
1 1 2 1 3 2 NA 2 NA 3 NA NA 5
2 4 5 4 NA 5 NA 5 NA 6 NA NA NA
I would like to do the following:
Identify consecutive rows that have more than 1 NA value -> combine entries from those consecutive rows into a single combined entiry
Place the above combined entry in new column "E" on the prior row
This is quite complex (for me!) and I am wondering if anyone can offer any help with this. I have searched for some similar problems, but have been unable to find one that produces a similar desired output.
Thank you very much for your thoughts--
Using tidyr and dplyr:
Concatenate values for each row.
Keep the concatenated values only for rows with more than one NA.
Group each “good” row with all following “bad” rows.
Use a grouped summarize() to concatenate “bad” row values to a single string.
df %>%
unite("E", everything(), remove = FALSE, sep = " ") %>%
mutate(
E = if_else(
rowSums(across(!E, is.na)) > 1,
E,
""
),
new_row = cumsum(E == "")
) %>%
group_by(new_row) %>%
summarize(
across(A:D, first),
E = trimws(paste(E, collapse = " "))
) %>%
select(!new_row)
# A tibble: 2 × 5
A B C D E
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 2 1 3 2 NA 2 NA 3 NA NA 5
2 4 5 4 NA 5 NA 5 NA 6 NA NA NA

R - counting with NA in dataframe [duplicate]

This question already has answers here:
ignore NA in dplyr row sum
(6 answers)
Closed 4 years ago.
lets say that I have this dataframe in R
df <- read.table(text="
id a b c
1 42 3 2 NA
2 42 NA 6 NA
3 42 1 NA 7", header=TRUE)
I´d like to calculate all columns to one, so result should look like this.
id a b c d
1 42 3 2 NA 5
2 42 NA 6 NA 6
3 42 1 NA 7 8
My code below doesn´t work since there is that NA values. Please note that I have to choose columns that I want to count since in my real dataframe I have some columns that I don´t want count together.
df %>%
mutate(d = a + b + c)
You can use rowSums for this which has an na.rm parameter to drop NA values.
df %>% mutate(d=rowSums(tibble(a,b,c), na.rm=TRUE))
or without dplyr using just base R.
df$d <- rowSums(subset(df, select=c(a,b,c)), na.rm=TRUE)

Data.table: rbind a list of data tables with unequal columns [duplicate]

This question already has answers here:
rbindlist data.tables with different number of columns
(1 answer)
Rbind with new columns and data.table
(5 answers)
Closed 4 years ago.
I have a list of data tables that are of unequal lengths. Some of the data tables have 35 columns and others have 36.
I have this line of code, but it generates an error
> lst <- unlist(full_data.lst, recursive = FALSE)
> model_dat <- do.call("rbind", lst)
Error in rbindlist(l, use.names, fill, idcol) :
Item 1362 has 35 columns, inconsistent with item 1 which has 36 columns. If instead you need to fill missing columns, use set argument 'fill' to TRUE.
Any suggestions on how I can modify that so that it works properly.
Here's a minimal example of what you are trying to do.
No need to use any other package to do this. Just set fill=TRUE in rbindlist.
You can do this:
df1 <- data.table(m1 = c(1,2,3))
df2 <- data.table(m1 = c(1,2,3), m2=c(3,4,5))
df3 <- rbindlist(list(df1, df2), fill=T)
print(df3)
m1 m2
1: 1 NA
2: 2 NA
3: 3 NA
4: 1 3
5: 2 4
6: 3 5
If I understood your question correctly, I could possibly see only two options for having your data tables appended.
Option A: Drop the extra variable from one of the datasets
table$column_Name <- NULL
Option B) Create the variable with missing values in the incomplete dataset.
full_data.lst$column_Name <- NA
And then do rbind function.
Try to use rbind.fill from package plyr:
Input data, 3 dataframes with different number of columns
df1<-data.frame(a=c(1,2,3,4,5),b=c(1,2,3,4,5))
df2<-data.frame(a=c(1,2,3,4,5,6),b=c(1,2,3,4,5,6),c=c(1,2,3,4,5,6))
df3<-data.frame(a=c(1,2,3),d=c(1,2,3))
full_data.lst<-list(df1,df2,df3)
The solution
library("plyr")
rbind.fill(full_data.lst)
a b c d
1 1 1 NA NA
2 2 2 NA NA
3 3 3 NA NA
4 4 4 NA NA
5 5 5 NA NA
6 1 1 1 NA
7 2 2 2 NA
8 3 3 3 NA
9 4 4 4 NA
10 5 5 5 NA
11 6 6 6 NA
12 1 NA NA 1
13 2 NA NA 2
14 3 NA NA 3

Replace NA's using data from Multiple Columns

I have a data-frame that looks as such:
ID col2 col3 col4
1 5 NA NA
2 NA NA 1
3 5 NA NA
4 19 NA 1
If col2 has a value, that cell should not change (even if columns 3 and 4 contains values). However, if col2 contains an "NA" value, I would like to return any non-NA's from col3 or col4, if they exist.
Desired output shown below, notice how row 2 has the "1" there now.
ID col2 col3 col4
1 5 NA NA
2 1 NA 1
3 5 NA NA
4 19 NA 1
I know this can be done manually by referencing each column using $ or [], but how can this be done using a for-loop or apply?
Thanks
We can do with ifelse
df1$col2 <- with(df1, ifelse(is.na(col2), pmax(col3, col4, na.rm = TRUE), col2))
df1$col2
#[1] 5 1 5 19
Or create a logical index to replace the values
i1 <- is.na(df1$col2)
df1$col2[i1] <- do.call(pmax, c(df1[i1, 3:4], na.rm = TRUE))

Resources