To vector with names from dataframe with rownames - r

I have the following vector with names:
myvec <- c(`C1-C` = 3, `C2-C` = 1, `C3-C` = NA, `C4-C` = 5, `C5-C` = NA)
C1-C C2-C C3-C C4-C C5-C
3 1 NA 5 NA
I would to convert it in a dadtaframe/tibble... keeping the names of elements as rowname.
The best way that I found it was:
mynames <- names(myvec)
myvec <- myvec %>%
as_tibble() %>%
mutate(rownames = mynames) %>%
column_to_rownames("rownames")
How can I to do this in a more efficient way?
Thanks all

as.data.frame(myvec)
myvec
C1-C 3
C2-C 1
C3-C NA
C4-C 5
C5-C NA
Or
data.frame(myvec)

Related

Convert NULLs to NAs in a list

Would you suggest a better (briefer or more legible) way of converting NULLs in a list to NAs; and from list to vector?
list(1, 2, 3, numeric(0), 5) %>%
purrr::map_dbl(~ ifelse(length(.) == 0, NA_real_, .))
# [1] 1 2 3 NA 5
I would prefer not using ifelse and instead using if_else.
Is there another way of doing it with purrr?
If the length of each element of a list is 0 or 1 (e.g. lst.1), you could simply use
lst.1 %>% map_dbl(1, .default = NA)
# [1] 1 2 3 NA 5
A general way to deal with a list with different length in each element (e.g. lst.2) is
lst.2 %>%
map_if(~ length(.) == 0, ~ NA) %>%
flatten_dbl()
# [1] 1 2 3 NA 5
Data
lst.1 <- list(1, 2, 3, numeric(0), 5)
lst.2 <- list(1:3, numeric(0), 5)
If L is the list then any of these work:
replace(L, lengths(L) == 0, NA)
ifelse(lengths(L), L, NA)
No packages needed.

How to check for the existence of a certain value in a set of variables only when there is no NA?

I have a dataframe with over hundreds of variables, grouped in different factors ("Happy_","Sad_", etc) and I want to create a set new variables indicating whether a participant put a rating of 4 in any of the variables in one factor. However, if any of the variable in that factor is NA, then the new variable will also output NA.
I have tried the following, but it didn't work:
library(tidyverse)
df <- data.frame(Subj = c("A", "B", "C", "D"),
Happy_1_Num = c(4,2,2,NA),
Happy_2_Num = c(4,2,2,1),
Happy_3_Num = c(1,NA,2,4),
Sad_1_Num = c(2,1,4,3),
Sad_2_Num = c(NA,1,2,4),
Sad_3_Num = c(4,2,2,1))
# Don't work
df <- df %>% mutate(Happy_Any4 = ifelse(if_any(matches("^Happy_") & matches("_Num$"), ~ is.na(.)), NA,
ifelse(if_any(matches("^Happy_") & matches("_Num$"), ~ . == 4),1,0)),
Sad_Any4 = ifelse(if_any(matches("^Sad_") & matches("_Num$"), ~ is.na(.)), NA,
ifelse(if_any(matches("^Sad_") & matches("_Num$"), ~ . == 4),1,0)))
I tried a workaround by first generating a set of variables to indicate if that factor has any NA, and after that check if participant put any rating of "4". it works; but since I have many factors, I was wondering if there is a more elegant way of doing it.
# workaround
df <- df %>% mutate(
NA_Happy = ifelse(if_any(matches("^Happy_") & matches("_Num$"), ~ is.na(.)), 1,0),
NA_Sad = ifelse(if_any(matches("^Sad_") & matches("_Num$"), ~ is.na(.)), 1,0))
df <- df %>% mutate(
Happy_Any4 = ifelse(NA_Happy == 1, NA,
ifelse(if_any(matches("^Happy_") & matches("_Num$"), ~ . == 4),1,0)),
Sad_Any4 = ifelse(NA_Sad == 1, NA,
ifelse(if_any(matches("^Sad_") & matches("_Num$"), ~ . == 4),1,0)))
Here is a base R option using split.default -
tmp <- df[-1]
cbind(df, sapply(split.default(tmp, sub('_.*', '', names(tmp))),
function(x) as.integer(rowSums(x== 4) > 0)))
# Subj Happy_1_Num Happy_2_Num Happy_3_Num Sad_1_Num Sad_2_Num Sad_3_Num Happy Sad
#1 A 4 4 1 2 NA 4 1 NA
#2 B 2 2 NA 1 1 2 NA 0
#3 C 2 2 2 4 2 2 0 1
#4 D NA 1 4 3 4 1 NA 1
sub would keep only either "Happy" or "Sad" part of the names, split.default splits the data based on that and use sapply to calculate if any value of 4 is present in a row.
If you can afford to write each and every factor manually you can do -
library(dplyr)
df %>%
mutate(Happy = as.integer(rowSums(select(., starts_with('Happy')) == 4) > 0),
Sad = as.integer(rowSums(select(., starts_with('Sad')) == 4) > 0))
here is another workaround by transposing the data.frame and an apply on colonns. I'm not sure it's more elegant but here it is ^^
tmp <- cbind(sub("^((Happy)|(Sad))(_.*_Num)$", "\\1", colnames(df)), t(df))
Happy_Any4 <- apply(tmp[tmp[,1]== "Happy", -1], 2,
function(x) ifelse(any(is.na(x)), NA, length(grep("4", x))) )
Sad_Any4 <- apply(tmp[tmp[,1]== "Sad", -1], 2,
function(x) ifelse(any(is.na(x)), NA, length(grep("4", x))) )
df <- cbind(df, Happy_Any4 = Happy_Any4, Sad_Any4 = Sad_Any4)
EDIT : Above was a strange test, but now this work with more beauty !
This is because the sum of anything where there is an NA will return NA.
df <- df %>% mutate(Happy_Any4 = apply(df[,grep("^Happy_.*_Num$", colnames(df))],
1, function(x) 1*(sum(x == 4) > 0)),
Sad_Any4 = apply(df[, grep("^Sad_.*_Num$", colnames(df))],
1, function(x) 1*(sum(x == 4) > 0)))
The apply will look every row, only on columns where we find the correct part in colnames (with grep. It then find every occurence of 4, which form a logical vector, and it's sum is the number of occurence. The presence of an NA will bring the sum to NA. I then just check if the sum is above 0 and the 1* will turn the numeric into logical.

Fill NAs with 0 if the column is numeric and empty string '' if the column is a factor using R [duplicate]

This question already has answers here:
How to replace NA values in a table for selected columns
(12 answers)
Closed 2 years ago.
I'm trying to replace all the NAs present in the column of integer type with 0 and NAs present in the column of factor type with empty string "". The code below is the one that i'm using but it doesn't seem to work
for(i in 1:ncol(credits)){
if(sapply(credits[i], class) == 'integer'){
credits[is.na(credits[,i]), i] <- 0
}
else if(sapply(credits[i], class) == 'factor'){
credits[is.na(credits[,i]), i] <- ''
}
You can use across in dplyr to replace column values by class :
library(dplyr)
df %>%
mutate(across(where(is.factor), ~replace(as.character(.), is.na(.), '')),
across(where(is.numeric), ~replace(., is.na(.), 0)))
# a b
#1 1 a
#2 2 b
#3 0 c
#4 4 d
#5 5
b column is of class "character" now, if you need it as factor, you can add factor outside replace like :
across(where(is.factor), ~factor(replace(as.character(.), is.na(.), ''))),
data
df <- data.frame(a = c(1, 2, NA, 4:5), b = c(letters[1:4], NA),
stringsAsFactors = TRUE)
Another way of achieving the same:
library(dplyr)
# Dataframe
df <- data.frame(x = c(1, 2, NA, 4:5), y = c('a',NA, 'd','e','f'),
stringsAsFactors = TRUE)
# Creating new columns
df_final<- df %>%
mutate(new_x = ifelse(is.numeric(x)==TRUE & is.na(x)==TRUE,0,x)) %>%
mutate(new_y = ifelse(is.factor(y)==TRUE & is.na(y)==TRUE,"",y))
# Printing the output
df_final

How to replace NA values with different values based on column in R dataframe?

I am trying to replace NA values by column with values predetermined from a vector. For example, I have vector containing the values (1,5,3) and a dataframe df, and want to replace all NA values from column one of df with 1, column two NA's with 5, and column three NA's with 3.
I tried a formula I saw that took
df[is.na(df)] = vector
but didn't seem to work due to "wrong length". Both the vector and #columns in df are also the same length.
You can use which to get row/column index of NA values and replace it directly.
mat <- which(is.na(df), arr.ind = TRUE)
df[mat] <- vector[mat[, 2]]
We can use Map to replace the corresponding columns in the dataset with the value in the vector and replace it directly and this would almost all the time and it is a single step replacement and is concise
df[] <- Map(function(x, y) replace(x, is.na(x), y), df, vec)
df
# col1 col2 col3
#1 1 5 2
#2 3 2 3
#3 1 5 3
Or another option is to make the lengths same, and then use pmax
df[] <- pmax(as.matrix(df), is.na(df) * vec[col(df)], na.rm = TRUE)
or another option with replace
df <- replace(df, is.na(df), rep(vec, colSums(is.na(df))))
NOTE: All the solutions above are one-liner
Or using data.table with set
library(data.table)
setDT(df)
for(j in seq_along(df)) set(df, i = which(is.na(df[[j]])), j = j, value = vec[j])
data
df <- data.frame(col1 = c(1, 3, NA), col2 = c(NA, 2, NA), col3 = c(2, NA, NA))
vec <- c(1, 5, 3)

How to get ONLY columns with NA values and the amount of NAs

I have a dataset and some of the columns have NA values. I need to display only the column names that have NA values as well as the total number of NA values in each of those columns.
I've been able to get different pieces of the problem working but not both things at once.
This gives me only the column names of the columns containing NA values. But I want the NA totals to show under each column name.
nacol<- colnames(df)[colSums(is.na(df)) > 0]
This gives me exactly what I want but it also displays the zero totals of the other columns in the dataframe and I don't want those to be displayed.
df %>% summarise_all(funs(sum(is.na(.))))
I'm obviously a complete beginner. I realize this is an extremely easy problem to fix but I've been trying for hours and I'm just getting frustrated. Please help. Thank you!
We can use Filter with colSums to remove 0 values
Filter(function(x) x > 0, colSums(is.na(df)))
#a c
#2 1
Or select_if in dplyr
library(dplyr)
df %>%
summarise_all(~(sum(is.na(.)))) %>%
select_if(. > 0)
We can also first select column with any NA values and then count them.
df %>%
select_if(~any(is.na(.))) %>%
summarise_all(~(sum(is.na(.))))
data
df <- data.frame(a = c(2, 3, NA, NA, 1), b = 1:5, c = c(1, 3, 4, NA, 1))
A possible alternative using purrr and dplyr for the pipe(using airquality for reproducibilty):
library(dplyr)
library(purrr)
airquality %>%
keep(~anyNA(.x)) %>%
map_dbl(~sum(is.na(.x)))
Ozone Solar.R
37 7
Using data from #Ronak Shah 's answer:
df %>%
keep(~anyNA(.x)) %>%
map_dbl(~sum(is.na(.x)))
a c
2 1
Using data.table(there might be a way to make it more compact):
setDT(df)
df[,Filter(anyNA,.SD)][,lapply(.SD, function(x) sum(is.na(x)))]
a c
1: 2 1
Data:
df <- structure(list(a = c(2, 3, NA, NA, 1), b = 1:5, c = c(1, 3, 4,
NA, 1)), class = "data.frame", row.names = c(NA, -5L))
airquality is builtin
We can do
na.omit(na_if(colSums(is.na(df)), 0))
# a c
# 2 1
Or using summarise_if
library(dplyr)
df %>%
summarise_if(~ any(is.na(.)), ~sum(is.na(.)))
# a c
#1 2 1
data
df <- data.frame(a = c(2, 3, NA, NA, 1), b = 1:5, c = c(1, 3, 4, NA, 1))

Resources