This question already has answers here:
How to treat NAs like values when comparing elementwise in R
(4 answers)
Closed 1 year ago.
I have a dataframe that contains 2 columns with character strings. The goal is to see how many of them are identical including NA values. If both columns give NA, it should be treated as identical.
class(df$column_1) # it shows characters
length(which(df$column_1 == df$column_2)) # the result exclude the NA rows
Try to ask in addition to is.na:
length(which(x$a == x$b | (is.na(x$a) & is.na(x$b))))
#[1] 2
Data:
x <- data.frame(a=c("a", NA, "b"), b=c("c", NA, "b"))
Another way would be using identical() (which has a nice property that identical(NA, NA) = TRUE) term by term with a loop:
Dummy data:
a=c("a",NA,"b")
b=c(NA,NA,"d")
df = data.frame(a, b, stringsAsFactors=FALSE)
Code:
count = 0
for(i in 1:nrow(df)){
count = count + identical(df[i,1],df[i,2])}
Output:
>count
>1
Related
This question already has answers here:
How do I extract a single column from a data.frame as a data.frame?
(3 answers)
Closed 3 years ago.
I have a row-NA-removal functional called foo. It works great ONLY for data.frame with at least 2 columns.
BUT for data.frames with only 1 column, it basically changes the inputted data.frame object to an "integer" object.
I wonder how to fix the function so it preserves the class of the inputted data.frame in its output both for multi-column and single-column data.frame inputs?
X <- data.frame(a = c(1, NA, 2, 3), b = c(1, NA, 4, 5)) # data.frame
foo <- function(X){ # Function `foo`
X[rowSums(is.na(X) | X == "") != ncol(X), ]
}
foo(X[c("a", "b")]) # Outputs a data.farme with NAs removed (as expected)
foo(X["a"]) # outputs: `> 1 2 3` basically a simple integer vector !!!
# My EXPECTED OUTPUT for `foo(X["a"])` is a data.frame like:
# a
#1 1
#2 2
#3 3
You can use argument drop for operator bracket
foo <- function(X){ # Function `foo`
X[rowSums(is.na(X) | X == "") != ncol(X), ,drop =FALSE]
}
Argument drop makes the result to keep its initial class.
I have a vector containing a combination of NA values and strings:
v <- c(NA, NA, "text", NA)
I also have a separate data frame:
df <- data.frame("Col1" = 1:4, "Col2" = 5:8)
Col1 Col2
1 5
2 6
3 7
4 8
My goal is to remove the rows of df where the corresponding v value is NA. So in this case the output would just be:
Col1 Col2
3 7
Since the third element of v is the only one that's not NA, only the third row of df is kept. I tried to accomplish this using a for loop:
for (i in 1:length(v)) {
if (is.na(v[i])) {
df <- df[-i, ]
}
}
However, for some reason this just outputs a version of df that includes only the 2nd and 4th rows:
Col1 Col2
2 6
4 8
I can't figure out why the loop isn't working. Any suggestions appreciated!
This will do it -
df[!is.na(v), ]
You don't need a loop. You can always subset any dataframe using a vector of row indices or logical vector (TRUE and FALSE). !is.na(v) generates a logical vector based on v and subsets the dataframe accordingly.
This question already has an answer here:
dplyr filter : value is contained in a vector
(1 answer)
Closed 4 years ago.
I'd like to use a vector featuring blank ("") and non-blank character strings to subset rows so that I end up with a result like in dfgoal.
I've tried using dplyr::select(), but I get an error message (Error: Strings must match column names. Unknown columns: tooth, , head, foot).
I realise I've got a problem in that I want to keep some "" and get rid of others, but I don't know how to resolve it.
Thanks for any help!
# Data
df <- data.frame(avar=c("tooth","","","head","","foot","",""),bvar=c(1:8))
# Vector
veca <- c("tooth","foot")
vecb <- c("")
vecc <- as.vector(rbind(veca,vecb))
vecc <- unique(vecc)
# Attempt
library(dplyr)
df <- df %>% dplyr::select(vecc)
# Goal
dfgoal <- data.frame(avar=c("tooth","","","foot","",""),bvar=c(1,2,3,6,7,8))
I'm not entirely clear on what you're trying to do. I assume you're asking how to select rows where avar %in% veca including subsequent blank ("") rows.
Perhaps something like this using tidyr::fill?
library(tidyverse)
veca <- c("tooth","foot")
df %>%
mutate(tmp = ifelse(avar == "", NA, as.character(avar))) %>%
fill(tmp) %>%
filter(tmp %in% veca) %>%
select(-tmp)
# avar bvar
#1 tooth 1
#2 2
#3 3
#4 foot 6
#5 7
#6 8
I am currently trying to clean a dataframe for further machine learning analysis. I want to replace all the instances of -1 as null. I know how to do this by column, but how do I do this over a lot of columns?
Lets assume a dataframe containing 10 columns with 1 and -1:
DF <- data.frame(matrix(sample(c(1,-1), 1000, replace = TRUE), ncol = 10))
Then you simple replace the -1 values by NA:
DF[DF==-1] <- NA
This should work if your data is in a data frame.
df[df == -1] <- NA
The answer is similar to the ones posted above, I thought of a small tweak.
I think you mean replace -1 with NAs, since missing values are stored as NAs in R.
Depending upon whether -1 is stored as a factor/character or a numeric variable, you could try -
dfx = data.frame(x = c(0,1,2,-1), y = c("a", "b", "c","-1") )
dfx[dfx == -1 | dfx == "-1"] <- NA
This question already has answers here:
How do I replace NA values with zeros in an R dataframe?
(29 answers)
Fastest way to replace NAs in a large data.table
(10 answers)
Closed 6 years ago.
Quite new to R, I am trying to subselect certain columns in order to set their NA's to 0.
so far I have:
col_names1 <- c('a','b','c')
col_names2 <- c('e','f','g')
col_names <- c(col_names1, col_names2)
data = fread('data.tsv', sep="\t", header= FALSE,na.strings="NA",
stringsAsFactors=TRUE,
colClasses=my_col_Classes
)
setnames(data, col_names)
data[col_names2][is.na(data[col_names2])] <- 0
But I keep getting the error
Error in `[.data.table`(`*tmp*`, column_names2): When i is a data.table (or character vector), x must be keyed (i.e. sorted, and, marked as sorted) so data.table knows which columns to join to and take advantage of x being sorted. Call setkey(x,...) first, see ?setkey.
I believer this error is saying I have the wrong order but I am not sure how I do?
You can do it with data.table assign :=
data <- data.table(a = c(2, NA, 3, 5), b = c(NA,2,3,4), c = c(2,5,NA, 6))
fix_columns <- c('a','b')
fix_fun <- function(x) ifelse(is.na(x), 0 , x)
data[,(fix_columns):=lapply(.SD, fix_fun), .SDcols=fix_columns]
P.S. You cant select columns from data.table like data[col_names2]. If you want select them by character vector, one approach is : data[, col_names2, with = F]