How to create logical variable based on logical condition? - r

I have a data frame with factor variables
> a <- c("a", "b", "c")
> b <- c("c", "b", "a")
> df <- as.data.frame(cbind(a,b))
> df$a <- as.factor(df$a)
> df$b <- as.factor(df$b)
> df
a b
1 a c
2 b b
3 c a
I create new logical variable based on the similarity of var a and var b.
> df$result <- isTRUE(df$a == df$b)
But I get the result:
> df
a b result
1 a c FALSE
2 b b FALSE
3 c a FALSE
When I expected
> df
a b result
1 a c FALSE
2 b b TRUE
3 c a FALSE
(I'm using factors to replicate my real data)
What am I doing wrong? How can I achieve my goal of identifying similar variables? Thanks

Just do
df$result <- with(df, a==b)
df
# a b result
#1 a c FALSE
#2 b b TRUE
#3 c a FALSE
The a==b already returns a logical vector and we don't need isTRUE to wrap it.
As #Frank mentioned in the comments, it is better to evaluate between character class columns as difference in factor levels can result in error. We can either convert the factor to character for evaluating
with(df, as.character(a)==as.character(b))
or make the levels the same as in both columns
Un1 <- union(levels(df$a), levels(df$b))
df[] <- lapply(df, factor, levels=Un1)
with(df, a==b)

Related

R, how to replace only the numeric values of a dataframe?

I am working on R 3.4.3 on Windows 10. I have a dataframe made of numeric values and characters.
I would like to replace only the numeric values but when I do that the characters also change and are replaced.
How can I edit my function to make it affect only the numeric values and not the characters?
Here is the piece of code of my function:
dataframeChange <- function(dFrame){
thresholdVal <- 20
dFrame[dFrame >= thresholdVal] <- -1
return(dFrame)
}
Here is a dataframe example:
example_df <- data.frame(
myNums = c (1:5),
myChars = c("A","B","C","D","E"),
stringsAsFactors = FALSE
)
Thanks for the help!
As Tim's comment, you should be aware of the location of the numeric columns which we can locate them using ind <- sapply(dFrame, is.numeric)
dataframeChange <- function(dFrame){
#browser()
thresholdVal <- 20
ind <- sapply(dFrame, is.numeric)
dFrame[(dFrame[,ind] >= thresholdVal),ind] <- -1
#dFrame[dFrame >= thresholdVal] <- -1
return(dFrame)
}
Use mutate_if from dplyr:
library(dplyr)
example_df %>% mutate_if(is.numeric, funs(if_else(. >= thresh, repl, .)))
myNums myChars
1 10 A
2 -1 B
3 -1 C
4 5 D
5 -1 E
Explanation:
The mutate family of functions is for variable assignment or updating.
mutate_if functions (specified within funs()) are only applied to columns which satisfy the first argument (in this case, is.numeric())
The updating function is a simple if_else clause based on OP rules.
Data:
thresh <- 20
repl <- -1.0
example_df <- data.frame(
myNums = c(10,20,30,5,70),
myChars = c("A","B","C","D","E"),
stringsAsFactors = FALSE
)
example_df
myNums myChars
1 10 A
2 20 B
3 30 C
4 5 D
5 70 E
Using data.table, we can avoid explicit loops and is faster. Here I've set the threshold value as 2:
# set to data table
setDT(example_df)
# get numeric columns
num_cols <- names(example_df)[sapply(example_df, is.numeric)]
# loop over all columns at once
example_df[,(num_cols) := lapply(.SD, function(x) ifelse(x>2,-1, x)), .SDcols=num_cols]
print(example_df)
myNums myChars
1: 1 A
2: 2 B
3: -1 C
4: -1 D
5: -1 E
Another data.table solution.
library(data.table)
dataframeChange <- function(dFrame){
setDT(dFrame)
for(j in seq_along(dFrame)){
set(dFrame, i= which(dFrame[[j]] < 20), j = j, value = -1)
}
}
dataframeChange_dt(example_df)
example_df
# myNums myChars
# 1: -1 A
# 2: 20 B
# 3: 30 C
# 4: -1 D
# 5: 70 E
It does not explicitly call only numeric columns, however I tested on multiple datasets and it does not effect the non-numeric columns.

R finding values in a data frame using | operator vs %in%

I'm trying to find all instances of certain values in a data frame, and replace them with NA. I tried this two different ways that I thought were equivalent, but I get different results. For example:
df <- data.frame(a=c(1,2),b=c(3,4))
df[df == 1 | df == 4] <- NA
gives me the expected result:
df
# a b
# 1 NA 3
# 2 2 NA
whereas
df <- data.frame(a=c(1,2),b=c(3,4))
df[df %in% c(1,4)] <- NA
does nothing:
df
# a b
# 1 1 3
# 2 2 4
This seems to be because if I use the "|" operator, it searches the data frame element by element, whereas if I use %in% it searches the data frame vector by vector (column by column), but I don't understand why.
df <- data.frame(a=c(1,2),b=c(3,4))
df == 1 | df == 4
# a b
# [1,] TRUE FALSE
# [2,] FALSE TRUE
df %in% c(1,4)
# [1] FALSE FALSE
If we look at the code for %in%
function (x, table)
match(x, table, nomatch = 0L) > 0L
So, it is basically doing a match. The output of match would be
match(c(1,4), df, nomatch = 0L) > 0L
#[1] FALSE FALSE
%in% is applied on vectors instead of data.frame. So, we loop through the columns using lapply, then do the %in%
lapply(df, `%in%`, c(1, 4))
If we need how the matrix, then use sapply
df[sapply(df, `%in%`, c(1, 4))] <- NA
We can check the match works on a vector
sapply(df, match, x = c(1,4), nomatch = 0L) > 0
# a b
#[1,] TRUE FALSE
#[2,] FALSE TRUE
%in% is only for vectors. In order to perform it on a dataframe you would have to use sapply to apply a function across each of the columns.
df[sapply(df, function(x) x %in% c(1, 4))] <- NA
a b
1 NA 3
2 2 NA

How to check if one row is existing in a big data frame with different column length in r

I am trying to match if one row of value matchs in a data frame but %in% function seems not working correctly.
Here is an example:
> c
a b
1 1 2
> d
a b
1 1 1
> g
a b f
1 1 1 1
2 2 2 2
3 3 3 3
Is there anyway I can check if a row exists in a large data frame g and print out TRUE for row d and FALSE for row c?
For your convenience, here is the sample data code:
a<-1;b<-2;c<-data.frame(a,b);a<-1;b<-1;d<-data.frame(a,b);a<-c(1,2,3);b<-c(1,2,3);f<-c(1,2,3);g<-data.frame(a,b,f)
We can paste the rows and then do %in%
do.call(paste, c) %in% do.call(paste, g[names(c)])
#[1] FALSE
do.call(paste, d) %in% do.call(paste, g[names(d)])
#[1] TRUE
We can create a function, using the intersect from the dplyr package to compare data frames.
In this example, dt2 is the data frame with more columns than dt1.
is.match <- function(dt1, dt2){
temp <- dplyr::intersect(dt1, dt2[, names(dt1)])
if (nrow(temp) == 0){
return(FALSE)
} else {
return(TRUE)
}
}
is.match(c, g)
# [1] FALSE
is.match(d, g)
# [1] TRUE

How to reference elements to objects (vectors) and compare them with is.element in R?

I have got these vectors:
a <- c('x','y','z')
b <- c('w','v','s')
c <- c('x','y')
d <- c('s')
and a data.frame which got the names of the vectors as elements:
df <- data.frame(P= c('c','d','c','d'),R = c('a','b','b','a'))
P R
1 c a
2 d b
3 c b
4 d a
I want the a data frame look like that:
P R P_R
1 c a True
2 d b True
3 c b False
4 d a False
I tried the following to reference my elements in df to the objects a,b,c,d)
df$P_R <- is.element((mget(df$P)),(mget(df$R)))
I got:
Fehler in mget(df$P) : ungültiges erstes Argument
Thanks for your help!

R - How to apply different functions to certain rows in a column

I am trying to apply different functions to different rows based on the value of a string in an adjacent column. My dataframe looks like this:
type size
A 1
B 3
A 4
C 2
C 5
A 4
B 32
C 3
and I want to apply different functions to types A, B, and C, to give a third column column "size2." For example, let's say the following functions apply to A, B, and C:
for A: size2 = 3*size
for B: size2 = size
for C: size2 = 2*size
I'm able to do this for each type separately using this code
df$size2 <- ifelse(df$type == "A", 3*df$size, NA)
df$size2 <- ifelse(df$type == "B", 1*df$size, NA)
df$size2 <- ifelse(df$type == "C", 2*df$size, NA)
However, I can't seem to do it for all of the types without erasing all of the other values. I tried to use this code to limit the application of the function to only those values that were NA (i.e., keep existing values and only fill in NA values), but it didn't work using this code:
df$size2 <- ifelse(is.na(df$size2), ifelse(df$type == "C", 2*df$size, NA), NA)
Does anyone have any ideas? Is it possible to use some kind of AND statement with "is.na(df$size2)" and "ifelse(df$type == "C""?
Many thanks!
This might be a might more R-ish (and I called my dataframe 'dat' instead of 'df' since df is a commonly used function.
> facs <- c(3,1,2)
> dat$size2= dat$size* facs[ match( dat$type, c("A","B","C") ) ]
> dat
type size size2
1 A 1 3
2 B 3 3
3 A 4 12
4 C 2 4
5 C 5 10
6 A 4 12
7 B 32 32
8 C 3 6
The match function is used to construct indexes to supply to the extract function [.
if you want you can nest the ifelses:
df$size2 <- ifelse(df$type == "A", 3*df$size,
ifelse(df$type == "B", 1*df$size,
ifelse(df$type == "C", 2*df$size, NA)))
# > df
# type size size2
#1 A 1 3
#2 B 3 3
#3 A 4 12
#4 C 2 4
#5 C 5 10
#6 A 4 12
#7 B 32 32
#8 C 3 6
This could do it like this, creating separate logical vectors for each type:
As <- df$type == 'A'
Bs <- df$type == 'B'
Cs <- df$type == 'C'
df$size2[As] <- 3*df$size[As]
df$size2[Bs] <- df$size[Bs]
df$size2[Cs] <- 2*df$size[Cs]
but a more direct approach would be to create a separate lookup table like this:
df$size2 <- c(A=3,B=1,C=2)[as.character(df$type)] * df$size

Resources