How to delete entire row for x if y appears at least once in same column? - r

I would like to run a code in which I delete the entire row for entries of "x", if "y" appears at least once in the same column of "var4". I can't find any solution in R. Below is what I tried.
In the code below, I tried to tell R that if var4 contains at least one y, all rows containing x should be filtered out/removed.
Example for df:
var1 var2 var3 var4
a b b a
b a b x
a b a x
a a a y
if (all(df$var4 %in% c("y"))) {
df <- filter(!var4 %in% c("x"))
}
So, I would like to delete rows 2&3 because y appears in var4. Unfortunately the code above doesn't return any change in df, even though y appears several times in var4.
Many thanks. I appreciate any kind of recommendation.

In the OP's code, filter statement is not getting the data. Instead, it can be
library(dplyr)
if("y" %in% df$var4) {
df <- df %>%
filter(!var4 %in% "x")
}
df
# var1 var2 var3 var4
#1 a b b a
#2 a a a y
It can be also written as
df %>%
filter("y" %in% var4 & !var4 %in% 'x')
data
df <- structure(list(var1 = c("a", "b", "a", "a"), var2 = c("b", "a",
"b", "a"), var3 = c("b", "b", "a", "a"), var4 = c("a", "x", "x",
"y")), class = "data.frame", row.names = c(NA, -4L))

If you want to use base R commands.
df[!df$var4 == "x", ] should do it.
df$var4 == "x" will return a vector of TRUE/FALSE
> df$var4 == "x"
[1] FALSE TRUE TRUE FALSE
The ! in front of it flips the TRUE FALSE
> !df$var4 == "x"
[1] TRUE FALSE FALSE TRUE
Then the bracket notation refers to subsetting the object by rows, then columns.
df[rows,columns]
Putting it all together, the following will subset rows based on the criteria supplied, and include all columns.
df[!df$var4 == "x", ]
Note that the nothing after the , means include all columns.

Related

Subsetting a dataframe using %in% and ! in R

I have the following dataframe.
Test_Data <- data.frame(x = c("a", "b", "c"), y = c("d", "e", "f"), z = c("g", "h", "i"))
x y z
1 a d g
2 b e h
3 c f i
I would like to filter it based on multiple conditions. Specifically, I would like to remove any record that has the value of "b" in column x or "f" in column y. My subsetted result would be;
x y z
1 a d g
I tried the following solutions;
View(Test_Data %>% subset(!x %in% "b" | !y %in% "f"))
View(Test_Data %>% subset(!x %in% "b" & !y %in% "f"))
View(Test_Data %>% subset(!(x %in% "b" | y %in% "f")))
The last two solutions give me the result I want, however the first one is the only one that makes 'sense' to me because it uses the OR operator and I only need one of the conditions to be met. Why do the last solutions work but not the first?
The subset operation returns the rows that you want to KEEP.
However your set of rules defines the rows you want NOT TO KEEP. Therefore you're getting confused with the negation logic.
The rows you don't want to keep follow a series of rules: r1 | r2 | ....
The NEGATION is: !(r1 | r2 | ...), or: !r1 & !r2 & ...

Select for every row between two columns based on condition in another column in R

may someone help me to find the answer thread or provide a method for solution? I can not find a solution.
What I want to do:
For every row if the value in column "x" is "A" then select the value in column "y" from the same row and if the value in column "x" is "B" then select the value in column "z" from the same row.
Ideally collected in a vector to include as a new column in the df afterwards.
df <- data.frame(x = c("A", "B", "B", "A"), y = c(1,2,3,4), z = c(4,3,2,1), fix.empty.names = FALSE)
df
x y z
1 A 1 4
2 B 2 3
3 B 3 2
4 A 4 1
result
[1] 1 3 2 4
Thank you very much in advance
If we can assume x is always "A" or "B":
ifelse(df$x == "A", df$y, df$z)
More generally:
ifelse(df$x == "A", df$y, ifelse(df$x == "B", df$z, NA))
You can, of course, assign this directly as a new column: df$result <- ifelse...
If you like dplyr:
library(dplyr)
df %>%
mutate(
result = case_when(
x == "A" ~ y,
x == "B" ~ z,
TRUE ~ NA_real_
)
)

How can I make a data.frame using chr vertor [duplicate]

This question already has answers here:
aggregating unique values in columns to single dataframe "cell" [duplicate]
(2 answers)
Closed 2 years ago.
I have the data.frame below.
> Chr Chr
> A E
> A F
> A E
> B G
> B G
> C H
> C I
> D E
and... I want to convert the dataset as belows as you may be noticed.
I want to coerce all chr vectors into an row.
chr chr
A E,F
B G
C H,I
D E
they are all characters, so I tried to do several things so that I want to make.
Firstly, I used unique function for FILTER <- unique(chr[,15])1st column and try to subset them using
FILTER data that I created using rbind or bind rows function.
Secondly, I tested to check whether my idea works or not
FILTER <- unique(Top[,15])
NN <- data.frame()
for(i in 1 :nrow(FILTER)){
result = unique(Top10Data[TGT == FILTER[i]]$`NM`))
print(result)
}
to this stage, it seems to be working well.
The problem for me is that when I used both functions, the data frame only creates 1 column and ignored the others vector (2nd variables from above data.frame) all.
Only For the chr [1,1], those functions do work well, but I have chr vectors such as chr[1,n], which is unable to be coerced.
here's my code for your reference.
FILTER <- unique(Top[,15])
NN <- data.frame()
for(i in 1 :nrow(FILTER)){
CGONM <- rbind(NN,unique(Top10Data[TGT == FILTER[i]]$`NM`))
}
Base R solutions:
# Solution 1:
df_str_agg1 <- aggregate(var2~var1, df, FUN = function(x){
paste0(unique(x), collapse = ",")})
# Solution 2:
df_str_agg2 <- data.frame(do.call("rbind",lapply(split(df, df$var1), function(x){
data.frame(var1 = unique(x$var1),
var2 = paste0(unique(x$var2), collapse = ","))
}
)
),
row.names = NULL
)
Tidyverse solution:
library(tidyverse)
df_str_agg3 <-
df %>%
group_by(var1) %>%
summarise(var2 = str_c(unique(var2), collapse = ",")) %>%
ungroup()
Data:
df <- data.frame(var1 = c("A", "A", "A", "B", "B", "C", "C", "D"),
var2 = c("E", "F", "E", "G", "G", "H", "I", "E"), stringsAsFactors = FALSE)

Identify index that is not shared between two variables in R

I would like to identify the indices for which there is not a match between two variables. The following code identifies the matches rather than the mismatched:
x <- c("a", "b", "c")
y <- c("a", "z", "c")
which(unique(as.character(x))%in% unique(y))
Thoughts on how to get this to identify the False indices (or in this example, 2)?
which(!(unique(as.character(x))%in% unique(y)))
cdeeterman is basically correct, just need to make sure that the not (!) applies to the entire relation unique(as.character(x))%in% unique(y)
You could also try using two equal signs where "x == y" basically says "x is exactly equal to y"
x = c("a", "b", "c")
y = c("a", "z", "c")
z = x == y
which(z == FALSE)
What about setdiff?
> which( y %in% setdiff(y,x) )
[1] 2

Subset a data frame using OR when the column contains a factor

I would like to make a subset of a data frame in R that is based on one OR another value in a column of factors but it seems I cannot use | with factor values.
Example:
# fake data
x <- sample(1:100, 9)
nm <- c("a", "a", "a", "b", "b", "b", "c", "c", "c")
fake <- cbind(as.data.frame(nm), as.data.frame(x))
# subset fake to only rows with name equal to a or b
fake.trunk <- fake[fake$nm == "a" | "b", ]
produces the error:
Error in fake$nm == "a" | "b" :
operations are possible only for numeric, logical or complex types
How can I accomplish this?
Obviously my actual data frame has more than 3 values in the factor column so just using != "c" won't work.
You need fake.trunk <- fake[fake$nm == "a" | fake$nm == "b", ]. A more concise way of writing that (especially with more than two conditions) is:
fake[ fake$nm %in% c("a","b"), ]
Another approach would be to use subset() and write
fake.trunk = subset(fake, nm %in% c('a', 'b'))

Resources