Delete rows based on values in R [duplicate] - r

This question already has answers here:
Subset data frame based on multiple conditions [duplicate]
(3 answers)
How to combine multiple conditions to subset a data-frame using "OR"?
(5 answers)
Closed 2 years ago.
Is there a way to delete rows based on values . For example
df
ColA ColB
A 1
B 2
A 3
Expected output (Basically i know we can delete based on row number. But is there way to way to delete based on values ("A", 3)
df
ColA ColB
A 1
B 2

You can use subset from base R
> subset(df,!(ColA=="A"&ColB==3))
ColA ColB
1 A 1
2 B 2
or a data.table solution
> setDT(df)[!.("A",3),on = .(ColA,ColB)]
ColA ColB
1: A 1
2: B 2

An option with filter
library(dplyr)
df %>%
filter(!(ColA == "A" & ColB == 3))

The easiest way to do this is to use the which() function (?which). You can then use this with a minus sign in conjunction with with indexing to subset based on a particular criteria.
df <- as.data.frame(cbind("ColA"=c("A", "B", "A"), "ColB" = c(1, 2, 3)))
df <- df[-which(df[,2]==3),]
View(df)

Related

Is there a way to tell a replicate function how long to make the string?

I'm using RStudio and am trying to compute a new column that contains a set of repeated letters with the value of those numbers taken from a corresponding column.
E.g., I have the data in ColA below, but would like to create ColB:
ColA <- c(1, 4, 6)
I would like ColB to be like this:
ColA
ColB
1
p
4
pppp
6
pppppp
I have been trying to use replicate but I can't work out how to make the frequency/length of the string equal to the value of the corresponding ColA.
df %>% mutate(ColB = rep("p", length.out = df$ColA, nrow(df)))
I can't seem to get rep to accept another value other than the number of rows of data - is there a way to also feed in the string length?
Any help greatly appreciated! :-)
Here is a base R solution:
df$ColB <- strrep("p", df$ColA)
ColA ColB
1 1 p
2 4 pppp
3 6 pppppp
You can also use this with dplyr:
df %>%
mutate(ColB = strrep("p", ColA))

Change values of a column conditionally using pipe() function [duplicate]

This question already has answers here:
Replace a value in a data frame based on a conditional (`if`) statement
(10 answers)
R ifelse statement only return number
(2 answers)
R ifelse is erroneously replacing text with integers
(1 answer)
How does R's ifelse work with character data?
(1 answer)
Closed 3 years ago.
I want to use the pipe function of dplyr package to change values of a column conditionally. I used below approach -
library(dplyr)
> df = data.frame("Col1" = letters[1:6], "Col2" = 1:6)
> df %>% mutate(Col3 = ifelse(Col1 == "a", "aa", Col1))
Col1 Col2 Col3
1 a 1 aa
2 b 2 2
3 c 3 3
4 d 4 4
5 e 5 5
6 f 6 6
In above result, the first value of Col3 is correctly assigned but not rest. Can someone please help me to understand the correct approach?
Two approaches :
Either use as.character in ifelse
library(dplyr)
df %>% mutate(Col3 = ifelse(Col1 == "a", "aa", as.character(Col1)))
Or use stringsAsFactors = FALSE while constructing the dataframe.
df = data.frame("Col1" = letters[1:6], "Col2" = 1:6, stringsAsFactors = FALSE)

How to count with condition how many zeros in a data frame using just one function() in R? [duplicate]

This question already has answers here:
Grouping functions (tapply, by, aggregate) and the *apply family
(10 answers)
Closed 5 years ago.
Consider the following replicable data frame:
col1 <- c(rep("a", times = 5), rep("b", times = 5), rep("c", times = 5))
col2 <- c(0,0,1,1,0,0,1,1,1,0,0,0,0,0,1)
data <- as.data.frame(cbind(col1, col2))
Now the data is a matrix of 15x2. Now I want to count how many zeros there are with the condition that only for the rows of a's. I use table():
table <- table(data$col2[data$col1=="a"])
table[names(table)==0]
This works just fine and result is 3.
But my real data has 100,000 observations with 12 different values of such col1 so I want to make a function so I don't have to type the above lines of code 12 times.
countzero <- function(row){
table <- table(data$col2[data$col1=="row"])
result <- table[names(table)==0]
return(result)
}
I expected that when I run countzero(row = a) it will return 3 as well but instead it returns 0, and also 0 for b and c.
For my real data, it returns
numeric(0)
which I have no idea why.
Anyone could help me out please?
EDIT: To all the answers showing me how to count in total how many zeros for each value of col1, it works all fine, but my purpose is to build a function that returns only the count of one specific col1 value, e.g. just the a's, because that count will be used later to compute other stuff (the percent of 0's in all a's, e.g.)
1) aggregate Try aggregate:
aggregate(col2 == 0 ~ col1, data, sum)
giving:
col1 col2 == 0
1 a 3
2 b 2
3 c 4
2) table or try table (omit the [,1] if you want the counts of 1's too):
table(data)[, 1]
giving:
a b c
3 2 4
We can use data.table which would be efficient
library(data.table)
setDT(data)[col2==0, .N, col1]
# col1 N
#1: a 3
#2: b 2
#3: c 4
Or with dplyr
library(dplyr)
data %>%
filter(col2==0) %>%
count(col1)

R: show ALL rows with duplicated elements in a column [duplicate]

This question already has answers here:
Fastest way to remove all duplicates in R
(3 answers)
Closed 6 years ago.
Does a function like this exist in any package?
isdup <- function (x) duplicated (x) | duplicated (x, fromLast = TRUE)
My intention is to use it with dplyr to display all rows with duplicated values in a given column. I need the first occurrence of the duplicated element to be shown as well.
In this data.frame for instance
dat <- as.data.frame (list (l = c ("A", "A", "B", "C"), n = 1:4))
dat
> dat
l n
1 A 1
2 A 2
3 B 3
4 C 4
I would like to display the rows where column l is duplicated ie. those with an A value doing:
library (dplyr)
dat %>% filter (isdup (l))
returns
l n
1 A 1
2 A 2
dat %>% group_by(l) %>% filter(n() > 1)
I don't know if it exists in any package, but since you can implement it easily, I'd say just go ahead and implement it yourself.

R: How to remove rows and add the value of a variable to the previous row as a comma separated value? [duplicate]

This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 7 years ago.
I have a dataset as follows:
col1 col2
a 1
a 2
b 1
b 3
c 4
I want the output as follows:
col1 col2
a 1,2
b 1,3
c 4
How is it possible in R?
We can group by 'col1' and paste the 'col2' with collapse=',' option. A convenient wrapper would be toString. This can be done with any of the aggregate by group functions. For example, with data.table, we convert 'data.frame' to 'data.table' (setDT(df1)) and use the logic as described above
library(data.table)
setDT(df1)[, list(col2 = toString(col2)), by = col1]
Or with aggregate from base R
aggregate(col2~col1, df1, FUN=toString)
If you need a list output for 'col2'
aggregate(col2~col1, df1, FUN=I)
Or using dplyr
library(dplyr)
df1 %>%
group_by(col1) %>%
summarise(col2= toString(col2))

Resources