This question already has answers here:
How to delete multiple values from a vector?
(9 answers)
Filter data frame rows based on values in vector
(4 answers)
Closed 4 years ago.
Sorry for the silly question, but I have a huge dataframe (called "totaldecade") with columns named:
Event.ID,Event.Date,CAMEO.Code
I want to delete all rows that have number ranges: 10:58, 90:145, 1011:1454, 160:166, 1661:1663, within the CAMEO:Code column.
I have tried:
totaldecade[with(totaldecade, !((CAMEO.Code %between% 10:58) |
(CAMEO.Code %between% 90:145) |
(CAMEO.Code %between% 1011:1454) | (CAMEO.Code %between% 160:166) |
(CAMEO.Code %between% 1661:1663))), ]
But doesn't seem to work.
Any help is appreciated!
Michelle
We get the ranges in a vector, use %in% to create the logical vector and negate (!) to change the FALSE elements to TRUE and viceversa
library(dplyr)
totaldecade %>%
filter(!CAMEO.Code %in% c(10:58, 90:145, 1011:1454, 160:166, 1661:1663))
Or using subset from base R
subset(totaldecade, !CAMEO.Code %in% c(10:58, 90:145, 1011:1454,
160:166, 1661:1663))
Related
This question already has answers here:
How to delete rows where all the columns are zero
(6 answers)
How does the logical negation operator ! work?
(3 answers)
Closed 2 years ago.
Looking to subset a data frame (all columns numeric) based on a condition. I would like the rows which have a rowSum = 0 to be subsetted. Only able to find a solution to subset the rows which don't equal 0!
Would anyone be able to help?
Thanks in advance.
We can use subset with rowSums
subset(df1, rowSums(df1, na.rm = TRUE) == 0)
This question already has answers here:
How to remove columns with same value in R
(4 answers)
Closed 2 years ago.
I have a really large dataset and I want to filter out some of the columns because it is the same data all throughout (ex: company name is all "Walmart"). I can go through and do these manually but I'm looking for a code to do it automatically.
I had in mind a function to subset based on if sum(unique(colnam)) == 1 but not sure how to get it to work. Thanks.
which(sapply(dat, function(col) length(unique(col)) == 1))
This question already has answers here:
Subset of rows containing NA (missing) values in a chosen column of a data frame
(7 answers)
Closed 2 years ago.
I would like to subset my data based on two conditions: if X is blank and if Y is blank.
Subsetting based on 1 condition is:
Blank_X <- Q4[is.na(Q4$X),]
How do I add a second condition to this?
Here is one way with subset
Blank_X <- subset(Q4,is.na(Q4$X) & is.na(Q4$Y))
with filter
Blank_X <- Q4 %>% filter(X!= NA & Y!=NA)
You can use & (and) to combine multiple conditions.
Blank_X <- Q4[is.na(Q4$X) & is.na(Q4$Y),]
This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Closed 4 years ago.
How can I extract two rows (rows a and b below) from my data frame df?
df1 = df[(df$column1 == "a" & "b"),]
Just use the subset function: subset(df, subset = column1 %in% c("a","b"))
Regards.
This question already has answers here:
Subset multiple rows with condition
(3 answers)
Closed 8 years ago.
Here is a trivialized example whose solution would help me greatly.
v.1<- c(5,8,7,2)
v.2<- c("hi", "hello", "hum", "bo")
df<- data.frame(v.1, v.2)
desired.values<- c("hi", "bo")
I would like all rows of the dataset where v.2 takes on one of the desired.values.
Desired output:
5 "hi"
2 "bo"
In my real dataset, v.2 has more than 10000 values and desired.values contains more than 2000 values.
You could try data.table
library(data.table)
setkey(setDT(df),v.2)[desired.values]
Or using base R methods
df[df$v.2 %in% desired.values,]
Or
df[grep(paste(desired.values, collapse="|"), df$v.2),]