This question already has answers here:
How to delete rows where all the columns are zero
(6 answers)
How does the logical negation operator ! work?
(3 answers)
Closed 2 years ago.
Looking to subset a data frame (all columns numeric) based on a condition. I would like the rows which have a rowSum = 0 to be subsetted. Only able to find a solution to subset the rows which don't equal 0!
Would anyone be able to help?
Thanks in advance.
We can use subset with rowSums
subset(df1, rowSums(df1, na.rm = TRUE) == 0)
Related
This question already has answers here:
Removing rows with zero numbers [duplicate]
(1 answer)
remove rows that sum zero based on one column as key
(2 answers)
How to remove columns and rows that sum to 0 while preserving non-numeric columns
(2 answers)
Closed 2 years ago.
I have a large dataframe and need to remove a few rows from it as rowSums shows these to be 0 and thus impacting my downstream analysis. I've seen that you can do df[rowSums(df[, -1])>0, ] to achieve this, but I get an error stating that "x must be numeric" as both the first and last columns are not numeric. Any ideas of how to get around this?
We need to remove the first and last columns by indexing
i1 <- sapply(df, is.numeric) # // get a logical index for numeric columns
df[rowSums(df[i1]) > 0,]
This question already has answers here:
How to remove columns with same value in R
(4 answers)
Closed 2 years ago.
I have a really large dataset and I want to filter out some of the columns because it is the same data all throughout (ex: company name is all "Walmart"). I can go through and do these manually but I'm looking for a code to do it automatically.
I had in mind a function to subset based on if sum(unique(colnam)) == 1 but not sure how to get it to work. Thanks.
which(sapply(dat, function(col) length(unique(col)) == 1))
This question already has answers here:
Removing NA observations with dplyr::filter()
(4 answers)
Closed 3 years ago.
I want to filter for rows that are NOT "NA" using the filter command. How do I do that?
Where "Dose_extract_IBUPROFEN" is the data frame, and "Drug3" is the variable for which a want to filter rows that are NOT missing (NA), I tried the following, which does not work.
filter(Dose_extract_IBUPROFEN, Drug3 != NA)
You should use !is.na(Drug3) instead of Drug3!=NA because you cannot use normal comparison operators with NA.
This question already has answers here:
How to delete multiple values from a vector?
(9 answers)
Filter data frame rows based on values in vector
(4 answers)
Closed 4 years ago.
Sorry for the silly question, but I have a huge dataframe (called "totaldecade") with columns named:
Event.ID,Event.Date,CAMEO.Code
I want to delete all rows that have number ranges: 10:58, 90:145, 1011:1454, 160:166, 1661:1663, within the CAMEO:Code column.
I have tried:
totaldecade[with(totaldecade, !((CAMEO.Code %between% 10:58) |
(CAMEO.Code %between% 90:145) |
(CAMEO.Code %between% 1011:1454) | (CAMEO.Code %between% 160:166) |
(CAMEO.Code %between% 1661:1663))), ]
But doesn't seem to work.
Any help is appreciated!
Michelle
We get the ranges in a vector, use %in% to create the logical vector and negate (!) to change the FALSE elements to TRUE and viceversa
library(dplyr)
totaldecade %>%
filter(!CAMEO.Code %in% c(10:58, 90:145, 1011:1454, 160:166, 1661:1663))
Or using subset from base R
subset(totaldecade, !CAMEO.Code %in% c(10:58, 90:145, 1011:1454,
160:166, 1661:1663))
This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Closed 4 years ago.
How can I extract two rows (rows a and b below) from my data frame df?
df1 = df[(df$column1 == "a" & "b"),]
Just use the subset function: subset(df, subset = column1 %in% c("a","b"))
Regards.