Subsetting based on multiple conditions in R [duplicate] - r

This question already has answers here:
Subset of rows containing NA (missing) values in a chosen column of a data frame
(7 answers)
Closed 2 years ago.
I would like to subset my data based on two conditions: if X is blank and if Y is blank.
Subsetting based on 1 condition is:
Blank_X <- Q4[is.na(Q4$X),]
How do I add a second condition to this?

Here is one way with subset
Blank_X <- subset(Q4,is.na(Q4$X) & is.na(Q4$Y))
with filter
Blank_X <- Q4 %>% filter(X!= NA & Y!=NA)

You can use & (and) to combine multiple conditions.
Blank_X <- Q4[is.na(Q4$X) & is.na(Q4$Y),]

Related

Split data in R with two specific values of column [duplicate]

This question already has answers here:
Using multiple criteria in subset function and logical operators
(2 answers)
Closed 1 year ago.
The following code takes the data of all the rows in which LABEL =0. How to modify it if I need to take the rows for both 0 and 4 in the LABEL column.
dt<- read.csv("log1.csv")
dt
dt_inactivity <- dt[dt$LABEL==0),]
dt_inactivity
Using dplyr, this should do what you want :
library(dplyr)
dt_inactivity <- filter(dt, LABEL == 0 | LABEL == 4)
See full documentation here : https://dplyr.tidyverse.org/reference/filter.html

is there a way to set all cells in a dataframe in the form of a vector as NA? [duplicate]

This question already has answers here:
R: Count number of objects in list [closed]
(5 answers)
Closed 2 years ago.
I have a dataframe in R, and I am trying to set all cells in the form of a vector, either c(1,2,3) or 1:2 to NA. Is there any easy way to do this?
You can use lengths to count number of elements in each value of column. Set them to NA where the length is greater than 1. Here I am considering dataframe name as df and column name as col_name. Change them according to your data.
df$col_name[lengths(df$col_name) > 1] <- NA

Create a dataframe from a subset of rows in a larger dataframe [duplicate]

This question already has answers here:
Filter multiple values on a string column in dplyr
(6 answers)
Closed 3 years ago.
I have a large dataframe "Marks", containing marks each year from 2014/5-2017/8. I have separated the dataframe into 4 smaller ones, by year of completion using:
marks14 <-
Marks%>%
filter(YearOfCompletion == "2014/5")
marks15 <-
Marks%>%
filter(YearOfCompletion == "2015/6")
marks16 <-
Marks%>%
filter(YearOfCompletion == "2016/7")
marks17 <-
Marks%>%
filter(YearOfCompletion == "2017/8")
I am attempting now to separate the "2016/7" and "2017/8" marks in to one dataframe. I have tried to manipulate the filter function, but I'm unable to figure it out and I can't find the code for this in online cookbooks.
We can use %in% to filter a vector of dates with length greater than or equal to 1
library(dplyr)
Marks %>%
filter(YearOfCompletion %in% c("2016/7", "2016/8"))

replacing NA's in one column with the product of two other columns [duplicate]

This question already has answers here:
Replace NA in column with value in adjacent column
(3 answers)
Closed 4 years ago.
In a column of 7000 rows there are 11 NA's. I want to replace those NA's with the product of two other columns in my data frame
The column with NA's is TOTALCHARGES and the two columns I want to multiply are TENURE and MONTHLYCHARGES.
Find the indices of the missing data:
na.vals <- which(is.na(your_data$TOTALCHARGES))
Modify the relevant elements of TOTALCHARGES (within the data set):
your_data <- transform(your_data,
TOTALCHARGES=replace(TOTALCHARGES,na.vals,
TENURE[na.vals]*MONTHLYCHARGES[na.vals]))
Something like this (assuming df is your data.frame)?
df[is.na(df$TOTALCHARGES), "TOTALCHARGES"] <- df[is.na(df$TOTALCHARGES), "TENURE"] * df[is.na(df$TOTALCHARGES), "MONTHLYCHARGES"]

Subsetting data frame based on values a particular column takes [duplicate]

This question already has answers here:
Subset multiple rows with condition
(3 answers)
Closed 8 years ago.
Here is a trivialized example whose solution would help me greatly.
v.1<- c(5,8,7,2)
v.2<- c("hi", "hello", "hum", "bo")
df<- data.frame(v.1, v.2)
desired.values<- c("hi", "bo")
I would like all rows of the dataset where v.2 takes on one of the desired.values.
Desired output:
5 "hi"
2 "bo"
In my real dataset, v.2 has more than 10000 values and desired.values contains more than 2000 values.
You could try data.table
library(data.table)
setkey(setDT(df),v.2)[desired.values]
Or using base R methods
df[df$v.2 %in% desired.values,]
Or
df[grep(paste(desired.values, collapse="|"), df$v.2),]

Resources