Split data in R with two specific values of column [duplicate] - r

This question already has answers here:
Using multiple criteria in subset function and logical operators
(2 answers)
Closed 1 year ago.
The following code takes the data of all the rows in which LABEL =0. How to modify it if I need to take the rows for both 0 and 4 in the LABEL column.
dt<- read.csv("log1.csv")
dt
dt_inactivity <- dt[dt$LABEL==0),]
dt_inactivity

Using dplyr, this should do what you want :
library(dplyr)
dt_inactivity <- filter(dt, LABEL == 0 | LABEL == 4)
See full documentation here : https://dplyr.tidyverse.org/reference/filter.html

Related

Select rows in dataframe based on a specific value in any column [duplicate]

This question already has answers here:
Finding rows containing a value (or values) in any column
(3 answers)
Closed 1 year ago.
I have a dataframe, that contains e.g. 5 rows and 3 columns:
I would like to select those rows, which contains for example text yellow (rows 1 and 4)?
Use the following to select rows that contain "yellow" in any column:
library(tidyverse)
result <- mydata %>%
filter_all(any_vars(. == "yellow"))
A base R option using subset + rowSums
subset(df,rowSums(df=="yellow")>0)

How do I subset a dataframe's columns if the data is all the same? [duplicate]

This question already has answers here:
How to remove columns with same value in R
(4 answers)
Closed 2 years ago.
I have a really large dataset and I want to filter out some of the columns because it is the same data all throughout (ex: company name is all "Walmart"). I can go through and do these manually but I'm looking for a code to do it automatically.
I had in mind a function to subset based on if sum(unique(colnam)) == 1 but not sure how to get it to work. Thanks.
which(sapply(dat, function(col) length(unique(col)) == 1))

Subsetting based on multiple conditions in R [duplicate]

This question already has answers here:
Subset of rows containing NA (missing) values in a chosen column of a data frame
(7 answers)
Closed 2 years ago.
I would like to subset my data based on two conditions: if X is blank and if Y is blank.
Subsetting based on 1 condition is:
Blank_X <- Q4[is.na(Q4$X),]
How do I add a second condition to this?
Here is one way with subset
Blank_X <- subset(Q4,is.na(Q4$X) & is.na(Q4$Y))
with filter
Blank_X <- Q4 %>% filter(X!= NA & Y!=NA)
You can use & (and) to combine multiple conditions.
Blank_X <- Q4[is.na(Q4$X) & is.na(Q4$Y),]

is there a way to set all cells in a dataframe in the form of a vector as NA? [duplicate]

This question already has answers here:
R: Count number of objects in list [closed]
(5 answers)
Closed 2 years ago.
I have a dataframe in R, and I am trying to set all cells in the form of a vector, either c(1,2,3) or 1:2 to NA. Is there any easy way to do this?
You can use lengths to count number of elements in each value of column. Set them to NA where the length is greater than 1. Here I am considering dataframe name as df and column name as col_name. Change them according to your data.
df$col_name[lengths(df$col_name) > 1] <- NA

Create a dataframe from a subset of rows in a larger dataframe [duplicate]

This question already has answers here:
Filter multiple values on a string column in dplyr
(6 answers)
Closed 3 years ago.
I have a large dataframe "Marks", containing marks each year from 2014/5-2017/8. I have separated the dataframe into 4 smaller ones, by year of completion using:
marks14 <-
Marks%>%
filter(YearOfCompletion == "2014/5")
marks15 <-
Marks%>%
filter(YearOfCompletion == "2015/6")
marks16 <-
Marks%>%
filter(YearOfCompletion == "2016/7")
marks17 <-
Marks%>%
filter(YearOfCompletion == "2017/8")
I am attempting now to separate the "2016/7" and "2017/8" marks in to one dataframe. I have tried to manipulate the filter function, but I'm unable to figure it out and I can't find the code for this in online cookbooks.
We can use %in% to filter a vector of dates with length greater than or equal to 1
library(dplyr)
Marks %>%
filter(YearOfCompletion %in% c("2016/7", "2016/8"))

Resources