How to subset data with multiple criteria from one column - r

I need to create a data subset from multiple "inclusion" criteria from a column (V5:Format) of my df.
I have tried :
new.data <- old.data[grep("text1", old.data$V5), ]
This works for 1 inclusion criteria. I want to add a second inclusion criteria - data must include "text1" & "text2" for data subset
Thanks in advance.

You can use grepl() instead of grep() to get a boolean vector which tells you which strings contain the pattern. On these vectors, you can use logical conditions like &:
new.data <- old.data[grepl("text1", old.data$V5)&grepl("text2", old.data$V5), ]

Related

R subsetting if both conditions are met

I am trying to subset a data frame df.1 based on two conditions:
observations in Accession variable should contain ;
observations in kinase.or.not should be kinase
Below is the code I used. But it seems that the first condition grep(";", df.1$Accession) is ignored. Why is that? Thanks!
df.2 <- df.1[grep(";", df.1$Accession) & df.1$kinase.or.not == "Kinase",]
We need grepl instead of grep - the difference is grep returns the numeric position index whereas grepl returns a logical vector which can be used along with & to join the compound expression
df.1[grepl(";", df.1$Accession) & df.1$kinase.or.not == "Kinase",]

Sorting data in an imported table in R by multiple conditions

I have imported a data set into R studio and want to count the rows that have certain values in multiple columns. The columns I want to sort by are titled "ROW" which I want less than or equal to 90, "house" which I want equal to 1 and "type" which I want equal to 1.
I know that I can use the sum command like this:
sum(data$type==1)
and that returns the rows with the value 1 in the "type" column. I have tried to combine these functions like this:
with(data, sum((type==1),(ROW<=90),(house==1))
to no avail.
Any suggestions on what I can do?
If we need to combine logical expressions, use & (if all of the conditions return TRUE) or | (if any of the conditions return TRUE)
with(data, sum((type==1)&(ROW<=90)&(house==1)))

Subsetting with an if....else statement in r

I am trying to subset a data frame so that if a column name is present I subset but if not I ignore. For the example I will use mtcars data set. What I am trying to accomplish is if there is a column "vs" subset the first 3 columns and vs. This would be a dateframe named "vsdf".
df <- mtcars
if(colnames(df)=="vs") {
vsdf <- df[,1,2,3,"vs"]
} else {
NULL
}
Any help or guidance would be greatly appreciated.
There are two problems with your code:
1) using ==
You want to check whether "vs" is part of the columns names, but since you're using == it means that you're checking whether the column names (all that are present) are exactly "vs". This will only be true if there's only one column and that is called "vs". Instead you need to use %in%, as in
if("vs" %in% colnames(d))
{...}
2) the subetting syntax df[,1,2,3,"vs"]
subsetting a data.frame usually follows the syntax
df[i, j]
where i denotes rows and j denotes columns. Since you want to subset columnns, you'll do this in j. What you did is supply much more arguments to [.data.frame than it takes because you didn't put those values into a vector. The vector can be numeric / integer or a character vector, but not both forms mixed, as you did. Instead you can build the vector like this:
df[, c(names(df)[1:3], "vs")]

R: Assign values to a new column based on values of another column where a condition is satisfied

I want to create a new column in a data.frame where its value is equal to the value in another data.frame where a particular condition is satisfied between two columns in each data frame.
The R pseudo-code being something like this:
DF1$Activity <- DF2$Activity where DF2$NAME == DF1$NAME
In each data.frame values for $NAME are unique in the column.
Use the ifelse function. Here, I put NA when the condition is not met. However, you may choose any value or values from any vector.
Recycling rules1 apply.
DF1$Activity <- ifelse(DF2$NAME == DF1$NAME, DF2$Activity, NA)
I'm not sure this one actually needs an example. What happens when you create a column with a set of NA values and then assign the required rows with the same logical vector on both sides:
DF1$Activity <- NA
DF1$Activity[DF2$NAME == DF1$NAME] <- DF2$Activity[DF2$NAME == DF1$NAME]
without an example its quite hard to tell. But from your description it sounds like a base::merge or dplyr::inner_join operation. Those are quite fast in comparison to if statements.
Cheers

Exclude rows by identifying a sequence using subset()

I am trying to exclude a series of rows from a dataset by using the subset() command by identifying a sequence of numbers in the "Rec" column that I want to remove. My attempts to use : and > within subset have failed, for example:
dataset<-subset(dataset,Rec !1812:1843) #here I'd like to exclude all rows with values of 1812:1843 for Rec in the dataset
or
dataset<-subset(dataset,Rec !>1812) #here I'd like to exclude all rows with Rec>1812
Can someone show me how to use the <> and : operators in this way? Can it be done with subset()?
For inclusion/exclusion based on membership in a list in general, you can use the %in% operator:
dataset <- subset(dataset, !(Rec %in% 1812:1843))

Resources