Exclude rows by identifying a sequence using subset() - r

I am trying to exclude a series of rows from a dataset by using the subset() command by identifying a sequence of numbers in the "Rec" column that I want to remove. My attempts to use : and > within subset have failed, for example:
dataset<-subset(dataset,Rec !1812:1843) #here I'd like to exclude all rows with values of 1812:1843 for Rec in the dataset
or
dataset<-subset(dataset,Rec !>1812) #here I'd like to exclude all rows with Rec>1812
Can someone show me how to use the <> and : operators in this way? Can it be done with subset()?

For inclusion/exclusion based on membership in a list in general, you can use the %in% operator:
dataset <- subset(dataset, !(Rec %in% 1812:1843))

Related

Sorting data in an imported table in R by multiple conditions

I have imported a data set into R studio and want to count the rows that have certain values in multiple columns. The columns I want to sort by are titled "ROW" which I want less than or equal to 90, "house" which I want equal to 1 and "type" which I want equal to 1.
I know that I can use the sum command like this:
sum(data$type==1)
and that returns the rows with the value 1 in the "type" column. I have tried to combine these functions like this:
with(data, sum((type==1),(ROW<=90),(house==1))
to no avail.
Any suggestions on what I can do?
If we need to combine logical expressions, use & (if all of the conditions return TRUE) or | (if any of the conditions return TRUE)
with(data, sum((type==1)&(ROW<=90)&(house==1)))

Is there a function to subset data using a qualitative requirement in a column?

I am having trouble creating a subset for a large dataframe. I need to extract all rows that match one of two correct cities in one of the columns, however any subset that I create ends up empty. Given the main dataframe, I try:
New = data[data$Home.port %in% c("ARDGLASS","NEWLYN")]
However R returns "undefined columns selected"
A comma is missing:
New = data[data$Home.port %in% c("ARDGLASS","NEWLYN"), ]
That is because you are selecting rows, not columns; if you leave out the comma, R tries to subset columns instead of rows.
I recommend to use data.table so:
# install.packages(data.table)
library(data.table)
data <- as.data.table(data)
new_data <- data[Home.port %in% c("ARDGLASS","NEWLYN")]
You can check this web to learn data.table is very fast with big data bases
The subset function will also do this task
new <- subset(data, subset = Home.port %in% c("ARDGLASS","NEWLYN"))
The base approach is functionally the same, its just a matter of using a declarative function for the task or not.
When using subset() the first argument is the data frame you want to subset. When you want to check for several variables you do not need to put "data$" in front. This save time and makes it easier to read.
datasubset <- subset(data, Home.port %in% c("ARDGLASS","NEWLYN"))
You can also use multiple conditions to subset use "&" for AND condition or "|" for OR condition depending on what you plan to do.
datasubset <- subset(data, Home.port == "ARDGLASS" & Home.port == "NEWLYN"))

How to subset data with multiple criteria from one column

I need to create a data subset from multiple "inclusion" criteria from a column (V5:Format) of my df.
I have tried :
new.data <- old.data[grep("text1", old.data$V5), ]
This works for 1 inclusion criteria. I want to add a second inclusion criteria - data must include "text1" & "text2" for data subset
Thanks in advance.
You can use grepl() instead of grep() to get a boolean vector which tells you which strings contain the pattern. On these vectors, you can use logical conditions like &:
new.data <- old.data[grepl("text1", old.data$V5)&grepl("text2", old.data$V5), ]

Subsetting with an if....else statement in r

I am trying to subset a data frame so that if a column name is present I subset but if not I ignore. For the example I will use mtcars data set. What I am trying to accomplish is if there is a column "vs" subset the first 3 columns and vs. This would be a dateframe named "vsdf".
df <- mtcars
if(colnames(df)=="vs") {
vsdf <- df[,1,2,3,"vs"]
} else {
NULL
}
Any help or guidance would be greatly appreciated.
There are two problems with your code:
1) using ==
You want to check whether "vs" is part of the columns names, but since you're using == it means that you're checking whether the column names (all that are present) are exactly "vs". This will only be true if there's only one column and that is called "vs". Instead you need to use %in%, as in
if("vs" %in% colnames(d))
{...}
2) the subetting syntax df[,1,2,3,"vs"]
subsetting a data.frame usually follows the syntax
df[i, j]
where i denotes rows and j denotes columns. Since you want to subset columnns, you'll do this in j. What you did is supply much more arguments to [.data.frame than it takes because you didn't put those values into a vector. The vector can be numeric / integer or a character vector, but not both forms mixed, as you did. Instead you can build the vector like this:
df[, c(names(df)[1:3], "vs")]

How to code this if else clause in R?

I have a function that outputs a list containing strings. Now, I want to check if this list contain strings which are all 0's or if there is at least one string which doesn't contain all 0's (can be more).
I have a large dataset. I am going to execute my function on each of the rows of the dataset. Now,
Basically,
for each row of the dataset
mylst <- func(row[i])
if (mylst(contains strings containing all 0's)
process the next row of the dataset
else
execute some other code
Now, I can code the if-else clause but I am not able to code the part where I have to check the list for all 0's. How can I do this in R?
Thanks!
You can use this for loop:
for (i in seq(nrow(dat))) {
if( !any(grepl("^0+$", dat[i, ])) )
execute some other code
}
where dat is the name of your data frame.
Here, the regex "^0+$" matches a string that consists of 0s only.
I'd like to suggest solution that avoids use of explicit for-loop.
For a given data set df, one can find a logical vector that indicates the rows with all zeroes:
all.zeros <- apply(df,1,function(s) all(grepl('^0+$',s))) # grepl() was taken from the Sven's solution
With this logical vector, it is easy to subset df to remove all-zero rows:
df[!all.zeros,]
and use it for any subsequent transformations.
'Toy' dataset
df <- data.frame(V1=c('00','01','00'),V2=c('000','010','020'))
UPDATE
If you'd like to apply the function to each row first and then analyze the resulting strings, you should slightly modify the all.zeros expression:
all.zeros <- apply(df,1,function(s) all(grepl('^0+$',func(s))))

Resources