How to extract rows from dataframe by factors? [duplicate] - r

I have data similar to this:
dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L))
I want to select rows from this data frame based on the values in the fct variable. For example, if I wish to select rows containing either "a" or "c" I can do this:
dt[dt$fct == 'a' | dt$fct == 'c', ]
which yields
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
as expected. But my actual data is more complex and I actually want to select rows based on the values in a vector such as
vc <- c('a', 'c')
So I tried
dt[dt$fct == vc, ]
but of course that doesn't work. I know I could code something to loop through the vector and pull out the rows needed and append them to a new dataframe, but I was hoping there was a more elegant way.
So how can I filter/subset my data based on the contents of the vector vc?

Have a look at ?"%in%".
dt[dt$fct %in% vc,]
fct X
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
You could also use ?is.element:
dt[is.element(dt$fct, vc),]

Similar to above, using filter from dplyr:
filter(df, fct %in% vc)

Another option would be to use a keyed data.table:
library(data.table)
setDT(dt, key = 'fct')[J(vc)] # or: setDT(dt, key = 'fct')[.(vc)]
which results in:
fct X
1: a 2
2: a 7
3: a 1
4: c 3
5: c 5
6: c 9
7: c 2
8: c 4
What this does:
setDT(dt, key = 'fct') transforms the data.frame to a data.table (which is an enhanced form of a data.frame) with the fct column set as key.
Next you can just subset with the vc vector with [J(vc)].
NOTE: when the key is a factor/character variable, you can also use setDT(dt, key = 'fct')[vc] but that won't work when vc is a numeric vector. When vc is a numeric vector and is not wrapped in J() or .(), vc will work as a rowindex.
A more detailed explanation of the concept of keys and subsetting can be found in the vignette Keys and fast binary search based subset.
An alternative as suggested by #Frank in the comments:
setDT(dt)[J(vc), on=.(fct)]
When vc contains values that are not present in dt, you'll need to add nomatch = 0:
setDT(dt, key = 'fct')[J(vc), nomatch = 0]
or:
setDT(dt)[J(vc), on=.(fct), nomatch = 0]

Related

R mutate function with two tables [duplicate]

I have data similar to this:
dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L))
I want to select rows from this data frame based on the values in the fct variable. For example, if I wish to select rows containing either "a" or "c" I can do this:
dt[dt$fct == 'a' | dt$fct == 'c', ]
which yields
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
as expected. But my actual data is more complex and I actually want to select rows based on the values in a vector such as
vc <- c('a', 'c')
So I tried
dt[dt$fct == vc, ]
but of course that doesn't work. I know I could code something to loop through the vector and pull out the rows needed and append them to a new dataframe, but I was hoping there was a more elegant way.
So how can I filter/subset my data based on the contents of the vector vc?
Have a look at ?"%in%".
dt[dt$fct %in% vc,]
fct X
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
You could also use ?is.element:
dt[is.element(dt$fct, vc),]
Similar to above, using filter from dplyr:
filter(df, fct %in% vc)
Another option would be to use a keyed data.table:
library(data.table)
setDT(dt, key = 'fct')[J(vc)] # or: setDT(dt, key = 'fct')[.(vc)]
which results in:
fct X
1: a 2
2: a 7
3: a 1
4: c 3
5: c 5
6: c 9
7: c 2
8: c 4
What this does:
setDT(dt, key = 'fct') transforms the data.frame to a data.table (which is an enhanced form of a data.frame) with the fct column set as key.
Next you can just subset with the vc vector with [J(vc)].
NOTE: when the key is a factor/character variable, you can also use setDT(dt, key = 'fct')[vc] but that won't work when vc is a numeric vector. When vc is a numeric vector and is not wrapped in J() or .(), vc will work as a rowindex.
A more detailed explanation of the concept of keys and subsetting can be found in the vignette Keys and fast binary search based subset.
An alternative as suggested by #Frank in the comments:
setDT(dt)[J(vc), on=.(fct)]
When vc contains values that are not present in dt, you'll need to add nomatch = 0:
setDT(dt, key = 'fct')[J(vc), nomatch = 0]
or:
setDT(dt)[J(vc), on=.(fct), nomatch = 0]

Extracting data from dataframe by logic in R [duplicate]

I have data similar to this:
dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L))
I want to select rows from this data frame based on the values in the fct variable. For example, if I wish to select rows containing either "a" or "c" I can do this:
dt[dt$fct == 'a' | dt$fct == 'c', ]
which yields
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
as expected. But my actual data is more complex and I actually want to select rows based on the values in a vector such as
vc <- c('a', 'c')
So I tried
dt[dt$fct == vc, ]
but of course that doesn't work. I know I could code something to loop through the vector and pull out the rows needed and append them to a new dataframe, but I was hoping there was a more elegant way.
So how can I filter/subset my data based on the contents of the vector vc?
Have a look at ?"%in%".
dt[dt$fct %in% vc,]
fct X
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
You could also use ?is.element:
dt[is.element(dt$fct, vc),]
Similar to above, using filter from dplyr:
filter(df, fct %in% vc)
Another option would be to use a keyed data.table:
library(data.table)
setDT(dt, key = 'fct')[J(vc)] # or: setDT(dt, key = 'fct')[.(vc)]
which results in:
fct X
1: a 2
2: a 7
3: a 1
4: c 3
5: c 5
6: c 9
7: c 2
8: c 4
What this does:
setDT(dt, key = 'fct') transforms the data.frame to a data.table (which is an enhanced form of a data.frame) with the fct column set as key.
Next you can just subset with the vc vector with [J(vc)].
NOTE: when the key is a factor/character variable, you can also use setDT(dt, key = 'fct')[vc] but that won't work when vc is a numeric vector. When vc is a numeric vector and is not wrapped in J() or .(), vc will work as a rowindex.
A more detailed explanation of the concept of keys and subsetting can be found in the vignette Keys and fast binary search based subset.
An alternative as suggested by #Frank in the comments:
setDT(dt)[J(vc), on=.(fct)]
When vc contains values that are not present in dt, you'll need to add nomatch = 0:
setDT(dt, key = 'fct')[J(vc), nomatch = 0]
or:
setDT(dt)[J(vc), on=.(fct), nomatch = 0]

How to select rows on condition? [duplicate]

I have data similar to this:
dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L))
I want to select rows from this data frame based on the values in the fct variable. For example, if I wish to select rows containing either "a" or "c" I can do this:
dt[dt$fct == 'a' | dt$fct == 'c', ]
which yields
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
as expected. But my actual data is more complex and I actually want to select rows based on the values in a vector such as
vc <- c('a', 'c')
So I tried
dt[dt$fct == vc, ]
but of course that doesn't work. I know I could code something to loop through the vector and pull out the rows needed and append them to a new dataframe, but I was hoping there was a more elegant way.
So how can I filter/subset my data based on the contents of the vector vc?
Have a look at ?"%in%".
dt[dt$fct %in% vc,]
fct X
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
You could also use ?is.element:
dt[is.element(dt$fct, vc),]
Similar to above, using filter from dplyr:
filter(df, fct %in% vc)
Another option would be to use a keyed data.table:
library(data.table)
setDT(dt, key = 'fct')[J(vc)] # or: setDT(dt, key = 'fct')[.(vc)]
which results in:
fct X
1: a 2
2: a 7
3: a 1
4: c 3
5: c 5
6: c 9
7: c 2
8: c 4
What this does:
setDT(dt, key = 'fct') transforms the data.frame to a data.table (which is an enhanced form of a data.frame) with the fct column set as key.
Next you can just subset with the vc vector with [J(vc)].
NOTE: when the key is a factor/character variable, you can also use setDT(dt, key = 'fct')[vc] but that won't work when vc is a numeric vector. When vc is a numeric vector and is not wrapped in J() or .(), vc will work as a rowindex.
A more detailed explanation of the concept of keys and subsetting can be found in the vignette Keys and fast binary search based subset.
An alternative as suggested by #Frank in the comments:
setDT(dt)[J(vc), on=.(fct)]
When vc contains values that are not present in dt, you'll need to add nomatch = 0:
setDT(dt, key = 'fct')[J(vc), nomatch = 0]
or:
setDT(dt)[J(vc), on=.(fct), nomatch = 0]

R subset filterout specific dates from vector [duplicate]

I have data similar to this:
dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L))
I want to select rows from this data frame based on the values in the fct variable. For example, if I wish to select rows containing either "a" or "c" I can do this:
dt[dt$fct == 'a' | dt$fct == 'c', ]
which yields
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
as expected. But my actual data is more complex and I actually want to select rows based on the values in a vector such as
vc <- c('a', 'c')
So I tried
dt[dt$fct == vc, ]
but of course that doesn't work. I know I could code something to loop through the vector and pull out the rows needed and append them to a new dataframe, but I was hoping there was a more elegant way.
So how can I filter/subset my data based on the contents of the vector vc?
Have a look at ?"%in%".
dt[dt$fct %in% vc,]
fct X
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
You could also use ?is.element:
dt[is.element(dt$fct, vc),]
Similar to above, using filter from dplyr:
filter(df, fct %in% vc)
Another option would be to use a keyed data.table:
library(data.table)
setDT(dt, key = 'fct')[J(vc)] # or: setDT(dt, key = 'fct')[.(vc)]
which results in:
fct X
1: a 2
2: a 7
3: a 1
4: c 3
5: c 5
6: c 9
7: c 2
8: c 4
What this does:
setDT(dt, key = 'fct') transforms the data.frame to a data.table (which is an enhanced form of a data.frame) with the fct column set as key.
Next you can just subset with the vc vector with [J(vc)].
NOTE: when the key is a factor/character variable, you can also use setDT(dt, key = 'fct')[vc] but that won't work when vc is a numeric vector. When vc is a numeric vector and is not wrapped in J() or .(), vc will work as a rowindex.
A more detailed explanation of the concept of keys and subsetting can be found in the vignette Keys and fast binary search based subset.
An alternative as suggested by #Frank in the comments:
setDT(dt)[J(vc), on=.(fct)]
When vc contains values that are not present in dt, you'll need to add nomatch = 0:
setDT(dt, key = 'fct')[J(vc), nomatch = 0]
or:
setDT(dt)[J(vc), on=.(fct), nomatch = 0]

Select rows from a data frame based on values in a vector

I have data similar to this:
dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L))
I want to select rows from this data frame based on the values in the fct variable. For example, if I wish to select rows containing either "a" or "c" I can do this:
dt[dt$fct == 'a' | dt$fct == 'c', ]
which yields
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
as expected. But my actual data is more complex and I actually want to select rows based on the values in a vector such as
vc <- c('a', 'c')
So I tried
dt[dt$fct == vc, ]
but of course that doesn't work. I know I could code something to loop through the vector and pull out the rows needed and append them to a new dataframe, but I was hoping there was a more elegant way.
So how can I filter/subset my data based on the contents of the vector vc?
Have a look at ?"%in%".
dt[dt$fct %in% vc,]
fct X
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
You could also use ?is.element:
dt[is.element(dt$fct, vc),]
Similar to above, using filter from dplyr:
filter(df, fct %in% vc)
Another option would be to use a keyed data.table:
library(data.table)
setDT(dt, key = 'fct')[J(vc)] # or: setDT(dt, key = 'fct')[.(vc)]
which results in:
fct X
1: a 2
2: a 7
3: a 1
4: c 3
5: c 5
6: c 9
7: c 2
8: c 4
What this does:
setDT(dt, key = 'fct') transforms the data.frame to a data.table (which is an enhanced form of a data.frame) with the fct column set as key.
Next you can just subset with the vc vector with [J(vc)].
NOTE: when the key is a factor/character variable, you can also use setDT(dt, key = 'fct')[vc] but that won't work when vc is a numeric vector. When vc is a numeric vector and is not wrapped in J() or .(), vc will work as a rowindex.
A more detailed explanation of the concept of keys and subsetting can be found in the vignette Keys and fast binary search based subset.
An alternative as suggested by #Frank in the comments:
setDT(dt)[J(vc), on=.(fct)]
When vc contains values that are not present in dt, you'll need to add nomatch = 0:
setDT(dt, key = 'fct')[J(vc), nomatch = 0]
or:
setDT(dt)[J(vc), on=.(fct), nomatch = 0]

Resources