Defining what species I want to include in a scatterplot [duplicate] - r

This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Closed 6 years ago.
I have a data frame df with an ID column eg A,B,etc. I also have a vector containing certain IDs:
L <- c("A", "B", "E")
How can I filter the data frame to get only the IDs present in the vector? Individually, I would use
subset(df, ID == "A")
but how do I filter on a whole vector?

You can use the %in% operator:
> df <- data.frame(id=c(LETTERS, LETTERS), x=1:52)
> L <- c("A","B","E")
> subset(df, id %in% L)
id x
1 A 1
2 B 2
5 E 5
27 A 27
28 B 28
31 E 31
If your IDs are unique, you can use match():
> df <- data.frame(id=c(LETTERS), x=1:26)
> df[match(L, df$id), ]
id x
1 A 1
2 B 2
5 E 5
or make them the rownames of your dataframe and extract by row:
> rownames(df) <- df$id
> df[L, ]
id x
A A 1
B B 2
E E 5
Finally, for more advanced users, and if speed is a concern, I'd recommend looking into the data.table package.

I reckon you need to use 'match'. It matches the values in one vector to the values in another vector, and gives NA where there's no match. So then you subset based on !is.na of the match.
See ?match and you can probably work it out for yourself, in which case you'll learn more than from the exact answer someone will do shortly which will just encourage you to cut n paste :)

Related

Filtering multiple Categorical Data [duplicate]

This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Closed 6 years ago.
I have a data frame df with an ID column eg A,B,etc. I also have a vector containing certain IDs:
L <- c("A", "B", "E")
How can I filter the data frame to get only the IDs present in the vector? Individually, I would use
subset(df, ID == "A")
but how do I filter on a whole vector?
You can use the %in% operator:
> df <- data.frame(id=c(LETTERS, LETTERS), x=1:52)
> L <- c("A","B","E")
> subset(df, id %in% L)
id x
1 A 1
2 B 2
5 E 5
27 A 27
28 B 28
31 E 31
If your IDs are unique, you can use match():
> df <- data.frame(id=c(LETTERS), x=1:26)
> df[match(L, df$id), ]
id x
1 A 1
2 B 2
5 E 5
or make them the rownames of your dataframe and extract by row:
> rownames(df) <- df$id
> df[L, ]
id x
A A 1
B B 2
E E 5
Finally, for more advanced users, and if speed is a concern, I'd recommend looking into the data.table package.
I reckon you need to use 'match'. It matches the values in one vector to the values in another vector, and gives NA where there's no match. So then you subset based on !is.na of the match.
See ?match and you can probably work it out for yourself, in which case you'll learn more than from the exact answer someone will do shortly which will just encourage you to cut n paste :)

View all rows where there is a duplicate in one of the columns in R [duplicate]

This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Closed 6 years ago.
I have a data frame df with an ID column eg A,B,etc. I also have a vector containing certain IDs:
L <- c("A", "B", "E")
How can I filter the data frame to get only the IDs present in the vector? Individually, I would use
subset(df, ID == "A")
but how do I filter on a whole vector?
You can use the %in% operator:
> df <- data.frame(id=c(LETTERS, LETTERS), x=1:52)
> L <- c("A","B","E")
> subset(df, id %in% L)
id x
1 A 1
2 B 2
5 E 5
27 A 27
28 B 28
31 E 31
If your IDs are unique, you can use match():
> df <- data.frame(id=c(LETTERS), x=1:26)
> df[match(L, df$id), ]
id x
1 A 1
2 B 2
5 E 5
or make them the rownames of your dataframe and extract by row:
> rownames(df) <- df$id
> df[L, ]
id x
A A 1
B B 2
E E 5
Finally, for more advanced users, and if speed is a concern, I'd recommend looking into the data.table package.
I reckon you need to use 'match'. It matches the values in one vector to the values in another vector, and gives NA where there's no match. So then you subset based on !is.na of the match.
See ?match and you can probably work it out for yourself, in which case you'll learn more than from the exact answer someone will do shortly which will just encourage you to cut n paste :)

how to get all rows with max value of a variable [duplicate]

This question already has answers here:
Extracting indices for data frame rows that have MAX value for named field
(3 answers)
Closed 4 years ago.
I have matrix containing two columns and many rows. The first column name is idCombinaison and the second column name is accuarcy. The accuarcy has a float values.
Now I want to get all rows which the value of accuarcy == max value. In some cases (like depicted in the picture), I can have many rows which the value of accuarcy equals to max, so I want to get all these rows!
I tried this:
maxAccuracy <- subset(accuarcyMatrix, accuarcyMatrix['accuarcy'] == max(accuarcyMatrix['accuarcy']))
But this return an empty vector. Any ideas please?
A reproducible data simulating your matrix:
set.seed(123)
x <- matrix(sample(1:9, 30, T), 10, 3)
row.names(x) <- 1:10
colnames(x) <- LETTERS[1:3]
# A B C
# 1 3 9 9
# 2 8 5 7
# 3 4 7 6
# ...
In matrix objects, you need to use a binary way to extract element such as data[a, b]. Take the above data for example, x["C"] will return NA and x[, "C"] will return all elements in column C. Therefore, the following two codes are going to generate different outputs.
subset(x, x["C"] == max(x["C"]))
# A B C (Empty)
subset(x, x[, "C"] == max(x[, "C"]))
# A B C
# 1 3 9 9
# 4 8 6 9
Maybe something like this?
library(dplyr)
accuarcyMatrix %>%
filter_at(vars(accuarcy),
any_vars(.==max(.))
)
Base R solution (although this is very likely a duplicate):
accuarcyMatrix[ which(accuarcyMatrix$accuarcy == max(accuarcyMatrix$accuarcy) , ]
I'm guessing you will want to change "accuarcy" to "accuracy"

Split column into vectors by group R - independent of column order

Edit
This question seems to be a duplicate of the question How to group a vector into a list of vectors?, and the answer split(df$b, df$id) was suggested. First happy with the solution, I realized that the given answers do not fully address my question. In the below question, I would like to obtain a list in which the vector elements are assigned to the value of a third column (in my example df$a). This is important, as otherwise the order of df$b plays a role. I mean obviously I can arrange by df$a and then call split(), but maybe there is another way of doing that.
My sample df:
df <- data_frame(id = paste0('id',rep(1:2, each = 5)), a = rep(letters[1:5],2),b=c(1:5,5:1))
Df should be grouped by ID (in df$id). I would like to create a list of vectors for each group (id) element that contains the values of df$b. My approach
require(tidyr)
spread_df <- df %>% spread(id,b) #makes new columns for each id
#loop over spread_df
for (i in 1:length(spread_df)) {
list_group_elements [i]<- list(spread_df[[i]])
#I want each vector to be identified by the identifier of column df$a
#therefore:
names(list_group_elements[[i]]) <- list_group_elements[[1]]
}
This results in :
list_group_elements
[[1]]
a b c d e
"a" "b" "c" "d" "e"
[[2]]
a b c d e
1 2 3 4 5
[[3]]
a b c d e
5 4 3 2 1
I don't need the first element of the list, but the rest is basically what I need. I have the peculiar impression that my approach is somewhat not ideal and if someone has an idea to improve this, (e.g., with dplyr?) this would be highly appreciated. Why do I want this: I made a function that uses vectors as arguments and I would like to run this function over certain columns from dataframes - but only using the grouped values as arguments and not the entire column.
You may make df$b a named vector using setNames, and then split it into a list:
split(setNames(df$b, df$a), df$id)
# $id1
# a b c d e
# 1 2 3 4 5
#
# $id2
# a b c d e
# 5 4 3 2 1
One way is
lapply(levels(df$id), function(L) df$b[df$id == L])
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 5 4 3 2 1
Consider by, object-oriented wrapper of tapply, designed to split dataframe by factor(s):
by(df, df$id, FUN=function(i) i$b)

Filtering a data frame on a vector [duplicate]

This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Closed 6 years ago.
I have a data frame df with an ID column eg A,B,etc. I also have a vector containing certain IDs:
L <- c("A", "B", "E")
How can I filter the data frame to get only the IDs present in the vector? Individually, I would use
subset(df, ID == "A")
but how do I filter on a whole vector?
You can use the %in% operator:
> df <- data.frame(id=c(LETTERS, LETTERS), x=1:52)
> L <- c("A","B","E")
> subset(df, id %in% L)
id x
1 A 1
2 B 2
5 E 5
27 A 27
28 B 28
31 E 31
If your IDs are unique, you can use match():
> df <- data.frame(id=c(LETTERS), x=1:26)
> df[match(L, df$id), ]
id x
1 A 1
2 B 2
5 E 5
or make them the rownames of your dataframe and extract by row:
> rownames(df) <- df$id
> df[L, ]
id x
A A 1
B B 2
E E 5
Finally, for more advanced users, and if speed is a concern, I'd recommend looking into the data.table package.
I reckon you need to use 'match'. It matches the values in one vector to the values in another vector, and gives NA where there's no match. So then you subset based on !is.na of the match.
See ?match and you can probably work it out for yourself, in which case you'll learn more than from the exact answer someone will do shortly which will just encourage you to cut n paste :)

Resources