dplyr::select one column and output as vector [duplicate] - r

This question already has answers here:
Extract a dplyr tbl column as a vector
(8 answers)
Closed 8 years ago.
dplyr::select results in a data.frame, is there a way to make it return a vector if the result is one column?
Currently, I have to do extra step (res <- res$y) to convert it to vector from data.frame, see this example:
#dummy data
df <- data.frame(x = 1:10, y = LETTERS[1:10], stringsAsFactors = FALSE)
#dplyr filter and select results in data.frame
res <- df %>% filter(x > 5) %>% select(y)
class(res)
#[1] "data.frame"
#desired result is a character vector
res <- res$y
class(res)
#[1] "character"
Something as below:
res <- df %>% filter(x > 5) %>% select(y) %>% as.character
res
# This gives strange output
[1] "c(\"F\", \"G\", \"H\", \"I\", \"J\")"
# I need:
# [1] "F" "G" "H" "I" "J"

The best way to do it (IMO):
library(dplyr)
df <- data_frame(x = 1:10, y = LETTERS[1:10])
df %>%
filter(x > 5) %>%
.$y
In dplyr 0.7.0, you can now use pull():
df %>% filter(x > 5) %>% pull(y)

Something like this?
> res <- df %>% filter(x>5) %>% select(y) %>% sapply(as.character) %>% as.vector
> res
[1] "F" "G" "H" "I" "J"
> class(res)
[1] "character"

You could also try
res <- df %>%
filter(x>5) %>%
select(y) %>%
as.matrix() %>%
c()
#[1] "F" "G" "H" "I" "J"
class(res)
#[1] "character"

Related

R:purrr Finding the list elements that contain named variables

I have a character vector nms of variable names that all appear in at least one of several files. If a variable exists in more than one file, the values will be the same.
I have a named list test_lst where the top-level names are the names of the files. A sublist of the list includes a vector of the names of the variables in the file.
I would like to use purrr go through test_lst and find the first file that contains each of the variables, and return a named list where the names are the filenames and each element is a vector of the variables in nms that exist in that file. And I would like to index the sublist by name, not by position.
It seems like this should be easy, and I don’t know why I can not make it work.
Data:
test_lst <- list(ob1 = list(v1 = list(s1 = "X", s2 = paste0("A", 1:3)), v2 = paste0("A", 4:8)),
ob2 = list(v1 = list(s1 = "X", s2 = paste0("A", 9:11)), v2 = paste0("A", 12:16)))
nms <- c (paste0("A", 1:2), paste0("A", 9:10))
Non-working code:
find_vars <- function(var_names, meta){
map_chr(meta, c("v1", "s2")) -> var_vecs
names(var_vecs)<- names(meta)
map_chr(var_vecs, var_names %in% .) -> out
names(out) <- names(var_vecs)
out
}
find_vars(var_names = nms, meta = test_lst)
Desired output, a list:
$ob1
[1] "A1" "A2"
$ob2
[1] "A9" "A10"
We can use modify_depth
library(tidyverse)
modify_depth(test_lst, 2, ~ enframe(.x) %>%
select(value) %>%
unnest %>%
filter(value %in% nms)) %>%
flatten %>%
keep(~ nrow(.x) > 0) %>%
map(~ .x %>%
pull(value)) %>%
set_names(names(test_lst))
#$ob1
#[1] "A1" "A2"
#$ob2
#[1] "A9" "A10"
Or we can enframe first and then loop through the 'value' column to subset the elements
enframe(test_lst) %>%
unnest %>%
mutate(value = map(value, ~ intersect(nms, unlist(.x)))) %>%
unnest %>%
deframe %>%
split(names(.))
Or using the same notation we used with intersect earlier
map(test_lst, ~ intersect(nms, unlist(.x)))
or another option is melt
library(reshape2)
melt(test_lst) %>%
select(L1, value) %>%
group_by(L1) %>%
filter(value %in% nms) %>%
{split(as.character(.$value), .$L1)}
We can unlist all values of test_lst and find out common values using intersect
lapply(test_lst, function(x) intersect(unlist(x), nms))
#$ob1
#[1] "A1" "A2"
#$ob2
#[1] "A9" "A10"
If you want to use purrr, we can change lapply to map
purrr::map(test_lst, ~intersect(unlist(.), nms))

R select items from list in pipeline

I want to select items by index from a list before applying another function to it using purrr:map. I have tried the following, but can't find a way that works.
require(dplyr)
require(purrr)
dat <- list(1:3,
4:6,
letters[1:3])
# I can select one item
dat[1]
# I can select two items
dat[c(1,2)]
# But how can I do this in a pipeline by index?
dat %>% map(mean)
dat %>%
filter(c(1,2)) %>%
map(mean)
dat %>%
keep(1,2) %>%
map(mean)
dat %>%
select(1,2) %>%
map(mean)
We can use `[` and do
dat %>%
.[c(1, 2)] %>%
map(., mean)
#[[1]]
#[1] 2
#[[2]]
#[1] 5
Or define an alias in the way the magrittr package does it
extract <- `[` # literally the same as magrittr::extract
dat %>%
extract(c(1, 2)) %>%
map(., mean)
Which could also be written as
dat %>% `[`(c(1,2))
Using baseR pipe operator this would read
dat |>
`[`(x = _, j = c(1,2)) |> # R version >= R 4.2.0
lapply(mean)
#[[1]]
#[1] 2
#
#[[2]]
#[1] 5
An option is
library(tidyverse)
keep(dat, seq_along(dat) %in% 1:2) %>%
map(mean)
#[[1]]
#[1] 2
#[[2]]
#[1] 5
Or map with pluck
map(1:2, ~ pluck(dat, .x) %>%
mean)
Or with assign_in
assign_in(dat, 3, NULL) %>%
map(mean)
Or another option is map_if
map_if(dat, is.numeric, mean, .else = ~ NULL) %>%
discard(is.null)
Or with discard
discard(dat, is.character) %>%
map(mean)
or with Filter and map
Filter(is.numeric, dat) %>%
map(mean)
NOTE: All of them gets the expected output.

dplyr::select_ behavior with factor

I use dplyr::select to select columns of my dataset. I observe an interesting phenomenon about select_ with factor and want to ask why this happens.
I have a 4x3 data frame and want to select column "a" and "c"
x <- matrix(1:12, ncol = 3) %>%
as.data.frame() %>%
`colnames<-`(c("a","b", "c"))
# works, output: "a" "c"
x %>% select_(.dots = c("a", "c")) %>% colnames()
# change the search term to a factor, output wrong columns: "a" "b"
x %>% select_(.dots = as.factor(c("a", "c"))) %>% colnames()
Could you kindly give a hint why this happens?
The problem is that the factor is stored internally as integer. So, it is coercing to integer, resulting in 1, 2 and select selects the 1st two. In general, the select_ with .dots method is outdated. We can use quosures or select_at, select_if etc
x %>%
select_at(vars(a, c))
Or
x %>%
select(a, c)
Or
x %>%
select(!!! quos(a, c))

R: Check if all values of one column match uniquely all values of another column

I have a data set with a lot of values. The majority of x matches a value in y uniquely. However some of x match multiple ys. Is there an easy way to find which values of y map to multiple xs?
mydata <- data.frame(x = c(letters,letters), y=c(LETTERS,LETTERS))
mydata$y[c(3,5)] <- "A"
mydata$y[c(10,15)] <- "Z"
mydata %>% foo
[1] "A" "Z"
I apologize if I am missing some obvious command here.
Using dplyr, you can do:
library(dplyr)
mydata <- data.frame(x = letters, y=LETTERS, stringsAsFactors = FALSE)
mydata$y[c(3,5)] <- "A"
mydata$y[c(10,15)] <- "Z"
mydata %>% group_by(y) %>% filter(n() > 1)
If you want to extract just the y values, you can store that to a data frame like this and find unique y values:
df <- mydata %>% group_by(y) %>% filter(n() > 1)
unique(df$y)
Another alternative format to get the same output into is as follows. This returns a single column data frame instead of a vector as above.
mydata %>% group_by(y) %>% filter(n() > 1) %>% select(y) %>% distinct()
use data.table
library(data.table)
setDT(mydata)
mydata[,list(n=length(unique(x))), by=y][n>2,]
# y n
# 1: A 3
# 2: Z 3
If we need the corresponding unique values in 'x'
library(data.table)
setDT(mydata)[,if(.N >2) toString(unique(.SD[[1L]])) , y]
# y V1
#1: A a, c, e
#2: Z j, o, z

use dplyr to get values of a column [duplicate]

This question already has answers here:
Extract a dplyr tbl column as a vector
(8 answers)
Closed 7 years ago.
I'd like to have dplyr return a character vector instead of a data frame. Is there an easy way to do this?
#example data frame
df <- data.frame( x=c('a','b','c','d','e','f','g','h'),
y=c('a','a','b','b','c','c','d','d'),
z=c('a','a','a','a','a','a','d','d'),
stringsAsFactors = FALSE)
#desired output
unique(df$z)
[1] "a" "d"
#dplys's output
df %>%
select(z) %>%
unique()
z
1 a
7 d
Try
library(dplyr)
df %>%
select(z) %>%
unique() %>%
.$z
#[1] "a" "d"
Or using magrittr
library(magrittr)
df %>%
select(z) %>%
unique() %>%
use_series(z)
#[1] "a" "d"

Resources