I know that the %>% operator allows one to input the LHS to the first argument of the RHS, (so that xxx %>% fun() is equivalent to fun(xxx,)) which allows us to "chain" functions together, but is there a way to generalize this operation so that I can pass the LHS to the "nth" argument of the RHS? I am using the R programming language.
You use the . to pass the LHS into the desired named argument on right. If you want to replace 'hey' with 'ho' in 'hey ho' using gsub(pattern,replacement,text) then you could do any of the following. Note, %>% does not pass the LHS into the first argument of the function, but the first unnamed argument (see the third example below).
'hey ho' %>% gsub('hey','ho',.)
'hey ho' %>% gsub('hey','ho',text=.)
'hey ho' %>% gsub(pattern='hey',replacement='ho')
Related
Let's say I need to do this:
foo <- list(`a+b` = 5)
but I have 'a+b' (a string) saved in a variable, let's say name:
name <- 'a+b'
How to create that list with an element whose name is the value in the variable name?
Note: I am aware of other ways of assigning names to elements of lists. The list here is just an example. What I want to understand is how I deal with non-standard evaluation so that I can indicate to a function the named argument without having to type it directly inline.
I have read Hadley's Advanced R Chapter 13 on Non-standard evaluation but I am lost on how to do this still.
Any solution with base R or Tidy Evaluation is appreciated.
We can use setNames
bar <- setNames(list(5), name)
identical(foo, bar)
#[1] TRUE
Or create the object first and then use names
bar2 <- list(5)
names(bar2) <- name
or with names<-
bar3 <- `names<-`(list(5), name)
Also, the tidyverse option would be to unquote (!!) and assign (:=)
library(tidyverse)
lst(!! name := 5)
#$`a+b`
#[1] 5
I wanted to see the difference between infix operator and compound assignment operator in r . So I used this example. This example may not be a good choice.
library("magrittr", lib.loc="~/R/R-3.5.1/library")
x <- rnorm(10)
x
bb=x %<>% abs%>% sort
bb1=x %>% abs %>% sort
But seems like i am getting the same answer in both cases.
Is there any unique uses of compound assignment operator ??
Thank you
I am trying to do a filter in dplyr where a column is like certain observations. I can use sqldf as
Test <- sqldf("select * from database
Where SOURCE LIKE '%ALPHA%'
OR SOURCE LIKE '%BETA%'
OR SOURCE LIKE '%GAMMA%'")
I tried to use the following which doesn't return any results:
database %>% dplyr::filter(SOURCE %like% c('%ALPHA%', '%BETA%', '%GAMMA%'))
Thanks
You can use grepl with ALPHA|BETA|GAMMA, which will match if any of the three patterns is contained in SOURCE column.
database %>% filter(grepl('ALPHA|BETA|GAMMA', SOURCE))
If you want it to be case insensitive, add ignore.case = T in grepl.
%like% is from the data.table package. You're probably also seeing this warning message:
Warning message:
In grepl(pattern, vector) :
argument 'pattern' has length > 1 and only the first element will be used
The %like% operator is just a wrapper around the grepl function, which does string matching using regular expressions. So % aren't necessary, and in fact they represent literal percent signs.
You can only supply one pattern to match at a time, so either combine them using the regex 'ALPHA|BETA|GAMMA' (as Psidom suggests) or break the tests into three statements:
database %>%
dplyr::filter(
SOURCE %like% 'ALPHA' |
SOURCE %like% 'BETA' |
SOURCE %like% 'GAMMA'
)
Building on Psidom and Nathan Werth's response, for a Tidyverse friendly and concise method, we can do;
library(data.table); library(tidyverse)
database %>%
dplyr::filter(SOURCE %ilike% "ALPHA|BETA|GAMMA") # ilike = case insensitive fuzzysearch
Using R 3.2.2 and dplyr 0.7.2 I'm trying to figure out how to effectively use group_by with fields supplied as character vectors.
Selecting is easy I can select a field via string like this
(function(field) {
mpg %>% dplyr::select(field)
})("cyl")
Multiple fields via multiple strings like this
(function(...) {
mpg %>% dplyr::select(!!!quos(...))
})("cyl", "hwy")
and multiple fields via one character vector with length > 1 like this
(function(fields) {
mpg %>% dplyr::select(fields)
})(c("cyl", "hwy"))
With group_by I cannot really find a way to do this for more than one string because if I manage to get an output it ends up grouping by the string I supply.
I managed to group by one string like this
(function(field) {
mpg %>% group_by(!!field := .data[[field]]) %>% tally()
})("cyl")
Which is already quite ugly.
Does anyone know what I have to write so I can run
(function(field) {...})("cyl", "hwy")
and
(function(field) {...})(c("cyl", "hwy"))
respectively? I tried all sorts of combinations of !!, !!!, UQ, enquo, quos, unlist, etc... and saving them in intermediate variables because that sometimes seems to make a difference, but cannot get it to work.
select() is very special in dplyr. It doesn't accept columns, but column names or positions. So that's about the only main verb that accepts strings. (Technically when you supply a bare name like cyl to select, it actually gets evaluated as its own name, not as the vector inside the data frame.)
If you want your function to take simple strings, as opposed to bare expressions or symbols, you don't need quosures. Just create symbols from the strings and unquote them:
myselect <- function(...) {
syms <- syms(list(...))
select(mtcars, !!! syms)
}
mygroup <- function(...) {
syms <- syms(list(...))
group_by(mtcars, !!! syms)
}
myselect("cyl", "disp")
mygroup("cyl", "disp")
To debug the unquoting, wrap with expr() and check that the expression looks right:
syms <- syms(list("cyl", "disp"))
expr(group_by(mtcars, !!! syms))
#> group_by(mtcars, cyl, disp) # yup, looks right!
See this talk for more on this (we'll update the programming vignette to make the concepts clearer): https://schd.ws/hosted_files/user2017/43/tidyeval-user.pdf.
Finally, note that many verbs have a _at suffix variant that accepts strings and character vectors without fuss:
group_by_at(mtcars, c("cyl", "disp"))
I am quite new to R.
Using the table called SE_CSVLinelist_clean, I want to extract the rows where the Variable called where_case_travelled_1 DOES NOT contain the strings "Outside Canada" OR "Outside province/territory of residence but within Canada". Then create a new table called SE_CSVLinelist_filtered.
SE_CSVLinelist_filtered <- filter(SE_CSVLinelist_clean,
where_case_travelled_1 %in% -c('Outside Canada','Outside province/territory of residence but within Canada'))
The code above works when I just use "c" and not "-c".
So, how do I specify the above when I really want to exclude rows that contains that outside of the country or province?
Note that %in% returns a logical vector of TRUE and FALSE. To negate it, you can use ! in front of the logical statement:
SE_CSVLinelist_filtered <- filter(SE_CSVLinelist_clean,
!where_case_travelled_1 %in%
c('Outside Canada','Outside province/territory of residence but within Canada'))
Regarding your original approach with -c(...), - is a unary operator that "performs arithmetic on numeric or complex vectors (or objects which can be coerced to them)" (from help("-")). Since you are dealing with a character vector that cannot be coerced to numeric or complex, you cannot use -.
Try putting the search condition in a bracket, as shown below. This returns the result of the conditional query inside the bracket. Then test its result to determine if it is negative (i.e. it does not belong to any of the options in the vector), by setting it to FALSE.
SE_CSVLinelist_filtered <- filter(SE_CSVLinelist_clean,
(where_case_travelled_1 %in% c('Outside Canada','Outside province/territory of residence but within Canada')) == FALSE)
Just be careful with the previous solutions since they require to type out EXACTLY the string you are trying to detect.
Ask yourself if the word "Outside", for example, is sufficient. If so, then:
data_filtered <- data %>%
filter(!str_detect(where_case_travelled_1, "Outside")
A reprex version:
iris
iris %>%
filter(!str_detect(Species, "versicolor"))
Quick fix. First define the opposite of %in%:
'%ni%' <- Negate("%in%")
Then apply:
SE_CSVLinelist_filtered <- filter(
SE_CSVLinelist_clean,
where_case_travelled_1 %ni% c('Outside Canada',
'Outside province/territory of residence but within Canada'))