dplyr date as.numeric strange behavior [duplicate] - r

This question already has answers here:
magrittr and date objects
(3 answers)
Closed 6 years ago.
I just noticed a strange and interesting bug:
as.numeric((Sys.Date()-30)-Sys.Date())
#[1] -30
Which is correct. But:
library(dplyr)
(Sys.Date()-30)-Sys.Date() %>% as.numeric()
#[1] "1969-12-02"
If the %>% simply feeds the output into the first argument slot, surely this behavior isn't correct?

I've modified your code to make it reproducible for the future:
date <- as.Date("2016-10-18")
as.numeric((date-30)-date)
#[1] -30
(date-30)-date %>% as.numeric()
#[1] "1969-12-02"
You may also noticed that placing parentheses can change these results:
(date-30)-(date %>% as.numeric())
#[1] "1969-12-02"
((date-30)-date) %>% as.numeric()
#[1] -30
The answer is in order of operations as specified on the Syntax help page. It states that:
The following unary and binary operators are defined. They are listed
in precedence groups, from highest to lowest.
:: ::: access variables in a namespace
$ # component / slot extraction
[ [[ indexing
^ exponentiation (right to left)
- + unary minus and plus
: sequence operator
%any% special operators (including %% and %/%)
* / multiply, divide
+ - (binary) add, subtract
Note here that %any% comes before + - (binary). For the difference between unary and binary operators, I recommend the answer to this question.

Related

How to filter R dataset by multiple partial match strings, similar to SQL % wildcard? [duplicate]

This question already has answers here:
What's the R equivalent of SQL's LIKE 'description%' statement?
(4 answers)
Closed 11 days ago.
I have a dataset with with a field of interest and a list of strings (several hundred of them).
What I want to do is, for each line of the data, to check if the field has any of the partials strings in it.
Essentially, I want to replicate the SQL % wildcard. So, if for example a value is "Game123" and one of my strings is "Ga" I want that to be a match. (But I don't want "OGame" to match "Ga").
I'm hoping to write some statement like this:
df%>%
filter(My_Field contains any one of List_Of_Strings)
How do I fill in that filter statement?
I tried to use the %in% operator but couldn't make it work. I know how to use substrings to check against a single string, but I have a long list of them and need to check all of them.
R filter rows based on multiple partial strings applied to multiple columns: This post is similar to what I'm trying to do, but my list of substrings is 400 plus, so I can't write it all out manually in a grepl statement (I think?)
Since there is no particular dataset or reproductible example, I can think of a way to implement it with two apply functions and a smart use of regex. Remember that the regex operator ^ matches only if the following expression shows up in its beginning.
library(dplyr)
MyField <- c("OGame","Game123","Duck","Dugame","Aldubame")
df <- data.frame(MyField)
ListOfStrings <- c("^Ga","^Du") #Notice the use of ^ here
match_s <- function(patterns,entry){
lapply(patterns,grepl,x = entry) %>% unlist() %>% any()
}
df$match_string <- lapply(df$MyField, match_s, pat = ListOfStrings)
df %>% filter(match_string == 1)
With dplyr (using stringr for words and sentences as examples) and grepl in conjunction with \\b to get the word boundary match at the beginning.
library(stringr)
library(dplyr)
set.seed(22)
tibble(sentences) %>%
rowwise() %>%
filter(any(sapply(words[sample(length(words), 10)], function(x)
grepl(paste0("\\b", x), sentences)))) %>%
ungroup()
# A tibble: 32 × 1
sentences
<chr>
1 It's easy to tell the depth of a well.
2 Kick the ball straight and follow through.
3 A king ruled the state in the early days.
4 March the soldiers past the next hill.
5 The dune rose from the edge of the water.
6 The grass curled around the fence post.
7 Cats and Dogs each hate the other.
8 The harder he tried the less he got done.
9 He knew the skill of the great young actress.
10 The club rented the rink for the fifth night.
# … with 22 more rows
I guess the problem you're facing is this:
You have a list of what could be called key words (what you call "a list of strings") and a vector/column with text (what you call "a field of interest") and your goal is to filter the vector/column on whether or not any of the key words is present. If that's correct the solution might be this:
Data:
a. List of key words:
keys <- c("how", "why", "what")
b. Dataframe with a vector/column of text:
df <- data.frame(
text = c("Hi there", "How are you?", "I'm fine.", "So how's work?", "Ah kinda stressful.", "Why?", "Well you know")
)
Solution:
To filter df on keys in text you need to convert keys into a regex alternation pattern (by collapsing the strings with |). Depending on your keys it may be useful or even necessary to also include word \\boundary markers (in case the keys values need to match as such, but not occurring inside other words). And finally, if there may be an issue with lower- or upper-case, we can use the case-insensitive flag (?i):
df %>%
filter(str_detect(text, str_c("(?i)\\b(", str_c(keys, collapse = "|"), ")\\b")))
text
1 How are you?
2 So how's work?
3 Why?

How come as.character(1) == as.numeric(1) is TRUE? [duplicate]

This question already has answers here:
Why does "one" < 2 equal FALSE in R?
(2 answers)
Why is the expression "1"==1 evaluating to TRUE? [duplicate]
(1 answer)
Closed 3 years ago.
Just like the title says, why does "1" == 1 is TRUE? What is the real reason behind this? Is R trying to be kind or is this something else? I was thinking since "1" (or any numbers it really doesn't matter) where read by R as a character it would automatically return FALSE if compare with as.numeric(1) or as.integer(1).
> as.character(1) == as.numeric(1)
[1] TRUE
or
> "1" == 1
[1] TRUE
I guess it is a simple question but I'd like to get an answer. Thank you.
According to ?==
For numerical and complex values, remember == and != do not allow for the finite representation of fractions, nor for rounding error. Using all.equal with identical is almost always preferable. S
In another paragraph, it is also written
x, y
atomic vectors, symbols, calls, or other objects for which methods have been written. If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
identical(as.character(1), as.numeric(1))
#[1] FALSE

Referencig values in vector [duplicate]

This question already has an answer here:
Order of operator precedence when using ":" (the colon)
(1 answer)
Closed 4 years ago.
I have a vector and would like to extract element 3 and 4. Can you please help me understand what the logic is behind the code version without parenthesis? I appreciate your help.
a=c(1:5)
a[(2+1): 4] # with parenthesis, makes sense
[1] 3 4
a[ 2+1 : 4] # without parenthesis, what is the logic here?
[1] 3 4 5 NA
The : operator is evaluated before the + operator.
Consider
print(c(2+1:4))
This returns
[1] 3 4 5 6
Because a vector 1,2,3,4 is created, then all elements are added by 2.
R Operator Syntax and Precedence
gives an overview over the priority of R's operators. The sequence operator : comes before the arithmetic operators like + or -.

Combining the common elements in two lists in R, using only logical and arithmetic operators

I'm currently attempting to work out the GCD of two numbers (x and y) in R. I'm not allowed to use loops or if, else, ifelse statements. So i'm restricted to logical and arithmetic operators. So far using the code below i've managed to make lists of the factors of x and y.
xfac<-1:x
xfac[x%%fac==0]
This gives me two lists of factors but i'm not sure where to go from here. Is there a way I can combine the common elements in the two lists and then return the greatest value?
Thanks in advance.
Yes, max(intersect(xfac,yfac)) should give the gcd.
You have almost solved the problem. Let's take the example x <- 12 and y <- 18. The GCD is in this case 6.
We can start by creating vectors xf and yf containing the factor decomposition of each number, similar to the code you have shown:
xf <- (1:x)[!(x%%(1:x))]
#> xf
#[1] 1 2 3 4 6 12
yf <- (1:y)[!(y%%(1:y))]
#> yf
#[1] 1 2 3 6 9 18
The parentheses after the negation operator ! are not necessary due to specific rules of operator precedence in R, but I think that they make the code clearer in this case (see fortunes::fortune(138)).
Once we have defined these vectors, we can extract the GCD with
max(xf[xf %in% yf])
#[1] 6
Or, equivalently,
max(yf[yf %in% xf])
#[1] 6

Order of operator precedence when using ":" (the colon)

I am trying to extract values from a vector using numeric vectors expressed in two seemingly equivalent ways:
x <- c(1,2,3)
x[2:3]
# [1] 2 3
x[1+1:3]
# [1] 2 3 NA
I am confused why the expression x[2:3] produces a result different from x[1+1:3] -- the second includes an NA value at the end. What am I missing?
Because the operator : has precedence over + so 1+1:3 is really 1+(1:3) (i. e. 2:4) and not 2:3. Thus, to change the order of execution as defined operator precedence, use parentheses ()
You can see the order of precedence of operators in the help file ?Syntax. Here is the relevant part:
The following unary and binary operators are defined. They are listed in precedence groups, from highest to lowest.
:: ::: access variables in a namespace
$ # component / slot extraction
[ [[ indexing
^ exponentiation (right to left)
- + unary minus and plus
: sequence operator
%any% special operators (including %% and %/%)
* / multiply, divide
+ - (binary) add, subtract

Resources