Strange as.Date() behavior - r

I'm using R 4.2.1 with all packages updated to the latest version.
The two lines below differ only in the order of the elements in a concatenated vector, yet the output is completely different.
as.Date(c(Sys.Date(), "2020-09-09"))
as.Date(c("2020-09-09", Sys.Date()))
The output is:
> as.Date(c(Sys.Date(), "2020-09-09"))
[1] "2022-09-16" "2020-09-09"
> as.Date(c("2020-09-09", Sys.Date()))
[1] "2020-09-09" NA
The first line correctly coerces the system date as a string, and the second line coerces it first as a numeric value and then as a string, but I have never before run into a situation where coercion in R depends on the order of elements in a vector...
Can someone explain to me why coercion rules behave this way and where I can read more about it...
And what can I do in a situation when the type of elements inside c() is not known a priori?
Thank you!

The default c() unclasses each argument before combining them (unclass(Sys.Date()) is 19251 [as of today]); this is because "all attributes except names are removed" by (at least the default version of) c(), which includes the class.
The reason for the difference in orders is that c() is an S3 generic function, which means that it dispatches on the class of its first argument, so c(<date>, <character>) calls c.Date(), while c(<character>, <date>) calls the generic version of c() (which falls through to a primitive function in C which I don't want to bother digging through).
The code of c.Date:
function (..., recursive = FALSE)
.Date(c(unlist(lapply(list(...), function(e) unclass(as.Date(e))))))
in other words, it coerces everything to a date, then unclasses it, then turns the vector back to dates once everything is concatenated ...
A possible workaround/solution is to call c.Date() explicitly, if you know that's what you want ...

Related

Referencing recently used objects in R

My question refers to redundant code and a problem that I've been having with a lot of my R-Code.
Consider the following:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
combined_df_putnam$fu_time<-combined_df_putnam$age*365.25
combined_df_einstein$fu_time<-combined_einstein$age*365.25
combined_df_newton$fu_time<-combined_newton$age*365.25
...
combined_leibniz$fu_time<-combined_leibniz$age*365.25
I am trying to slim-down my code to do something like this:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1)
paste0("combined_df_",list_names[0:7]) <- paste0("combined_df_",list_names[0:7])$age*365.25
When I try to do that, I get "target of assignment expands to non-language object".
Basically, I want to create a list that contains descriptors, use that list to create a list of dataframes/lists and use these shortcuts again to do calculations. Right now, I am copy-pasting these assignments and this has led to various mistakes because I failed to replace the "name" from the previous line in some cases.
Any ideas for a solution to my problem would be greatly appreciated!
The central problem is that you are trying to assign a value (or data.frame) to the result of a function.
In paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1), the left-hand-side returns a character vector:
> paste0("combined_df_",list_names[0:7])
[1] "combined_df_putnam" "combined_df_einstein" "combined_df_newton"
[4] "combined_df_kant" "combined_df_hume" "combined_df_locke"
[7] "combined_df_leibniz"
R will not just interpret these strings as variables that should be created and be referenced to. For that, you should look at the function assign.
Similarily, in the code paste0("combined_df_",list_names[0:7])$age*365.25, the paste0 function does not refer to variables, but simply returns a character vector -- for which the $ operator is not accepted.
There are many ways to solve your problem, but I will recommend that you create a function that performs the necessary operations of each data frame. The function should then return the data frame. You can then re-use the function for all 7 philosophers/scientists.

Why are names(x)<-y and "names<-"(x,y) not equivalent?

Consider the following:
y<-c("A","B","C")
x<-z<-c(1,2,3)
names(x)<-y
"names<-"(z,y)
If you run this code, you will discover that names(x)<-y is not identical to "names<-"(z,y). In particular, one sees that names(x)<-y actually changes the names of x whereas "names<-"(z,y) returns z with its names changed.
Why is this? I was under the impression that the difference between writing a function normally and writing it as an infix operator was only one of syntax, rather than something that actually changes the output. Where in the documentation is this difference discussed?
Short answer: names(x)<-y is actually sugar for x<-"names<-"(x,y) and not just "names<-"(x,y). See the the R-lang manual, pages 18-19 (pages 23-24 of the PDF), which comes to basically the same example.
For example, names(x) <- c("a","b") is equivalent to:
`*tmp*`<-x
x <- "names<-"(`*tmp*`, value=c("a","b"))
rm(`*tmp*`)
If more familiar with getter/setter, one can think that if somefunction is a getter function, somefunction<- is the corresponding setter. In R, where each object is immutable, it's more correct to call the setter a replacement function, because the function actually creates a new object identical to the old one, but with an attribute added/modified/removed and replaces with this new object the old one.
In the case example for instance, the names attribute are not just added to x; rather a new object with the same values of x but with the names is created and linked to the x symbol.
Since there are still some doubts about why the issue is discussed in the language doc instead directly on ?names, here is a small recap of this property of the R language.
You can define a function with the name you wish (there are some restrictions of course) and the name does not impact in any way if the function is called "normally".
However, if you name a function with the <- suffix, it becomes a replacement function and allows the parser to apply the function with the mechanism described at the beginning of this answer if called by the syntax foo(x)<-value. See here that you don't call explicitely foo<-, but with a slightly different syntax you obtain an object replacement (since the name).
Although there are not formal restrictions, it's common to define getter/setter in R with the same name (for instance names and names<-). In this case, the <- suffix function is the replacement function of the corresponding version without suffix.
As stated at the beginning, this behaviour is general and a property of the language, so it doesn't need to be discussed in any replacement function doc.
In particular, one sees that names(x)<-y actually changes the names of x whereas "names<-"(z,y) returns z with its names changed.
That’s because `names<-`1 is a regular function, albeit with an odd name2. It performs no assignment, it returns a new object with the names attribute set. In fact `names<-` is a primitive function in R but it could be implemented as follows (there are shorter, better ways of writing this in R, but I want the separate steps to be explicit):
`names<-` = function (x, value) {
new = x
attr(new, 'names') = value
new
}
That is, it
… creates a new object that’s a copy of x,
… sets the names attribute on that newly created object, and
… returns the new object.
Since virtually all objects in R are immutable, this fits naturally into R’s semantics. In fact, a better name for this exact function would be with_names3. But the creators of R found it convenient to be able to write such an assignment without repeating the name of the object. So instead of writing
x = with_names(x, c('foo', 'bar'))
or
x = `names<-`(x, c('foo', 'bar'))
R allows us to write
names(x) = c('foo', 'bar')
R handles this syntax specially by internally converting it to another expression, documented in the Subset assignment section of the R language definition, as explained in the answer by Nicola.
But the gist is that names(x) = y and `names<-`(x, y) are different because … they just are. The former is a special syntactic form that gets recognised and transformed by the R parser. The latter is a regular function call, and the weird function name is a red herring: it doesn’t affect the execution whatsoever. It does the same as if the function was named differently, and you can confirm this by assigning it a different name:
with_names = `names<-`
`another weird(!) name` = `names<-`
# These are all identical:
`names<-`(x, y)
with_names(x, y)
`another weird(!) name`(x, y)
1 I strongly encourage using backtick quotes (`) instead of straight quotes (' or ") to quote R variable names. While both are allowed in some circumstances, the latter invites confusion with strings, and is conceptually bonkers. These are not strings. Consider:
"a" = "b"
"c" = "a"
Rather than copy the value of a into c, what this code actually does is set c to literal "a", because quotes now mean different things on the left- and right-hand side of assignment.
The R documentation confirms that
The preferred quote [for variable names] is the backtick (`)
2 Regular variable names (aka “identifiers” or just “names”) in R can only contain letters, digits, underscore and the dot, must start with a letter, or with a dot not followed by a digit, and can’t be reserved words. But R allows using pretty much arbitrary characters — including punctuation and even spaces! — in variable names, provided the name is backtick-quoted.
3 In fact, R has an almost-alias for this function, called setNames — which isn’t a great name, since set… implies mutating the object, but of course it doesn’t do that.

Why does substitute change noquote text to a string in R?

I wanted to answer a question regarding plotmath but I failed to get my desired substitute output.
My desired output:paste("Hi", paste(italic(yes),"why not?"))
and what I get: paste("Hi", "paste(italic(yes),\"why not?\")")
text<-'paste(italic(yes),"why not?")'
text
[1] "paste(italic(yes),\"why not?\")"
noqoute_text<-noquote(text)
noqoute_text
[1] paste(italic(yes),"why not?")
sub<-substitute(paste("Hi",noqoute_text),
env=list(noqoute_text=noqoute_text))
sub
paste("Hi", "paste(italic(yes),\"why not?\")")
You're using the wrong function, use parse instead of noquote :
text<-'paste(italic(yes),"why not?")'
noquote_text <- parse(text=text)[[1]]
sub<- substitute(paste("Hi",noquote_text),env=list(noquote_text= noquote_text))
# paste("Hi", paste(italic(yes), "why not?"))
noquote just applies a class to an object of type character, with a specific print method not to show the quotes.
str(noquote("a"))
Class 'noquote' chr "a"
unclass(noquote("a"))
[1] "a"
Would you please elaborate on your answer?
In R you ought to be careful about the difference between what's in an object, and what is printed.
What noquote does is :
add "noquote" to the class attribute of the object
That's it
The code is :
function (obj)
{
if (!inherits(obj, "noquote"))
class(obj) <- c(attr(obj, "class"), "noquote")
obj
}
Then when you print it, the methods print.noquote :
Removes the class "noquote" from the object if it's there
calls print with the argument quote = FALSE
that's it
You can actually call print.noquote on a string too :
print.noquote("a")
[1] a
It does print in a similar fashion as quote(a) or substitute(a) would but it's a totally different beast.
In the code you tried, you've been substituting a string instead of a call.
For solving the question I think Moody_Mudskipperss answer works fine, but as you asked for some elaboration...
You need to be careful about different ways similar-looking things are actually stored in R, which means they behave differently.
Especially with the way plotmath handles labels, as they try to emulate the way character-strings are normally handled, but then applies its own rules. The 3 things you are mixing I think:
character() is the most familiar: just a string. Printing can be confusing when quotes etc. are escaped. The function noquote basically tells R to mark it's argument, so that quotes are not escaped.
calls are "unevaluated function-calls": it's an instruction as to what R should do, but it's not yet executed. Any errors in this call don't come up yet, and you can inspect it.
Note that a call does not have its own evironment given with it, which means a call can give different results if evaluated e.g. from within a function.
Expressions are like calls, but applied more generally, i.e. not always a function that needs to be executed. An expression can be a variable-name, but also a simple value such as "why not?". Also, expressions can consist of multiple units, like you would have with {
Different functions can convert between these classes, but sometimes functions (such as paste!) also convert unexpectedly:
noquote does not do that much useful, as Moody_Mudskipper already pointed out: it only changes the printing. But the object basically remains a character
substitute not only substitutes variables, but also converts its first argument into (most often) a call. Here, the print bites you, for when printing a call, there is no provision for special classes of its members. Try it: sub[[3]] from the question gives[1] paste(italic(yes),"why not?")
without any backslashes! Only when printing the full call the noquote-part is lost.
parse is used to transform a character to an expression. Nothing is evaluated yet, but some structure is introduced, so that you could manipulate the expression.
paste is often behaving annoyingly (although as documented), as it can only paste together character-strings. Therefore, if you feed it anything but a character, it firs calls as.character. So if you give it a call, you just get a text-line again. So in your question, even if you'd use parse, as soon as you start pasting thing together, you get the quotes again.
Finally, your problem is harder because it's using plotmaths internal logic.
That means that as soon as you try to evaluate your text, you'll probably get an error "could not find function italic" (or a more confusing error if there is a function italic defined elsewhere). When providing it in plotmath, it works because the call is only evaluated by plotmath, which will give it a nice environment, where italic works as expected.
This all means you need to treat it all as an expression or call. As long as evaluation cannot be done (as long as it's you that handles the expression, instead of plotmath) it all needs to remain an expression or call. Giving substitute a call works, but you can also emulate more closely what happens in R, with
call('paste', 'Hi', parse(text=text)[[1]])

R: IF statement evaluating expression despite condition being FALSE?

I've got a large function in R and the users have the ability to not include/specify an object. If they DO, the code checks to make sure the names in that object match the names in another. If they DON'T, there's no need to do that checking. The code line is:
if(exists("grids")) if(!all(expvarnames %in% names(grids))) {stop("Not all expvar column names found as column names in grids")}
But I'm getting the following error:
Error in match(x, table, nomatch = 0L) : argument "grids" is missing, with no default
Well in this trial run, grids is SUPPOSED to be missing. If I try
if(exists("grids")) print("yay")
Then nothing prints, i.e. the absence of grids means the expression isn't evaluated, which is as I'd expect. So can anyone think why R seems to be evaluating the subsequent IF statement in the main example? Should I slap another set of curly brackets around the second one??
Thanks!
Edit: more problems. Removing "grids," from the functions list of variables means it works if there's no object called grids and you don't specify it in the call (i.e. function(x,grids=whatever)). And keeping "grids," IN the functions list of variables means it works if there IS an object called grids and you do specify it in the call.
Please see this: http://i.imgur.com/9mr1Lwi.png
using exists(grids) is out because exists wants "quotes" and without em everything fails. WITH them ("grids"), I need to decide whether to keep "grids," in the functions list. If I don't, but I specify it in the call (function(x,grids=whatever)) then I get unused argument fail. If I DO, but don't specify it in the call because grids doesn't exist and I don't want to use it, I get match error, grids missing no default.
How do I get around this? Maybe list it in the function variables list as grids="NULL", then rather than if(exists("grids")) do if(grids!="NULL")
I still don't know why the original match problem is happening though. Match is from the expvarnames/grids names checker, which is AFTER if(exists("grids")) which evaluates to FALSE. WAaaaaaaiiiiittttt..... If I specify grids in the function variables list, i.e. simply putting function(x,grids,etc){do stuff}, does that mean the function CREATES an object called grids, within its environment?
Man this is so f'd up....
testfun <- function(x,grids)
{if(exists("grids")) globalgrids<<-grids
print(x+1)}
testfun(1) # Error in testfun(1) : argument "grids" is missing, with no default
testfun <- function(x,grids)
{if(exists("grids")) a<<-c(1,2,3)
print(x+1)}
testfun(1) #2 (and globally assigns a)
So in the first example, the function seems to have created an object called "grids" because exists("grids") evaluates to true. But THEN, ON THE SAME LINE, when asked to do something with grids, it says it doesn't exist! Schroedinger's object?!
This is proven in example 2: grids evaluates true and a is globally assigned then the function does its thing. Madness. Complete madness. Does anyone know WHY this ridiculousness is going on? And is the best solution to use my grids="NULL" default in the functions variables list?
Thanks.
Reproducible example, if you want to but I've already done it for every permutation:
testfun <- function(x,grids)
{if(exists("grids")) if(!all(expvarnames %in% names(grids))) {stop("Not all expvar column names found as column names in grids")}
print(x+1)}
testfun(1)
testfun(x=1,grids=grids)
grids<-data.frame(c(1,2,3),c(1,2,3),c(1,2,3))
expvarnames <- c("a","b","c")
colnames(grids) <- c("a","b","c")
Solution
Adapting your example use:
testfun <- function(x,grids = NULL)
{
if(!is.null(grids)){
if(!all(expvarnames %in% names(grids))){
stop("Not all expvar column names found as column names in grids")
}
print(x+1)
}
}
Using this testfun(1) will return nothing. By specifying a default argument in the function as NULL the function then checks for this (i.e. no argument specified) and then doesn't continue the function if so.
The Reason the Problem Occurs
We go through each of the examples:
testfun <- function(x,grids)
{if(exists("grids")) globalgrids<<-grids
print(x+1)}
testfun(1) # Error in testfun(1) : argument "grids" is missing, with no default
Here we call the function testfun, giving only the x argument. testfun knows it needs two arguments, and so creates local variables x and grids. We have then given an argument to x and so it assigns the value to x. There is no argument to grids, however the variable has still been created, even though no value has been assigned to it. So grids exists, but has no value.
From this exists("grids") will be TRUE, but when we try to do globalgrids<<-grids we will get an error as grids has not been assigned a value, and so we can't assign anything to globalgrids.
testfun <- function(x,grids)
{if(exists("grids")) a<<-c(1,2,3)
print(x+1)}
testfun(1) #2 (and globally assigns a)
This, however is fine. grids exists as in the previous case, and we never actually try and access the value stored in grids, which would cause an error as we have not assigned one.
In the solution, we simply set a default value for grids, which means we can always get something whenever we try and access the variable. Unlike in the previous cases, we will get NULL, not that nothing is stored there.
The main point of this is that when you declare arguments in your function, they are created each time you use the function. They exist. However, if you don't assign them values in your function call then they will exist, but have no value. Then when you try and use them, their lack of values will throw an error.
> a <- c(1,2,3,4)
> b <- c(2,4,6,8)
> if(exists("a")) if(!all(a %in% b)) {stop("Not all a in b")}
Error: Not all a in b
> rm(a)
> if(exists("a")) if(!all(a %in% b)) {stop("Not all a in b")}
>
When a does not exist, the expression does not evaluate, as expected. Before testing your first expression, make sure that grids does not exist by running rm(grids) in the console.
Richard Scriven's comment got me thinking: grids was an argument in my function but was optional, so maybe shouldn't be specified (like anything in "..." optional functions). I commented it out and it worked. Hooray, cheers everyone.

The arcane formals(function(x){})$x

What is the object formals(function(x){})$x?
It's found in the formals of a function, bound to arguments without default value.
Is there any other way to refer to this strange object? Does it have some role other than representing an empty function argument?
Here are some of its properties that can be checked in the console:
> is(formals(function(x){})$x)
[1] "name" "language" "refObject"
> formals(function(x){})$x
> as.character(formals(function(x){})$x)
[1] ""
EDIT: Here are some other ways to get this object:
alist(,)[[1]]
bquote()
quote(expr=)
Background: What is formals(function(x) {})?
Well, to start with (and as documented in ?formals) , formals(function(x) {}) returns a pairlist:
is(formals(function(x){}))
# [1] "pairlist"
Unlike list objects, pairlist objects can have named elements that contain no value -- a very nice thing when constructing a function that has a possibly optional formal argument. From ?pairlist:
tagged arguments with no value are allowed whereas ‘list’ simply ignores them.
To see the difference, compare alist(), which creates pairlists, with list() which constructs 'plain old' lists:
list(x=, y=2)
# Error in list(x = , y = 2) : argument 1 is empty
alist(x=, y=2)
# $x
#
# $y
# [1] 2
Your question: What is formals(function(x) {})$x?
Now to your question about what formals(function(x) {})$x is. My understanding is in some sense its real value is the "empty symbol". You can't, however, get at it from within R because the "empty symbol" is an object that R's developers -- very much by design -- try to entirely hide from R users. (For an interesting discussion of the empty symbol, and why it's kept hidden, see the thread starting here).
When one tries to get at it by indexing an empty-valued element of a pairlist, R's developers foil the attempt by having R return the name of the element instead of its verbotten-for-public-viewing value. (This is, of course, the name object shown in your question).
It's a name or symbol, see ?name, e.g.:
is(as.name('a'))
#[1] "name" "language" "refObject"
The only difference from your example is that you can't use as.name to create an empty one.

Resources