Guidance on missing() in R - r

Suppose a function, G, takes two arguments; a and b: G(a = some number, b = some number).
Now two situations (wondering what commands to use in each case?):
1- if a user puts G(b = some number), will the if(missing(a)){do this} recognize the complete absence of a argument? AND more importantly:
2- if a user puts G(a =, b = some number), still will the if(missing(a)){do this} recognize a = but lack of some number in front of it?

Defining the function as below doesn't throw an error in both the cases:
ch <- function(a=NA,b=NA){ if(is.na(a)) return(b) else( return(a+b)) }
> ch(b=2)
[1] 2
> ch(a=,b=2)
[1] 2

Related

Using R, can I create special functions like %in% just one input (LEFT) or (RIGHT)?

The common notation for factorial is the ! operator in mathematics.
I can create function:
"%!%" = function(n, r=NULL) { factorial(n); }
This works as expected if I pass a NULL or NA to the RHS, which I don't really want to do.
3 %!% NA
3 %!% NULL
3 %!% .
What I would like to do is just enter:
3 %!%
Any suggestions on HOW I can do that? In that setup, I want the LHS (left) to be the input and the RHS (right) to be ignored.
I can do the BOTH no problem:
nPr = function(n, r, replace=FALSE)
{
if(replace) { return( n^r ); }
factorial(n) / factorial(n-r);
}
"%npr%" = "%nPr%" = nPr;
nCr = function(n, r, replace=FALSE)
{
# same function (FALSE, with n+r-1)
if(replace) { return( nCr( (n+r-1), r, replace=FALSE ) ); }
factorial(n) / ( factorial(r) * factorial(n-r) );
}
"%ncr%" = "%nCr%" = nCr;
where
5 %nCr% 3
5 %nPr% 3
work as expected based on selection without replacement.
Question: How to use the special operator with just the LHS?
The follow-on question is the opposite. Let's say I want the LHS (left) to be ignored and focus on the RHS (right). I believe this is how the built-in ? function links to help and ?? links to help.search(). Let's say I wanted to create an %$$% operator that worked that way.
No, you can't do that. The %any% operators are defined by the parser to be binary operators.
You can see all of the operators in R in the ?Syntax help page. Some are binary, some are unary, and some can be either one, but the unary operators always precede the argument. You can attach different functions to most of them (e.g. change the meaning of ! in !x), but you can't change the parser to allow x! to be legal code.

ellipsis ... as function in substitute?

I'm having trouble understanding how/why parentheses work where they otherwise should not work®.
f = function(...) substitute(...()); f(a, b)
[[1]]
a
[[2]]
b
# but, substitute returns ..1
f2 = function(...) substitute(...); f2(a, b)
a
Normally an error is thrown, could not find function "..." or '...' used in an incorrect context, for example when calling (\(...) ...())(5).
What I've tried
I have looked at the source code of substitute to find out why this doesn't happen here. R Internals 1.1.1 and 1.5.2 says ... is of SEXPTYPE DOTSXP, a pairlist of promises. These promises are what is extracted by substitute.
# \-substitute #R
# \-do_substitute #C
# \-substituteList #C recursive
# \-substitute #C
Going line-by-line, I am stuck at substituteList, in which h is the current element of ... being processed. This happens recursively at line 2832 if (TYPEOF(h) == DOTSXP) h = substituteList(h, R_NilValue);. I haven't found exception handling of a ...() case in the source code, so I suspect something before this has happened.
In ?substitute we find substitute works on a purely lexical basis. Does it mean ...() is a parser trick?
parse(text = "(\\(...) substitute(...()))(a, b)") |> getParseData() |> subset(text == "...", select = c(7, 9))
#> token text
#> 4 SYMBOL_FORMALS ...
#> 10 SYMBOL_FUNCTION_CALL ...
The second ellipsis is recognized during lexical analysis as the name of a function call. It doesn't have its own token like |> does. The output is a pairlist ( typeof(f(a, b)) ), which in this case is the same as a regular list (?). I guess it is not a parser trick. But whatever it is, it has been around for a while!
Question:
How does ...() work?
Note: When referring to documentation and source code, I provide links to an unofficial GitHub mirror of R's official Subversion repository. The links are bound to commit 97b6424 in the GitHub repo, which maps to revision 81461 in the Subversion repo (the latest at the time of this edit).
substitute is a "special" whose arguments are not evaluated (doc).
typeof(substitute)
[1] "special"
That means that the return value of substitute may not agree with parser logic, depending on how the unevaluated arguments are processed internally.
In general, substitute receives the call ...(<exprs>) as a LANGSXP of the form (pseudocode) pairlist(R_DotsSymbol, <exprs>) (doc). The context of the substitute call determines how the SYMSXP R_DotsSymbol is processed. Specifically, if substitute was called inside of a function with ... as a formal argument and rho as its execution environment, then the result of
findVarInFrame3(rho, R_DotsSymbol, TRUE)
in the body of C utility substituteList (source) is either a DOTSXP or R_MissingArg—the latter if and only if f was called without arguments (doc). In other contexts, the result is R_UnboundValue or (exceptionally) some other SEXP—the latter if and only if a value is bound to the name ... in rho. Each of these cases is handled specially by substituteList.
The multiplicity in the processing of R_DotsSymbol is the reason why these R statements give different results:
f0 <- function() substitute(...(n = 1)); f0()
## ...(n = 1)
f1 <- function(...) substitute(...(n = 1)); f1()
## $n
## [1] 1
g0 <- function() {... <- quote(x); substitute(...(n = 1))}; g0()
## Error in g0() : '...' used in an incorrect context
g1 <- function(...) {... <- quote(x); substitute(...(n = 1))}; g1()
## Error in g1() : '...' used in an incorrect context
h0 <- function() {... <- NULL; substitute(...(n = 1))}; h0()
## $n
## [1] 1
h1 <- function(...) {... <- NULL; substitute(...(n = 1))}; h1()
## $n
## [1] 1
Given how ...(n = 1) is parsed, you might have expected f1 to return call("...", n = 1), both g0 and g1 to return call("x", n = 1), and both h0 and h1 to throw an error, but that is not the case for the above, mostly undocumented reasons.
Internals
When called inside of the R function f,
f <- function(...) substitute(...(<exprs>))
substitute evaluates a call to the C utility do_substitute—you can learn this by looking here—in which argList gets a LISTSXP of the form pairlist(x, R_MissingArg), where x is a LANGSXP of the form pairlist(R_DotsSymbol, <exprs>) (source).
If you follow the body of do_substitute, then you will find that the value of t passed to substituteList from do_substitute is a LISTSXP of the form pairlist(copy_of_x) (source).
It follows that the while loop inside of the substituteList call (source) has exactly one iteration and that the statement CAR(el) == R_DotsSymbol in the body of the loop (source) is false in that iteration.
In the false branch of the conditional (source), h gets the value
pairlist(substituteList(copy_of_x, env)). The loop exits and substituteList returns h to do_substitute, which in turn returns CAR(h) to R (source 1, 2, 3).
Hence the return value of substitute is substituteList(copy_of_x, env), and it remains to deduce the identity of this SEXP. Inside of this call to substituteList, the while loop has 1+m iterations, where m is the number of <exprs>. In the first iteration, the statement CAR(el) == R_DotsSymbol in the body of the loop is true.
In the true branch of the conditional (source), h is either a DOTSXP or R_MissingArg, because f has ... as a formal argument (doc). Continuing, you will find that substituteList returns:
R_NilValue if h was R_MissingArg in the first while iteration and m = 0,
or, otherwise,
a LISTSXP listing the expressions in h (if h was a DOTSXP in the first while iteration) followed by <exprs> (if m > 1), all unevaluated and without substitutions, because the execution environment of f is empty at the time of the substitute call.
Indeed:
f <- function(...) substitute(...())
is.null(f())
## [1] TRUE
f <- function(...) substitute(...(n = 1))
identical(f(a = sin(x), b = zzz), pairlist(a = quote(sin(x)), b = quote(zzz), n = 1))
## [1] TRUE
Misc
FWIW, it helped me to recompile R after adding some print statements to coerce.c. For example, I added the following before UNPROTECT(3); in the body of do_substitute (source):
Rprintf("CAR(t) == R_DotsSymbol? %d\n",
CAR(t) == R_DotsSymbol);
if (TYPEOF(CAR(t)) == LISTSXP || TYPEOF(CAR(t)) == LANGSXP) {
Rprintf("TYPEOF(CAR(t)) = %s, length(CAR(t)) = %d\n",
type2char(TYPEOF(CAR(t))), length(CAR(t)));
Rprintf("CAR(CAR(t)) = R_DotsSymbol? %d\n",
CAR(CAR(t)) == R_DotsSymbol);
Rprintf("TYPEOF(CDR(CAR(t))) = %s, length(CDR(CAR(t))) = %d\n",
type2char(TYPEOF(CDR(CAR(t)))), length(CDR(CAR(t))));
}
if (TYPEOF(s) == LISTSXP || TYPEOF(s) == LANGSXP) {
Rprintf("TYPEOF(s) = %s, length(s) = %d\n",
type2char(TYPEOF(s)), length(s));
Rprintf("TYPEOF(CAR(s)) = %s, length(CAR(s)) = %d\n",
type2char(TYPEOF(CAR(s))), length(CAR(s)));
}
which helped me confirm what was going into and coming out of the substituteList call on the previous line:
f <- function(...) substitute(...(n = 1))
invisible(f(hello, world, hello(world)))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 2
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = pairlist, length(CDR(CAR(t))) = 1
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = pairlist, length(CAR(s)) = 4
invisible(substitute(...()))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 1
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = NULL, length(CDR(CAR(t))) = 0
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = language, length(CAR(s)) = 1
Obviously, compiling R with debugging symbols and running R under a debugger helps, too.
Another puzzle
Just noticed this oddity:
g <- function(...) substitute(...(n = 1), new.env())
gab <- g(a = sin(x), b = zzz)
typeof(gab)
## [1] "language"
gab
## ...(n = 1)
Someone here can do another deep dive to find out why the result is a LANGSXP rather than a LISTSXP when you supply env different from environment() (including env = NULL).

How do I create a function that takes a function as input and returns this function modified but with same input arguments?

Let for example function g be defined by g(x):=x+1.
I want to programm a function f which can take a arbitrary function h(a_1,...,a_n) (a_1,...,a_n being the arguments) and returns the function g(h). So that
f(h)(a_1=1,...,a_n=n) works and returns the same as g(h(a_1=1,...,a_n=n)).
So we need something like
f <- (h){
- get the arguments of h and put them in a list/vector arg(I found functions that do that)
- return a function ´f(h)´ that has the elements of arg as arguments. (I am not sure how to do that)
}
I'm not sure I understand your question since what you wrote seems ok but is that what you are looking for?
somelistorvector <- list(a = 1, b = 2)
fct <- function(arg){
arg[[1]] + arg[[2]] # arg[["a"]] + arg[["b"]] could also work
}
fct(somelistorvector)
[1] 3
Also are the arguments always going to be a and b or element 1 and 2 ?

Is it possible to overload functions in Scilab?

I would like to know how to overload a function in scilab. It doesn't seem to be as simple as in C++. For example,
function [A1,B1,np1]=pivota_parcial(A,B,n,k,np)
.......//this is just an example// the code doesn't really matter
endfunction
//has less input/output variables//operates differently
function [A1,np1]=pivota_parcial(A,n,k,np)
.......//this is just an example// the code doesn't really matter
endfunction
thanks
Beginner in scilab ....
You can accomplish something like that by combining varargin, varargout and argn() when you implement your function. Take a look at the following example:
function varargout = pivota_parcial(varargin)
[lhs,rhs] = argn();
//first check number of inputs or outputs
//lhs: left-hand side (number of outputs)
//rhs: right-hand side (number of inputs)
if rhs == 4 then
A = varargin(1); B = 0;
n = varargin(2); k = varargin(3);
np = varargin(4);
elseif rhs == 5 then
A = varargin(1); B = varargin(2);
n = varargin(3); k = varargin(4);
np = varargin(5);
else
error("Input error message");
end
//computation goes on and it may depend on (rhs) and (lhs)
//for the sake of running this code, let's just do:
A1 = A;
B1 = B;
np1 = n;
//output
varargout = list(A1,B1,np1);
endfunction
First, you use argn() to check how many arguments are passed to the function. Then, you rename them the way you need, doing A = varargin(1) and so on. Notice that B, which is not an input in the case of 4 inputs, is now set to a constant. Maybe you actually need a value for it anyways, maybe not.
After everything is said and done, you need to set your output, and here comes the part in which using only varargout may not satisfy your need. If you use the last line the way it is, varargout = list(A1,B1,np1), you can actually call the function with 0 and up to 3 outputs, but they will be provided in the same sequence as they appear in the list(), like this:
pivota_parcial(A,B,n,k,np);: will run and the first output A1 will be delivered, but it won't be stored in any variable.
[x] = pivota_parcial(A,B,n,k,np);: x will be A1.
[x,y] = pivota_parcial(A,B,n,k,np);: x will be A1 and y will be B1.
[x,y,z] = pivota_parcial(A,B,n,k,np);: x will be A1, y will be B1, z will be np1.
If you specifically need to change the order of the output, you'll need to do the same thing you did with your inputs: check the number of outputs and use that to define varargout for each case. Basically, you'll have to change the last line by something like the following:
if lhs == 2 then
varargout = list(A1,np1);
elseif lhs == 3 then
varargout = list(A1,B1,np1);
else
error("Output error message");
end
Note that even by doing this, the ability to call this functions with 0 and up to 2 or 3 outputs is retained.

Getting count of occurrences for X in string

Im looking for a function like Pythons
"foobar, bar, foo".count("foo")
Could not find any functions that seemed able to do this, in a obvious way. Looking for a single function or something that is not completely overkill.
Julia-1.0 update:
For single-character count within a string (in general, any single-item count within an iterable), one can use Julia's count function:
julia> count(i->(i=='f'), "foobar, bar, foo")
2
(The first argument is a predicate that returns a ::Bool).
For the given example, the following one-liner should do:
julia> length(collect(eachmatch(r"foo", "bar foo baz foo")))
2
Julia-1.7 update:
Starting with Julia-1.7 Base.Fix2 can be used, through ==('f') below, as to shorten and sweeten the syntax:
julia> count(==('f'), "foobar, bar, foo")
2
What about regexp ?
julia> length(matchall(r"ba", "foobar, bar, foo"))
2
I think that right now the closest built-in thing to what you're after is the length of a split (minus 1). But it's not difficult to specifically create what you're after.
I could see a searchall being generally useful in Julia's Base, similar to matchall. If you don't care about the actual indices, you could just use a counter instead of growing the idxs array.
function searchall(s, t; overlap::Bool=false)
idxfcn = overlap ? first : last
r = findnext(s, t, firstindex(t))
idxs = typeof(r)[] # Or to only count: n = 0
while r !== nothing
push!(idxs, r) # n += 1
r = findnext(s, t, idxfcn(r) + 1)
end
idxs # return n
end
Adding an answer to this which allows for interpolation:
julia> a = ", , ,";
julia> b = ",";
julia> length(collect(eachmatch(Regex(b), a)))
3
Actually, this solution breaks for some simple cases due to use of Regex. Instead one might find this useful:
"""
count_flags(s::String, flag::String)
counts the number of flags `flag` in string `s`.
"""
function count_flags(s::String, flag::String)
counter = 0
for i in 1:length(s)
if occursin(flag, s)
s = replace(s, flag=> "", count=1)
counter+=1
else
break
end
end
return counter
end
Sorry to post another answer instead of commenting previous one, but i've not managed how to deal with code blocks in comments :)
If you don't like regexps, maybe a tail recursive function like this one (using the search() base function as Matt suggests) :
function mycount(what::String, where::String)
function mycountacc(what::String, where::String, acc::Int)
res = search(where, what)
res == 0:-1 ? acc : mycountacc(what, where[last(res) + 1:end], acc + 1)
end
what == "" ? 0 : mycountacc(what, where, 0)
end
This is simple and fast (and does not overflow the stack):
function mycount2(where::String, what::String)
numfinds = 0
starting = 1
while true
location = search(where, what, starting)
isempty(location) && return numfinds
numfinds += 1
starting = location.stop + 1
end
end
one liner: (Julia 1.3.1):
julia> sum([1 for i = eachmatch(r"foo", "foobar, bar, foo")])
2
Since Julia 1.3, there has been a count method that does exactly this.
count(
pattern::Union{AbstractChar,AbstractString,AbstractPattern},
string::AbstractString;
overlap::Bool = false,
)
Return the number of matches for pattern in string.
This is equivalent to calling length(findall(pattern, string)) but more
efficient.
If overlap=true, the matching sequences are allowed to overlap indices in the
original string, otherwise they must be from disjoint character ranges.
│ Julia 1.3
│
│ This method requires at least Julia 1.3.
julia> count("foo", "foobar, bar, foo")
2
julia> count("ana", "bananarama")
1
julia> count("ana", "bananarama", overlap=true)
2

Resources