Does unique() preserve order?

Does unique() preserve order? - r

Imagine we're using the following code:
set.seed(42)
v <- sample(1:10, 100, T)
v <- sort(v)
unique.v <- unique(v)
Can I be sure that unique.v is already sorted?
In a more general setting, is that true that unique() returns a vector, ordered according to the first entry?
The documentation does not imply this, looking to the source with
?unique
getAnywhere('unique.default')
is not of a much help.
Related questions: one, two.

Here's what I found.
This guide leads us to names.c, where we see
{"unique", do_duplicated, 1, 11, 4, {PP_FUNCALL, PREC_FN, 0}},
After that we move to unique.c and find an entry
SEXP attribute_hidden do_duplicated(SEXP call, SEXP op, SEXP args, SEXP env)
Browsing the code, we stumble upon
dup = duplicated3(x, incomp, fL, nmax);
which is a reference to
static SEXP duplicated3(SEXP x, SEXP incomp, Rboolean from_last, int nmax)
Finally, the main loop here is
for (i = 0; i < n; i++) {
// if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
v[i] = isDuplicated(x, i, &data);
}
So the answer to my question is yes.

Related

Memoization code for "Longest Common Substring" doesn't work as expected

I was able to think of a recursive solution for the problem "Longest Common Substring" but when I try to memoize it, it doesn't seem to work as I expected it to, and throws a wrong answer.
Here is the recursive code.
int lcs(string X, string Y,int i, int j, int count)
{
if (i == 0 || j == 0)
return count;
if (X[i - 1] == Y[j - 1])
count = lcs(X,Y,i - 1, j - 1, count + 1);
count = max(count,max(lcs(X,Y,i, j-1, 0),lcs(X,Y,i - 1, j, 0)));
return count;
}
int longestCommonSubstr(string S1, string S2, int n, int m)
{
return lcs(S1,S2,n,m,0,dp);
}
And here is the memoized code.
int lcs(string X, string Y,int i, int j, int count,vector<vector<vector<int>>>& dp)
{
if (i == 0 || j == 0)
return count;
if(dp[i - 1][j - 1][count] != -1)
return dp[i - 1][j - 1][count];
if (X[i - 1] == Y[j - 1])
count = lcs(X, Y, i - 1, j - 1, count + 1, dp);
count = max(count,max(lcs(X,Y,i, j-1, 0,dp),lcs(X,Y,i - 1, j, 0,dp)));
return dp[i-1][j-1][count]=count;
}
int longestCommonSubstr(string S1, string S2, int n, int m)
{
int maxSize=max(n,m);
vector<vector<vector<int>>> dp(n,vector<vector<int>>(m,vector<int>(maxSize,-1)));
return lcs(S1,S2,n,m,0,dp);
}
I do know that the problem can be solved using a 2D DP vector as well but my objective was to convert my original recursive solution to a memoized solution and not write a solution from scratch. And as I have 3 parameters which are changing, so it should use a 3D DP table.
Can anyone figure out what's wrong or help me out with a 3D DP solution with recursive code same or similar to mine.
Note:-
An interesting observation, the max function for some reason works from left to right on my Mac system and on Ubuntu running under parallels as well, but the same function works from right to left in Windows machine and in online compilers. I do not know the reason but I would be happy to know about it. I'm running the code in an M1 Mac, I don't know if the ARM compiler is different from x86 Mac compiler or not.
Another thing, the memoized code gives different answers depending upon which recursive call is called first on the line,
count = max(count,max(lcs(X,Y,i, j-1, 0),lcs(X,Y,i - 1, j, 0)));
If I swap the positions of the function call statements then it gives a correct output but for that specific test case and probably similar cases.
This Memo solution gives TLE as well in large test cases, and I do not know why.
I recently started studying DP and this is the only question which I wasn't able to solve by just modifying the original recursive solution. It has been two days and I just can't figure out the proper reasons.
Submission Link:- https://practice.geeksforgeeks.org/problems/longest-common-substring1452/1/#
Any help in this regard would be great.

Halide: Using constant_exterior() + vectorize() in OpenCL

I can't generate an OpenCL implementation with Halide when I choose a constant_exterior() type of boundary condition with vectorize scheduling.
When compiling, I get the following error:
Error:
Vector of bool not valid in OpenCL C (yet)
I don't understand why it would need to use a boolean vector..
My function looks something like this:
void dummy_step()
{
Var x("x"), y("y"), c("c");
Func src("src");
Func dst("dst");
// input parameters
ImageParam image(UInt(8), 3, "inputImage");
Param<int> W;
Param<int> H;
// boundary condition
src = constant_exterior(image, 0, 0, W, 0, H);
Expr x0 = cast<int>(x + y);
Expr y0 = cast<int>(x - y);
dst(x, y, c) = cast<uint8_t>(clamp(src(x0, y0, c), 0.0f, 255.0f));
// scheduling
dst.vectorize(x, 4).gpu_tile(x, y, 16, 8).compute_root();
dst.compile_to_file("test", {image, W, H});
}
If I remove .vectorize(x, 4), the code compiles. If I use another boundary condition, let's say, src = repeat_edge(image, 0, W, 0, H); it also works.

constant_exterior checks if each x coordinate in the vector is within the bounds in order to mux between the constant exterior value and the interior values. The result of this check is a vector of booleans. repeat_edge doesn't need to do that check - it can just clamp the coordinates directly using min and max operations.
I suggest not vectorizing this part of the code using a schedule like so:
src.compute_at(dst, x);
dst.vectorize(x, 4).gpu_tile(x, y, 16, 8).compute_root();

Source code of nlm function in stats package

I need to find the source code of the nlm function.
When I use
edit(nlm)
below code appears
function (f, p, ..., hessian = FALSE, typsize = rep(1, length(p)),
fscale = 1, print.level = 0, ndigit = 12, gradtol = 1e-06,
stepmax = max(1000 * sqrt(sum((p/typsize)^2)), 1000), steptol = 1e-06,
iterlim = 100, check.analyticals = TRUE)
{
print.level <- as.integer(print.level)
if (print.level < 0 || print.level > 2)
stop("'print.level' must be in {0,1,2}")
msg <- (1 + c(8, 0, 16))[1 + print.level]
if (!check.analyticals)
msg <- msg + (2 + 4)
.External2(C_nlm, function(x) f(x, ...), p, hessian, typsize,
fscale, msg, ndigit, gradtol, stepmax, steptol, iterlim)
}
now when I want to see what is insode C_nlm
I tried
stats:::C_nlm
and I get
$name
[1] "nlm"
$address
<pointer: 0x0000000004a83920>
attr(,"class")
[1] "RegisteredNativeSymbol"
$dll
DLL name: stats
Filename: C:/Program Files/R/R-3.1.2/library/stats/libs/x64/stats.dll
Dynamic lookup: FALSE
$numParameters
[1] 11
attr(,"class")
[1] "ExternalRoutine" "NativeSymbolInfo"
After some web search I found out that I need to use grep after this.
But I am not getting how to use it.
I tried these references
How to locate code called by .External2()?
How can I view the source code for a function?
Can anyone please tell me how to proceed further?

You can browse the R source code at this GitHub repo: r-source.
Search it for the term "SEXP nlm" since stats:::C_nlm points to a function with the name "nlm" and all functions returning data to R use a datatype called SEXP (S expression).
You'll get two hits in the files statsR.h and optimize.c. The c-file is what you are looking for, so go down to the line starting with SEXP nlm and you got it.
SEXP nlm(SEXP call, SEXP op, SEXP args, SEXP rho)
{
SEXP value, names, v, R_gradientSymbol, R_hessianSymbol;
double *x, *typsiz, fscale, gradtl, stepmx,
steptol, *xpls, *gpls, fpls, *a, *wrk, dlt;
int code, i, j, k, itnlim, method, iexp, omsg, msg,
n, ndigit, iagflg, iahflg, want_hessian, itncnt;
/* .Internal(
* nlm(function(x) f(x, ...), p, hessian, typsize, fscale,
* msg, ndigit, gradtol, stepmax, steptol, iterlim)
*/
function_info *state;

Passing mean and standard deviation into dnorm() using Rcpp Sugar

I am converting some R code to Rcpp code and need to calculate the likelihood for a vector of observations given a vector of means and vector of standard deviations. If I assume the means are 0 and the standard deviations 1, I can write this function (running this requires the 'inline' and 'Rcpp' packages to be loaded),
dtest1 = cxxfunction(signature( x = "numeric"),
'Rcpp::NumericVector xx(x);
return::wrap(dnorm(xx, 0.0, 1.0));',
plugin='Rcpp')
and the result is as expected.
> dtest1(1:3)
[1] 0.241970725 0.053990967 0.004431848
However, if I try to make a function
dtest2 = cxxfunction(signature( x = "numeric", y="numeric", z="numeric" ),
'Rcpp::NumericVector xx(x);
Rcpp::NumericVector yy(y);
Rcpp::NumericVector zz(z);
return::wrap(dnorm(xx, yy, zz));',
plugin='Rcpp')
which would allow me to pass in different means and standard deviations results in an error, shown below. Is there a way to make the function I am trying to make, or I do need to program the normal density manually?
Error
Error in compileCode(f, code, language = language, verbose = verbose) :
Compilation ERROR, function(s)/method(s) not created! file31c82bff9d7c.cpp: In function ‘SEXPREC* file31c82bff9d7c(SEXP, SEXP, SEXP)’:
file31c82bff9d7c.cpp:33:53: error: no matching function for call to
‘dnorm4(Rcpp::NumericVector&, Rcpp::NumericVector&, Rcpp::NumericVector&)’
file31c82bff9d7c.cpp:33:53: note: candidates are:
/home/chris/R/x86_64-pc-linux-gnu-library/3.0/Rcpp/include/Rcpp/stats/norm.h:106:1:
note: template<int RTYPE, bool NA, class T> Rcpp::stats::D0<RTYPE, NA, T> Rcpp::dnorm4(const Rcpp::VectorBase<RTYPE, NA, VECTOR>&, bool)
/home/chris/R/x86_64-pc-linux-gnu-library/3.0/Rcpp/include/Rcpp/stats/norm.h:107:1:
note: template<int RTYPE, bool NA, class T> Rcpp::stats::D1<RTYPE, NA, T> Rcpp::dnorm4(const Rcpp::VectorBase<RTYPE, NA, VECTOR>&, double, bool)
/home/chris/R/x86_64-pc-linux-gnu-library/3.0/Rcpp/include/Rcpp/stats/norm.h:108:1:
note: template<int RTYPE, bool NA, class T> Rcpp::stats::D2<RTYPE, NA, T> Rcpp::dnorm4(const Rcpp::VectorBase<RTYPE, NA,
In addition: Warning message:
running command '/usr/lib/R/bin/R CMD SHLIB file31c82bff9d7c.cpp 2> file31c82bff9d7c.cpp.err.txt' had status 1

The sugar dnorm is only vectorized in terms of the first argument.
To simplify (it is slightly more involved, but we don't need to concern ourselves with this yet here), the call
dnorm(xx, 0.0, 1.0)
uses the overload
NumericVector dnorm( NumericVector, double, double )
And the second call tries to use something like
NumericVector dnorm( NumericVector, NumericVector, NumericVector )
which is not implemented. We could implement it, it would have to go high enough in our priority list.
In the meantime, it is easy enough to write a small wrapper, like (this does not handle argument lengths, etc ...) :
NumericVector my_dnorm( NumericVector x, NumericVector means, NumericVector sds){
int n = x.size() ;
NumericVector res(n) ;
for( int i=0; i<n; i++) res[i] = R::dnorm( x[i], means[i], sds[i] ) ;
return res ;
}

Convert RcppArmadillo vector to Rcpp vector

I am trying to convert RcppArmadillo vector (e.g. arma::colvec) to a Rcpp vector (NumericVector). I know I can first convert arma::colvec to SEXP and then convert SEXP to NumericVector (e.g. as<NumericVector>(wrap(temp)), assuming temp is an arma::colvec object). But what is a good way to do that?
I want to do that simply because I am unsure if it is okay to pass arma::colvec object as a parameter to an Rcpp::Function object.

I was trying to Evaluate a Rcpp::Function with argument arma::vec, it seems that it takes the argument in four forms without compilation errors. That is, if f is a Rcpp::Function and a is a arma::vec, then
f(a)
f(wrap(a))
f(as<NumericVector>(wrap(a)))
f(NumericVector(a.begin(),a.end()))
produce no compilation and runtime errors, at least apparently.
For this reason, I have conducted a little test for the four versions of arguments. Since I suspect that somethings will go wrong in garbage collection, I test them again gctorture.
gctorture(on=FALSE)
Rcpp::sourceCpp(code = '
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
double foo1(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(a, b));
}
return sum;
}
// [[Rcpp::export]]
double foo2(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(wrap(a),wrap(b)));
}
return sum;
}
// [[Rcpp::export]]
double foo3(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(as<NumericVector>(wrap(a)),as<NumericVector>(wrap(b))));
}
return sum;
}
// [[Rcpp::export]]
double foo4(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(NumericVector(a.begin(),a.end()),NumericVector(b.begin(),b.end())));
}
return sum;
}
')
# note that when gctorture is on, the program will be very slow as it
# tries to perfrom GC for every allocation.
# gctorture(on=TRUE)
f = function(x,y) {
mean(x) + mean(y)
}
# all three functions should return 700
foo1(c(1,2,3), c(4,5,6), f) # error
foo2(c(1,2,3), c(4,5,6), f) # wrong answer (occasionally)!
foo3(c(1,2,3), c(4,5,6), f) # correct answer
foo4(c(1,2,3), c(4,5,6), f) # correct answer
As a result, the first method produces an error, the second method produces a wrong answer and only the third and the fourth method return the correct answer.
> # they should return 700
> foo1(c(1,2,3), c(4,5,6), f) # error
Error: invalid multibyte string at '<80><a1><e2>'
> foo2(c(1,2,3), c(4,5,6), f) # wrong answer (occasionally)!
[1] 712
> foo3(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
> foo4(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
Note that, if gctorture is set FALSE, then all functions will return a correct result.
> foo1(c(1,2,3), c(4,5,6), f) # error
[1] 700
> foo2(c(1,2,3), c(4,5,6), f) # wrong answer (occasionally)!
[1] 700
> foo3(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
> foo4(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
It means that method 1 and method 2 are subjected to break when garbage is collected during runtime and we don't know when it happens. Thus, it is dangerous to not wrap the parameter properly.
Edit: as of 2017-12-05, all four conversions produce the correct result.
f(a)
f(wrap(a))
f(as<NumericVector>(wrap(a)))
f(NumericVector(a.begin(),a.end()))
and this is the benchmark
> microbenchmark(foo1(c(1,2,3), c(4,5,6), f), foo2(c(1,2,3), c(4,5,6), f), foo
3(c(1,2,3), c(4,5,6), f), foo4(c(1,2,3), c(4,5,6), f))
Unit: milliseconds
expr min lq mean median uq
foo1(c(1, 2, 3), c(4, 5, 6), f) 2.575459 2.694297 2.905398 2.734009 2.921552
foo2(c(1, 2, 3), c(4, 5, 6), f) 2.574565 2.677380 2.880511 2.731615 2.847573
foo3(c(1, 2, 3), c(4, 5, 6), f) 2.582574 2.701779 2.862598 2.753256 2.875745
foo4(c(1, 2, 3), c(4, 5, 6), f) 2.378309 2.469361 2.675188 2.538140 2.695720
max neval
4.186352 100
5.336418 100
4.611379 100
3.734019 100
And f(NumericVector(a.begin(),a.end())) is marginally faster than other methods.

This should works with arma::vec, arma::rowvec and arma::colvec:
template <typename T>
Rcpp::NumericVector arma2vec(const T& x) {
return Rcpp::NumericVector(x.begin(), x.end());
}

I had the same question. I used wrap to do the conversion at the core of several layers of for loops and it was very slow. I think the wrap function is to blame for dragging the speed down so I wish to know if there is an elegant way to do this.
As for Raymond's question, you might want to try including the namespace like: Rcpp::as<Rcpp::NumericVector>(wrap(A)) instead or include a line using namespace Rcpp; at the beginning of your code.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Does unique() preserve order? - r

Related

Memoization code for "Longest Common Substring" doesn't work as expected

Halide: Using constant_exterior() + vectorize() in OpenCL

Source code of nlm function in stats package

Passing mean and standard deviation into dnorm() using Rcpp Sugar

Convert RcppArmadillo vector to Rcpp vector

Categories

Resources