C way to detect and 'separate' a binary expression - r

I have a bottleneck in my code in expressions like any(x >= b | x == y) for a large x.
I'd like to avoid the allocation x >= b | x == y. I've found that it's easy to write a function for particular cases.
SEXP eval_any_or2(SEXP x, SEXP b, SEXP y) {
R_xlen_t N = xlength(x);
if (xlength(y) != N || xlength(b) != 1) {
error("Wrong lengths.");
}
const int *xp = INTEGER(x);
const int *yp = INTEGER(y);
const int *bp = INTEGER(b);
bool o = false;
for (R_xlen_t i = 0; i < N; ++i) {
if (xp[i] >= bp[0] || xp[i] == yp[i]) {
o = true;
break;
}
}
SEXP ans = PROTECT(allocVector(LGLSXP, 1));
LOGICAL(ans)[0] = o ? TRUE : FALSE;
UNPROTECT(1);
return ans;
}
However, for clarity I'd like to keep as much of the natural syntax as possible, like any_or(x >= b, x == y). So I'd like to be able to detect whether a call is of the form <vector> <operator> <vector> when <operator> is one of the standard binary operators, and each <vector> is of equal length vectors length 1. Something like this:
any_or2 <- function(expr1, expr2) {
sexp1 <- substitute(expr1)
sexp2 <- substitute(expr2)
if (!is_binary_sexp(sexp1) || !is_binary_sexp(sexp2) {
# fall through to just basic R
return(any(expr1 | expr2))
}
# In C
eval_any_or2(...) # either the substituted expression or x,y,b
}
I've attempted the following C function which detects whether a substituted expression/call is a binary expression, but (a) I'm having trouble detecting whether the operator is a binary operator and (b) getting the vectors from the expression (x, y, b in the example) to use later (either in the same C function or as passed to a C function like the one above).
#define return_false SEXP ans = PROTECT(allocVector(LGLSXP, 1)); \
LOGICAL(ans)[0] = FALSE; \
UNPROTECT(1); \
return ans; \
SEXP is_binary_sexp(SEXP sx) {
if (TYPEOF(sx) != LANGSXP) {
return_false
}
// does it have three elements?
int len = 0;
SEXP el, nxt;
for (nxt = sx; nxt != R_NilValue || len > 4; el = CAR(nxt), nxt = CDR(nxt)) {
len++;
}
if (len != 3) {
return_false;
}
if (TYPEOF(CAR(sx)) != SYMSXP) {
return_false;
}
SEXP ans = PROTECT(allocVector(LGLSXP, 1));
LOGICAL(ans)[0] = TRUE;
UNPROTECT(1);
return ans;
}
In R I would write something like:
is_binary_sexp_R <- function(sexprA) {
# sexprA is the result of substitute()
is.call(sexprA) &&
length(sexprA) == 3L &&
match(as.character(sexprA[[1]]), c("!=", "==", "<=", ">=", "<", ">"), nomatch = 0L) &&
is.name(lhs <- sexprA[[2L]])
}
but I'd like to do as much as possible in C.

Related

Recursion with memorization gives TLE for c++ while the same logic written in python passes all the test cases?

Longest Palindromic Subsequence problem:
C++:
class Solution {
public:
vector<vector<int>> dp;
Solution(){
dp = vector<vector<int>>(1001, vector<int>(1001, -1));
}
int lps(string s, int i, int j){
if(i == j)
return 1;
if(i>j)
return 0;
if(dp[i][j] != -1)
return dp[i][j];
if(s[i] == s[j])
return dp[i][j]= 2 + lps(s, i+1, j-1);
else
return dp[i][j]= max(lps(s,i+1,j), lps(s,i,j-1));
}
int longestPalindromeSubseq(string s) {
return lps(s, 0, s.size()-1);
}
};
Gives TLE
Python code:
class Solution(object):
def lps(self, s, i, j, dp):
if i == j:
return 1
if i> j:
return 0
if dp[i][j] != -1:
return dp[i][j];
if s[i] == s[j]:
dp[i][j]= 2 + self.lps(s, i+1, j-1, dp)
else:
dp[i][j]= max(self.lps(s, i+1, j, dp), self.lps(s, i, j-1, dp))
return dp[i][j]
def longestPalindromeSubseq(self, s):
dp = [[-1 for x in range(len(s))] for y in range(len(s))]
ans= self.lps(s, 0, len(s) -1, dp)
return ans
Passes all the test cases in leetcode.
Can anyone please help me understand this behavior?
Thanks in advance.
In C++ version you are passing string by value in function.(i.e. a new copy of string is made a each function call).
In python version since strings by default are immutable they are passed by reference.
So to make your code work in C++ just do int lps(string& s, int i, int j)
string &s here ensures that string get passed by reference.

TLE on UVA 10776 - Determine The Combination

I tried this problem by backtracking and did some optimization, though I am getting TLE. what further optimization can I do on this code?
Abridged problem statement - Task is to print all different r combinations of a string s (a r combination of a string s is a collection of exactly r letters from different positions in s).There may be different permutations of the same combination; consider only the one that has its r
characters in non-decreasing order. If s = "abaa" and s = 3.Then output should be (aaa,aab).
My code(in c)
int compare_chars(const void* a, const void* b);
char s[50];
int len;
int r ;
char combination[50];
void combinate(int index,int at)
{
if(at == r)
{
combination[at] = '\0';
puts(combination);
return ;
}
int i = index+1;
for ( ; i <= len-r+at ;)
{
char temp = s[I];
combination[at] = temp;
combinate(i,at+1);
while(s[i] == temp and i <= len-r+at)
i++;
}
return ;
}
int solve()
{
while ((scanf("%s %i",s,&r)) == 2)
{
len = strlen(s);
if(len == r)
{
printf("%s\n",s);
continue;
}
qsort(s,len,sizeof(char),compare_chars);
combinate(-1,0);
}
return 0;
}
int main()
{
int a = 1;
int t = 1;
while (a <= t)
{
int kk = solve();
}
return 0;
}
int compare_chars(const void* a, const void* b)
{
char arg1 = *(const char*)a;
char arg2 = *(const char*)b;
if (arg1 < arg2) return -1;
if (arg1 > arg2) return 1;
return 0;
}

Rewriting R's cummin() function using Rcpp and allowing for NAs

I'm learning Rcpp. In this example, I'm attempting to roll my own cummin() function like base R's cummin(), except I'd like my version to have a na.rm argument. This is my attempt
cummin.cpp
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector cummin_cpp(NumericVector x, bool narm = false){
// Given a numeric vector x, returns a vector of the
// same length representing the cumulative minimum value
// if narm = true, NAs will be ignored (The result may
// contain NAs if the first values of x are NA.)
// if narm = false, the resulting vector will return the
// cumulative min until the 1st NA value is encountered
// at which point all subsequent entries will be NA
if(narm){
// Ignore NAs
for(int i = 1; i < x.size(); i++){
if(NumericVector::is_na(x[i]) | (x[i-1] < x[i])) x[i] = x[i-1];
}
} else{
// Don't ignore NAs
for(int i = 1; i < x.size(); i++){
if(NumericVector::is_na(x[i-1]) | NumericVector::is_na(x[i])){
x[i] = NA_REAL;
} else if(x[i-1] < x[i]){
x[i] = x[i-1];
}
}
}
return x;
}
foo.R
library(Rcpp)
sourceCpp("cummin.cpp")
x <- c(3L, 1L, 2L)
cummin(x) # 3 1 1
cummin_cpp(x) # 3 1 1
class(cummin(x)) # integer
class(cummin_cpp(x)) # numeric
I have a few questions..
R's standard variable name is na.rm, not narm as I've done. However, it seems I can't use a dot in the c++ variable name. Is there a way around this so I can be consistent with R's convention?
I don't know ahead of time if the user's input is going to be a numeric vector or an integer vector, so I've used Rcpp's NumericVector type. Unfortunately, if the input is integer, the output is cast to numeric unlike base R's cummin() behavior. How do people usually deal with this issue?
The line if(NumericVector::is_na(x[i]) | (x[i-1] < x[i])) x[i] = x[i-1]; seems silly, but I don't know a better way to do this. Suggestions here?
I would use this:
template<typename T, int RTYPE>
Vector<RTYPE> cummin_cpp2(Vector<RTYPE> x, bool narm){
Vector<RTYPE> res = clone(x);
int i = 1, n = res.size();
T na;
if(narm){
// Ignore NAs
for(; i < n; i++){
if(ISNAN(res[i]) || (res[i-1] < res[i])) res[i] = res[i-1];
}
} else{
// Do not ignore NAs
for(; i < n; i++){
if(ISNAN(res[i-1])) {
na = res[i-1];
break;
} else if(res[i-1] < res[i]){
res[i] = res[i-1];
}
}
for(; i < n; i++){
res[i] = na;
}
}
return res;
}
// [[Rcpp::export]]
SEXP cummin_cpp2(SEXP x, bool narm = false) {
switch (TYPEOF(x)) {
case INTSXP: return cummin_cpp2<int, INTSXP>(x, narm);
case REALSXP: return cummin_cpp2<double, REALSXP>(x, narm);
default: Rcpp::stop("SEXP Type Not Supported.");
}
}
Try this on:
x <- c(NA, 7, 5, 4, NA, 2, 4)
x2 <- as.integer(x)
cummin_cpp(x, narm = TRUE)
x
cummin_cpp(x2)
x2
x <- c(NA, 7, 5, 4, NA, 2, 4)
x2 <- as.integer(x)
x3 <- replace(x, is.na(x), NaN)
cummin_cpp2(x, narm = TRUE)
x
cummin_cpp2(x2)
x2
cummin_cpp2(x3)
x3
Explanation:
Joran's advice is good, just wrap that in an R function
I use a dispatcher as Joseph Wood suggested
Beware that x is passed by reference and is modified if of the same type of what you declared (see these 2 slides)
You need to handle NA as well as NaN
You can use || instead of | to evaluate only the first condition if it is true.

Apply functions instead of for loop in R

I am novice in R. I want to know how we can write the below for loop in an efficient way. I am getting correct answer by the below code for small dataset.
data <- data.frame(x1=c(rep('a',12)),
x2=c(rep('b',12)),
x3=c(rep(as.Date('2017-03-09'),4),rep(as.Date('2017-03-10'),4),rep(as.Date('2017-03-11'),4)),
value1= seq(201,212),
x4=c(as.Date('2017-03-09'),as.Date('2017-03-10'),as.Date('2017-03-11'),as.Date('2017-03-12')
,as.Date('2017-03-10'),as.Date('2017-03-11'),as.Date('2017-03-12'),as.Date('2017-03-13')
,as.Date('2017-03-11'),as.Date('2017-03-12'),as.Date('2017-03-13'),as.Date('2017-03-14')),
value2= seq(101,112), stringsAsFactors = FALSE)
Below for loop script:
for (i in 1:length(data$x3)){
print(i)
if (!is.na(data$x4[i])){
if(data$x4[i] == data$x3[i] && data$x2[i]==data$x2[i] && data$x1[i]==data$x1[i]){
data$diff[i] <- data$value1[i] - data$value2[i]
}
else{
print("I am in else")
for (j in 1:length(data$x3)){
print(c(i,j))
# print(a$y[i])
if(data$x4[i]==data$x3[j] && data$x1[i]==data$x1[j] && data$x2[i]==data$x2[j]){
# print(a$x[j])
data$diff[i] <- data$value1[j] - data$value2[i]
break
}
}
}
}
}
If you want performance, the answer is often Rcpp.
Translating your R code in Rcpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector f_Rcpp(List data) {
StringVector x1 = data["x1"];
StringVector x2 = data["x2"];
NumericVector x3 = data["x3"];
NumericVector x4 = data["x4"];
NumericVector value1 = data["value1"];
NumericVector value2 = data["value2"];
int n = value1.size();
NumericVector diff(n, NA_REAL);
int i, j;
for (i = 0; i < n; i++) {
Rprintf("%d\n", i);
if (x4[i] != NA_REAL) {
if (x4[i] == x3[i]) {
diff[i] = value1[i] - value2[i];
} else {
Rprintf("I am in else\n");
for (j = 0; j < n; j++) {
Rprintf("%d %d\n", i, j);
if (x4[i] == x3[j] && x1[i] == x1[j] && x2[i] == x2[j]) {
diff[i] = value1[j] - value2[i];
break;
}
}
}
}
}
return diff;
}
/*** R
f_Rcpp(data)
*/
Put that in a .cpp file and source it.
You can do this:
data$diff <- sapply(seq_along(data$x3), function(i) {
if (!is.na(data$x4[i])){
ind <- which(data$x4[i] == data$x3 & data$x1[i] == data$x1 & data$x2[i] == data$x2)
j <- `if`(i %in% ind, i, min(ind))
data$value1[j] - data$value2[i]
} else {
NA
}
})
Beware in your code, if column $diff doesn't exist yet, doing data$diff[1] <- 100 will put all the values of the column at 100.

Should SEXP function args be PROTECTed when put inside an Rcpp::Xptr?

Look at the (oversimplified) Rcpp + R code below :
test.cpp :
#include <Rcpp.h>
using namespace Rcpp;
class VecWrap{
public:
SEXP vector;
int type;
VecWrap(SEXP vector)
{
this->vector = vector;
this->type = TYPEOF(vector);
if(this->type != INTSXP && this->type != REALSXP)
stop("invalid type");
}
bool contains(double val){
if(type == INTSXP){
IntegerVector v = vector;
for(int i = 0; i < v.size(); i++)
if(v[i] == val)
return true;
}else if(type == REALSXP){
NumericVector v = vector;
for(int i = 0; i < v.size(); i++)
if(v[i] == val)
return true;
}
return false;
}
};
// [[Rcpp::export]]
SEXP createVecWrap(SEXP x) {
VecWrap* w = new VecWrap(x);
return XPtr< VecWrap >(w);
}
// [[Rcpp::export]]
SEXP vecWrapContains(XPtr< VecWrap > w, double val){
return wrap(w->contains(val));
}
test.R :
library(Rcpp)
sourceCpp(file='test.cpp')
v <- 1:10e7
w <- createVecWrap(v)
vecWrapContains(w, 10000) # it works
# remove v and call the garbage collector
rm(v)
gc()
vecWrapContains(w, 10000) # R crashes (but it works with small vector "v")
Basically I put inside the custom class VecWrap the SEXP vector received as argument of createVecWrap function, in order to use it later.
But, as explained by the comments in the code, if I remove the vector v from the R-side and call the garbage collector, the R process crashes when I try to access the vector.
Should the vector be protected by the GC in someway ? If so, how? (Rcpp-style if possible)
Generally speaking you should try to stick to the C++ type system / Rcpp classes as much as possible (re: avoid handling SEXP directly if possible). However, the RObject class will provide your SEXP with protection from the garbage collector, and seems to work in this case:
#include <Rcpp.h>
class VecWrap {
public:
Rcpp::RObject vector;
int type;
VecWrap(SEXP vector_)
: vector(vector_)
{
type = vector.sexp_type();
if (type != INTSXP && type != REALSXP) {
Rcpp::stop("invalid type");
}
}
bool contains(double val) {
if (type == INTSXP){
Rcpp::IntegerVector v = Rcpp::as<Rcpp::IntegerVector>(vector);
for (int i = 0; i < v.size(); i++) {
if (v[i] == val) return true;
}
} else if (type == REALSXP) {
Rcpp::NumericVector v = Rcpp::as<Rcpp::NumericVector>(vector);
for (int i = 0; i < v.size(); i++) {
if (v[i] == val) return true;
}
}
return false;
}
};
// [[Rcpp::export]]
Rcpp::XPtr<VecWrap> createVecWrap(SEXP x) {
return Rcpp::XPtr<VecWrap>(new VecWrap(x));
}
// [[Rcpp::export]]
bool vecWrapContains(Rcpp::XPtr<VecWrap> w, double val) {
return w->contains(val);
}
v <- 1:10e7
w <- createVecWrap(v)
vecWrapContains(w, 10000)
# [1] TRUE
rm(v)
gc()
# used (Mb) gc trigger (Mb) max used (Mb)
# Ncells 366583 19.6 750400 40.1 460000 24.6
# Vcells 100559876 767.3 145208685 1107.9 100560540 767.3
vecWrapContains(w, 10000)
# [1] TRUE
Unrelated: consider using { } for your control flow structures, and don't get carried away with this->; both of those will improve the readability of your code IMO.

Resources