L1 regularisation in cplex - convex-optimization

I am trying to perform optimization which uses the L1 regularisation method.
However, I am using cplex and I do not see an obvious way of performing L1 regularisation when I use cplex. Can someone please help?

Let me start with the example curve fitting from Model Building
Without regularization:
int n=...;
range points=1..n;
float x[points]=...;
float y[points]=...;
// y== b*x+a
dvar float a;
dvar float b;
minimize sum(i in points) (b*x[i]+a-y[i])^2;
subject to
The Lasso Version (L1 regularisation) would be:
int n=...;
range points=1..n;
float x[points]=...;
float y[points]=...;
float lambda=0.1;
// y== b*x+a
dvar float a;
dvar float b;
minimize sum(i in points) (b*x[i]+a-y[i])^2+lambda*(abs(a)+abs(b));
subject to


Segmentation fault for big input (recursive function)

`#include <bits/stdc++.h>
using namespace std;
#define ll long long
ll solve(ll a, ll b, ll i){
//base case
if (a == 0) return i;
if (b > a) return i+1;
//recursive case
if (b == 1) {
return solve(a,b+1,i+1);
ll n = solve(a, b+1, i+1);
ll m = solve(a/b, b, i+1);
return min(n,m);
int main(){
int t;
cin >> t;
ll a, b;
cin >> a >> b;
cout << solve(a, b, 0)<< endl;
Basically question is from codeforces (1485A). The problem is that when I give some big input like 50000000 a and 5 for b, this gives me segmentation fault error while the code works fine for smaller inputs. Please help me solve it.
Using recursion is a terrible choice. And you need to make all obvious algorithmic optimizations.
The key insight is that for any path that divides before increasing b, there is a path that is as good or better that does not divide before increasing b. Why divide by a smaller number when you can divide by a bigger one if you're going to use the steps to increase the number anyway?
With that insight, and removing recursion, the problem is trivial to solve:
#include <iostream>
unsigned long long divisions(unsigned long long a, unsigned long long b)
// figure out how many divide operations we need
int ops = 0;
while (a > 0)
return ops;
unsigned long long ops(unsigned long long a, unsigned long long b)
// figure out how many divides we need with the smallest possible b
unsigned long long min_ops = (b == 1) ? (1 + divisions(a, b+1)) : divisions(a, b);
// try every sensible larger b to see if it takes fewer operations
for (unsigned long long num_inc = 1; num_inc <= min_ops; ++num_inc)
unsigned long long ops = num_inc + divisions (a, b + num_inc);
if (ops < min_ops)
min_ops = ops;
return min_ops;
int main(void)
int t;
std::cin >> t;
while (t--)
unsigned long long a, b;
std::cin >> a >> b;
std::cout << ops(a, b) << std::endl;
Again, the lesson is that you must make algorithmic optimizations before you start coding. No amount of great coding will make a terrible algorithm work well.
By the way, there was a huge hint on the problem page. Something in the problem tags gives the key optimization away.

Method in a contraint

I've a cplex constraint in the form of a a binary variable multiply for a number >= to another number.
The second number is complex to calculate, I think I need a method to compute it, it is possible in cplex write a constraint like this:
k*y[i] > method(parameter1,parameter2)
In the method I need to access to binary variables values.
Thanks a lot for replies.
Let me try this oulipo challenge.
Write an OPL models that works and that contains what you wrote.
Could this help?
float k=1.2;
dvar boolean y[1..1];
int parameter1=1;
int parameter2=2;
dvar boolean x;
dexpr float method[i in 1..10,j in 1..10]=x*(i+j);
subject to
forall(i in 1..1)
k*y[i] >= method[parameter1,parameter2];
PS: with your later comments:
float k=1.2;
dvar boolean y[1..1];
int parameter1=1;
int parameter2=2;
dvar boolean x;
float methodresults[i in 1..10,j in 1..10]; //=x*(i+j);
range r=1..10;
function method(i,j)
return i+j;
for(var i in r) for (var j in r) methodresults[i][j]=method(i,j);
subject to
forall(i in 1..1)
k*y[i] >= x*methodresults[parameter1,parameter2];
If you are using a script in a .mod file, then you can define a function whithin an execute block [1]. These blocks define pre-processing or post-processing instructions written in ILOG Script [2]. Here's a trivial example from the documentation at https://www.ibm.com/support/knowledgecenter/SSSA5P_12.9.0/ilog.odms.ide.help/OPL_Studio/opllangref/topics/opl_langref_script_struct_statements_function.html.
execute {
function add(a, b) {
return a+b
[1] https://www.ibm.com/support/knowledgecenter/SSSA5P_12.9.0/ilog.odms.ide.help/OPL_Studio/opllanguser/topics/opl_languser_script_intro_presynt.html
[2] https://www.ibm.com/support/knowledgecenter/SSSA5P_12.9.0/ilog.odms.ide.help/OPL_Studio/opllanguser/topics/opl_languser_script.html

Recursive Vs iterative Traversal of a BST

If I do recursive traversal of a binary tree of N nodes, it will occupy N spaces in execution stack.
If i use iteration , i will have to use N spaces in an explicit stack.
Question is do we say that recursive traversal is also using O(N) space complexity like iterative one is using?
I am talking in terms of running traversal code on some platform which bounds me by memory limits.
Also i am not talking of directly implementing iteration (in which one can say either of the approaches is fine), I am implementing algorithm for KthSmallestElement() in a BST which uses sort of traversal through the BST.
Should i use iterative approach or recursive approach in terms of space complexity, so that my code doesn't fail in space limits?
Putting it clearly:
Here is what i implemented:
int Solution::kthsmallest(TreeNode* root, int k) {
stack<TreeNode *> S;
return root->val;
Here is what my friend implemented:
class Solution {
int find(TreeNode* root, int &k) {
if (!root) return -1;
// We do an inorder traversal here.
int k1 = find(root->left, k);
if (k == 0) return k1; // left subtree has k or more elements.
if (k == 0) return root->val; // root is the kth element.
return find(root->right, k); // answer lies in the right node.
int kthsmallest(TreeNode* root, int k) {
return find(root, k); // Call another function to pass k by reference.
SO Which of the two is better & how?
If you care about memory use, you should try to ensure that your tree is balanced, i.e. that its depth is smaller than the number of nodes. A perfectly balanced binary tree with N nodes has depth log2N (rounded up).
It is important because the memory needed to visit all nodes in a binary tree is proportional to the depth of the tree, not to the number of nodes as you erroneously think; the recursive or iterative program needs to "remember" the path from the root to the current node, not other previously visited nodes.

Rcpp keeps running for a seemingly simple task

I've been thinking about it all day and still cannot figure out why this happens. My objective is simple: STEP1, generate a function S(h,p); STEP2, numerically integrate S(h,p) with respect to p by trapezoidal rule and obtain a new function SS(h). I wrote the code and source it by sourceCpp, and it successfully generated two functions S(h,p) and SS(h) in R. But when I tried to test it by calculating SS(1), R just kept running and never gave the result, which is weird because the calculation amount is not that big. Any idea why this would happen?
My code is here:
#include <Rcpp.h>
using namespace Rcpp;
//generate the first function that gives S(h,p)
// [[Rcpp::export]]
double S(double h, double p){
double out=2*(h+p+h*p);
return out;
//generate the second function that gives the numerically integreation of S(h,p) w.r.t p
double SS(double h){
double out1=0;
double sum=0;
for (int i=0;i<1;i=i+0.01){
return out1;
The problem is that you are treating i as if it were not an int in this statement:
for (int i=0;i<1;i=i+0.01){
After each iteration you are attempting to add 0.01 to an integer, which is of course immediately truncated towards 0, meaning that i is always equal to zero, and you have an infinite loop. A minimal example highlighting the problem, with a couple of possible solutions:
#include <Rcpp.h>
// [[Rcpp::export]]
void bad_loop() {
for (int i = 0; i < 1; i += 0.01) {
std::printf("i = %d\n", i);
// [[Rcpp::export]]
void good_loop() {
for (int i = 0; i < 100; i++) {
std::printf("i = %d\n", i);
// [[Rcpp::export]]
void good_loop2() {
for (double j = 0.0; j < 1.0; j += 0.01) {
std::printf("j = %.2f\n", j);
The first alternative (good_loop) is to scale your step size appropriately -- looping from 0 through 99 by 1 takes the same number of iterations as looping from 0 to 0.99 by 0.01. Additionally, you could just use a double instead of an int, as in good_loop2. At any rate, the main takeaway here is that you need to be more careful about choosing your variable types in C++. Unlike R, when you declare i to be an int it will be treated like an int, not a floating point number.
As #nrussell pointed out very expertly, there is an issue with treating i as an int when the type held is a double. The goal of posting this answer is to stress the need to avoid using a double or float as a loop incrementer. I've opted to post it as an answer instead of a comment for readability.
Please note, the loop increment should not ever be given as a double or a float due to precision issues. e.g. it is hard to get i = .99 since i = 0.981111111 et cetera...
Instead, I would opt to have the loop be processed as an int and convert it to a double / float as soon as possible, e.g.
for (int i=0; i < 100; i++){
// Make sure to use double division
// (e.g. either numerator or denominator is a floating / double)
sum += S(h, i/100.0);
Further notes:
RcppArmadillo and C++ division issue
Using float / double as a loop variable

boosting parallel reduction OpenCL

I have an algorithm, performing two-staged parallel reduction on GPU to find the smallest elemnt in a string. I know that there is a hint on how to make it work faster, but I don't know what it is. Any ideas on how I can tune this kernel to speed my program up? It is not necessary to actually change algorithm, may be there are other tricks. All ideas are welcome.
Thank you!
void reduce(__global float* buffer,
__local float* scratch,
__const int length,
__global float* result) {
int global_index = get_global_id(0);
float accumulator = INFINITY
while (global_index < length) {
float element = buffer[global_index];
accumulator = (accumulator < element) ? accumulator : element;
global_index += get_global_size(0);
int local_index = get_local_id(0);
scratch[local_index] = accumulator;
for(int offset = get_local_size(0) / 2;
offset > 0;
offset = offset / 2) {
if (local_index < offset) {
float other = scratch[local_index + offset];
float mine = scratch[local_index];
scratch[local_index] = (mine < other) ? mine : other;
if (local_index == 0) {
result[get_group_id(0)] = scratch[0];
accumulator = (accumulator < element) ? accumulator : element;
Use fmin function - it is exactly what you need, and it may result in faster code (call to built-in instruction, if available, instead of costly branching)
global_index += get_global_size(0);
What is your typical get_global_size(0)?
Though your access pattern is not very bad (it is coalesced, 128byte chunks for 32-warp) - it is better to access memory sequentially whenever possible. For instance, sequential access may aid memory prefetching (note, OpenCL code can be executed on any device, including CPU).
Consider following scheme: each thread would process range
[ get_global_id(0)*delta , (get_global_id(0)+1)*delta )
It will result in fully sequential access.
