in VRPTW, how to make the constraints soft instead of being strictly hard when doing the routing? - constraints

as the title indicates, when incorporating time window, vehicle constraints for nodes, in some scenarios the constraints are too strict and I am unable to produce an output.
How can I make the constraints optional or soft but reflect that in a cost (utility) function, so I can rank my solutions?
Used combinations of constraints to solve the VRPTW but it turned out to be unsolvable, how to make it solvable but reflect the degree to which I am violating the constraints?

Please try to use
void RoutingDimension::SetCumulVarSoftUpperBound(
int64_t index,
int64_t upper_bound,
int64_t coefficient);
ref: https://github.com/google/or-tools/blob/f460e9b0fcd444c37878ec64be9822d40fb375f4/ortools/constraint_solver/routing.h#L2905-L2914
and/or
void RoutingDimension::SetCumulVarSoftLowerBound(
int64_t index,
int64_t upper_bound,
int64_t coefficient);
ref: https://github.com/google/or-tools/blob/f460e9b0fcd444c37878ec64be9822d40fb375f4/ortools/constraint_solver/routing.h#L2927-L2937
note: Supposing you have
0 ---- [min_hard -- [min_soft --- max_soft] -- max_hard] --- vehicle_capacity
You could use (in Python)
index = manager.NodeToIndex(42)
time_dimension = routing.GetDimensionOrDie('Time')
time_dimension.CumulVar(index).SetRange(min_hard, max_hard)
penalty = 100
time_dimension.SetCumulVarSoftLowerBound(index, min_soft, penalty)
time_dimension.SetCumulVarSoftUpperBound(index, max_soft, penalty)
if vehicle visit index at max_soft + k then the objective will have k * penalty extra cost.

Related

What unit is `getFitnessScore()` in the IterativeClosestPoint class from PCL returning?

I use the pcl::IterativeClosestPoint method from the Point-Cloud-Library.
As of right now it seems that the documentation of it is offline.
Here in google cache. And also a tutorial.
There is a possibility to call icp.getFitnessScore() to get the mean squared distances from the points of the two clouds. I just can't find information on what kind of unit this is indicated. Does anyone knows what the number I get there means? For example output for me was: 0,0003192. This seems to be low, but I have no clue if it is meters, centimeters, feet, or whatever.
Thank you very much.
what kind of unit is icp.getFitnessScore() used?
Like Joy said in his comment, the unit is the same as your input data.
For example, your input point cloud might comes from a obj file. And a point will be stored like v 9.322 -1.0778 0.44997. The number returned by icp.getFitnessScore() will have the same unit as the point's coordinate.
Does anyone knows what the number I get there means?
The number you get represents the mean squared distance from each point in source to its closest point in target.
That is to say, if you assume every point in source has a corresponding point in target, and the correspondence set comes from closest point data association, then the number represents the mean squared distance between all correspondences. That can be seen from the source code below.
To make more sense of the function, you might want to filter out correspondences that have a large distance between them. (The two point cloud might only partially overlap.) And the function actually has an optional parameter max_range that does this.
The method getFitnessScore() is defined in pcl::Registration, the base class of pcl::IterativeClosestPoint. The optional parameter max_range is defaulted to be std::numeric_limits<double>::max(), as you can see in the definition:
/** \brief Obtain the Euclidean fitness score (e.g., sum of squared distances from the source to the target)
* \param[in] max_range maximum allowable distance between a point and its correspondence in the target
* (default: double::max)
*/
inline double
getFitnessScore (double max_range = std::numeric_limits<double>::max ());
And the source code of this function is:
template <typename PointSource, typename PointTarget, typename Scalar> inline double
pcl::Registration<PointSource, PointTarget, Scalar>::getFitnessScore (double max_range)
{
double fitness_score = 0.0;
// Transform the input dataset using the final transformation
PointCloudSource input_transformed;
transformPointCloud (*input_, input_transformed, final_transformation_);
std::vector<int> nn_indices (1);
std::vector<float> nn_dists (1);
// For each point in the source dataset
int nr = 0;
for (size_t i = 0; i < input_transformed.points.size (); ++i)
{
// Find its nearest neighbor in the target
tree_->nearestKSearch (input_transformed.points[i], 1, nn_indices, nn_dists);
// Deal with occlusions (incomplete targets)
if (nn_dists[0] <= max_range)
{
// Add to the fitness score
fitness_score += nn_dists[0];
nr++;
}
}
if (nr > 0)
return (fitness_score / nr);
else
return (std::numeric_limits<double>::max ());
}

Simple low pass filter in fixed point

I have a simple circuit setup to read the light level via an LDR into an Arduino. I'm trying to implement a simple low pass filter to data read in. How best to tackle this given that analogRead() returns an unsigned int.
I have tried to implement a simple fixed point representation but am unsure if this is the correct approach.
Here's a code snippet:
#define WLPF 0.1
#define FIXED_SHIFT 4
ldr_val = ((int)analogRead(A0)) << FIXED_SHIFT;
while (true) {
int newval = (int)analogRead(A0) << FIXED_SHIFT;
ldr_val += WLPF*(newval - ldr_val);
Serial.println(ldr_val >> FIXED_SHIFT, DEC);
}
Note the resolution of the ADC is 10 bits and I am working with an 8-bit Arduino Micro.
I'm paraphrasing from the book "Musical Applications of Microprocessors" by Hal Chamberlin, page 438:
If you allow large numbers in the accumulator, then you can make a first-order low-pass filter with one multiplication and some right-shifts.
out = accum >> k
accum = accum - out + in
Choose 'k' to change the cutoff frequency. The more shifts, the lower the low-pass cutoff, but the larger the value in the accumulator. With a 10-bit value from analog_read(), you can easily right-shift 4 places, and still have 2 bits of headroom in the accumulator (as #datafiddler noted above).
Cypress has some app-notes for their PSOC chips with similar equations, and using shifts. I remember one had a nice table that related number of shifts to the cutoff frequency.
The approximate cutoff frequency is the sampling frequency divided by 2-pi times the gain factor:
f0 ~ fs / (2 pi a)
where 'a' is that power of two.
Keep smoothin' those signals!
On a device with no FPU rather then multiplying by 0.1 (which in any case make this a floating not fixed point implementation) you should divide by 10:
#define WLPF_DIV 10
...
ldr_val += (newval - ldr_val) / WLPF_DIV;
However division on an 8 bit processor is often expensive (although probably dwarfed by the execution time of Serial.println() in the loop - but that is a different issue). Instead it is more efficient to select a power of two so that the division can be performed with a right-shift.
#define WLPF_SHIFT 3 // divide by 8
...
ldr_val += (newval - ldr_val) >> WLPF_SHIFT ;
The use of signed int is problematic since right-shift of a signed type is undefined behaviour. In this case this can be resolved by changing the code to:
#define WLPF_DIV 8
...
ldr_val += (newval - ldr_val) / WLPF_DIV ;
The compiler will most likely spot the power-of-two constant and generate the code using an arithmetic-shift-right in any case. However you would probably do better to reconsider the data type.
You still have a right-shift in the Serial.println() call, but that too could by replaced with a divide-by-16:
#define WLPF_DIV 8
#define FIXED_MUL 16
ldr_val = (int)analogRead(A0) * FIXED_MUL ;
for(;;)
{
int newval = (int)analogRead(A0) * FIXED_MUL ;
ldr_val += (newval - ldr_val) / WLPF_DIV
Serial.println(ldr_val / FIXED_MUL, DEC);
}
Non-deterministic output of the data on a per sample basis is not going to make for a very accurate filter and will dominate the timing in any case so you have little control over the frequency response and it will not be stable. It also makes the previous performance optimisations rather pointless. You may want to think about that if it is important in your application - but that is a different question.
Stick with integer arithmetics:
#define WLPF 9
filtered = ((long)filtered * WLPF + newValue) / (WLPF + 1);

Optimizing mask function with ARM SIMD instructions

I was wondering if you could help me use NEON intrinsics to optimize this mask function. I already tried to use auto-vectorization using the O3 gcc compiler flag but the performance of the function was smaller than running it with O2, which turns off the auto-vectorization. For some reason the assembly code produced with O3 is 1,5 longer than the one with O2.
void mask(unsigned int x, unsigned int y, uint32_t *s, uint32_t *m)
{
unsigned int ixy;
ixy = xsize * ysize;
while (ixy--)
*(s++) &= *(m++);
}
Probably I have to use the following commands:
vld1q_u32 // to load 4 integers from s and m
vandq_u32 // to execute logical and between the 4 integers from s and m
vst1q_u32 // to store them back into s
However i don't know how to do it in the most optimal way. For instance should I increase s,m by 4 after loading , anding and storing? I am quite new to NEON so I would really need some help.
I am using gcc 4.8.1 and I am compiling with the following cmd:
arm-linux-gnueabihf-gcc -mthumb -march=armv7-a -mtune=cortex-a9 -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=neon -O3 -fprefetch-loop-arrays name.c -o name
Thanks in advance
I would probably do it like this. I've included 4x loop unrolling. Preloading the cache is always a good idea and can speed things up another 25%. Since there's not much processing going on (it's mostly spending time loading and storing), it's best to load lots of registers, then process them as it gives time for the data to actually load. It assumes the data is an even multiple of 16 elements.
void fmask(unsigned int x, unsigned int y, uint32_t *s, uint32_t *m)
{
unsigned int ixy;
uint32x4_t srcA,srcB,srcC,srcD;
uint32x4_t maskA,maskB,maskC,maskD;
ixy = xsize * ysize;
ixy /= 16; // process 16 at a time
while (ixy--)
{
__builtin_prefetch(&s[64]); // preload the cache
__builtin_prefetch(&m[64]);
srcA = vld1q_u32(&s[0]);
maskA = vld1q_u32(&m[0]);
srcB = vld1q_u32(&s[4]);
maskB = vld1q_u32(&m[4]);
srcC = vld1q_u32(&s[8]);
maskC = vld1q_u32(&m[8]);
srcD = vld1q_u32(&s[12]);
maskD = vld1q_u32(&m[12]);
srcA = vandq_u32(srcA, maskA);
srcB = vandq_u32(srcB, maskB);
srcC = vandq_u32(srcC, maskC);
srcD = vandq_u32(srcD, maskD);
vst1q_u32(&s[0], srcA);
vst1q_u32(&s[4], srcB);
vst1q_u32(&s[8], srcC);
vst1q_u32(&s[12], srcD);
s += 16;
m += 16;
}
}
I would start with the simplest one and take it as a reference for compare with future routines.
A good rule of thumb is to calculate needed things as soon as possible, not exactly when needed.
This means that instructions can take X cycles to execute, but the results are not always immediately ready, so scheduling is important
As an example, a simple scheduling schema for your case would be (pseudocode)
nn=n/4 // Assuming n is a multiple of 4
LOADI_S(0) // Load and immediately after increment pointer
LOADI_M(0) // Load and immediately after increment pointer
for( k=1; k<nn;k++){
AND_SM(k-1) // Inner op
LOADI_S(k) // Load and increment after
LOADI_M(k) // Load and increment after
STORE_S(k-1) // Store and increment after
}
AND_SM(nn-1)
STORE_S(nn-1) // Store. Not needed to increment
Leaving out these instructions from the inner loop we achieve that the ops inside don't depend on the result of the previous op.
This schema can be further extended in order to take profit of the time that otherwise would be lost waiting for the result of the previous op.
Also, as intrinsics still depend on the optimizer, see what does the compiler do under different optimization options. I prefer to use inline assembly, which is not difficult for small routines, and give you more control.

Integer polynomial interpolation (or fast select case)

Let x in {10, 37, 96, 104} set.
Let f(x) a "select case" function:
int f1(int x) {
switch(x) {
case 10: return 3;
case 37: return 1;
case 96: return 0;
case 104: return 1;
}
assert(...);
}
Then, we can avoid conditional jumps writing f(x) as a "integer polynomial" like
int f2(int x) {
// P(x) = (x - 70)^2 / 1000
int q = x - 70;
return (q * q) >> 10;
}
In some cases (still including mul operations) would f2 better than f1 (eg. large conditional evaluations).
Are there methods to find P(x) from a switch injection?
Thank you very much!
I suggest you start reading the Wikipedia page about Polynomial Interpolation, if you do not know how to calculate the interpolation polynomial.
Note, that not all calculation methods are suitable for practical application, because of numerical issues (e.g. divisions in the Lagrange version). I am confident that you shold be able to find a libary providing this functionality. Note that the construction will take some time too, hence this makes only sence if your function will be called quite frequently.
Be aware that integer function values and integer points of support do not imply integer coefficients for your polynomial! Thus, in the general case, you will require O(n) floating point operations, and finally a round toward the nearest integer. It may depend on your input wether the interpolation method is reliable and faster than the approach using switch.
Further, I want to propose a differnt solution, assuming that n is rather large. Why dont you put your entries (the pairs (10,3), (37,1), (96,0), (104,1) for your example) inside a serchtree (e.g. std::map in C++ or SortedDictionary in C#)? Thus, your query cost would reduce from linear to O(log n)!

number squared in programming

I know this is probably a very simple question but how would I do something like
n2 in a programming language?
Is it n * n? Or is there another way?
n * n is the easiest way.
For languages that support the exponentiation operator (** in this example), you can also do n ** 2
Otherwise you could use a Math library to call a function such as pow(n, 2) but that is probably overkill for simply squaring a number.
n * n will almost always work -- the couple cases where it won't work are in prefix languages (Lisp, Scheme, and co.) or postfix languages (Forth, Factor, bc, dc); but obviously then you can just write (* n n) or n n* respectively.
It will also fail when there is an overflow case:
#include <limits.h>
#include <stdio.h>
int main()
{
volatile int x = INT_MAX;
printf("INT_MAX squared: %d\n", x * x);
return 0;
}
I threw the volatile quantifier on there just to point out that this can be compiled with -Wall and not raise any warnings, but on my 32-bit computer this says that INT_MAX squared is 1.
Depending on the language, you might have a power function such as pow(n, 2) in C, or math.pow(n, 2) in Python... Since those power functions cast to floating-point numbers, they are more useful in cases where overflow is possible.
There are many programming languages, each with their own way of expressing math operations.
Some common ones will be:
x*x
pow(x,2)
x^2
x ** 2
square(x)
(* x x)
If you specify a specific language, we can give you more guidance.
If n is an integer :p :
int res=0;
for(int i=0; i<n; i++)
res+=n; //res=n+n+...+n=n*n
For positive integers you may use recursion:
int square(int n){
if (n>1)
return square(n-1)+(n-1)+n;
else
return 1;
}
Calculate using array allocation (extremely sub-optimal):
#include <iostream>
using namespace std;
int heapSquare(int n){
return sizeof(char[n][n]);
}
int main(){
for(int i=1; i<=10; i++)
cout << heapSquare(i) << endl;
return 0;
}
Using bit shift (ancient Egyptian multiplication):
int sqr(int x){
int i=0;
int result = 0;
for (;i<32;i++)
if (x>>i & 0x1)
result+=x << i;
return result;
}
Assembly:
int x = 10;
_asm_ __volatile__("imul %%eax,%%eax"
:"=a"(x)
:"a"(x)
);
printf("x*x=%d\n", x);
Always use the language's multiplication, unless the language has an explicit square function. Specifically avoid using the pow function provided by most math libraries. Multiplication will (except in the most outrageous of circumstances) always be faster, and -- if your platform conforms to the IEEE-754 specification, which most platforms do -- will deliver a correctly-rounded result. In many languages, there is no standard governing the accuracy of the pow function. It will generally give a high-quality result for such a simple case (many library implementations will special-case squaring to save programmers from themselves), but you don't want to depend on this[1].
I see a tremendous amount of C/C++ code where developers have written:
double result = pow(someComplicatedExpression, 2);
presumably to avoid typing that complicated expression twice or because they think it will somehow slow down their code to use a temporary variable. It won't. Compilers are very, very good at optimizing this sort of thing. Instead, write:
const double myTemporaryVariable = someComplicatedExpression;
double result = myTemporaryVariable * myTemporaryVariable;
To sum up: Use multiplication. It will always be at least as fast and at least as accurate as anything else you can do[2].
1) Recent compilers on mainstream platforms can optimize pow(x,2) into x*x when the language semantics allow it. However, not all compilers do this at all optimization settings, which is a recipe for hard to debug rounding errors. Better not to depend on it.
2) For basic types. If you really want to get into it, if multiplication needs to be implemented in software for the type that you are working with, there are ways to make a squaring operation that is faster than multiplication. You will almost never find yourself in a situation where this matters, however.

Resources