Minizinc soft constraint issue

Minizinc soft constraint issue - constraints

i'm facing an issue with a Minizinc code that have to schedule some exams in available rooms and periods. I skip the entire problem and code, because i've just a problem in the definition of this soft constraint:
some exams are targeted as exclusive or not, meaning that they need a room for just themselves, this is a soft consraint, so i need to minimize its number of violations. In fact each exam scheduled in the same room and period of an exam requiring a dedicated room counts as one violation.
The data and variables useful for this constraint are:
Exams (the number of exams to be scheduled)
room_exclusive (a vector of boolean that stands for the exclusivity required by each exam)
room_assignment (a variable vector of the assigned room to each exam)
period_assignment (a variable vector of the period assigned to exams)
roomexclusivity (an int vector that counts the exclusivity violations for each exams)
roomexclusivity_violations (the total number of violations).
For example if four exams share the room, and two of them require a dedicated room, the total number of violations is six. Each of those two has three other exams in the same room.
I've thought to make a two cycle control on each exam in order to find first an exclusive exam and then to check if it shares its room and period with other exams. The issue may be in the sum(e2 in 1..Exams, e1!=e2), because i've wrote it without know if it's right in Minzinc syntax to put e1!=e2.
constraint
forall(e1 in 1..Exams)
(roomexclusivity[e1]=sum(e2 in 1..Exams, e1!=e2)(bool2int(room_exclusive[e1]=true /\ room_assignment[e1]=room_assignment[e2] /\ period_assignment[e1]=period_assignment[e2])));
constraint
roomexclusivity_violation=sum(e in 1..Exams)(bool2int(roomexclusivity[e]>0));
The error i get is on (roomexclusivity[e1].. line:
syntax error at `('; expected `)' after body expression in generator call

Related

Is it always necessary to make hash table number of buckets a prime number for performance reason?

https://www.quora.com/Why-should-the-size-of-a-hash-table-be-a-prime-number?share=1
I see that people mention that the number of buckets of a hash table is better to be prime numbers.
Is it always the case? When the hash values are already evenly distributed, there is no need to use prime numbers then?
https://github.com/rui314/chibicc/blob/main/hashmap.c
For example, the above hash table code does not use prime numbers as the number of buckets.
https://github.com/rui314/chibicc/blob/main/hashmap.c#L37
But the hash values are generated from strings using fnv_hash.
https://github.com/rui314/chibicc/blob/main/hashmap.c#L17
So there is a reason why it makes sense to use bucket sizes that are not necessarily prime numbers?

The answer is "usually you don't need a table whose size is a prime number, but there are some implementation reasons why you might want to do this."
Fundamentally, hash tables work best when hash codes are spread out as close to uniformly at random as possible. That prevents items from clustering in any one location within the table. At some level, provided that you have a good enough hash function to make this happen, the size of the table doesn't matter.
So why do folks say to pick tables whose size is a prime? There are two main reasons for this, and they're due to specific cases that don't arise in all hash tables.
One reason why you sometimes see prime-sized tables is due to a specific way of building hash functions. You can build reasonable hash functions by picking functions of the form h(x) = (ax + b) mod p, where a is a number in {1, 2, ..., p-1} and b is a number in the {0, 1, 2, ..., p-1}, assuming that p is a prime. If p isn't prime, hash functions of this form don't spread items out uniformly. As a result, if you're using a hash function like this one, then it makes sense to pick a table whose size is a prime number.
The second reason you see advice about prime-sized tables is if you're using an open-addressing strategy like quadratic probing or double hashing. These hashing strategies work by hashing items to some initial location k. If that slot is full, we look at slot (k + r) mod T, where T is the table size and r is some offset. If that slot is full, we then check (k + 2r) mod T, then (k + 3r) mod T, etc. If the table size is a prime number and r isn't zero, this has the nice, desirable property that these indices will cycle through all the different positions in the table without ever repeating, ensuring that items are nicely distributed over the table. With non-prime table sizes, it's possible that this strategy gets stuck cycling through a small number of slots, which gives less flexibility in positions and can cause insertions to fail well before the table fills up.
So assuming you aren't using double hashing or quadratic probing, and assuming you have a strong enough hash function, feel free to size your table however you'd like.

templatetypedef has some excellent points as always - just adding a couple more and some examples...
Is it always necessary to make hash table number of buckets a prime number for performance reason?
No. Firstly, using prime numbers for bucket count tends to mean you need to spend more CPU cycles to fold/mod a hash value returned by the hash function into the current bucket count. A popular alternative is to use powers of two for the bucket count (e.g. 8, 16, 32, 64... as you resize), because then you can do a bitwise AND operation to map from a hash value to a bucket in 1 CPU cycle. That answers your "So there is a reason why it makes sense to use bucket sizes that are not necessarily prime numbers?"
Tuning a hash table for performance often means weighing the cost of a stronger hash function and modding by prime numbers against the cost of higher collisions.
Prime bucket counts often help reduce collisions when the hash function is unable to produce a very good distribution for the keys its fed.
For example, if you hashed a bunch of pointers to 64-bit doubles using an identity hash (basically, casting the pointer address to a size_t), then the hash values would all be multiples of 8 (due to alignment), and if you had a hash table size like say 1024 or 2048 (powers of 2), then all your pointers would hash onto 1/8th of the bucket indices (specifically, buckets 0, 8, 16, 25, 32 etc.). With a prime number of buckets, at least the pointer values - which if the load factor is high are inevitably spread out over a much larger range than the range of bucket indices - tend to wrap around the hash table hitting different indices.
When you use a very strong hash function - where the low order bits are effectively random but repeatable, you'll already get a good distribution across buckets regardless of the bucket count. There are also times when even with a terribly weak hash function - like an identity hash - h(x) == x - all the bits in the keys are so random that they produce as good a distribution as a cryptographic hash could produce, so there's no point spending extra time on a stronger hash - that may even increase collisions.
There a also times when the distribution isn't inherently great, but you can afford to use extra memory to keep the load factor low, so it's not worth using primes or a better hash function. Still, extra buckets puts more strain on the CPU caches too - so things can end up slower than hoped for.
Other times, keys with an identity hash have an inherent tendency to fall into distinct buckets (e.g. because they might have been generated by an incrementing counter, even if some of the values are no longer in use). In that case, a strong hash function increases collisions and worsens CPU cache access patterns. Whether you use powers of two or prime bucket counts makes little difference here.
When the hash values are already evenly distributed, there is no need to use prime numbers then?
That statement is trivially true but kind of pointless if you're talking about hash values after the mod-to-current-hash-table-size operation: even distribution there directly relates to few collisions.
If you're talking about the more interesting case of hash values evenly distributed in the hash function return type value space (e.g. a 64-bit integer), before those values are modded into whatever the current hash table bucket count is, then there's till room for prime numbers to help, but only when the hashed key space a larger range than the hash bucket indices. The pointer example above illustrated that: if you had say 800 distinct 8-byte-aligned pointers going into ~1000 bucket, then the difference between the numerically lowest pointer and the higher address would be at least 799*8 = 6392... you're wrapping around the table more than 6 times at a minimum (for close-as-possible pointers), and a prime number of buckets would increase the odds of each of "wrap" modding onto previously unused buckets.
Note that some of the above benefits to prime bucket counts apply to any kind of collision handling - separate chaining, linear probing, quadratic probing, double hashing, cuckoo hashing, robin hood hashing etc.

Are these two different formulas for Value-Iteration update equivalent?

While studying MDP via different sources, I came across two different formulas for the Value update in Value-Iteration algorithm.
The first one is (the one on Wikipedia and a couple of books):
.
And the second one is (in some questions here on stack, and my course slides) :
For a specific iteration, they don't seem to give the same answer. Is one of them converging faster to the solution ?

Actually the difference is in reward functions R(s , s') or R(s) in the second formula.
First equation is generalized.
In the first one, the rewards is Ra(s , s') when transitioning from state s to the state s' due action a'.
Reward could be different for different states and actions.
But if for every state s we have some pre-defined reward(regardless of the previous state and the action that leads to s), then we can simplify the formula to the second one.
The final values are not necessarily equal but the policies are same.

customer segmentation in retail [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I have a large sales database of a 'home and construction' retail.
And I need to know who are the electricians, plumbers, painters, etc. in the store.
My first approach was to select the articles related to a specialty (wires [article] is related to an electrician [specialty], for example) And then, based on customer sales, know who the customers are.
But this is a lot of work.
My second approach is to make a cluster segmentation first, and then discover which cluster belong to a specialty. (this is a lot better because I would be able to discover new segments)
But, how can I do that? What type of clustering should I occupy? Kmeans, fuzzy? What variables should I take to that model? Should I use PCA to know how many cluster to search?
The header of my data (simplified):
customer_id | transaction_id | transaction_date | item_article_id | item_group_id | item_category_id | item_qty | sales_amt
Any help would be appreciated
(sorry my english)

You want to identify classes of customers based on what they buy (I presume this is for marketing reasons). This calls for a clustering approach. I will talk you through the entire setup.
The clustering space
Let us first consider what exactly you are clustering: either orders or customers. In either case, the way you characterize the items and the distances between them is the same. I will discuss the basic case for orders first, and then explain the considerations that apply to clustering by customers instead.
For your purpose, an order is characterized by what articles were purchased, and possibly also how many of them. In terms of a space, this means that you have a dimension for each type of article (item_article_id), for example the "wire" dimension. If all you care about is whether an article is bought or not, each item has a coordinate of either 0 or 1 in each dimension. If some order includes wire but not pipe, then it has a value of 1 on the "wire" dimension and 0 on the "pipe" dimension.
However, there is something to say for caring about the quantities. Perhaps plumbers buy lots of glue while electricians buy only small amounts. In that case, you can set the coordinate in each dimension to the quantity of the corresponding article (presumably item_qty). So suppose you have three articles, wire, pipe and glue, then an order described by the vector (2, 3, 0) includes 2 wire, 3 pipe and 0 glue, while an order described by the vector (0, 1, 4) includes 0 wire, 1 pipe and 4 glue.
If there is a large spread in the quantities for a given article, i.e. if some orders include order of magnitude more of some article than other orders, then it may be helpful to work with a log scale. Suppose you have these four orders:
2 wire, 2 pipe, 1 glue
3 wire, 2 pipe, 0 glue
0 wire, 100 pipe, 1 glue
0 wire, 300 pipe, 3 glue
The former two orders look like they may belong to electricians while the latter two look like they belong to plumbers. However, if you work with a linear scale, order 3 will turn out to be closer to orders 1 and 2 than to order 4. We fix that by using a log scale for the vectors that encode these orders (I use the base 10 logarithm here, but it does not matter which base you take because they differ only by a constant factor):
(0.30, 0.30, 0)
(0.48, 0.30, -2)
(-2, 2, 0)
(-2, 2.48, 0.48)
Now order 3 is closest to order 4, as we would expect. Note that I have used -2 as a special value to indicate the absence of an article, because the logarithm of 0 is not defined (log(x) tends to negative infinity as x tends to 0). -2 means that we pretend that the order included 1/100th of the article; you could make the special value more or less extreme, depending on how much weight you want to give to the fact that an article was not included.
The input to your clustering algorithm (regardless of which algorithm you take, see below) will be a position matrix with one row for each item (order or customer), one column for each dimension (article), and either the presence (0/1), amount, or logarithm of the amount in each cell, depending on which you choose based on the discussion above. If you cluster by customers, you can simply sum the amounts from all orders that belong to that customer before you calculate what goes into each cell of your position matrix (if you use the log scale, sum the amounts before taking the logarithm).
Clustering by orders rather than by customers gives you more detail, but also more noise. Customers may be consistent within an order but not between them; perhaps a customer sometimes behaves like a plumber and sometimes like an electrician. This is a pattern that you will only find if you cluster by orders. You will then find how often each customer belongs to each cluster; perhaps 70% of somebody's orders belong to the electrician type and 30% belong to the plumber type. On the other hand, a plumber may only buy pipe in one order and then only buy glue in the next order. Only if you cluster by customers and sum the amounts of their orders, you get a balanced view of what each customer needs on average.
From here on I will refer to your position matrix by the name my.matrix.
The clustering algorithm
If you want to be able to discover new customer types, you probably want to let the data speak for themselves as much as possible. A good old fashioned
hierarchical clustering with complete linkage (CLINK) may be an appropriate choice in this case. In R, you simply do hclust(dist(my.matrix)) (this will use the Euclidean distance measure, which is probably good enough in your case). It will join closely neighbouring items or clusters together until all items are categorized in a hierarchical tree. You can treat any branch of the tree as a cluster, observe typical article amounts for that branch and decide whether that branch represents a customer segment by itself, should be split in sub-branches, or joined with a sibling branch instead. The advantage is that you find the "full story" of which items and clusters of items are most similar to each other and how much. The disadvantage is that the outcome of the algorithm does not tell you where to draw the borders between your customer segments; you can cut up the clustering tree in many ways, so it's up to your interpretation how you want to identify your customer types.
On the other hand, if you are comfortable fixing the number of clusters (k) beforehand, k-means is a very robust way to get just any segmentation of your customers in k distinct types. In R, you would do kmeans(my.matrix, k). For marketing purposes, it may be sufficient to have (say) 5 different profiles of customers that you make custom advertisement for, rather than treating all customers the same. With k-means you don't explore all of the diversity that is present in your data, but you might not need to do so anyway.
If you don't want to fix the number of clusters beforehand, but you also don't want to manually decide where to draw the borders between the segments afterwards, there is a third possibility. You start with the k-means algorithm, where you let it generate an amount of cluster centers that is much larger than the number of clusters that you hope to end up with (for example, if you hope to end up with somewhere about 10 clusters, let the k-means algorithm look for 200 clusters). Then, use the mean shift algorithm to further cluster the resulting centers. You will end up with a smaller number of compact clusters. The approach is explained in more detail by James Li over here. You can use the mean shift algorithm in R with the ms function from the LPCM package, see this documentation.
About using PCA
PCA will not tell you how many clusters you need. PCA answers a different question: which variables seem to represent a common underlying (hidden) factor. In a sense, it is a way to cluster variables, i.e. properties of entities, not to cluster the entities themselves. The number of principal components (common underlying factors) is not indicative of the number of clusters needed. PCA can still be interesting if you want to learn something about the predictive value of each article about a customer's interests.
Sources
Michael J. Crawley, 2005. Statistics. An Introduction using R.
Gerry P. Quinn and Michael J. Keough, 2002. Experimental Design and Data Analysis for Biologists.
Wikipedia: hierarchical clustering, k-means, mean shift, PCA

What is the cost of deleting a value from a hashtable?

Now I have this question where I was asked the cost of deleting a value from a hash table when we used linear probing while the insertion process.
What I could figure out from reading various stuff on the internet is that it has to do something with the load factor. Though I am not sure, but I read a relation between the load factor and no of probes required and it is No of probes = 1 / (1-LF).
So I believe the cost has to be dependent on the probe sequence. But then another thought ruins everything.
What if the element was inserted in p probes and now I am trying to delete this element. But before this I had already deleted few elements having the same hash code and were a part of insertion in probes less than p.
In this case I reach to a stage where I see a slot empty in the hash table but I am not sure if the element I am trying to delete is already deleted or is at some other location as a result of probing.
I also found that once I delete an element I must mark this slot with some special indicator to inform that it is available, but this doesn't solve my problem of being uncertain about the element which I am willing to delete.
Could anyone please suggest how to find the cost in such cases?
Is the approach going to vary if it is non-linear probing?

The standard approach is "lookup the element, mark as deleted". Marking obviously has O(1) cost, so the total operation cost is the same as just lookup: O(1) expected. It can be as high as O(n) in degenerate cases (e.g. all elements have the same hash). O(1) expected is all we can say theoretically.
About the load factor. The higher the load factor (ratio of number of occupied buckets to the total number), the larger is the expected factor (but this doesn't change the theoretical O cost). Note that in this case load factor includes number of both present in the table elements plus the number of buckets that got marked as deleted previously.
Other probing kinds (e.g. quadratic) don't change the theoretical cost, but may alter the expected constant factor or its variance. If you look at "fallback" sequences, in linear ordering the sequences of different buckets overlap. This means that if for some bucket the sequence is long, for adjacent buckets it will also be long. E.g.: if buckets 4 to 10 are occupied, sequence for bucket #4 is 7 bucket long (4, 5, 6, ..., 10), for #5 it's 6 and so on. For quadratic probing this is not the case.
However, linear probing has the benefit of better memory-cache behavior, since you check memory cells close to each other. In practice, though, for quadratic probing fallback sequences are rarely long enough for this to matter.
Finally, in linear probing case, it is possible to work without deleted mark, but for this you'd have to complicate deleting procedure considerably (still O(1) expected, though, but with much higher constant factor). Whether it is worth it has to be decided with actual profiling; for example, this simplifies inserting somewhat and lookup a bit. For a C++ implementation this would have the downside that erase() would invalidate iterators, though.

Formula to prioritize tasks based on weight and date

Is there a formula or algorithm which can prioritize items based on weight and a date? For instance, a critical item would always be at the top of the list while a two normal items would be prioritized based on their due date.

Scheduling is one of the most-studied areas of computer science, which is convenient, because it gives a lot of prior art that you can learn from.
Perhaps the easiest approach is Earliest Deadline First -- where you schedule the task with the first deadline and work on it until it blocks. Then work on the next earliest deadline. The downside is that low-priority tasks that take a long time might stall higher-priority tasks.
It might be worthwhile to determine if your scheduling must be hard, firm, or soft -- sometimes it makes sense to drop tasks completely and finish nearly everything on time than to finish everything but half a second too late.

Yes. This can either be done by defining a comparison function that checks priority first. I.e.
// Returns n < 0, 0, or n > 1 if value1 is less than, equal to or greater
compare(value1, value2) {
if(value1.priority != value2.priority) {
return value1.priority - value2.priority;
}
return value1.date - value2.date;
}
Alternatively, this function returns a value calculated from the date and the priority, this can be used to compare tasks and order them by priority (and then date):
// Returns
task.GetValue() {
return me.GetDateAsIntegerValue() + MAX_DATE_VALUE * me.GetPriority();
}
But just as sarnold mentioned, this is a highly studied area.

A different way to look at this is as a ranking problem. If you take these two values, weight and priority as inputs, you can create a table of paired comparisons that decompose items into their inputs (weight and priority) and outputs are relative orderings.
Consider, say, item 42 and item 69, denoted X42 and X69: if you have their weights and priority (W42, P42) and (W69, P69), you'd like to know if X42 should appear before X69, after it, or at an equal position. If you have a training set, you can tag whether one is preferred to the other.
What we're lacking here is a method for comparing these. A very simple method is to use logistic regression on the differences, i.e. a simple function f( (W_A - W_B), (P_A - P_B)), or f((W42 - W69),(P42 - P69)), in this case. If the result is above some threshold, then A is preferred to B, otherwise B is preferred to A. You can use this to sort the results.
As usual, most of the results online are not very accessible to beginners. Here's a short chapter that may be helpful in understanding the logistic regression. However, if you'd like to address such matters in more depth, the statistics StackExchange site would be better.
You'll have to decide: (1) if what you're looking at can be decomposed into an additive function of the weight and priority, and, if so, (2) the loss function or objective function that you need to minimize, so that you can get the optimal parameters for this additive function. An ordinal logistic model is one choice, ordinal probit another, and there are tons of other options. If you don't use an additive function (i.e. a linear combination), you'll have a challenging range of possibilities to consider, so it's best to start with something simple.

You can separate the tasks by rating the impact 1-10 (10 being highest) and the output needed 1-10 (also 10 being hardest)
You add the numbers together and divide by two. The result will be the priority ranking of your task 1-10 (10 being most important).
Example:
Check Emails: impact 2 output 1 = 1.5
Call potential customer: impact 10 output 2 = 6
From this example the calling of the customer would then be placed in a higher priority than checking emails.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex