Optaplanner - Constraint streams groupBy - constraints

I am trying to solve a scheduling problem, which centers around the following arrangement:
Equipment <- Task <- ShiftAssignment(Planning Variable) -> Shift
Tasks have a problem fact of their equipment usage; each may reference a specific Equipment instance or instances, with an associated time in minutes. It takes up that time for that equipment in the Shift it's assigned to.
Should I be able to achieve the constraint with join() and groupBy()? I've tried pursuing the following route:
private Constraint doNotOverbookEquipment(ConstraintFactory factory) {
return factory.from(ShiftAssignment.class)
// join shift assignments with the shifts they are assignments of
.join(Shift.class, equal(ShiftAssignment::getShiftId, Shift::getId))
// join with ALL pieces of equipment
.join(Equipment.class)
.groupBy([Shift and Equipment, summing the equipment-usage for each ShiftAssignment])
.filter([equipment usage greater than a constant])
.penalizeConfigurable("do not overbook Equipment");
I think the filter() should be no problem, but unsure exactly how to get this groupBy() to achieve what I want. Do I need a TriConstraintCollector here? Or is there a different, better overall approach?
For reference, the ShiftAssignment class can easily have a method like the following:
public LinkedHashSet<Equipment, Integer> getEquipmentUsage()

I think that your groupBy() would look something like this:
.groupBy(
(shiftAssignment, shift, equipment) -> shift,
(shiftAssignment, shift, equipment) -> equipment,
ConstraintCollectors.sum((shiftAssignment, shift, equipment) ->
shiftAssignment.getUsage(shift, equipment)
)
The actual logic of what is to be summed should, in this case, be implemented in shiftAssignment.getUsage(...) or any other method you choose to use there.

Related

Gremlin - optimize query

I have a graph, that represents database objects, parent-child relations and dataflows relations (only in-between columns).
Here is my current gremlin query (in python), that should find dataflow impact of a column:
g.V().has('fqn', 'some fully qualified name').
repeat(outE("flows_into").dedup().store('edges').inV()).
until(
or_(
outE("flows_into").count().is_(eq(0)),
cyclicPath(),
)
).
cap('edges').
unfold().
dedup().
map(lambda: "g.V(it.get().getVertex(0).id()).in('child').in('child').id().next().toString() + ',' + g.V(it.get().getVertex(1).id()).in('child').in('child').id().next().toString()").
toList()
This query should return all edges, that are somehow impacted by the initial column.
The problem is, that in some cases, I do not care about the column-level stuff and I want to get the edges on 'schema level'. That is wjat the lambda does - for both nodes in the edge, it traverses two times up in the objects tree, which returns the schema node.
The problem is in this lambda function - I cannot just do this:
it.get().getVertex(1).in('child').in('child').id().next().toString()
because getVertex(1) does not return a traversable instance. So I need to start new traversal by g.V().... By my debugging, this line causes the horrible slowdown. It gets about 50x slower if I leave this transformation in.
Do you have any ideas how to optimize this query?
You might consider not using a lambda at all, given they tend to not be portable between implementations. Perhaps the map step could be replaced with a project step something like:
project('v0','v1').
by(outV().in('child').in('child').id())
by(inV().in('child').in('child').id())

Create Simple join in Entity Framework Core

I am trying to do the simple, something I've done countless times in SQL, but I have Table A, it has a id column as the primary key - which I'm not using here. Table B also has one, also not used for this.
I want to get all the rows in B, that match on a column that is in A (int) and have them show up in that entity.
Thus: Let's for sake of argument say that both tables have a column called "Joiner", If I was writing this in SQL, I'd say:
Select A.*, B.* from TableA A
Join TableB B on A.Joiner = B.Joiner
Easy right? Couldn't be a more common thing unless it was A.Id... But it isn't
So, in Entity Frameworks Core, I want to get an A by id, .Include B.
Now, I get A.Bs which is a list of all the B's that match the A's on their Joiner fields.
Super Easy right?
I'm using the EntityMappingConfiguration<A> and EntityMappingConfiguration<B> to define the mapping, Overriding the 'Map()' Method, so things that tell me how to modify that would be most helpful.
I've been able to have it create mappings that are looking for B.Joiner = A.Id, And things that are looking for an array of B to exist in A (that just doesn't even make sense)...
Oh, there are MANY B's that match A's value, and could be that same value in many other A's (From what I can tell it is a Many to Many relationship, except that you would never ask for B's matching A records.) The idea being that we can update A's Joiner value, and have it switch to a different B, and thus, be able to pick up other configuration columns in B..
Just seems like this should be a slam dunk, yet, none of the Entity Framework (core) guru's seem to be able to do this.
In reference to code examples, I have about 200 things that don't work. Here is one example:
public class TableAMap: EntityMappingConfiguration<TableA>
{
public override void Map(EntityTypeBuilder<TableA> b)
{
b.ToTable("TableA");
b.HasKey(x => x.Id);
b.Property(x => x.Id).UseIdentityColumn();
b.HasMany(x => x.TableBs)
.WithOne()
.HasForeignKey(x => x.Joiner);
}
}
The class TableA has a Joiner field that is an Int and a Joiners field that is public virtual ICollection Joiners{ get; set; }, (In my case, I'm calling TableB the same as the field, so both would be 'Joiner' in this case.) But it wants to use the Id/TableAId to join. I've tried using HasPrincipalKey () - which seems like exactly what I'm wanting to do, except that there is a comment on the documentation that says "if there is not a unique constraint, it will add one" - sigh, there are multiple duplicate values, so nope, not unique, so please don't automatically add the constraint!
I'll try to write up an example that is simple (and isn't code that I can't share)

Dynamodb: Index on List attribute and query NOT_CONTAINS

I am trying to figure out (at this point I think the answer is No) if it is possible to build a index on a List Attribute and query NOT_CONTAINS on that attribute.
Example table:
Tasks
Task_id: string
solved_by: List<String> # stores list of user_ids who previously solved this task.
My query would be:
Get me all the tasks not yet solved by current_user
select * from tasks where tasks.solved_by NOT_CONTAINS current_user_id
Is it possible to do this without full scans. I tried creating an attribute of type L but aws cli errors out saying Member must satisfy enum value set: [B, N, S]
If this is not possible with dynamodb, please suggest what datastore I can use.
Any help is highly appreciated. Thanks!
As you found out, and as the error you got suggests, this is NOT possible.
However, I'd argue if your design couldn't be improved. Storing a potentially unbound list of entries (users in your case) inside a single item, which is limited to 400kb seems dangerous.
If instead, you'd store for each task the information that a particular user resolved it as a separate item (partition key - task_id, sort key - user_id) than you could easily look up if a user solved a task or not. You could also store additional information about the particular solution or attempts.
If you haven't heard of DynamoDB single table design yet, or how to overload indexes, I can recommend looking at
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html
https://www.dynamodbbook.com/
Update
I just realised, you care about a negation (NOT_CONTAINS) - for those, you can't use an index anyway. For the sort key you can only use positive comparison (=, <, >, <=, >=, between, begins_with): https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.KeyConditionExpressions
So you might have to rethink the whole approach, to better pre-process the data stored in DDB, so it's easier to fetch, or pick a different database.
In your original question, you defined your access pattern as
Get me all the tasks not yet solved by current_user
In a later comment, you clarified that the access pattern is
A solver should be shown a task that is not yet solved by them.
which is a slightly different access pattern.
Here's one way you could fetch a task not yet solved by a user.
In this data model, I chose to model Users and Tasks as separate items. Tasks have numerically increasing ID's. Each User item should start with the lastSolved attribute set to 1. Each time you fetch a new Task for a user, you fetch TASK#{last_solved+1} and increment the lastSolved attribute by 1.
You could probably take a similar approach by using timestamps instead of numbers... anything sortable, really.

Linq 'contains' query taking too long

I have this query:
var newComponents = from ic in importedComponents
where !existingComponents.Contains(ic)
select ic;
importedComponents and existingComponents are of type List<ImportedComponent>, and exist only in memory (are not tied to a data context). In this instance, importedComponents has just over 6,100 items, and existingComponents has 511 items.
This statement is taking too long to complete (I don't know how long, I stop the script after 20 minutes). I've tried the following with no improvement in execution speed:
var existingComponentIDs = from ec in existingComponents
select ec.ID;
var newComponents = from ic in importedComponents
where !existingComponentIDs.Contains(ic.ID)
select ic;
Any help will be much appreciated.
The problem is quadratic complexity of this algorithm. Put the IDs of all existingComponentIDs into a HashSet and use the HashSet.Contains method. It has O(1) lookup cost compared to O(N) for Contains/Any on a list.
The morelinq project contains a method that does all of that in one convenient step: ExceptBy.
You could use Except to get the set difference:
var existingComponentIDs = existingComponents.Select(c => c.ID);
var importedComponentIDs = importedComponents.Select(c => c.ID);
var newComponentIDs = importedComponentIDs.Except(existingComponentIDs);
var newComponents = from ic in importedComponents
join newID in newComponentIDs on ic.ID equals newID
select ic;
foreach (var c in newComponents)
{
// insert into database?
}
Why is LINQ JOIN so much faster than linking with WHERE?
In short: Join method can set up a hash table to use as an index to quicky zip two tables together
Well based on the logic and numbers you provided that means you are basically performing 3117100 comparisons when you run that statement. Obviously that is not entirely accurate because your condition may be satisfied before running through the entire array but you get my point.
With collections this large you are going to want use a collection where you can index your key (in this case your component ID) to help reduce the overhead of the search. The thing to remember is that even though LINQ looks like SQL there are no magic indexes here; it is mainly for convenience. In fact, I have seen articles where a link lookup is actually a slight bit slower than a brute force lookup.
EDIT: If it is possible I would suggest trying a Dictionary or SortedList for your values. I believe either one would have slightly better lookup performance.

Understanding HQL queries on collection objects

This is similar to a question I asked earlier. The answers to that question partially solved my issue, but I'm still having some issues in trying to perform the kind of search I specified there; furthermore, I'm simply having trouble understanding how Hibernate chooses what to return in different scenarios.
Here's my mapping:
Client {
#OneToMany(mappedBy="client",cascade=CascadeType.ALL)
private Set<Group> groups = new HashSet<Group>();
}
Group {
#ManyToOne (cascade=CascadeType.ALL)
private Client client = new Client();
private String name;
private String state; //two char state code
private String extId; //unique identifier; candidate key, but not the #Id.
}
Queries by name are inline (e.g., like with wildcards on both ends of the param); state and extId are by equality.
The following query returns a single client, with only the matching group attached, even if other groups are associated to the client (note again that extId will only return one group):
select distinct client from Client as client
inner join client.groups as grp
where grp.extId = :extId
This query returns a single client, but with all associated groups attached, regardless of whether the group's state code matches the criteria:
select distinct client from Client as client
inner join client.groups as grp
where grp.state= :state
Finally, this query returns a separate copy of the client for each matched group, and each copy contains all of its associated groups, regardless of whether the group's name matches the criteria:
select distinct client from Client as client
inner join client.groups as grp
where grp.name like :name
I'm new to Hibernate, and I'm finding it immensely frustrating that I'm unable to predict what is going to be returned from a given query. All three queries are nearly identical, except for some small changes in the WHERE clause, yet I get radically different results for each. I'd spent time reviewing the documentation, but I'm missing wherever this behavior is explained. Can anyone help shed some light on this?
Finally, what I really need to do is to return Clients when querying by Group, and have the client only contain the Groups which match the search criteria. Is there a single-shot way I can construct an HQL query to do so, or will I have to do multiple queries and build my objects up in code?
Thanks.
The answer to this is twofold. One, there was a problem with the test harness, which was (sensibly) using transaction rollback to create test instances without leaving artifacts in the database. This was the source of my odd responses in the queries.
I managed to return just the values I wanted in the collections by simply changing to an outer fetch join.

Resources