Retain value/state of a variable for a particular id of a spout in storm - r

I have defined a single bolt that calculates certain threshold. The bolt is receiving data for several values of a field. Is it possible that I can retain the value/state of a variable for a particular value of field.
Suppose I have two set of tuple inputs s$tuple$input:
s$id = "21343254545454354343" s$id="45645465645456561234"
s$tuple$input = ["ABC",2] s$tuple$input= ["CDE",5]
Is it possible to retain the value of a variable like counter=5 for "ABC" and counter=9 for "CDE" and update them only when a tuple for corresponding id is received.

I haven't played with Storm and R but hopefully the ideas will be similar to Java.
You have a few options for storing state:
In worker memory (per bolt)
External store (not within Storm)
What you choose depends on your requirements but lets assume you're just trying to count words and don't really care if a worker dies. For this, the implementation is simple. Just create a private variable in your bolt and keep track.
For example, lets say you have a counts variable:
Map<String, Integer> counts = new HashMap<String, Integer>();
Then, in your bolt's execute method you just check if you've gotten the word before and if so increment the count:
Integer count = counts.get(word);
if (count == null)
count = 0;
count++;
counts.put(word, count);
Source: WordCountBolt.java
You also want to consider how tuples flow to the workers. You probably don't want to use shuffle grouping anymore. Instead you want to do a field grouping by ID so that tuples with the same ID go to the same bolt.
Going forward you probably want something more durable (so if you lost a worker then you don't lose all your counts) so you'd probably store your counts in something like HBase.

Related

How to count agents and save the number to a variable

I would like to count the number of agents that exits the Sink and save that number continuously to my variable "Loads" while i run the simulation.
!https://imgur.com/rAUQ52n
Anylogic suggets the following.. but i can't seem to get it right.
long count() - returns the number of agents exited via this Sink block.
hope some of you can help me out
It should be easy... you don't even need a variable since sink.count() has the information you need always.
But if you insist in creating one, You can also create a variable called counter of type int and on the sink action just use counter++;

Avoiding salesforce governing limits on soql queries getting group members for each group?

I am working in apex on salesforce platform. I have this loop to grab all group names, Ids, and their respective group members, place them in an object to collect all this info, then put that in a list to have a list of all groups and all information I need:
List<groupInfo> memberList = new List<groupInfo>();
for(Id key : groupMap.keySet()){
groupInfo newGroup = new groupInfo();
Group g = groupMap.get(key);
if(g.Name != null){
set<Id> memberSet = getGroupEventRelations(new set<Id>{g.Id});
if(memberSet.size() != 0){
newGroup.groupId = g.Id;
newGroup.groupName = g.Name;
newGroup.groupMemberIds = memberSet;
memberList.add(newGroup);
}
}
}
My getGroupEventRelations method is as such:
global static set<Id> getGroupEventRelations(set<Id> groupIds){
set<Id> nestedIds = new set<Id>();
set<Id> returnIds = new set<Id>();
List<GroupMember> members = [SELECT Id, GroupId, UserOrGroupId FROM GroupMember WHERE GroupId IN :groupIds];
for(GroupMember member : members){
if(Schema.Group.SObjectType == member.UserOrGroupId.getSObjectType()){
nestedIds.add(member.UserOrGroupId);
} else{
returnIds.add(member.UserOrGroupId);
}
}
if(nestedIds.size() > 0){
returnIds.addAll(getGroupEventRelations(nestedIds));
}
return returnIds;
}
getGroupEventRelations contains a soql query, and considering this is called inside a loop of groups... if someone has over 100 groups with group members or possibly a series of 100 nested groups inside groups... then this is going to hit the governing limits of salesforce soql queries pretty quickly...
I am wondering if anyone knows of a way to possibly get rid of the soql query inside getGroupEventRelations to get rid of the query in the loop. When I want group members for a specific group, I am not really seeing a way to get by this without more loops inside loops where I could risk running into CPU timeout salesforce governing limit :(
Thank you in advance for any help!
At large enough numbers there's no solution, you'll run into SOME governor limit. But you can certainly make your code work with bigger numbers than it does now. Here's a quick little cheat you could do to cut nesting 5-fold. Instead of just looking at the immediate parent (single level of children) look for parent, grandparent, great grandparent, etc, all in one query.
[SELECT Id, GroupId, UserOrGroupId FROM GroupMember WHERE (GroupId IN :groupIds OR Group.GroupId IN :groupIds OR Group.Group.GroupId IN :groupIds OR Group.Group.Group.GroupId IN :groupIds OR Group.Group.Group.Group.GroupId IN :groupIds OR Group.Group.Group.Group.Group.GroupId IN :groupIds) AND Id NOT IN :returnIds];
You just got 5 (or is it 6?) levels of children in one SOQL call, so you can support that many times more nest levels now. Note that I added a 'NOT IN' clause to make sure you don't repeat children that you already have, since you won't know which Ids came from the bottom level.
You can also make your very first call for all groups instead of each group at a time. So if someone has 100 groups you'll make just one call instead of 100.
List<Group> groups = groupMap.values();
List<GroupMember> allMembers = [SELECT Id, GroupId, UserOrGroupId FROM GroupMember WHERE GroupId IN :groups];
Lastly, you could query all GroupMembers in a single SOQL call and then iterate yourself. Like you said, you risk running into the 10 second limit here, but if the number of groups isn't in the millions you'll likely be just fine, especially if you do some O(n) analysis and choose good data structures and algorithms. On the plus side, you won't have to worry about SOQL limits regardless of the nesting and the tree complexity. This answer should be very helpful, they are doing almost exactly what you'd have to do if you pulled all members in one call.
How to efficiently build a tree from a flat structure?

Dynamodb data model for process/transaction monitoring

I am wanting to keep track of multi stage processing job.
Likely just need the following fields
batchId (guid) | eventId (guid) | statusId (int) | timestamp | message (string)
There are relatively small number of events per batch.
I want to be able to easily query events that have a statusId less than n (still being processed or didn't finish processing).
Would using multiple rows for each status change, and querying for latest status be the best approach? I would use global secondary index but StatusId does not seem like a good candidate for hashkey (less than 10 statuses).
Instead of using multiple rows for every status change, if you updated the same event row instead, you could use a technique described in the DynamoDB documentation in the section 'Use a Calculated Value'. Basically this would involve adding another attribute (say 'derivedStatusId') which would be derived by appending a random number to statusId at the time of writing to DynamoDB. For example, for a statusId of 2, derivedStatusId could be one of {"2-00", "2-01", .. "2-99"}. Setting up a Global Secondary Index on derivedStatusId would give you some fan-out that will help in preventing the index from becoming hot.
If you are sure that you will use this index for only unfinished events, then removing the derivedStatusId attribute from the record when it transitions to a finished status will remove it from index as well - which may be a good property if events are expected to finish processing eventually, and if they stay around forever. This technique is called "Sparse Index" and is described in more detail here.
From your question, it seems like keeping status history recording is a desired property (I assume this because you want to have multiple rows for status changes). Consider putting this historical information in the same row. DynamoDB supports list data types and also has a generous 400KB item limit which may just allow you to capture all the desired historical information in the same record.

In a Drupal Views argument, how can I get the total number of nodes in a nodequeue?

I'm working on a site that is a database of thousands record albums and am attempting to create an "album of the day" block. My solution has been to create a nodequeue of specific records, create a view that passes the current day of the year as an argument, and then uses this value to call the nodequeue item with that same numbered position. I do this by providing a "Default Argument" as PHP code in the "Nodequeue: Position" argument setting. Here's the code I use:
$nodequeueTotalNodes = 120;
$dayOfTheYear = date("z");
$nodeQueuePosition = $dayOfTheYear % $nodequeueTotalNodes;
return $nodeQueuePosition;
The above code works to my satisfaction but my problem is I have to manually change the value of $nodequeueTotalNodes every time I add or remove an item from my nodequeue.
Is there a way to pull the total number of nodes from my queue to replace the "120" in my code above?
The nodequeue_nodes table holds all the nodes in your queues. Something like this should do the trick, where qid is the queue id:
$nodequeueTotalNodes = db_result(db_query('SELECT COUNT(nid) FROM {nodequeue_nodes} WHERE qid = %d', $qid));
If you're using a subqueue there's a column called sqid which you can use.

AdvancedDataGrid (grouping) quick jump to row

I have a problem with the AdvancedDataGrid widget. When the dataProvider is an ArrayCollection (of arrays), the nth array (within the collection) is also the nth row within the grid, and I can jump and display the i-th row by scripting
adg.selectedIndex = i;
adg.scrollToIndex(i);
now, when I add a Grouping, the dataProvider ends up being a GroupingCollection2, and now the index in the dataprovider's source does not correspond to the index in the adg anymore (which is understandable, because it's being grouped).
How can I select and display a row in grouped data efficiently? Currently, I have to traverse the adg and compare each found item with its data attributes in order to find the correct index of the row within the adg, and jump to it like above. This process is very slow. Any thoughts?
edited later:
We already used a caching object as Shaun suggests, but it still didn't compensate for the search times. In order to fully construct a sorting of a list of things (which this problem equates to, as the list is completely reordered by the grouping), you always have to know the entire set. In the end we didn't solve that problem. The project is over now. I will accept Shaun's answer if no one knows a better way in three days.
Depending on what values your comparing against you can store the objects in a dictionary with the lookup using the property/properties that would be searched for, this way you have a constant time look-up for the object (no need to look at every single item). Say for example your using a property called id on an object then you can create an AS object like
var idLookup:Object = {};
for(myObject in objects)
idLookup[myObject.id] = myObject;
//Say you want multiple properties
//idLookup[myObject.id]={};
//idLookup[myObject.id][myObject.otherProp] = myObject;
now say the user types in an id you go into the idLookup object at that id property and retrieve the object:
var myObject:Object = idLookup[userInput.text];
myAdg.expandItem(myObject, true);
now when you want to get an object by id you can just do
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/mx/controls/AdvancedDataGrid.html#expandItem()
I haven't done any thorough testing of this directly, but use a similar concept for doing quick look-ups for advanced filtering. Let me know if this helps at all or is going in the wrong direction. Also if you could clarify a bit more in terms of what types/number of values you need to lookup and if there's the possibility for multiple matches etc. I may be able to provide a better answer.
Good luck,
Shaun

Resources