Aggregations in Marklogic 8 Java - xquery

I'm trying to group all the documents based on an element value. Through X-Query, I'm able to get the element value and its corresponding count. But, with Java API I'm not able to do that.
X-Query:
for $name in distinct-values(doc()/document/<element_name>)
return fn:concat("Element Value:",$name,", Count:",fn:count(doc()/document/[element_name eq $name]));
Output:
Element Value:A, Count:100
Element Value:B, Count:200
Java:
QueryManager qryMgr = client.newQueryManager();
StructuredQueryBuilder qb = new StructuredQueryBuilder();
StructuredQueryDefinition querydef = qb.containerQuery(qb.element("<element_name>"), qb.term("A"));
SearchHandle handle = new SearchHandle();
qryMgr.search(querydef, handle);
System.out.println(handle.getTotalResults());
By this method, I'm able to get the document count only for a particular value. Is there any way to get the count of all documents. Kindly Help!

If I understand your use case, you can use a range index to solve this problem, which is - you want to know what all the values are for a particular element, and then how many documents have that value. That's exactly what a range index is for.
Try adding a range index on "element_name" - you can use the ML Admin app for that - go to your database and click on Element Range Indexes.
In XQuery, you can then do something like this:
for $val in cts:element-values(xs:QName("element_name"))
return text{$val, cts:frequency($val)}
With the Java Client, you can do the same by adding a range-based constraint to a search options file, and then the response from SearchManager will have all of the values and frequencies in it that match your query. Check the REST API docs for constructing such a search options file.

Related

Kibana: How to aggregate for all UUIDs

I am tracking my url hit counts and want to aggregate them.
I have a few URL as follows:
example.com/service/{uuid}
when I view in Kibana it lists out the total hit count of each URL individually so my table has something like:
example.com/homepage 100 count
example.com/service/uuid1 10 count
example.com/service/uuid2 5 count
Is there an easy way to combine all uuids into 1 entry?
I was thinking of replacing uuids with a static string, however the admins blocked regex support making the replacement very difficult. So I am trying to see if there is any other way before doing that.
Thanks!
I would suggest to create a new field with scripted fields.
The new field would return value: example.com/service/uuid if the url contains the word uuid. Otherwise it will return the url as it is.
Then you could do the aggregation on the new field.

TypeError: Cannot read property "data" from undefined

There is a many to many relationship between two records namely countries and clients. When I fetch some records from the clients ( an array of clients ) and I try to assign them problematically to a country( record ) like this record[clientsRelationName] = clients I get the following bazaar error, TypeError: Cannot read property "data" from undefined. I know for sure that the variable clientsRelationName is actually a string that corresponds to the name of the relation which is simply just called clients. And it has nothing to do with a variable called data. In fact data does't exist. And I know for sure that record is a defined variable.
Any idea why this is happening? Is it a bug?
I have seen this issue where using Object.keys() on a server-side record yields [key, data, state] instead of the expected fields for that record. So if your programmatic assignment involves iterating on the properties of that record object, you may hit this data property.
Unfortunately that's all I know so far. Maybe the App Maker Team can provide further insight.
As you pointed out in your question, clientsRelationName is a string corresponding to the name of the relation. Your actual relation is just Clients, therefore either of the following should work:
record[Clients] = clients;
or
record.Clients = clients;
I would actually suggest using record.YourRelation because when you use the dot after your record the intellisense will automatically bring up all options for field names or relation end names that are available with that record.
After a lot of trail and error, I finally found a way to make it work using a rather very simple solution and its the only way I could make it work. Basically to a void getting this strange error when when modifying an association on a record TypeError: Cannot read property "data" from undefined , I did the following:
Loop through the record relation(array) and and popup every record in side it. Then loop through the other records that you want to assign the record relation to ( modify the association ) pushing every element to the record relation.
var length = record[relationNameAsVariable].length;
for(var i=0; i<length; i++){
record[relationNameAsVariable].pop();
}
now record[relationNameAsVariable] is empty so do the following:
for(var i=0; i < clientsArray.length; i++ ){
record[relationNameAsVariable].push(clientsArray[i]);
}
It could be a bug or something else that I'm doing wrong when trying to replace the whole association. I'm not sure. But this works like a champ.

How can I be notified of an index being updated on DynamoDB?

I have a table (key=username, value=male or female) and an index on the values.
After I add an item to the table, I want to update the counts of males and females. However, after a successful write, as the index is a Global Secondary Index, the count query is not consistent.
Is there a way (dynamo db Streams, Lambda, ...) to monitor when the index is up to date?
Note that Im not looking for a solution that involves something else (keep count of increments in redis or ...), what I describe here is a simplified problem to especially ask a question about how can I monitor an index in dynamo.
Thanks!
I am not sure if there is any mechanism currently provided to check this but, you can easily solve this problem by adding a single line of code to your query.
ConsistentRead = True
DynamoDB has a parameter when set as true will make sure that you get latest updated value.
Now, when you add/update the item and then query the data add ConsistentRead option in it, this will ensure that you will have latest count value.
Here is the reference link.
If you are able to accomplish using other technique then please do share it.
Hope that helps.

How can I look for objects with missing value or None as key?

I would like to perform a search on the zope catalog of the objects with missing index key values. Is it possible?
For example consider the subsequent code lines:
from Products.CMFCore.utils import getToolByName
catalog = getToolByName(context, 'portal_catalog')
results = catalog.searchResults({'portal_type': 'Event', 'review_state': 'pending'})
what to do if I'm interested in objects in which a certain item, instead of portal_type or review_state, has not be inserted?
You can search for both types, but to search for MissingValue entries requires custom handling of the internal catalog data structures.
Indexes take the value from an object, and index that. If there is an AttributeError or similar, the index does not store anything for that object, and if the same field is part of the returned columns, in that case a MissingValue will be given to indicate the index is empty for that field.
In the following examples I assume you have a variable catalog that points to the site's portal_catalog tool; e.g. the result of getToolByName(context, 'portal_catalog') or similar.
Searching for None
You can search for None in many indexes just fine:
catalog(myKeywordIndex=None)
The problem is that most indexe types ignore None as a value. Thus, searching for None will fail on Date and Path indexes; they ignore None on index, and Boolean indexes; they turn None into False when indexing.
Keyword indexes ignore None as well, unless it is part of a sequence. If the indexed method returns [None] it'll happily be indexed, but None on it's own won't be.
Field indexes do store None in the index.
Note that each index can show unique values, so you can check if there are None values stored for a given index by calling:
catalog.uniqueValuesFor(indexname)
Searching for missing values
This is a little trickier. Each index does keep track of what objects it has indexed, to be able to remove data from the index when the object is removed, for example. At the same time, the catalog keeps track of what objects it has indexed as a whole.
Thus, we can calculate the difference between these two sets of information. This is what the catalog does all the time when you call the published APIs, but for this trick there is no such public API. We'll need to reach into the catalog internals and grab these sets for ourselves.
Luckily, these are all BTree sets, and the operations are thus relatively efficient. Here is how I'd do it:
from BTrees.IIBTree import IISet, difference
def missing_entries_for_index(catalog, index_name):
# Return the difference between catalog and index ids
index = catalog._catalog.getIndex(index_name)
referenced = IISet(index.referencedObjects()) # Works with any UnIndex-based index
return (
difference(IISet(catalog._catalog.paths), referenced),
len(catalog) - len(referenced)
)
The missing_entries_for_index method returns an IISet of catalog ids and it's length; each is a pointer to a catalog record for which the named index has no entry. You can then use catalog.getpath to turn that into a full path to objects, or use catalog.getMetadataForRID to get a dictionary of metadata values, or use catalog.getobject to get the original object itself, or use catalog._catalog[] to get catalog brains.
The following method will give you a catalog result set, just like you would get from a regular catalog search:
from ZCatalog.Lazy import LazyMap
def not_indexed_results(catalog, index_name):
rs, length = missing_entries_for_index(catalog, index_name)
return LazyMap(catalog._catalog.__getitem__, rs.keys(), length)
Thanks Ago. Actually reading the link you suggest I discover that it's not possible without a trick. I report from pypi:
Note that negative filtering on an index still restricts items to those having a value in the index. So with 10 documents, 5 of them in the foo index with a value of 1, a query for not 1 will return no items instead of the 5 items without a value. You need to index a dummy/default value if you want to consider all items for a particular index.
So it is necessary to give a default value to your item and look for it.

Apachesolr query - filter by text in fields

I'm constructing an apachesolr query in my Drupal module programmatically and have had success with some aspects, but am still struggling with others.
So far, I have been able to construct a query that can search for specific text and limit the results based on terms to be filtered by with the following code:
$subquery_region->addFilter('tid', $term->tid);
$query->addFilterSubQuery($subquery_region, 'OR', 'AND');
What I'd like to achieve next is to be able to narrow the search further by adding a filter for finding certain text within a specific field in the node. Has anyone been able to do this.
I've been researching online and have tried lots of different ways such as adding the filter directly to the main search query
$query->addParam('fl', 'ss_my_field');
$query->addFilter("ss_my_field", "field_substring_to_search_for");
As well as breaking that out into a subquery to add to the main search query
$subquery_test = apachesolr_drupal_query("Test");
$subquery_test->addParam('fl', 'ss_my_field');
$subquery_test->addFilter("ss_my_field", "field_substring_to_search_for");
$query->addFilterSubQuery($subquery_test, 'OR', 'AND');
But none of these are working. They are returning an empty set, even though I know the substring exists in the field I'm adding as a filter and it has been indexed. I have verified through the apachesorl views module that the search index has been populated with that field and can see the substring exists.
Is there anything wrong with my syntax or the way I'm building the query?
If you know how to add filters for searching for text within certain fields, please share! It may not even be done with the addFilter function, but that's all I have been trying so far.
Thanks!
First you have to create an index for that specific field.
function hook_apachesolr_update_index(&$document, $node) {
$document->ss_your_field_name = $node->your_field_name;
}
where ss_* is the pattern.
ss_* -> String
is_* -> Integer
im_* -> Integer, Multivalued
After that you have to
1. delete the index - admin/settings/apachesolr/index
2. re-index the content
3. run the cron
4. check the filter that you created - admin/reports/apachesolr/index
Then, you can add filters
$query->addFilter("ss_your_field_name", "value");
Hope this helps you.
function hook_apachesolr_modify_query(&$query, &$params, $caller){
$subquery = apachesolr_drupal_query();
$subquery->add_filter("ss_my_field", "field_substring_to_search_for");
$query->add_subquery($subquery, "AND");
}

Resources