How to get the total count of unique values of a field in solr - solrcloud

I want to the sum of count of unique values of a field in solr.
My original requirement is like this:
I have a collection in solr cloud, spread across multiple shards from which I have to get the list of unique values of the field(let's say abc), which has duplicate values. And the sorting of the search results should be such that the value having the maximum occurance should be at the top and least occurance should be at the bottom.
I have been able to achieve this requirement by the below query and it works totally fine.
http://localhost:8983/solr/secondcol/select?q=test&wt=json&indent=true&facet=true&facet.field=abc&facet.mincount=1&rows=0&facet.offset=0&facet.limit=10
But I am not able to get the count of the total num of facets returned here.
Is there a way to do it.
I am doing this in solr 4.10 and 5.3

You could specify facet.limit=-1 and count the number of terms returned.
In Solr 5.3 there is JSON facets which has the parameter numBucketsthat seems to do what you want, but I do not know if that works properly in SolrCloud.

Related

Gremlin Graph Search Query: What Percent of my Workspace has been Filled Out?

I recently began using the Enterprise Architecture tool Ardoq for metadata modeling. I have roughly 55 workspaces set up, all of which have differing numbers of fields and have been filled out to differing degrees.
What I need to do is create a Gremlin query that will tell me what percent of a given workspace has been filled out i.e. count of all custom fields in workspace with data entered / count of total custom fields within the workspace. There are a number of default fields that are automatically filled in, and I would not like to include these. That's what I've tried to do with 'by(properties().not(hasKey('field1','field2',...)' below.
The goal is to create a report for every workspace, then aggregate these reports within a dashboard. So, the ideal query will have minimal inputs that need to be changed when calculating this percentage for separate workspaces. I believe that the only input that should need to be changed is the workspace ID, and that is what I have tried in my query below. However, while the query below was accurate for 1 of my workspaces, it is anywhere from 10% to 50% off for all others. Any suggestions?
My Query:
totalFieldCount = g.V().
group().
by('typeId').
by(properties().not(hasKey('_id','Ardoq-ID','last-updated-date','last-updated-by','parent','lastModifiedByName','createdByEmail','type','name','created','rootWorkspace','typeId','created-by','created-date','last-modified-by','lastUpdated','lastModifiedByEmail',
'component-key','entity-type','createdByName','model','description','_order','children','incomingReferenceCount','outgoingReferenceCount','persistent','_version','ardoq','icon','version','target','origin','image','shape','id','order','targetWorkspace','isPublic','returnValue','displayText','source','ardoq-entity-type','color','mustBeSaved','tags','lock')).label().dedup().count()).next()
g.V().has('rootWorkspace', '3907706fb235eda5ec15fb89').
project('fieldsOnThisComponent', 'allFields').
by(properties().not(hasKey('_id','Ardoq-ID','last-updated-date','last-updated-by','parent','lastModifiedByName','createdByEmail','type','name','created','rootWorkspace','typeId','created-by','created-date','last-modified-by','lastUpdated','lastModifiedByEmail','component-key','entity-type','createdByName','model','description','_order','children','incomingReferenceCount','outgoingReferenceCount','persistent','_version','ardoq','icon','version','target','origin','image','shape','id','order','targetWorkspace','isPublic','returnValue','displayText','source','ardoq-entity-type','color','mustBeSaved','tags','lock')).count()).
by(map{ totalFieldCount[it.get().value('typeId')] }).
math('100 * fieldsOnThisComponent / allFields').
mean().
map{ it.get().round(2) }

Aggregation tool in BIRT : display a count like number on a table column

It must be very easy, baut i can't display a very simple filtered agregation on a table header row in BIRT 3.7. I manage to use count aggregation on groups headers or footers, buet not a filtered aggregation on a simple column table.
USE case : my sql statement car return the string value "ERROR..." for a string field name TEST. The query returns 734 results. My table display all the results.
In the header row i just want to diplay a count of which would be in SQL a count like "ERROR%".
I can't manage to do that whit the aggregation tool !
aggregation builder
Many thanks for your help.
Julien
as i can't manage to find the correct way of filtering my aggregation, i provide the results from my SQL request. By i'd be glad to find the dynamic way of filtering it.
Here is an image of a column that contains strings with "ERR". Le top field in red is my aggregation field served by a slq like '%ERR :%' statement.
example

Need help in apply kibana query

I want to know the query for kibana, i tried terms and agg, but didnt get right output, so need to filter the data based on distinct query in kibana.
I want to apply query in following input data in elasticsearch
Rows and columns
CELLID|MCC|MNC|
1222|405|861|
1222|405|861|
1222|405|122|
1233|406|861|
1233|406|861|
1224|407|777|
1224|407|777|
need to apply query such a way, it will remove the same CELL ID with different MNC, so expecting output like this
CELLID|MCC|MNC|
1233|406|861|
1233|406|861|
1224|407|777|
1224|407|777|
As you know, it's impossible to have such row vs column infrastructure within Kibana graphs as of now. Cos this feature is yet be made to the new versions as an enhancement.
But then if you're simply trying to print out the count|sum or let it be whatever the aggregation you need, you can have a Data Table visualization with a metric of count and then within your buckets you could define multiple terms-aggregation. In your case, you should have CELLID|MCC|MNC being split by terms-aggregation which should do the job for you. Hope this helps!

"Calculated columns cannot contain volatile functions like Today and Me" error message on Sharepoint

I try to add a new calculated column to sharepoint list that will show elapsed day. I enter name and write a formula like;
=ABS(ROUND(Today-Created;0))
The data type returned from this formula is: Single line of text
When I want to save I get an error like
Calculated columns cannot contain volatile functions like Today and
Me.
Calculated Column Values Only Recalculate As Needed
The values in SharePoint columns--even in calculated columns--are stored in SharePoint's underlying SQL Server database.
The calculations in calculated columns are not performed upon page load; rather, they are recalculated only whenever an item is changed (in which case the formula is recalculated just for that specific item), or whenever the column formula is changed (in which case the formula is recalculated for all items).
(As a side note, this is the reason why in SharePoint 2010 you cannot create or change a calculated column on a list that has more than the list view threshold of 5000 items; it would require a mass update of values in all those items, which could impact database performance.)
Thus, in order for calculated columns to accurately store "volatile" values like "Me" and "Today", SharePoint would need to somehow constantly recalculate those column values and continuously update the column values in the database. This simply isn't possible.
Alternatives to Calculated Columns
I suggest taking a different approach entirely instead of using a calculated column for this purpose.
Conditional Formatting: You can apply conditional formatting to highlight records that meet certain criteria. This can be done using SharePoint Designer or HTML/JavaScript.
Filtered List views: Since views of lists are queried and generated in real time, you can use volatile values in list view filters. You can set up a list view web part that only shows items where Created is equal to [Today]. Since you can place multiple list view web parts on one page, you could have one section for today's items, and another web part for all the other items, giving you a visual separation.
A workflow, timer job, or scheduled task: You can use a repeating process to set the value of a normal (non-calculated) column on a daily basis. You need to be careful with this approach to ensure good performance; you wouldn't want it to query for and update every item in the list if the list has surpassed the list view threshold, for example.
I found some conversations about this issue. Many people suggest to creating a new Date Time column, visible is false, default value is Today's Date and it will be named as Today. Then we can use this column in our formulas.
I tried this suggestion and yes error is gone and formula is accepted but calculated columns' values are wrong. I setted column Today is visible and checked, it was empty. Default value Today's Date was not working. When I looking for a solution for this issue I deleted column Today carelessly. Then I realized calculated columns' values are right.
Finally; I don't know what is trick but before using Today keyword in your formulas if you create a column named as Today and after your formula saving if you delete Today column, it is working.
UPDATE
After #Thriggle's answer I realized this approach doesn't work like a charm. Yes, formula doesn't cause an error when calculated column saving but it works correctly only first time, in the next day the calculated column shows old values, because its values are static as Thriggle explained.

How to list unique values of a particular field in Kibana

I am having a field named rpc in my elasticsearch database and I am displaying it using Kibana. When I search in search bar of kibana like:
rpc:*
It display all the values of rpc field but I want to have only those value to be displayed which are unique.
I have been playing around with Kibana4 since a couple of weeks now. I find it intuitive and simple and the experience has been great till now. Following your question, I tried getting unique results via a Data Table visualization. Why? Because I personally find it easier to understand. Following are the steps:
1. Get unique count
Create the visualization (Visualize -> Data Table). First lets get
the count of how many unique entries we have for a particular field
(We will use this in the later part for verification). I'm using
clientip.raw but as I see, it will work just fine with any friendly
field name too.
2. Set the aggregation right
Set you aggregation back to count and have a Split Rows as follows. Not doing this will give you count 1 for each field value (since it is looking for unique counts) when you populate the table. Noteworthy part is setting the Top field to 0. Because Kibana won't let you enter anything else than a digit (Obviously!). This was the tricky part. Hit Apply and you'll get the results. Unique field values and the count of each of them.
3. Verification:
Going to the last page of the table, we see there are exactly 543 results. This is how I know it works.
What Next?
You save this visualization and add it to a Dashboard. There you can always check the request, query, response and other stats.
Just an addition to the above mathakoot answer.
For the user of newer version (which do not allow bucket size of 0 anymore) just set a value greater than the maximum number of result
And report the value in the Options>Per Page field
I am using Kibana 6 so the UI looks a bit different than the older answers here.
Here is what worked for me
Create a visualization from your query, I used a line graph type (don't think it matters)
Under Data, set metrics aggregation = "Unique Count" and set field to your field.
Set x-axis aggregation = "Terms" and set field to your field.
Set Size > your number of records
Under Metrics and Axes, disable drawing of the graph, circles, and labels (this really helps the UI not lag)
Run query and then click "Inspect" and download CSV
Data
Metrics & Axes
I wanted to achieve something similar but I'm stuck with Kibana 3.1.
I simply added a panel of type "TERMS" and configured its Field = User-agent and left everything else on default values. This gave me a nice bar chart with one bar for each User-agent.

Resources