I am tracking my url hit counts and want to aggregate them.
I have a few URL as follows:
example.com/service/{uuid}
when I view in Kibana it lists out the total hit count of each URL individually so my table has something like:
example.com/homepage 100 count
example.com/service/uuid1 10 count
example.com/service/uuid2 5 count
Is there an easy way to combine all uuids into 1 entry?
I was thinking of replacing uuids with a static string, however the admins blocked regex support making the replacement very difficult. So I am trying to see if there is any other way before doing that.
Thanks!
I would suggest to create a new field with scripted fields.
The new field would return value: example.com/service/uuid if the url contains the word uuid. Otherwise it will return the url as it is.
Then you could do the aggregation on the new field.
Related
I'm trying to group all the documents based on an element value. Through X-Query, I'm able to get the element value and its corresponding count. But, with Java API I'm not able to do that.
X-Query:
for $name in distinct-values(doc()/document/<element_name>)
return fn:concat("Element Value:",$name,", Count:",fn:count(doc()/document/[element_name eq $name]));
Output:
Element Value:A, Count:100
Element Value:B, Count:200
Java:
QueryManager qryMgr = client.newQueryManager();
StructuredQueryBuilder qb = new StructuredQueryBuilder();
StructuredQueryDefinition querydef = qb.containerQuery(qb.element("<element_name>"), qb.term("A"));
SearchHandle handle = new SearchHandle();
qryMgr.search(querydef, handle);
System.out.println(handle.getTotalResults());
By this method, I'm able to get the document count only for a particular value. Is there any way to get the count of all documents. Kindly Help!
If I understand your use case, you can use a range index to solve this problem, which is - you want to know what all the values are for a particular element, and then how many documents have that value. That's exactly what a range index is for.
Try adding a range index on "element_name" - you can use the ML Admin app for that - go to your database and click on Element Range Indexes.
In XQuery, you can then do something like this:
for $val in cts:element-values(xs:QName("element_name"))
return text{$val, cts:frequency($val)}
With the Java Client, you can do the same by adding a range-based constraint to a search options file, and then the response from SearchManager will have all of the values and frequencies in it that match your query. Check the REST API docs for constructing such a search options file.
I got latest Kibana 5.4.0 and Docs says:
https://www.elastic.co/guide/en/kibana/current/index-patterns.html#settings-create-pattern
To use an event time in an index name, enclose the static text in the pattern and specify the date format using the tokens described in the following table.
For example, [logstash-]YYYY.MM.DD matches all indices whose names have a timestamp of the form YYYY.MM.DD appended to the prefix logstash-, such as logstash-2015.01.31 and logstash-2015-02-01.
When I try to create pattern [testx_]YYYY-MM-DD_HH-mm or [testx_]YYYY-MM-DD_HH or [testx_]YYYY-MM-DD Kibana can't find #timstamp field and says that none of indexes match these patterns.
GET _cat/indices
yellow open testx_2017-06-19_14 dHAfSzAuSEKpYLuA8p5EIw 1 1 1 0 4.6kb 4.6k
yellow open testx_2017-06-19_13-59 hfGkELCsSUavaX8GuLPuMQ 1 1 1 0 4.6kb 4.6kb
yellow open testx_2017-06-19 lbsdW18cSIuZ2bNn1Fw7WA 1 1 1 0 4.6kb 4.6kb
On other hand for testx_* pattern Kibana finds #timestamp field and matches 100% of indexes...
Do latest Kibana support time based names for indexes?
I would like to gain performance benefits from index naming schema if it's still appropriate...
UPDATE
What is wrong:
Some warnings:
UPDATE 2 I found https://www.elastic.co/blog/managing-time-based-indices-efficiently which promote "Rollover Pattern". Maintaining date/time in index name is no longer a recommended way, but I doubt that new API makes life easier ((
According to these issues:
https://github.com/elastic/kibana/issues/5447 - Default Logstash index pattern should be "[logstash-]YYYY.MM.DD", not "logstash-*"
Kibana 4.3.0 should address this for you: it automatically optimizes wildcard index patterns such as logstash-* in the same way that you could previously only achieve by manually configuring a time-based index pattern name that matches your underlying indexing scheme (e.g. [logstash-]YYYY.MM.DD).
https://github.com/elastic/kibana/issues/4342 - Efficiently search against wildcard indices regardless of underlying indexing strategy
Elasticsearch 1.6 introduced the _field_stats API which will, for the first time, allow us to search for indices that contain fields within a given range. For example, we can search for indices that contain an #timestamp between X and Y.
This means that users will no longer be required to roll their indices at UTC midnight, nor use date patterns at all. They can effectively name indices whatever they want. and Kibana can automatically optimize requests by firing a pre-flight request for indices. We might need to add some caching here, but it should greatly enhance usability.
There is no need for time based names for performance but keeping time based index names still useful for archiving old indexes.
UPDATE Created issue to remove time based pattern from docs. https://github.com/elastic/kibana/issues/12406
Elasticsearch in previous version was allowing auto addition of fields like #timestamp.
https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_50_mapping_changes.html
So indices don't contain time based events, or in other term no field having a datetime field.
I am dumping json logs directly to elasticsearch and adding a timestamp before adding to elasticsearch. So while creating index I select the timestamp field I have defined.
I would like to filter certain sources and mediums (specifically email clients). I need to keep the dimension as one column (I use the maximum number of dimensions - 7).
The filter works fine when I have only one sourceMedium such as:
ga:sourceMedium!=amail.centrum.cz / referral
Filter doesn’t work at all when I use two sourceMedium:
ga:sourceMedium!=amail.centrum.cz / referral,ga:sourceMedium!=mail.google.com / referral
It doesn’t matter If I use AND / OR, the query doesn’t output the desired data.
I assume that there supposed to be some delimiter which could identify that amail.centrum.cz is one string which is delimited by another one. I already tried to use ' at the beginning and at the end of the string, but it seems that it doesn’t work.
Is there anything that I missed in docs or anything else? Looking for your help :)
BTW: I'm aware of the solution: Pull out data from GA, filter data manually (compare output data vs my list of email clients what I would like to exclude)
I have found a similar question earlier here:
Google Analytics Visitors Flow: grouping URLs?
However I'm confused because people suggest different way to write the Replace String, and either way I try it am not able to make it work.
So I have a ecommerce site with hundreds of different pages. The different parts of the website is:
http://example.com/sv/ (Root)
http://example.com/sv/category/1-name/
http://example.com/sv/product/1-name/
http://example.com/sv/designer-tool/1-name/
http://example.com/sv/checkout/
When I go to the visitors flow. I want to see the amount of people that go from example Root to Category, and from Category to Product, and from Product to Designer Tool, and from Designer Tool to Checkout. However now when I have so many different pages it becomes very difficult to follow the visitors flow, because the product pages are for example not grouped together.
So instead of above. I would like to remove the 1-name/ part in the end. And only see /sv/category/, /sv/product/, /sv/designer-tool/.
In the earlier post I understand you can use an advanced filter to do this. I have set the following settings:
Type: Search & Replace
Field: Request URI
Search String: ^/(category|product|designer-tool)(/\d*)(.*)
Replace String: /$A1$A3
I guess that my search string and my replace string is wrong. Any ideas?
EDIT: I updated my filter to the following:
Search String: ^/sv/(category|product|designer-tool)(/\d*)(.*)$
Replace String: /sv/\1/
Still testing and unsure if it's the correct way to set it up.
I was able to solve this by the Search String and the Replace String in my edit above.
So basically what I did was:
Create a secondary view/profile for your site. If you apply your filter to your one and only view/profile that means that you won't be able to see any detailed data about specific pages, because the filter removes/filter that.
Add an Advanced Filter with the following settings:
Type: Search & Replace
Field: Request URI
Search String: ^/sv/(category|product|designer-tool)(/\d*)(.*)$
Replace String: /sv/\1/
You need to wait 24h after creating your new profile/view before you can see any data in it.
So my confusion was regarding the Search and Replace String. The Search String is an regular expression for matching everything after your .tld. So for example http://www.example.com/sv/mypage/1-post/, the Search String will only search within /sv/mypage/1-post/.
The Replace String is what it should replace the whole Search String with. So in my case, I matched all URL's that had /sv/category/1-string/. I wanted only to keep the "category" part, so I replaced the whole string with /sv/category/ by inputting Replace String /sv/\1/
/sv/ means just what it says. \1 means that it should take the value of the first () of my Search String (In this case "category"). The ending / is just an ending slash.
All in all, it means that any URLs that looked like http://example.com/sv/category/1-string/ was changed to http://example.com/sv/category/. Meaning that I can now see data for all my categories as a group, instead of individual pages.
I am using custom variables to track order ids. In order to aggregate analytics data into out data warehouse, I want to select a number of metrics with the custom variable as a dimension. However, if I do so, I will not get the entries where the variable is not set (E.g. sessions that didn't result in a sale). I need to get these as well.
Can I write a filter or segment that selects only the entries that doesn't have a particular custom variable? I have tried:
segment=dynamic::ga:customVarValue1==
But that doesn't seem to work (It gives no results back).
Basically I'm looking for the equivalent to where ga:customVarValue1 is null in sql.
In short, it's not possible to get the nullset data, as explained by a Google rep:
For some dimensions, GA uses the default value of (not set).
Custom Variable do not have a default value, so if a hit does not have a
custom variable associated with it, all the other dimensions in the query
are not added to the reports.
The original answer is a little confusing, but when you read between the lines it suggests that they throw out these "empty" values when they run their aggregates.
The "correct" approach, as he explains, is to set a default value for any row you want reported:
If you need to see the (not set) value, you could try sending a default
value for custom variables.
For example if you use visitor level custom vars to track member vs
non-member, you should always set non-member as a default for everybody;
then modify to member once they register.
Details are here: http://groups.google.com/group/google-analytics-data-export-api/browse_thread/thread/cd078ddb26ca18d5?pli=1
I've just had some success solving this by using a Regex to capture users or sessions where the custom dimension has no value. In my case I want to separate logged in and logged out users.
The Regex .+ will capture any non-empty value, so can be used to get the job done.
The filters for my Returning users segment is matches regex: .+ like this:
For the Customer prospects I used: does not match regex: .+ like this:
It's early days, but this appears to be working: