Is there an equivalent to SumoLogs LogReduce for Application Insights?

Is there an equivalent to SumoLogs LogReduce for Application Insights? - azure-application-insights

Using Azure App Insights, I'm wanting to generate statistics for controller endpoints. The catch is the URL path might look something like:
/api/v1/test/val1/statistics
/api/v1/test/val2/statistics
Where val1, val2 etc are varying to a large degree. I'm wanting to determine how many times /api/v1/test/*/statistics has been loaded (and also generate average durations, percentiles etc).
I've started with examples from Azure, such as :
requests
| summarize RequestsCount=sum(itemCount), AverageDuration=avg(duration), percentiles(duration, 50, 95, 99) by operation_Name
| order by RequestsCount desc
Have also started to split the URL by:
extend urlParts = parseurl(url)| project url, urlParts.Path|
but no luck.

I think you can leverage Parse operator for this purpose:
| parse url with * "/api/" version "/" environment "/" valueParameter "/" *
This will produce parts of the url you can then concatenate in a required way by ignoring unnecessary parameters. (Or use parameters for some other calculation..)
Alternatively, you can amend URL before it is sent from AI SDK to put "*" at the locations you'd like to ignore, then all default visualization will have the URL you'd like to see. You can do it with Telemetry Initializer or Telemetry Processor.

Related

LocustIO: How to do batch request

I started to use LocustIO for load testing a 3rd party API which provides a way to do batch requests (http://docs.oasis-open.org/odata/odata/v4.01/odata-v4.01-part1-protocol.html#sec_BatchRequests).
How can this be done using LocustIO?
I tried with the following:
def batch(self):
response = self.client.request(method="POST", url="/$batch", auth=("ABC", "DEF"), headers={"ContentType": "multipart/mixed; boundary=batch_36522ad7-fc75-4b56-8c71-56071383e77b"}, data="Content-Type: application/http\nContent-Transfer-Encoding: binary\n\nGET putyoururlhere HTTP/1.1\nAccept: application/json\n\n\n")
Auth is something I need to have authentication to the API, but that's not the point of the question and "putyoururlhere" should be replaced with the actual url. Either way, it gives errors when executing the test, so I must be doing something wrong.
People with experience how to do this?
Kind regards!

The data parameter should be your POST body (only), you cant put additional headers in it the way you did. You probably just want to add them as additional entries in the dict you pass as headers
Se the documentation for python requests library for more details. https://requests.readthedocs.io/en/master/

Kronos API v2 - including multiple query parameters in a request

I am trying to access the Time Entries object via the Kronos API v2.
The documentation says that there are two required Query Parameters: start_date and end_date.
I am able to query the endpoint including one of the parameters at a time but am not able to enter both. And, I find the documentation quite lacking.
The root of the endpoint is:
https://secure.saashr.com/ta/rest/v2/companies/{cid}/time-entries
Here are things I have tried to suffix to the above endpoint:
?start_date=2019-11-01&end_date=2019-12-01
?start_date=2019-11-01|end_date=2019-12-01
?start_date=2019-11-01 end_date=2019-12-01
?start_date=2019-11-01?end_date=2019-12-01
?start_date=2019-11-01:end_date=2019-12-01
?filter=start_date:=:2019-11-01|end_date:=:2019-12-01
I also tried including quotes around the dates.
Everything results in some 400 level error when querying the API. With most of the above suffixes, it recognizes start_date but not the end_date. In this case, the error is:
{'code': 400, 'message': 'Missing required: end_date'}]
Note, above {cid} is replaced with the company's id.
In summary, how should I include two query parameters in the Kronos API?

The first option is correct.
https://secure.saashr.com/ta/rest/v2/companies/{cid}/time-entries?start_date=2019-11-01&end_date=2019-12-01
should work just fine.
Could you provide full URL you set in request?

Java API to query CommonCrawl to populate Digital Object Identifier (DOI) Database

I am attempting to create a database of Digital Object Identifier (DOI) found on the internet.
By manually searching the CommonCrawl Index Server manually I have obtained some promising results.
However I wish to develop a programmatic solution.
This may result in my process only requiring to read the index files and not the underlying WARC data files.
The manual steps I wish to automate are these:-
1). for each CommonCrawl Currently available index collection(s):
2). I search ... "Search a url in this collection: (Wildcards -- Prefix: http://example.com/* Domain: *.example.com) " e.g. link.springer.com/*
3). this returns almost 6MB of json data that contains approx 22K unique DOIs.
How can I browse all available CommonCrawl indexes instead of searching for specific URLs?
From reading the API documentation for CommonCrawl I cannot see how I can browse all the indexes to extract all DOIs for all domains.
UPDATE
I found this example java code https://github.com/Smerity/cc-warc-examples/blob/master/src/org/commoncrawl/examples/S3ReaderTest.java
that shows how to access a common crawl dataset.
However when I run it I receive this exception
"main" org.jets3t.service.S3ServiceException: Service Error Message. -- ResponseCode: 404, ResponseStatus: Not Found, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>common-crawl/crawl-data/CC-MAIN-2016-26/segments/1466783399106.96/warc/CC-MAIN-20160624154959-00160-ip-10-164-35-72.ec2.internal.warc.gz</Key><RequestId>1FEFC14E80D871DE</RequestId><HostId>yfmhUAwkdNeGpYPWZHakSyb5rdtrlSMjuT5tVW/Pfu440jvufLuuTBPC25vIPDr4Cd5x4ruSCHQ=</HostId></Error>
In fact every file I try to read results in the same error. Why is that?
what is the correct common crawl uri's for their datasets?

The data set location has changed since more than one year, see announcement. However, many examples and libraries still contain the old pointers. You can access the index files for all crawls back to 2013 on s3://commoncrawl/cc-index/collections/CC-MAIN-YYYY-WW/indexes/cdx-00xxx.gz - replace YYYY-WW with year and week of the crawle and expand xxx to 000-299 to get all 300 index parts. New crawl data is announced on the Common Crawl group, or read more about how to access the data.

To get the example code to work replace lines 24 and 25 with:
String fn = "crawl-data/CC-MAIN-2013-48/segments/1386163035819/warc/CC-MAIN-20131204131715-00000-ip-10-33-133-15.ec2.internal.warc.gz";
S3Object f = s3s.getObject("commoncrawl", fn, null, null, null, null, null, null);
Also note that the commoncrawl group have an updated example.

Retrieve current metric value from graphite

Suppose I have a metric named a.b.c.count. I am trying to write a python script which reads the latest value of the metric a.b.c.count in graphite.
I went through the docs and figured out that we can use curl to retrieve metrics from graphite using functions http://graphite.readthedocs.org/en/0.9.13-pre1/functions.html.
But still unable to figure out how to achieve the same.

I haven't seen a way to ask Graphite for a single value, but you can ask for a summary of values over a configurable period, and take the last one. (This is just for minimizing the data returned, you could as easily pull out the last value from any series in a given timeframe.) Example render parameters:
target=summarize(a.b.c.count,'1hour','last')&from=-1h&format=json
The JSON returned will look like this:
[{"target": "summarize(a.b.c.count, \"1hour\", \"last\")",
"datapoints": [[5.1333330000000004, 1442160000],
[5.5499989999999997, 1442163600]]}]
Here is a Python snippet to retrieve and parse this, using the 'requests' HTTP library
import requests
r = requests.get("http://graphite.yourdomain.com/render/?" +
"target=summarize(a.b.c.count,'1hour','last')&from=-1h&format=json")
print r.json()[0][u'datapoints'][-1][0]

Google Analytics Realtime Sandbox Environment

I am looking for a way to setup a google analytics sandbox environment that will allow me
to test out my custom js code near real time.
My app will be using custom variables for advanced segmentation, and I would like to test out multiple scenarios quickly, as opposed to setting up a dummy GA account and wait for a whole day to confirm the test.
Thanks

Great question.
For GA, server updates occur every four hours, and after every sixth such update, the entire set is recalculated, which means a 24-hour lag from code change to reliable feedback. This delay also applies to most customizations to the GA Browser (e.g., "custom filters").
So if you are going to use GA as your web metrics system, and you expect to actually rely on those data then a test rig is essential.
For me, it's useful to group test systems for client-side analytics using two rubrics: (i) complete, self-contained (closed-loop) systems; or (ii) simpler automated data pulls from the production system (by "production system" here i mean GA's system, not the Site whose pages the GA code is tracking).
For the latter, just add this line to each page of your Site that contains the GA tracking code, just below '__trackPageview()':
pageTracker._setLocalRemoteServerMode();
That line will cause a copy of each transaction line to be logged to your server's activity log--so in essence, you get the data captured by GA in real-time That's all you need to do to capture the data; to parse it, you can use, for instance, any of the excellent open source web log analyzers like AWStats, or roll your own.
This is simple and reliable--but all it can do is tell you (in real-time) "does the analytics code i just implemented on pages served by my production server actually work?"
Usually, that's not good enough--you would rather know if your code will work before it's on your production server. To do that, you need to simulate the production environment and find a way to access in real-time the data GA collects.
This kind of test rig is a little more involved, but still not difficult.
In sum, it requires these steps:
host/serve the ga.js and the
tracking pixel locally;
log the __utm.gif requests (in the
GA data flow, each request
corresponds to one logged
transaction); and
parse the headers into some
convenient human-readable form.
If you want more detail than that (ie, a step-by-step implementation), here it is:
I. Hosting/Serving the GA Script (& automating updates
To do that, you can create a small shell script like this one to wget the latest ga.js version into your local directory (replacing the extant version it finds there).
#!/bin/sh
rm /My_Sites/sitename.com/analytics/ga.js
cd /My_Sites/sitename.com/analytics/
wget http://www.google-analytics.com/ga.js
chmod 644 /My_Sites/sitename.com/analytics/ga.js
cd ${OLDPWD}
exit 0;
(Thanks to AskApache.com, which provided the original motivation and config details to do this in a production context.)
II. Create __utm.gif file
This is just a transparent 1x1 pixel gif image, which you will place in Site directory (doesn't matter where, it just needs to match the location recited in your pages)
III. Log the __utm.gif Requests
For a testing protocol in which you are the source of the client-side activity (e.g., you want to verify the cross-browser fidelity of some event-tracking code you've added to a page on your Site, so you automate 5000 clicks on the button you just wired up,serving the page from your dev server set up for this purpose) it's probably simplest to just log the Request Headers, because it's in those headers that the GA script directs the client to gather various data from the DOM, from the location bar (url), and from prior http headers, and append them to a request for a resource on the GA server (__utm.gif, which is just a 1x1 transparent pixel).
For this type of protocol, i use the Firefox addon, LiveHTTPHeaders. You install it like any other Firefox addon, a few mouse clicks is all. Next, open it, and click the "Generator" tab. From this window, you can see the actual requests in real time. At the bottom of the window is a 'save' button to store the log. I find it easier to configure LiveHTTPHeaders to log only the __utm.gif requests; to do that, just click the 'Edit' tab and create a siimple filter to exclude everything except these particular gif images (using the check boxes on the right, and the large text box to the right).
Other kinds of test protocols require you to work from your Server Activity Logs; in that case just add this line to each page of your Site, just below __trackPageview():
pageTracker._setLocalRemoteServerMode();
IV. Parse those logged requests so you can actually read them
So now your log will contain individual transction lines, each one of which is a string appended to an HTTP Request for the GA tracking pixel. This string is just a concatenation of key-value pairs, each key begins with the letters "utm" (probably for "urchin tracker"). Each of these parameters corresponds to a variable that you see in the GA Dashboard (here's a complete list and description of them). This is all you need to know to build a parser. In more detail:
First, here's a sanitized __utm.gif request (the entries in your LiveHTTPHeaders log):
http://www.google-analytics.com/__utm.gif?utmwv=1&utmn=1669045322&utmcs=UTF-8&utmsr=1280x800&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.0%20r45&utmcn=1&utmdt=Position%20Listings%20%7C%20Linden%20Lab&utmhn=lindenlab.hrmdirect.com&utmr=http://lindenlab.com/employment&utmp=/employment/openings.php?sort=da&&utmac=UA-XXXXXX-X&utmcc=__utma%3D87045125.1669045322.1274256051.1274256051.1274256051.1%3B%2B__utmb%3D87045125%3B%2B__utmc%3D87045125%3B%2B__utmz%3D87045125.1274256051.1.1.utmccn%3D(referral)%7Cutmcsr%3Dlindenlab.com%7Cutmcct%3D%2Femployment%7Cutmcmd%3Dreferral%3B%2B
This is my parser (in Python):
# regular expression module imported
import re
pattern = r'\&{1,2}'
pat_obj = re.compile(pattern)
# splitting the gif request on the '&' character
# (which GA originally used to concatenate each piece to build the request)
# (here, i've bound the __utm.gif to the variable by 'gfx')
gfx1 = pat_obj.split(gfx)
# create a look-up table to map a descriptive name to each gif request parameter
# (note, this isn't the entire list, which i've linked to above)
keys = "utmje utmsc utmsr utmac utmcc utmcn utmcr utmcs utmdt utme utmfl utmhn utmn utmp utmr utmul utmwv"
values = "java_enabled screen_color_depth screen_resolution account_string cookies campaign_session_new repeat_campaign_visit language_encoding page_title event_tracking_data flash_version host_name GIF_req_unique_id page_request referral_url browser_language gatc_version"
keys = keys.strip().split()
#create the look-up table
GIF_REQUEST_PARAMS = dict(zip(keys, values))
# parse each request parameter and map the parameter name to a descriptive name:
pattern = r'(utm\w{1,2})=(.*?)$'
pat_obj = re.compile(pattern)
for itm in gfx1 :
m = pat_obj.search(itm)
if m :
fmt = '{0:25} {1:10}'
print( fmt.format( GIF_REQUEST_PARAMS[m.group(1)], m.group(2) ) )
The result looks like this:
gatc_version              1         
GIF_req_unique_id         1669045322
language_encoding         UTF-8     
screen_resolution         1280x800  
screen_color_depth        24-bit    
browser_language          en-us     
java_enabled              1         
flash_version             10.0%20r45
campaign_session_new      1         
page_title                Position%20Listings%20%7C%20Linden%20Lab
host_name                 lindenlab.hrmdirect.com
referral_url              http://lindenlab.com/employment
page_request              /employment/openings.php?sort=da
account_string            UA-XXXXXX-X
cookies
To avoid making this longer still, i left out the cookies' value. They obviously require a separate parsing step, though it's virtually identical to the step i just showed. Again, each request represents a single transaction, so you can store them as you need to.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Is there an equivalent to SumoLogs LogReduce for Application Insights? - azure-application-insights

Related

LocustIO: How to do batch request

Kronos API v2 - including multiple query parameters in a request

Java API to query CommonCrawl to populate Digital Object Identifier (DOI) Database

Retrieve current metric value from graphite

Google Analytics Realtime Sandbox Environment

Categories

Resources