OpenTSDB - Get the last data point for a metric timeseries before a given timestamp - opentsdb

Use case
I want to find the last data point for a metric timeseries just before or on a given timestamp from OpenTSDB.
So far I tried
api/query/last HTTP API
It did provide the last data point for a metric after changing opentsdb config and adding
tsd.core.meta.enable_realtime_uid = true
tsd.core.meta.enable_realtime_ts = true
tsd.core.meta.enable_tsuid_incrementing = true
tsd.core.meta.enable_tsuid_tracking = true
I there any way to query it with a timestamp, or any other api/way to get the desired result.

Related

Querying Assisted Conversions with a Conversion Segment in R

I am using the googleAnalyticsR package. Since Google Analytics accounts all have different configurations, a MWE isn't possible, but I'll at least provide the relevant code, which queries from the v3 API:
library(googleAnalyticsR)
ga_auth()
google_analytics(id = ga_id,
start = start_date,
end = end_date,
metrics = c("assistedConversions", "assistedValue",
"lastInteractionConversions", "lastInteractionValue"),
dimensions = c("nthDay"),
samplingLevel = "WALK",
segment = "sessions::condition::ga:landingPagePath=#/some_string",
type = "mcf")
The variables ga_id, start_date, and end_date are numeric, string, and string respectively, with the dates in the form YYYY-MM-DD.
I am trying to execute the query above, which pulls Assisted Conversions, the values of these Assisted Conversions, Last Interaction Conversions, and the values of these Last Interaction Conversions from the Multi-Channel funnel.
The main problem is the segment argument above. When I run this query, it pulls everything for the entire website and does not appear to be applying the segment. If I were to create a conversion segment under Conversions > Assisted Conversions > Conversion Segments, when I run
ga_segment_list() %>% .$items %>% as.data.frame() %>% arrange(name)
the segment I created doesn't even show up. I also have the same segment under Admin > Segments, and using gaid:: with the segment_id does not work either.
Is there any way to me to pull these data that I seek? I'm open to using a different R package for this if there's one available.

Google Analytics query for sessions

I am trying to analyze visits to purchase in google analytics through r.
Here is the code
query.list<-Init(start.date = "2016-07-01",
end.date = "2016-08-01",
dimensions = c("ga:daysToTransaction","ga:sessionsToTransaction"),
metrics = c("ga:transaction"),
sort = c("ga:date"),
table.id = "ga:104454195")
I have this code which shows error as
Error in ParseDataFeedJSON(GA.Data) :
code : 400 Reason : Sort key ga:date is not a dimension or metric in this query.
Can you help me to get this desired output
Days to Transaction Transaction %total
0 44 50%
1 11 20%
2-5 22 30%
You are trying to sort your results based on a dimension, which is not included in your result set. You have ga:daysToTransaction and ga:sessionsToTransactions dimensions, and you have tried to apply a sort based on ga:date.
You'll need to use this for sorting:
sort = c("ga:daysToTransaction")
It is not clear for me, if you'll use ga:sessionsToTransactions in an other part of your script, as it'll add an other breakdown compared to your desired output, which needs to be aggregated later to get your expected results.
Also, will you calculate %total in an other part of the script, or you expect it to be returned as part of Analytics response? (About which I'm not sure, if it's possible in GA API or not.)

Using latitude and longitude to generate timezone

I have about 9 million records with latitude and longitude, in addition to the timestamp in EST. I want to use the latitude and longitude to generate the appropriate regional timezone, from which I then want to adjust all these times for the relevant timezone.
I have tried using geonames
data_raw$tz <- mapply(GNtimezone, data$lat,data$lon)
However, this returns the following error:
Error in getJson("timezoneJSON", list(lat = lat, lng = lng, radius = 0)) :
error code 13 from server: ERROR: canceling statement due to statement timeout
I have tried to use a method described in this post.
data$tz_url <- sprintf("https://maps.googleapis.com/maps/api/timezone/%s?location=%s,%s&timestamp=%d&sensor=%s",
"xml",
data$lat,
data$lon,
as.numeric(data$time),
"false")
for(i in 1:100){
data$tz[i] <- xmlParse(readLines(data$tz_url[i]), isURL=TRUE)[["string(//time_zone_name)"]]
}
With this method, I am able to get the urls for the XML data. But when I try to pull the XML data in a for loop, and append the timezone to the dataframe, it doesn't do it for all the records... (in fact, only 10 records at a time intermittently).
Does anyone know of any alternate methods or packages to get the three character timezone (i.e. EST) for about 9 million records relatively quickly? Your help is much appreciated. Or better yet, if you have ideas on why the code I used above isn't working, I'd appreciate that too.
For a list of methods of converting latitude and longitude to time zone, see this post. These mechanisms will return the IANA/Olson time zone identifier, such as America/Los_Angeles.
However, you certainly don't want to make 9 million individual HTTP calls. You should attempt to group the records to distinct locations to minimize the number of lookups. If they are truly random, then you will still have a large number of locations, so you should consider the offline mechanisms described in the previous post (i.e. using the tz_world shapefile with some sort of geospatial lookup mechanism).
Once you have the IANA/Olson time zone identifier for the location, you can then use R's time zone functionality (as.POSIXct, format, etc.) with each of corresponding timestamp to obtain the abbreviation.
However, do recognize that time zone abbreviations themselves can be somewhat ambiguous. They are useful for human readability, but not much else.
I've written the package googleway to access google maps API. You'll need a valid API key (and, for Google to handle 9 million calls you'll have to pay for it as their free one only covers 2500)
library(googleway)
key <- "your_api_key"
google_timezone(location = c(-37, 144),
key = key)
$dstOffset
[1] 0
$rawOffset
[1] 36000
$status
[1] "OK"
$timeZoneId
[1] "Australia/Hobart"
$timeZoneName
[1] "Australian Eastern Standard Time"

graphite summarize function not working as expected

I am feeding data into a metric, let say it is "local.junk". What I send is just that metric, a 1 for the value and the timestamp
local.junk 1 1394724217
Where the timestamp changes of course. I want to graph the total number of these instances over a period of time so I used
summarize(local.junk, "1min")
Then I went and made some data entries, I expected to see the number of requests that it received in each minute but it always just shows the line at 1. If I summarize over a longer period like 5 mins, It is showing me some random number... I tried 10 requests and I see the graph at like 4 or 5. Am I loading the data wrong? Or using the summarize function wrong?
The method summarize() just sums up your data values so co-relate and verify that you indeed are sending correct values.
Also, to localize weather the function or data has issues, you can run it on metricsReceived:
summarize(carbon.agents.ip-10-0-0-1-a.metricsReceived,"1hour")
Which version of Grahite are you running?
You may want to check your carbon aggregator settings. By default carbon aggregates data for every 10 seconds. Without adding any entry in aggregation-rules.conf, Graphite only saves last metric it receives in the 10second duration.
You are seeing above problem because of that behaviour. You need to add an entry for your metric in the aggregation-rules.conf with sum method like this
local.junk (10) = sum local.junk

Transform for graphite counter

I'm using the incr function from the python statsd client. The key I'm sending for the name is registered in graphite but it shows up as a flat line on the graph. What filters or transforms do I need to apply to get the rate of the increments over time? I've tried an apply function > transform > integral and an apply function > special > aggregate by sum but no success yet.
Your requested function is "Summarize" - see it over here: http://graphite.readthedocs.org/en/latest/functions.html
In order to the totals over time just use the summarize functions with the "alignToFrom =
true".
For example:
You can use the following metric for 1 day period:
summarize(stats_counts.your.metrics.path,"1d","sum",true)
See graphite summarize datapoints
for more details.
The data is there, it just needs hundreds of counts before you start to be able to see it on the graph. Taking the integral also works and shows number of cumulative hits over time, have had to multiple it by x100 to get approximately the correct value.

Resources