I'm running into an issue that I'm hoping someone might know how to resolve. In the reporting api, I can get information based on a sku. Something like so:
{
'reportRequests': [{
'viewId': VIEW_ID,
'dateRanges': [{'startDate': '7daysAgo', 'endDate': 'today'}],
'metrics': [
{"expression": "ga:productDetailViews"},
{"expression": "ga:itemQuantity"},
{"expression": "ga:uniquePurchases"},
{"expression": "ga:itemRevenue"}
],
'dimensions': [{'name':'ga:productSku'}]
}
]
}
This is great for getting the data tied to a sku, but I was wondering if there is a way to get additional product information for each sku rather than metric information. For example, given a set of skus, I'd like to see the category hierarchy (section, category, subcategory) as well as the product name:
SKU, Name, Section,Category, Subcategory
1, Blah, Shoes, Basketball,
2, Another,Shoes, Soccer, Mens
...
Given that information like this is all correlated on a page when sent from a website, I don't see why this shouldn't be possible.
ga('ec:addProduct', {
'id': '1',
'name': 'Blah',
'category': 'Shoes/Basketball',
});
Adobe has something similar with their Saint Classifications and I was hoping it would be something possible to get out of GA. I can't seem to find anything like this across any of the apis, so am hoping someone can point me in the right direction!
As you look to have specified categories with the correct '/' delimiter you can use the dimension ga:productCategoryLevelXX, you do need to have at least one metric in your request but you could do something like
{
'reportRequests': [{
'viewId': VIEW_ID,
'dateRanges': [{'startDate': '7daysAgo', 'endDate': 'today'}],
'metrics': [{"expression": "ga:productDetailViews"}],
'dimensions': [
{"expression": "ga:productSku"},
{"expression": "ga:productCategoryLevel1"},
{"expression": "ga:productCategoryLevel2"},
{"expression": "ga:productCategoryLevel3"}
]
]
}
Related
I am using the Analytics Reporting API v4 to get a report with this body:
{
'viewId': XXX,
'dateRanges': {
'startDate': "2021-03-16",
'endDate': "2021-04-14"
},
'metrics': [
"ga:itemQuantity", "ga:itemRevenue"
],
'samplingLevel': 'LARGE',
'pageSize': 100000,
'hideTotals': False,
'dimensions': ["ga:source", "ga:date", "ga:dimension4"]
}
If I compare the total itemRevenue with what I have on Analytics interface in the "Product performance" report, I have the same results for the whole period but if I check on a subperiod I have some different numbers.
I did the same test with a start_date="2021-04-09" and end_date="2021-04-15" and this time I always have different results. The missing rows always match ga:source=affiliation.
If I remove the ga:dimension4 (corresponding to userType in the interface), my results are always the same.
Could someone explain why my results are not consistent between the API and the interface?
I have this relationship:
person --likes--> subject
This is my query:
g.V().
hasLabel('person').
has('name', 'Joe').
outE('likes').
range(0, 2).
union(identity(), inV().hasLabel('subject')).
valueMap('rating', 'name').
At this point, I get result that looks like this:
[
{
"rating": 3.236155563
},
{
"rating": 3.162886797
},
{
"name": "math"
},
{
"name": "history"
}
]
I'd like to get something like this:
[
{
"rating": 3.236155563,
"name": "math"
},
{
"rating": 3.162886797,
"name": "history"
},
]
I've tried grouping the results - which gives me the structure I want - but because of the identical keys, I only get 1 set of results back.
It always helps when you post the code to create the graph so we can give you a tested answer. Like so
g.addV('person').property('name', 'P1').as('p1').
addV('subject').property('name', 'Math').as('math').
addV('subject').property('name', 'History').as('history').
addV('subject').property('name', 'Geography').as('geography').
addE('likes').from('p1').to('math').property('rating', 1.2).
addE('likes').from('p1').to('history').property('rating', 2.3).
addE('likes').from('p1').to('geography').property('rating', 3.4)
I believe you are trying to write a traversal that starts from a certain person, go out along the first two "likes" edges and get the names of the subjects that he likes and the rating on the corresponding "likes" edge.
g.V().has('person', 'name', 'P1').
outE('likes').
range(0, 2).
project('SubjectName', 'Rating').
by(inV().values('name')).
by(values('rating'))
Here is a sample query that I created to fetch Google Analytics data:
response = service.reports().batchGet(
body={
'reportRequests': [
{
'viewId': 'xxxx',
'dateRanges': [{'startDate': '2021-01-14', 'endDate': '2021-01-15'}],
'metrics': [
{'expression': 'ga:pageViews'},
{'expression': 'ga:sessions'},
{'expression': 'ga:itemRevenue'},
{'expression': 'ga:hits'},
{'expression': 'ga:sessionDuration'},
],
# Get Pages
'dimensions': [
{"name": "ga:clientId"},
{"name": "ga:pagePath"},
{"name": "ga:dateHourMinute"},
{"name": "ga:shoppingStage"},
{"name": "ga:source"},
{"name": "ga:campaign"},
],
# Filter by condition
"filtersExpression": "ga:clientId==yyyy.zzzz",
'orderBys': [{"fieldName": "ga:dateHourMinute", "sortOrder": "DESCENDING"}],
'pageSize': 500
}]
}
).execute()
Sample response:
{'dimensions': ['yyyy.zzzz',
'/products/pants-green?variant=456456456',
'202101142347',
'ALL_VISITS',
'newsletter',
'2021_01-pre-sale',
'282'],
'metrics': [{'values': ['0',
'0',
'0.0',
'1',
'0.0']}]},
Is it possible to define alternate naming for the dimensions in the response within the query itself, e.g.
strip the variant part from the page path with regex,
change the wording for "ga:shoppingStage" from ALL_VISITS to something else?
Or is this something which needs to be done in post-processing?
The dimensions and metrics are standard within Google analytics. The response returned to you from the API is simply the name of the dimensions and metrics from the API.
Even if you have your own custom dimensiosn and metrics set up the API is still just going to return it with the name ga:dimensionXX
If you want to change the names your going to have to do that locally after the data is returned to you.
I have discrepancies in the revenue metric, between the data I collect from the Google Analytics API and the custom reports in the user interface.
The discrepancies for each value maintain the same rate, where the data collected through the API is greater than the data in the custom reports.
This is the body of the request I'm using:
{
"reportRequests":[
{
"viewId":"xxxxxxxxxx",
"dateRanges": [{"startDate":"2017-07-01","endDate":"2018-12-31"}],
"metrics": [
{"expression": "ga:transactionRevenue","alias": "transactionRevenue","formattingType": "CURRENCY"},
{"expression": "ga:itemRevenue","alias": "itemRevenue","formattingType": "CURRENCY"},
{"expression": "ga:productRevenuePerPurchase","alias": "productRevenuePerPurchase","formattingType": "CURRENCY"}
],
"dimensions": [
{"name": "ga:channelGrouping"},
{"name": "ga:sourceMedium"},
{"name": "ga:dateHour"},
{"name": "ga:transactionId"},
{"name": "ga:keyWord"}
],
"pageSize": "10000"
}]}
This is an extract of the response:
{{
"reports": [
{
"columnHeader": {
"dimensions": [
"ga:channelGrouping",
"ga:sourceMedium",
"ga:dateHour",
"ga:transactionId",
"ga:keyWord"
],
"metricHeader": {
"metricHeaderEntries": [
{
"name": "transactionRevenue",
"type": "CURRENCY"
},
{
"name": "itemRevenue",
"type": "CURRENCY"
},
{
"name": "productRevenuePerPurchase",
"type": "CURRENCY"
}
]
}
},
"data": {
"rows": [
{
"dimensions": [
"(Other)",
"bing / (not set)",
"2018052216",
"834042319461-01",
"(not set)"
],
"metrics": [
{
"values": [
"367.675436",
"316.55053699999996",
"316.55053699999996"
]
}
]
},
...
So, if I create a custom report in the Google Analytics user interface and look for the transaction ID 834042319461-01, I get the following result:
google Analytics custom report filtered by transaction id 834042319461-01
In the end I have a revenue value of 367.675436 in the API response, but a value of 333.12 in the custom report, its a 10.37% more in the value of the API. I get this 10.37% increase for all values.
¿Why I'm having these discrepance?
¿What would you recomend to do in order to solve these problem?
Thanks.
My bet is that you're experiencing sampling (is your time range in the UI lower than in the API?): https://support.google.com/analytics/answer/2637192?hl=en
Sampling applies when:
you customize the reports
the number of sessions for the overall time range of the report (whether or not your query returns less sessions) exceeds 500K (GA) or 100M (GA 360)
The consequence is that:
the report will be based on a subset of the data (the % depends on the total number of sessions)
therefore your report data won't be as accurate as usual
What you can do to reduce sampling:
increase sample size (will only decrease sampling to a certain extend, but in most cases won't completely remove sampling). In UI it's done via the option at the top of the report, in the API it's done using the samplingLevel option
reduce time range
create filtered views so your reports contain the data you need without needed to customize reports
Because you are looking at a particular transaction ID, this might not be a sampling issue.
If the ratio is consistent, from your question it seems to be 10.37%. I believe this is the case of currency that you are using.
Try using local currency metric API calls when making monetary based calls.
For example -
ga:localTransactionRevenue instead of ga:transactionRevenue
Using segments, I am building an app that lets user enter a URL and check different metrics for it.
I see some significant differences between reports pulled with the API and report generated in the UI. Maybe I am misunderstanding something about segments.
For example, I have a segment designed to show only users that went on a specific page.
which map to a small fraction of my users:
When looking at the user age brackets, I see small numbers scattered across all categories.
Now to run the equivalent report in the API, I am using the payload below.
{
"reportRequests": [
{
"viewId": "#####",
"dateRanges": [
{ "startDate": "2017-03-01",
"endDate": "2017-04-27" }
],
"metrics": [
{"expression": "ga:pageviews"},
{"expression": "ga:sessions"},
{"expression": "ga:users"}
],
"segments": [
{
"dynamicSegment": {
"name": "Users of /apath/ofinterest/",
"userSegment": {
"segmentFilters": [
{
"simpleSegment": {
"orFiltersForSegment": {
"segmentFilterClauses": [
{
"dimensionFilter": {
"dimensionName": "ga:pagePath",
"operator": "EXACT",
"expressions": [
"/apath/ofinterest/"
]}}]}}}]}}}
],
"dimensions": [
{ "name": "ga:userAgeBracket" },
{ "name": "ga:segment" }
]
}
]
}
Which yield completely different results:
Only 2 age brackets, and weirdly, the same number of users in each (I tried with different time frame with the same behaviour).
Any ideas on what could be wrong? Could it be something in the settings of the segment? Related to "Sessions/User Include"?
Or could this warning below that I see in the UI have different impact in the UI and the API?
According to this comment, it sounds like numbers might be calculated differently for the API and the UI. Is that still the case?
Thanks a lot!
GA UI data is picked from the pre-aggregated tables.
Pre-calculated data -- pre-aggregated tables
These are the precalculated data that Google uses to speed up the UI. Google does not specify when this is done but it can be at any point of the time. These are known as pre-aggregated tables
So if you compare the numbers from GA UI to your GA API data, you will always see a discrepancy because the view might be aggregated at some early stages of the day, while your api data is fresh