URL Manipulation With Google Analytics Advanced Filters - google-analytics

In Google Analytics, I have a view for a web site in which I'm trying to use Advanced filters to codify a transformation on the "Request URI" field:
if the Request URI matches "/product/[productid]/someproductscreen" then
I want to strip "/[productid]" from the Request URI so I can combine all
visits to /someproductscreen across all products
all Request URIs that do not match the pattern above should be passed
into the view unmodified
When I view the traffic in the Site Content..All Pages report, I don't want to see any values of "/[productid]" in the URIs in the "Page" column - I'd like all visits to a particular product page to roll up under a URI like "/product/warranty" or "/product/description".
Unfortunately I find it difficult to try figuring this out on my own because of the lag in seeing results in Google Analytics after making a change combined with my shaky grasp of how regular expressions are utilized in Advanced Filters.

GA Advanced Filter
Assuming your [product id] was 3 or more consecutive digits, ie: /product/123456789/someproductscreen then this would work:
Advanced Filter
Field A: Request URI: ^/product/\d{3,}(.*)
Field B:
Output to: Request URI: /product/{id}$A1
Check Field A Required and Override Output Field
The above configuration will rewrite the Request URI from:
/product/123456789/someproductscreen
/product/12345
/some/other/url
to:
/product/{id}/someproductscreen
/product/{id}
/some/other/url
You mention you'd want to see /product/warranty. This would obscure the edit. My suggestion is to leave a placeholder with the edit. I use {id} but it could be any string, ie. <product id>
Level Up the Regex
Link to regex101 example
Regular Expressions are used by GA Filters, in the above example we used regex to match a product ID that is all digits. We did this using the regular expression:
^(/.*/)(\d{3,})(.*)
This is true when Request URI has root folder (/.*/) followed by three or more digits: (\d{3,}) Finally, we capture the remainder of the URI using (.*). We used groups so we can access the values in a later step.
GA Advanced Filters can persist groups extractions from Field A and Field B. We use this feature to rebuild a Request URI using the Output To -> Constructor. Below is an example of condensing dynamic Ids to a static string:
$A1{id}$A3
$A1 will extract 1st group from Field A. $A3 would extract the third group from Field A if it were to exist. {id} is a static string that is a placeholder for the dynamic value.
If your product id was a mix of alphanumeric, then we'd simply need to find a pattern that matched. You didn't provide any examples of ID, so here are a few examples of common ID patterns found in URLs:
[A-Z]-\d+ // matches Z-764537389
\d{4}-\d{3}-\d{2} // matches 1234-123-12
Easy mode right? What about if you have a RFC4122 compliant UUID in the URL you need match? No problem:
[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}
// matches 0df98a02-c438-4c57-8d1c-2f6041804e2c
Note: GA Advanced Filter Regex is case insensitive by default, this can be overridden in the filter settings.

Here https://regex101.com/r/kRUJnU/1
Start playing with this tool it ll become really helpful on the future since personalized filters with regex matching and capturing groups are REALLY important in GA.
EDIT: How to go from regex101 to GA.
In the image below you can see how i deleted the last part of URLs when they are something like:
www.mysite.com/vuelos/carrito/checkout/46787654567898765
Or something like:
www.mysite.com/vuelos/carrito/46787654567898765

Related

How does one remove numeric id ending for restful URL in Google Analytics?

In Google Analytics, we track urls such as
/app/person/1
/app/person/78
or
/app/person/pet/456
/app/person/pet/65
And we'd like to remove the final Identifier so we can report on the page itself (i.e. /app/person and /app/person/pet)
Is there a way to do this? Thanks!
2 ways:
Search/Replace Filters: this will overwrite the original URLs with the non-numeric ones (if you ever want to know the exact original URL you're screwed).
Content Grouping: this will retain the original URLs, but will create a separate reporting option whereby you can switch from original URLs to your content group (see below example showing the page type, but you can really do whatever you want with content groups, including non-numeric IDs):
In both cases, you'll need a regular expression to handle the conversion: ([^0-9]+)/[0-9]+ should work (note that with GA's regular expression engine you do not need to escape /)
Accordingly the content grouping setup would look like:

Google Analytics doesnt apply my filter

I created a filter on my account.
This filter is a custom filter, search and replace.
I use
"Request URI" for Filter Field,
\?.* for Search String
I also attached this filter to my specific view.
My problem is, if I go to the view->Reporting->Behavior->Site Content->All Pages, I see that the filter is not applied. I see pages such as "/xy.html?id=12345".
I would expect "/xy.html" only. Somewhere I've read that filters are not works for past data, but I did some test visits after I applied the filter and the urls wasn't changed :(
If I click on verify, I get this message: "This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small."
Your filter definition should use regular expressions for search&replace.
Search String: (.)?(\?.)
Replace String: \1
This will search for two parts: 1. all symbols before the very first "?" 2. all symbols after the first "?" in your URI.
The replacement will use the first part as replacement (all symbols before the very first "?"
Make sure you google some regex basics.
Filters only apply the new data collected, never the historic data you already have in your properties collected.

Filter to Group URL on Visitors Flow

I have found a similar question earlier here:
Google Analytics Visitors Flow: grouping URLs?
However I'm confused because people suggest different way to write the Replace String, and either way I try it am not able to make it work.
So I have a ecommerce site with hundreds of different pages. The different parts of the website is:
http://example.com/sv/ (Root)
http://example.com/sv/category/1-name/
http://example.com/sv/product/1-name/
http://example.com/sv/designer-tool/1-name/
http://example.com/sv/checkout/
When I go to the visitors flow. I want to see the amount of people that go from example Root to Category, and from Category to Product, and from Product to Designer Tool, and from Designer Tool to Checkout. However now when I have so many different pages it becomes very difficult to follow the visitors flow, because the product pages are for example not grouped together.
So instead of above. I would like to remove the 1-name/ part in the end. And only see /sv/category/, /sv/product/, /sv/designer-tool/.
In the earlier post I understand you can use an advanced filter to do this. I have set the following settings:
Type: Search & Replace
Field: Request URI
Search String: ^/(category|product|designer-tool)(/\d*)(.*)
Replace String: /$A1$A3
I guess that my search string and my replace string is wrong. Any ideas?
EDIT: I updated my filter to the following:
Search String: ^/sv/(category|product|designer-tool)(/\d*)(.*)$
Replace String: /sv/\1/
Still testing and unsure if it's the correct way to set it up.
I was able to solve this by the Search String and the Replace String in my edit above.
So basically what I did was:
Create a secondary view/profile for your site. If you apply your filter to your one and only view/profile that means that you won't be able to see any detailed data about specific pages, because the filter removes/filter that.
Add an Advanced Filter with the following settings:
Type: Search & Replace
Field: Request URI
Search String: ^/sv/(category|product|designer-tool)(/\d*)(.*)$
Replace String: /sv/\1/
You need to wait 24h after creating your new profile/view before you can see any data in it.
So my confusion was regarding the Search and Replace String. The Search String is an regular expression for matching everything after your .tld. So for example http://www.example.com/sv/mypage/1-post/, the Search String will only search within /sv/mypage/1-post/.
The Replace String is what it should replace the whole Search String with. So in my case, I matched all URL's that had /sv/category/1-string/. I wanted only to keep the "category" part, so I replaced the whole string with /sv/category/ by inputting Replace String /sv/\1/
/sv/ means just what it says. \1 means that it should take the value of the first () of my Search String (In this case "category"). The ending / is just an ending slash.
All in all, it means that any URLs that looked like http://example.com/sv/category/1-string/ was changed to http://example.com/sv/category/. Meaning that I can now see data for all my categories as a group, instead of individual pages.

Using filters to track hits for multiple URLs

We have a large website that is split up into groups of organisations with a number of micro-sites. We would like to provide one organisation within a group with their own set of data and I am having troubling getting the filtering working.
I think my main problem is I have 2 include filters. According to the documentation:
"If you apply multiple Include Filters, the hit must match every applied Include Filter in order to save the hit."
Our website urls would go something like this: https://[host]/[group]/[site]/[params]. I would like to track the following, given that this client (id 9) is in group "foo":
https://mysite.com/foo/live/default.aspx?id=9
https://mysite.com/foo/live/?id=9
https://mysite.com/foo/reporting/9/*
so that any hits on those urls would be captured for this particular client.
Our 2 current filters (type="Include") are as follows:
/foo/Reporting/9/
/foo/[^\?]*\?id=9
but these do not seem to track everything we think they should. Any help would be much appreciated.
By the time the first filter is done there is nothing left for the second filter to match - the first filter throws everything away that does not match (that's what Google means by "the hit must match every applied Include Filter").
I would suggest you first use an advanced filter to transform your urls so they follow all the same pattern (i.e. grab the value from the query parameter and append it to the url path) and then apply the include filter. I'm pretty certain that would be easier than trying to include different url structures (if you need help with the filters holler away in the comments, but the example given in the advanced filters interface should give you a clue how this works).

Google Analytics - Is head match of goal URL any substring of the actual URL

I have 2 urls to be tracked under a single goal :
/vpv/purchase_item
/vpv/purchase_coupon
So, I have setup a head match as /vpv/purchase.
Is this correct, as /vpv/purchase is a substring of both the urls i need to track ?
Or does head match considers the complete URL except the query string ?
Is this correct, as /vpv/purchase is a substring of both the urls i need to track? Yes
Or does head match considers the complete URL except the query string? No
A head match matches identical characters starting from the beginning of the string up to and including the last character in the string you specify.
Head Match
Suppose your pet store website has a number of pages in a single directory, and you want to use a head match URL to create a goal only for the fish-related pages, which all have the same structure:
/supplies/fishFood.html
/supplies/fishTanks.html
/supplies/fishTankDecorations.html
To determine whether your head match URI works, go to the Pages report for your site, click the Search button and choose "Begins with" as your search type. To match the URLs above, you would enter /supplies/fish in the search field. If your search returns those pages you expect to match, you can use that same URI string as you goal URL.
See Verifying Correct URL Expressions for Goals

Resources