Related
Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 months ago.
Improve this question
When I google a keyword into google.com, I see this URL in the browser:
https://www.google.com/search?q=harry+potter&sxsrf=AOaemvJzqEslTi5rksHz8Da7pgdZ1J3uMw%3A1634810260185&source=hp&ei=lDlxYYaCCNaL9u8Popq2-AQ&iflsig=ALs-wAMAAAAAYXFHpA2d9PU58mYXikU2pl90IN7Z8wXq&ved=0ahUKEwiGnNLmntvzAhXWhf0HHSKNDU8Q4dUDCAg&uact=5&oq=harry+potter&gs_lcp=Cgdnd3Mtd2l6EAMyCAguEIAEEJMCMgUILhCABDIFCAAQgAQyBQguEIAEMgUIABCABDIFCAAQgAQyBQguEIAEMgUIABCABDIFCC4QgAQyBQgAEIAEOgcIIxDqAhAnOgQIIxAnOgUIABCRAjoLCC4QgAQQxwEQowI6CwguEIAEEMcBEK8BOgsILhCABBDHARDRA1D3GliFJmDtJmgAcAB4AIABowGIAeQKkgEDNi43mAEAoAEBsAEK&sclient=gws-wiz
I understand that virtually all websites work via the Hypertext Transfer Protocol. Some of the most common HTTP methods are GET and POST.
I assume the above is a POST method, since it has a request payload (my search query) and a response payload (the webpage returned).
The parameter "q" is clearly my search keyword.
What do
sxsrf=AOaemvJzqEslTi5rksHz8Da7pgdZ1J3uMw%3A1634810260185
source=hp
ei=lDlxYYaCCNaL9u8Popq2-AQ
iflsig=ALs-wAMAAAAAYXFHpA2d9PU58mYXikU2pl90IN7Z8wXq
ved=0ahUKEwiGnNLmntvzAhXWhf0HHSKNDU8Q4dUDCAg
uact=5
oq=harry+potter
gs_lcp=Cgdnd3Mtd2l6EAMyCAguEIAEEJMCMgUILhCABDIFCAAQgAQyBQguEIAEMgUIABCABDIFCAAQgAQyBQguEIAEMgUIABCABDIFCC4QgAQyBQgAEIAEOgcIIxDqAhAnOgQIIxAnOgUIABCRAjoLCC4QgAQQxwEQowI6CwguEIAEEMcBEK8BOgsILhCABBDHARDRA1D3GliFJmDtJmgAcAB4AIABowGIAeQKkgEDNi43mAEAoAEBsAEK
sclient=gws-wiz
represent, and how does one know?
There is two ways to know that: Semi-automated using the Unfurl tool and manually reading the list of explanations.
Semi-automated way with use of Unfurl
There is an project Unfurl URL parser and browser, a free tool to check and decode Google Search URLs, https://dfir.blog/introducing-unfurl/.
And here is online hosted version of Unfurl: https://dfir.blog/unfurl/
It is a visual 2D browser of URL parameters, use mouse wheel to zoom in and zoom out, use mouse to unclutter the nodes and watch the explanations for query parameters, not only google.
And further I collected some information, googled it now, 2022-Sep.
Beware that Google query parameters explanations can get outdated very soon, every several years, so the only thing you can do is to search again for newer explanations in the Net.
Manual way of reading the list of explanations
Google query parameters explanations from [2021][2021]:
q= query sent to search engine
oq= 'original query' text of query last typed by user into the search box before the user selected a search term from given suggestions; it coincides with q= if the latter was entered all manually
ei= Search Session Start Date/Time
represents the time that the user’s session started, "Google time" (so no dependence on the local system time).
ved= Page Load Date/Time
sxsrf= Previous Page Load Date/Time
Explanations from [2016][2016]:
Here is a list of the URL parameters that we would commonly see:
q= the query string (keyword) that the user searched
oq= tracks the characters that were last typed into the search box before the user selected a suggested search term
hl= controls the interface language
redir_esc= unknown
sa= user search behavior
rct= unknown; seems to be related to Google AdWords
gbv= control the presence of JavaScript on the page
gs_l= unknown; seems to be related to what type of search is being done (i.e., mobile, serp, img, youtube, etc.)
esrc= set to ‘s’ for secure search
frm= unknown
source= where the search originated (i.e., google.com, toolbar, etc.)
v= unknown
qsubts= unknown
action= unknown
ct= click location
oi= unknown
cd= ranking position of the search result that was clicked
cad= unknown; appears to be a referrer, affiliate or client token
sqi= unknown
ved= contains information about the search result link that was clicked (see https://moz.com/blog/inside-googles-ved-parameter)
url= the URL that Google will redirect the user to after a search result link is clicked
ei= passes an alphanumeric parameter that decodes the originating SERP where user clicked on a related search
usg= unknown; possibly handling the encrypted search string
bvm= unknown; possibly a location tracker
ie= input encoding (default: utf-8)
oe= output encoding
sig2= unknown
Sources:
[2021]: Analyzing Timestamps in Google Search URLs - Magnet Forensics
[2016]: The Approaching Darkness: The Google Referral URL In 2016
I hope others will update this list, and will add here newer explanations later.
Also I leave here few articles with outdated explanations:
2008 article - Moz' The Ultimate Guide to the Google Search Parameters, this is very similar or the same as mentioned in neighbour answer blog post Google Search URL Parameters [Ultimate Guide] by SEOQuake, it is again like from 2008.
2014 article How to Use the Information Inside Google's Ved Parameter - Moz.
First of all, the request you wrote uses GET as Request Method.
You can easily check that on the Network tab of the Developer Tools in any browser:
Second, the difference between GET and POST (there many other methods but that's other topic) isn't the one you write. Mainly, the difference between these two methods is the presence or not of a body in the request (even if you could send a body in a GET-Request, but that's highly unrecommended).
The Request Methods goal is to indicate the destination server how it should treat the request.
Now, focusing on your question, you could have discover the values of all of them with a simple Google search, but anyway here you have a blog where all the parameters of the Google Search URL are explained:
Google Search URL Parameters [Ultimate Guide]
I asked this question in other forums, and didn't have a solution so far.
I would like to have Google Analytics' source and medium added in every form sent by my websites, as hidden fields.
I use WordPress, and the plugings that I commonly use for contact forms are Contact Form 7 and Fast Secure Contact Form.
Any ideas?
Thanks in advance!
-- Gabriel
There isn't really a way.
It used to be that channel attribution was computed in the GA tracking code in the client, and you could extract it from the cookie values. However since Google has switched to the measurement protocol (which is what's behind both analytics.js and gtag.js) attribution is determined on the Google servers and there is no really feasible way to get the information in realtime to include it in a form.
You could create a script that emulates GA attribution, but the rules are somewhat complex and it is unlikey you would get an exact match.
Another way (which, if you are in Europe, might bring you in conflict with the new privacy guidelines from march on) would be to save a unique token with the form, send the same token as custom dimension to Google Analytics, and then join the information after the fact via the API. Some time ago I described the process in a tutorial, and even if this is for salesforce (and the specific code most certainly obsolete) it describes the problem and the solution somewhat exhaustively.
We want to capture aggregated, anonymous search query history for analytic purposes to improve our internal search engine performance and metadata practices.
I found this article: https://support.google.com/analytics/answer/1012264?hl=en
Unfortunately, our search engine uses a hash tag instead of a question mark (nonstandard query string).
For example: http://www.site.com/search#q=search%20term
Is there a way to configure Google Analytics to recognize hash tag values in the URLs and capture these given a defined pattern?
Thanks
sorry to say this, but hash tags won't "make it" into the reports at all, so no search reports for hash tag.
There is a simple workaround though: use virtual pageviews, that would emulate the request with regular query parameter with ? sign.
_gaq.push(['_trackPageview', '/search?q=search%20term']);
However, this virtual pageview will generate a second pageview for a given page, which isn't preferable. So I would recommend setting up a new view specifically just for site search reports (or try to play around with advanced filters, which might get the work done). Also, don't forget to turn on the site search within the view settings as you would do otherwise:
Have you tested putting in a hash into the Query Parameter field?
Important: This question isn't actually really an ASP.NET question. Anyone who knows anything about URLS can answer it. I just happen to be using ASP.NET routing so included that detail.
In a nutshell my question is :
"What URL format should I design that i can give to external parties to get to a specific place on my site that will be future proof. [I'm new to creating these 'REST' URLs]."
I need an ASP.NET routing URL that will be given to a third party for tracking marketing campaigns. It is essentially a 'gateway' URL that redirects the user to a specific page on our site which may be the homepage, a special contest or a particular product.
In addition to trying to capture the referrer I will need to receive a partnerId, a campaign number and possibly other parameters. I want to provide a route to do this BUT I want to get it right first time because obviously I cant easily change it once its being used externally.
How does something like this look?
routes.MapRoute(
"3rd-party-campaign-route",
"campaign/{destination}/{partnerid}/{campaignid}/{custom}",
new
{
controller = "Campaign",
action = "Redirect",
custom = (string)null // optional so we need to set it null
}
);
campaign : possibly don't want the word 'campaign' in the actual link -- since users will see it in the URL bar. i might change this to just something cryptic like 'c'.
destination : dictates which page on our site the link will take the user to. For instance PR to direct the user to products page.
partnerid : the ID for the company that we've assigned - such as SO for Stack overflow.
campaignid : campaign id such as 123 - unique to each partner. I have realized that I think I'd prefer for the 3rd party company to be able to manage the campaign ids themselves rather than us providing a website to 'create a campaign'. I'm not
completely sure about this yet though.
custom : custom data (optional). i can add further custom data parameters without breaking existing URLS
Note: the reason i have 'destination' is because the campaign ID is decided upon by the client so they need to also tell us where the destination of that campaign is. Alternatively they could 'register' a campaign with us. This may be a better solution to avoid people putting in random campaign IDs but I'm not overly concerned about that and i think this system gives more flexibility.
In addition we want to know perhaps which image they used to link to us (so we can track which banner works the best). I THINK this is a candiate for a new campaignid as opposed to a custom data field but i'm not sure.
Currently I am using a very primitive URL such as http://example.com?cid=123. In this case the campaign ID needs to be issued to the third party and it just isn't a very flexible system. I want to move immediately to a new system for new clients.
Any thoughts on future proofing this system? What may I have missed? I know i can always add new formats but I want to use this format as much as possible if that is a good idea.
This URL:
"campaign/{destination}/{partnerid}/{campaignid}/{custom}",
...doesn't look like a resource to me, it looks like a remote method call. There is a lot of business logic here which is likely to change in the future. Also, it's complicated. My gut instinct when designing URLs is that simpler is generally better. This goes double when you are handing the URL to an external partner.
Uniform Resource Locators are supposed to specify, well, resources. The destination is certainly a resource (but more on this in a moment), and I think you could consider the campaign a resource. The partner is not a resource you serve. Custom is certainly not a resource, as it's entirely undefined.
I hear what you're saying about not wanting to have to tell the partners to "create a campaign," but consider that you're likely to eventually have to go down this road anyway. As soon as the campaign has any properties other than the partner identifier, you pretty much have to do this.
So my first to conclusions are that you should probably get rid of the partner ID, and derive it from the campaign. Get rid of custom, too, and use query string parameters instead, should it be necessary. It is appropriate to use query string parameters to specify how to return a resource (as opposed to the identity of the resource).
Removing those yields:
"campaign/{destination}/{campaignid}",
OK, that's simpler, but it still doesn't look right. What's destination doing in between campaign and campaign ID? One approach would be to rearrange things:
"campaign/{campaignid}/{destination}",
Another would be to use Astoria-style indexing:
"campaign({campaignid})/{destination}",
For some reason, this looks odd to a lot of people, but it's entirely legal. Feel free to use other legal characters to separate campaign from the ID; the point here is that a / is not the only choice, and may not be the appropriate choice.
However...
One question we haven't covered yet is what should happen if/when the user submits a valid destination, but an invalid campaign or partner ID. If the correct response is that the user should see an error, then all of the above is still valid. If, on the other hand, the correct response is that the user should be silently taken to the destination page anyway, then the campaign ID is really a query string parameter, not a part of the resource. Perhaps some partners wouldn't like being given a URL with a question mark in it, but from a purely REST point of view, I think that's the right approach, if the campaign ID's validity does not determine where the user ends up. In this case, the URL would be:
"campaign/{destination}",
...and you would add a query string parameter with the campaign ID.
I realize that I haven't given you a definite answer to your question. The trouble is that most of this rests on business considerations which you are probably aware of, but I'm certainly not. So I'm more trying to cover the philosophy of a REST-ful URL, rather than attempting to explain your business to you. :)
I think the URL rewriting is getting out of hand a little bit lately. Not everything belongs to the URL. After all, a URL is supposed to describe a resource that can be searched for, discovered or manipulated and it seems to me that at least the partner ID and the custom fields from above are not part of the resource.
Not to mention that that at some point you would like to actually keep the partner ID constant across multiple campaigns and that means that it is now orthogonal to the particular places they need to visit. If you keep these as parameters, you will allow your partners to access uniformly multiple resources on your website, while still reliably identifying themselves, so you can track their participation in any of your campaigns.
It looks like you've covered all of your bases. The only suggestion I have is to change
{custom}
to
{*custom}
That way, if you ever need to accept further parameters, you don't have to take the chance that old URLs will get a 404. For example:
If you have a URL that looks like:
campaign/PR/SO/123
and you decide in the future that you would like to accept a fourth and fifth parameter:
campaign/PR/SO/123/blah/foo
then the first URL will still be valid, because you're using a wildcard character in {*custom}. "blah/foo" would be passed as a string to your action. To get those extra two parameters, you would simply split the custom argument in your action by '/'. Add some friendly error handling if they don't exist and you've successfully changed the amount of information you can receive with a campaign URL without completely breaking URLs already in the wild.
Why not use URL encoded variables instead of routes? They're a lot more flexible - you can add any new features in the future while still maintaining 100% backwards compatibility. Admittedly, it's a little more trouble to type manually, but if there's all those parameters anyway, it's already no picnic.
http://mysite.com/page?campaign=1&dest=products&pid=15&cid=25
To me, this is much more indicative of what is really going on. Using paths implies a that a resource exists at that location. But really you're just providing a web service with various parameters, and this model captures that much more clearly. And in the future, you can add more parameters effortlessly. You can also default parameters if they are missing without messing anything up.
Not sure of the code in ASP, but it should be trivial to implement.
I think I'd look at doing it the way that SO does it's questions.
"campaign/{campaign-id}/friendly-name-of-campaign"
Create a mapping in your database when the campaign is created that associates all the data you need with an automatically generated id. The friendly name could be assigned basically the same way as a question is on SO -- by the user -- but you could also have an approval process that makes sure that it meets your requirements and is distinct from any existing campaign names. Your tracking company can track by the id and you can correlate that with your associated data with a simple look up.
What you have looks good for your needs. The other posts here have good points. But may not be suitable for you. One thing that you could consider with future proofing your links is to put a version number somewhere in there.
"campaign/{version}/{destination}/{partnerid}/{campaignid}/{custom}"
This way if you decide to completely change your format you can up the version to 2.0 (or whatever) and still keep track of the old links coming in.
I would do
/c/{destination}/{partnerid}/{campaignid}/?customvar=s
You should think about the hierarchy of the first parameters, you already got that managed quite well. Only if there's a hierarchy path segments should be used.
From your description, destination seems to be the broadest parameter, partnerid only works with destination, and campaingid is specific to a partner.
When you really need to add custom parameters I would go for query variables (they are not forbidden in REST), because these are not part of the hierarchy.
You also shouldn't try to be too RESTful here. After all, it's for a campaign and for redirecting to a final resource. So the URL you want to design here is not really a specific resource in the terms of REST.
Create an URL called http://mysite.com/gateway
Return an HTML form, tell your partners to fill in the form and POST it. Redirect based on the form values.
You could easily provide your partners with the javascript to do the GET and POST. Should be trivial.
The most important thing i have learned about REST URL´s thats usually burried deep in some book or article:
The URL should point to a resource and the following ?querystring should have all the scoping information needed. DONT mix those two or you will have a design thats very hard to work with.
Other then that i fully agree with Craig Stuntz
Now, I realise the initial response to this is likely to be "you can't" or "use analytics", but I'll continue in the hope that someone has more insight than that.
Google adwords with "autotagging" appends a "gclid" (presumably "google click id") to link that sends you to the advertised site. It appears in the web log since it's a query parameter, and it's used by analytics to tie that visit to the ad/campaign.
What I would like to do is to extract any useful information from the gclid in order to do our own analysis on our traffic. The reasons for this are:
Stats are imperfect, but if we are collating them, we know exactly what assumptions we have made, and how they were calculated.
We can tie the data to the rest of our data and produce far more accurate stats wrt conversion rate.
We don't have to rely on javascript for conversions.
Now it is clear that the gclid is base64 encoded (or some close variant), and some parts of it vary more than others. Beyond that, I haven't been able to determine what any of it relates to.
Does anybody have any insight into how I might approach decoding this, or has anybody already related gclids back to compaigns or even accounts?
I have spoken to a couple of people at google, and despite their "don't be evil" motto, they were completely unwilling to discuss the possibility of divulging this information, even under an NDA. It seems they like the monopoly they have over our web stats.
By far the easiest solution is to manually tag your links with Google Analytics campaign tracking parameters (utm_source, utm_campaign, utm_medium, etc.) and then pull out that data.
The gclid is dependent on more than just the adwords account/campaign/etc. If you click on the same adwords ad twice, it could give you different gclids, because there's all sorts of session and cost data associated with that particular click as well.
Gclid is probably not 100% random, true, but I'd be very surprised and concerned if it were possible to extract all your Adwords data from that number. That would be a HUGE security flaw (i.e. an arbitrary user could view your Adwords data). More likely, a pseudo-random gclid is generated with every impression, and if that ad is clicked on, the gclid is logged in Adwords (otherwise it's thrown out). Analytics then uses that number to reconcile the data with Adwords after the fact. Other than that, there's no intrinsic value in the gclid number itself.
In regards to your last point, attempting to crack or reverse-engineer this information is explicitly forbidden in both the Google Analytics and Google Adwords Terms of Service, and is grounds for a permanent ban. Additionally, the TOS that you agreed to when signing up for these services says that it is not your data to use in any way you feel like. Google is providing a free service, so there are strings attached. If you don't like not having complete control over your data, then there are plenty of other solutions out there. However, you will pay a premium for that kind of control.
Google makes nearly all their money from selling ads. Adwords is their biggest money-making product. They're not going to give you confidential information about how it works. They don't know who you are, or what you're going to do with that information. It doesn't matter if you sign an NDA and they have legal recourse to sue you; if you give away that information to a competitor, your life isn't worth enough to pay back the money you will have lost them.
Sorry to break it to you, but "Don't be Evil" or not, Google is a business, not a charity. They didn't become one of the most successful companies in the world by giving away their search algorithm to the first guy who asked for it.
The gclid parameter is encoded in Protocol Buffers, and then in a variant of Base64.
See this guide to decoding the gclid and interpreting it, including an (Apache-licensed) PHP function you can use.
There are basically 3 parameters encoded inside it, one of which is a timestamp. The other 2 as yet are not known.
As far as understanding what these other parameters mean—it may be helpful to compare it to the ei parameter, which is encoded in an extremely similar way (basically Protocol Buffers with the keys stripped out). The ei parameter also has a timestamp, with what seem to be microseconds, and 2 other integers.
FYI, I just posted a quick analysis of some glcid data from my sites on this post. There definitely is some structure to the gclid, but it is difficult to decipher.
I think you can get all the goodies linked to the gclid via google's adword api. Specifically, you can query the click performance report.
https://developers.google.com/adwords/api/docs/appendix/reports#click
I've been working on this problem at our company as well. We'd like to be able to get a better sense of what our AdWords are doing but we're frustrated with limitations in Analytics.
Our current solution is to look in the Apache access logs for GET requests using the regex:
.*[?&]gclid=([^$&]*)
If that exists, then we look at the referer string to get the keyword:
.*[?&]q=([^$&]*).*
An alternative option is to change your Apache web log to start logging the __utmz cookie that google sets, which should have a piece for the keyword in utmctr. Google __utmz cookie and you should be able to find plenty of information.
How accurate is the referer string? Not 100%. Firewalls and security appliances will strip it out. But parsing it out yourself does give you more flexibility than Google Analytics. It would be a great feature to send the gclid to AdWords and get data back, but that feature does not look like it's available.
EDIT: Since I wrote this we've also created our own tags that are appended to each destination url as a request parameter. Each tag is just an md5 hash of the text, ad group, and campaign name. We grab it using regex from the access log and look it up in a SQL database.
This is a non-programmatic way to decode the GCLID parameter. Chances are you are simply trying to figure out the campaign, ad group, keyword, placement, ad that drove the click and conversion. To do this, you can upload the GCLID into AdWords as a separate conversion type and then segment by conversion type to drill down to the criteria that triggered the conversion. These steps:
In AdWords UI, go to Tools->Conversions->Add conversion with source "Import from clicks"
Visit the AdWords help topic about importing conversions https://support.google.com/adwords/answer/7014069 and create a bulk load file with your GCLID values, assigning the conversions to you new "Import from clicks" conversion type
Upload the conversions into AdWords in Tools->Conversions->Conversion actions (Uploads) on left navigation
Go to campaigns tab, Segment->Conversions->Conversion name
Find your new conversion name in the segment list, this is where the conversion came from. Continue this same process on the ad groups and keywords tab until you know the GCLID originating criteria
Well, this is no answer, but the approach is similar to how you'd tackle any cryptography problem.
Possibility 1: They're just random, in which case, you're screwed. This is analogous to a one-time pad.
Possibility 2: They "mean" something. In that case, you have to control the environment.
Get a good database of them. Find gclids for your site, and others. Record all times that all clicks occur, and any other potentially useful data
Get cracking! As you have started already, start regressing your collected data against your known, and see if you can find patterns used decrypting techniques
Start scraping random gclid's, and see where they take you.
I wouldn't hold high hope for this to be successful though, but I do wish you luck!
Looks like my rep is weak, so I'll just post another answer rather than a comment.
This is not an answer, clearly. Just voicing some thoughts.
When you enable auto tagging in Adwords, the gclid params are not added to the destination URLs. Rather they are appended to the destination URLs at run time by the Google click tracking servers. So, one of two things is happening:
The click servers are storing the gclid along with Adwords entity identifiers so that Analytics can later look them up.
The gclid has the entity identifiers encoded in some way so that Analytics can decode them.
From a performance perspective it seems unlikely that Google would implement anything like option 1. Forcing Analytics to "join" the gclid to Adwords IDs seems exceptionally inefficient at scale.
A different approach is to simply look at the referrer data which will at least provide the keyword which was searched.
Here's a thought: Is there a chance the gclid is simply a crytographic hash, a la bit.ly or some other URL shortener?
In which case the contents of the hashed text would be written to a database, and replaced with a unique id.
Afterall, the gclid is shortening a bunch of otherwise long text.
Takes this example:
www.example.com?utm_source=google&utm_medium=cpc
Is converted to this:
www.example.com?gclid=XDF
just like a URL shortener.
One would need a substitution cipher in order to reverse engineer the cryptographic hash... not as easy task: https://crypto.stackexchange.com/questions/300/reverse-engineering-a-hash
Maybe some deep digging into logs, looking for patterns, etc...
I agree with Ophir and Chris. My feeling is that it is purely a serial number / unique click ID, which only opens up its secrets when the Analytics and Adwords systems talk to each other behind the scenes.
Knowing this, I'd recommend looking at the referring URL and pulling as much as possible from this to use in your back end click tracking setup.
For example, I live in NZ, and am using Firefox. This is a search from the Firefox Google toolbar for "stack overflow":
http://www.google.co.nz/search?q=stack+overflow&ie=utf-8&oe=utf-8&aq=t&client=firefox-a&rlz=1R1GGLL_en-GB
You can see that: a) im using .NZ domain, b) my keyword "stack+overflow", c) im running firefox.
Finally, if you also stash the full landing page URL, you can store the GCLID, which will tell you the visitor came from paid, whereas if it doesn't have a GCLID, then the user must have come from natural search (if URL tagging is enabled of course).
This would theoretically allow you to then search for the keyword in your campaign, and figure out which adgroup them came from. Knowing the creative would probably be impossible though, unless you split test your landing URLs or tag them somehow.