What i'm wondering is, what kind of behaviour does google analytics show when a ddos attack occurs? Any theories?
My theory would be that an effective DDoS platform/script would not include anything as heavyweight as a JavaScript engine, and that therefore the DDoS activity would not show up in Google Analytics at all.
The point of a DDoS attack is to overwhelm the server with a flood of requests. Any CPU cycles that are spent evaluating JavaScript in the response that the server sends back are cycles that could better be used churning out more requests to the server. I would fully expect a properly executed DDoS attack to not waste time parsing the response from the server, or even reading it off of the underlying socket, let alone interpreting and executing and JavaScript that may be embedded in the markup or fetching scripts and other resources from domains other than the target server.
Of course, this does not preclude the possibility of an exceptionally naive DDoS attack implemented using web frameworks and libraries that do evaluate embedded JavaScript. Such an attack would not (or rather, should not if you've implemented your server code correctly) be very effective, but it would likely generate a spike in Google Analytics traffic.
It depends on the way that the DDOS is implemented. If it's simply an executable distributed to multiple machines, making simple HTTP queries using native TCP sockets, then Google Analytics wouldn't notice anything at all: because the JavaScript that gets returned would never be executed.
However, other sorts of DDOS attacks could leverage actual browsers distributed across many machines. For instance, if you could hack the Yahoo home page and insert an <iframe src='takemedown.com'> into it, you could easily DDOS "takemedown.com". In this particular scenario, GA would certainly detect the impressions, and because (depending on the scenario) there might be an HTTP referrer tag, you could possibly run a report in GA that could pull out the suspicious impressions.
But there are other similar scenarios that wouldn't leave any particular footprints. For instance, if you could hack Lady Gaga's twitter account, you could send out a link to her 16MM followers, and a significant number would immediately click on it: and since most of those clicking on it would probably be doing so from within a separate app, there wouldn't be any referrer tag, and no particular way of identifying the requests.
In other words, it all depends, but it's probably not a terribly useful avenue to investigate. In many (most?) scenarios, GA wouldn't even recognize the impression; and in many others, wouldn't have any reasonable way of picking out the good impressions from the bad.
It will show up 100% some significant peaks in google analytics , simply because there are huge number of requests from multiple sources having huge bounce rate !
When a HTTP DDoS attack occurs the attacker is either using several (thousands) of computers to do so. Sometimes, it's also done with servers. When they make the request, they don't render the javascript or anything - they simply in most cases just make a GET request to the webpage.
So no, it shouldn't really have an impact on GoogleAnalytics
Well, I'm also searching this kind of information, but I have some considerations about the answer:
You will probably not see the attack itself with Google analytics, but you should see the results, I mean, a DDoS is a "distributed deny of service", so, if the service is effectively denied, then you should see a flat line on the graph on Google analytics.
It depends how the bot works, but here's what happened to my website:
Google Analytics real time report for the monk
As well as the increase in traffic you will likely see your bounce rate go sky high and average time on page significantly drop - which I'm sure can have a negative impact on SERPS.
For me it coincided with a Google update so first I put it down to that, but I started getting a lot of traffic to the root page, terms, and privacy, with many prefixed with /?m=0 which is in itself odd (and I'd love for someone to shed light).
The attack caused a great deal of timeouts and was painful to fix:
In short, I hooked up CloudFlare, then created Security -> WAF rules to challenge countries where I was receiving most of the bot traffic. I also switched on the basic bot attack mode (there's a more effective super bot attack mode with the paid subscriptions).
The other interesting point of note was why was my site subject to a DDOS attack. I wish I knew, but at a similar time to when the attack started I was approached by someone who enquired about buying the website. Possibly a tactic to get me to sell it/sell it cheap.
Related
After some experimenting, I noticed it is possible to send events directly to a server container via HTTP request instead of pushing to the data layer (which is connected to a web container). A big advantage of this setup is that the front-end doesn't need to load any GTM script. Yet, I have some doubts because I don't find much documentation about this setup. This setup also brings some challenges like implementing automatically collected events (e.g. page_view). Does anyone have experience with this setup or is able to tell me why I shouldn't be following this path?
Regards, Thomas
This is definitely not a best practice, although this is actually a technically more beneficial path since... A few things, actually:
Can make your tracking completely immune to adblockers.
Has the potential to protect from malicious analytics spam, also makes it way harder for third parties to spoil your data.
Doesn't surface your analytics stack and libraries to the public.
Is typically way lighter than the GTM lib.
You have a much better degree of control about what happens and have much more power over the tracking.
But this is only if you have the competency to develop it, which is a rarity, actually. Normally web-developers don't know analytics well enough to make it work well while analytics developers lack the technical knowledge. You now suddenly can't just hire a junior or mid implementation expert to help with the tracking. A lot of those who call themselves seniors wouldn't be able to maintain raw JS tracking libraries either.
As you've mentioned, you won't be able to rely on automatic tracking from GTM or gtag libraries. And not having automatic events is actually not the issue. The more important thing is manually collecting all dimensions, including the proper maintenance of client ids and session ids.
Once your front-end is ready, it's important to note that you don't want to expose your server-side GTM's endpoint. I mean, you can, but this would defeat the purpose significantly. You want to make a mirror on your backend that would reroute the events to the sGTM.
Finally, you may want to make up some kind of data encryption/protection/validation/authentication logic on your mirror for the data. You may consider it just because without surfacing the endpoints, you're now able to further conceal what you're doing thus avoiding much of potential data tampering. This won't make it impossible to look into what you're doing, of course, but it will make nearly impossible any casual interference.
In the end, people don't do it because this would effectively double the monetary cost of tracking since sufficient experts would charge approximately double from what regular analytics folks charge. However, the clarity of data will only grow about 10-20%. Such an exchange generally doesn't make business sense unless you're a huge corporation for which even enterprise analytics solutions like Adobe Analytics is not good enough. Amazon would probably be a good example.
Also, If you're already redefining users and sessions, you're not that far from using something like Segment for tracking and then ETLing all that into a data warehouse and use a proper BI tool for further analysis. And now is there still sense in having the sGTM at all if you can just stream your events to Segment realtime from your mirror, and then it can seamlessly re-integrate this data into GA, Firebase, AA, Snowflake, Facebook and tens if not hundreds more destinations, and this all server-side.
You want to know where to stop, and the best way to do it is by assessing the depth of the analysis/data science your company is conducting on the user behavioral data. And in 99% of cases, it's not deep enough to even consider sGTM.
In response to #BNazaruk
So it's been a while now… I've been looking into the setup, because it’s just way too cool. I also took a deeper dive into CGTM to better understand the benefits of SGTM. And honestly, everything that has the probability to replace CGTM should be considered. My main reasons are;
Cybersecurity - Through injection it is possible to insert malicious software like keyloggers. The only thing that withholds this, are the login details to CGTM. These are, relatively speaking easy to get with targeted phishing.
Speed - A CGTM setup, with about 10 - 15 tags, means an avg performance loss of 40 points in Lighthouse.
Quality - Like you said; because browser restrictions like cookie policies and ad blockers that intercept/manipulate/block CGTM signals: On avg. 10-20% of the events are not registered in proper fashion.
Mistakes - Developing code outside a proper dev process, limits the insight into the impact of the code with possible errors or performance loss as a result.
So far I have created a standardized setup (container templates, measurement plans, libraries) for online marketers and developers to use. Within the setup, we maintain our own client and session ID’s. Developers are able to make optimal use of SGTM and increase productivity drastically. The only downside to the setup is that we still use CGTM to implement page_view and exceptions. Which is a shame, because I’m not far away from a full server-to-server setup. Companies are still too skeptical to fully commit to SGTM I guess. Though, my feeling says that in 5 years time, high-end apps won't use CGTM anymore.
Once again, thanks for your answer, it’s been an important part of my journey.
I am working on a project to remove gtag based tracking from a website. We have concerns about not being able to implement tracking due to poor network connections. The ideal solution would be to handle conversion tracking on the backend when we receive a purchase request (so as not to send too many requests.) It seems that Google Analytics Measurement Protocol would be the correct tool for the job. However, this (somewhat frustrating) note mentions that "only partial reporting may be available."
We have explored server side GTM, but that still adds unwanted network requests and is not the approach we want to go for. It seems that there are workarounds such as in this question, but they seem pretty precarious. Is there any other api available, or approach that might fit the use case we are looking for?
My website uses Google Ads Conversion and Google Analystics. From time to time, I will see Chrome reports errors in accessing Google related URLs, such as
GET https://www.googleadservices.com/pagead/conversion_async.js net::ERR_TIMED_OUT
A screenshot is put below:
THe error is generated from Google Tag Manager with the following URL:
https://www.googletagmanager.com/gtag/js?id=AW-1071615551&l=dataLayer&cx=c
I am curious why this will occur so frequently since Google has the best server and network connection in the world?
I have seen some report indicating Google Server is the best, for example:
Google CDN is the 2nd fast, see https://www.cdnperf.com/
Another report is not in English so I do not put it here.
I've been working with various Google services for a while now. Have never seen a timed out or a different kind of errorous hit except in cases:
Sudden client connection loss. Especially with mobile traffic.
Second Most popular issue - adblockers. They tend to block tracking too. Often by interfering with the network requests, resulting in network errors that then surface in datadog/dynatrace and whatnot.
When it's impossible for the client to play its encryption part, which is likely either due to there being a middle man sniffing the traffic, or, which is actually more common, the client having a completely wrong date. This should be quite a rare occasion
When there's a firewall-like filtering in place blocking requests to Google servers. Even more rare in my experience.
With the new HTTP reporting headers being developed and refined, it seems more important than ever to be able to tell/validate where the reports are coming from.
For example, someone attempting to "hack" the site can very easily flood the reporting endpoint with false reports, drowning out the details of what they're attempting. It's also a vector for a DDOS attack.
Is there some mechanism for doing this aside from obfuscation?
Do the User Agents sign their reports?
Any advice would be much appreciated!
I took a quick glance through the standard draft for the Report-To header, but it doesn't seem to touch on it.
One thought on application-level mitigation: record the IPs of all clients that are connected and authenticated and only accept reports from IPs that are whitelisted in this way. This assumes that the browser sends its reports direct from the client machine (I believe this is the case, but can anyone confirm?).
I build ASP.NET websites (hosted under IIS 6 usually, often with SQL Server backends and forms authentication).
Clients sometimes ask if I can check whether there are people currently browsing (and/or whether there are users currently logged in to) their website at a given moment, usually so the can safely do a deployment (they want a hotfix, for example).
I know the web is basically stateless so I can't be sure whether someone has closed the browser window, but I imagine there'd be some count of not-yet-timed-out sessions or something, and surely logged-in-users...
Is there a standard and/or easy way to check this?
Jakob's answer is correct but does rely on installing and configuring the Membership features.
A crude but simple way of tracking users online would be to store a counter in the Application object. This counter could be incremented/decremented upon their sessions starting and ending. There's an example of this on the MSDN website:
Session-State Events (MSDN Library)
Because the default Session Timeout is 20 minutes the accuracy of this method isn't guaranteed (but then that applies to any web application due to the stateless and disconnected nature of HTTP).
I know this is a pretty old question, but I figured I'd chime in. Why not use Google Analytics and view their real time dashboard? It will require minor code modifications (i.e. a single script import) and will do everything you're looking for...
You may be looking for the Membership.GetNumberOfUsersOnline method, although I'm not sure how reliable it is.
Sessions, suggested by other users, are a basic way of doing things, but are not too reliable. They can also work well in some circumstances, but not in others.
For example, if users are downloading large files or watching videos or listening to the podcasts, they may stay on the same page for hours (unless the requests to the binary data are tracked by ASP.NET too), but are still using your website.
Thus, my suggestion is to use the server logs to detect if the website is currently used by many people. It gives you the ability to:
See what sort of requests are done. It's quite easy to detect humans and crawlers, and with some experience, it's also possible to see if the human is currently doing something critical (such as writing a comment on a website, editing a document, or typing her credit card number and ordering something) or not (such as browsing).
See who is doing those requests. For example, if Google is crawling your website, it is a very bad idea to go offline, unless the search rating doesn't matter for you. On the other hand, if a bot is trying for two hours to crack your website by doing requests to different pages, you can go offline for sure.
Note: if a website has some critical areas (for example, writing this long answer, I would be angry if Stack Overflow goes offline in a few seconds just before I submit my answer), you can also send regular AJAX requests to the server while the user stays on the page. Of course, you must be careful when implementing such feature, and take in account that it will increase the bandwidth used, and will not work if the user has JavaScript disabled).
You can run command netstat and see how many active connection exist to your website ports.
Default port for http is *:80.
Default port for https is *:443.