Why would a consistent number of visitors be missing User-Agent headers? - http

We recently deployed a mobile version of our site, and part of that deployment included a User-Agent check to determine which version to deliver to the end user.
Every minute or so since we released, we've had an Elmah error from an exception that was thrown when a User-Agent was blank.
We've already fixed the issue in production, but I'm curious as to why a consistent (but very small) percentage of our traffic might not have User-Agent defined.

This is a simple guess, but it could come from bots.
There's an amazing number of bots (search engines, botnets, and others) that constantly scan websites and servers for vulnerabilities, passwords and such. Sometimes they have a known User-Agent, sometimes not.
You could use a CDN service like CloudFlare to get an idea of how many of those requests come from robots (no, I don't work for that company - but using their services made me realize how much the web is polluted by bots, the stats are scary).

Related

Why so many time out from Google related URLs from time to time?

My website uses Google Ads Conversion and Google Analystics. From time to time, I will see Chrome reports errors in accessing Google related URLs, such as
GET https://www.googleadservices.com/pagead/conversion_async.js net::ERR_TIMED_OUT
A screenshot is put below:
THe error is generated from Google Tag Manager with the following URL:
https://www.googletagmanager.com/gtag/js?id=AW-1071615551&l=dataLayer&cx=c
I am curious why this will occur so frequently since Google has the best server and network connection in the world?
I have seen some report indicating Google Server is the best, for example:
Google CDN is the 2nd fast, see https://www.cdnperf.com/
Another report is not in English so I do not put it here.
I've been working with various Google services for a while now. Have never seen a timed out or a different kind of errorous hit except in cases:
Sudden client connection loss. Especially with mobile traffic.
Second Most popular issue - adblockers. They tend to block tracking too. Often by interfering with the network requests, resulting in network errors that then surface in datadog/dynatrace and whatnot.
When it's impossible for the client to play its encryption part, which is likely either due to there being a middle man sniffing the traffic, or, which is actually more common, the client having a completely wrong date. This should be quite a rare occasion
When there's a firewall-like filtering in place blocking requests to Google servers. Even more rare in my experience.

Is it possible to add logic to CDN

Is it possible to serve two different pages based on the user agent.
I want to serve pagename-iphone.html if the user agent matches iPhone and pagename-other.html for all other user agents. I want all pages on the site to follow this pattern. Is it possible to do this at the CDN level (cloud front, akamai etc).
thanks for your help!
I think what you are after is User Agent based caching, like vary: User-Agent.
In theory, a server provide Cache service can definitely do so, however, as far as I can tell CloudFront and most of other major CDN providers don' support so.
The basic reason is very straightforward that the currently there are too many User-Agent header, and it's almost unique on every single browser, not mention the different versions of the same browser. If you purely do things based on the whole User-Agent, you will lost the benefit of CDN cache most of the time.
Some of the more advanced servers allow you to add condition based on headers, for example, in Varnish, you can even add if,else logic for returning different values. But this is not available for majority of CDNs.
In the other hand, you should not rely on CDN to return different html pages. CDN is more commonly used to accelerate artifacts (js/css/imgs) instead of the whole page.
EDIT: Actually, I just recieved an email from AWS mentioned now CloudFront starts to support this:
Mobile Device Detection: You can now cache and deliver customized
content to your viewers on different devices (e.g. mobile vs. desktop)
based on the value of the User Agent header.
Please refers to: http://aws.amazon.com/about-aws/whats-new/2014/06/26/amazon-cloudfront-device-detection-geo-targeting-host-header-cors/ for more details.

implications of having a site in full https

I am currently developing an MVC4 web application for eCommerce. The site will contain a login and users can visit the site, input their details and submit orders etc. This is a traditional eCommerce site.
To boost the security of the site, I am looking to set up the entire site in https. As the user will be supplying their log in credentials and storing personal information in cookies, I would like the site to be fully secured.
I have concerns though, these being if I set up the site in https, will it detriment performance? Will it impact negatively on search engine optimization? Are there any other implications of having an entire site in https?
I use output caching to cache the content of my views - with https will these still get cached?
I have been reviewing security guidelines and documentation, such as this from OWASP and they recommend this. Also, I see that sites such as twitter are fully https.
Generally speaking, no - whole-site encryption is not a problem for performance.
(Just make sure you disable SSL 2.0 on your server, as it's vulnerable to the BEAST attack; you should use TLS 1.0 or SSL3.0 which have been supported by pretty much every browser since 2000).
The performance issues were a problem years ago, but not anymore. Modern servers have the capacity to deal with the encryption of hundreds of requests and responses every second.
You haven't mentioned deploying a load-balancer or failover system, which implies your site won't be subject to thousands of pageviews every second. That's when you need to start using SSL offloaders - but you're okay for now.
Output caching is not affected by encryption - just make sure you're not serving one person's output to another (i.e. cache a shopping cart or banking details in Session or with the Session ID in the Cache key).

Google Analytics vs ddos

What i'm wondering is, what kind of behaviour does google analytics show when a ddos attack occurs? Any theories?
My theory would be that an effective DDoS platform/script would not include anything as heavyweight as a JavaScript engine, and that therefore the DDoS activity would not show up in Google Analytics at all.
The point of a DDoS attack is to overwhelm the server with a flood of requests. Any CPU cycles that are spent evaluating JavaScript in the response that the server sends back are cycles that could better be used churning out more requests to the server. I would fully expect a properly executed DDoS attack to not waste time parsing the response from the server, or even reading it off of the underlying socket, let alone interpreting and executing and JavaScript that may be embedded in the markup or fetching scripts and other resources from domains other than the target server.
Of course, this does not preclude the possibility of an exceptionally naive DDoS attack implemented using web frameworks and libraries that do evaluate embedded JavaScript. Such an attack would not (or rather, should not if you've implemented your server code correctly) be very effective, but it would likely generate a spike in Google Analytics traffic.
It depends on the way that the DDOS is implemented. If it's simply an executable distributed to multiple machines, making simple HTTP queries using native TCP sockets, then Google Analytics wouldn't notice anything at all: because the JavaScript that gets returned would never be executed.
However, other sorts of DDOS attacks could leverage actual browsers distributed across many machines. For instance, if you could hack the Yahoo home page and insert an <iframe src='takemedown.com'> into it, you could easily DDOS "takemedown.com". In this particular scenario, GA would certainly detect the impressions, and because (depending on the scenario) there might be an HTTP referrer tag, you could possibly run a report in GA that could pull out the suspicious impressions.
But there are other similar scenarios that wouldn't leave any particular footprints. For instance, if you could hack Lady Gaga's twitter account, you could send out a link to her 16MM followers, and a significant number would immediately click on it: and since most of those clicking on it would probably be doing so from within a separate app, there wouldn't be any referrer tag, and no particular way of identifying the requests.
In other words, it all depends, but it's probably not a terribly useful avenue to investigate. In many (most?) scenarios, GA wouldn't even recognize the impression; and in many others, wouldn't have any reasonable way of picking out the good impressions from the bad.
It will show up 100% some significant peaks in google analytics , simply because there are huge number of requests from multiple sources having huge bounce rate !
When a HTTP DDoS attack occurs the attacker is either using several (thousands) of computers to do so. Sometimes, it's also done with servers. When they make the request, they don't render the javascript or anything - they simply in most cases just make a GET request to the webpage.
So no, it shouldn't really have an impact on GoogleAnalytics
Well, I'm also searching this kind of information, but I have some considerations about the answer:
You will probably not see the attack itself with Google analytics, but you should see the results, I mean, a DDoS is a "distributed deny of service", so, if the service is effectively denied, then you should see a flat line on the graph on Google analytics.
It depends how the bot works, but here's what happened to my website:
Google Analytics real time report for the monk
As well as the increase in traffic you will likely see your bounce rate go sky high and average time on page significantly drop - which I'm sure can have a negative impact on SERPS.
For me it coincided with a Google update so first I put it down to that, but I started getting a lot of traffic to the root page, terms, and privacy, with many prefixed with /?m=0 which is in itself odd (and I'd love for someone to shed light).
The attack caused a great deal of timeouts and was painful to fix:
In short, I hooked up CloudFlare, then created Security -> WAF rules to challenge countries where I was receiving most of the bot traffic. I also switched on the basic bot attack mode (there's a more effective super bot attack mode with the paid subscriptions).
The other interesting point of note was why was my site subject to a DDOS attack. I wish I knew, but at a similar time to when the attack started I was approached by someone who enquired about buying the website. Possibly a tactic to get me to sell it/sell it cheap.

Check if anyone is currently using an ASP.Net app (site)

I build ASP.NET websites (hosted under IIS 6 usually, often with SQL Server backends and forms authentication).
Clients sometimes ask if I can check whether there are people currently browsing (and/or whether there are users currently logged in to) their website at a given moment, usually so the can safely do a deployment (they want a hotfix, for example).
I know the web is basically stateless so I can't be sure whether someone has closed the browser window, but I imagine there'd be some count of not-yet-timed-out sessions or something, and surely logged-in-users...
Is there a standard and/or easy way to check this?
Jakob's answer is correct but does rely on installing and configuring the Membership features.
A crude but simple way of tracking users online would be to store a counter in the Application object. This counter could be incremented/decremented upon their sessions starting and ending. There's an example of this on the MSDN website:
Session-State Events (MSDN Library)
Because the default Session Timeout is 20 minutes the accuracy of this method isn't guaranteed (but then that applies to any web application due to the stateless and disconnected nature of HTTP).
I know this is a pretty old question, but I figured I'd chime in. Why not use Google Analytics and view their real time dashboard? It will require minor code modifications (i.e. a single script import) and will do everything you're looking for...
You may be looking for the Membership.GetNumberOfUsersOnline method, although I'm not sure how reliable it is.
Sessions, suggested by other users, are a basic way of doing things, but are not too reliable. They can also work well in some circumstances, but not in others.
For example, if users are downloading large files or watching videos or listening to the podcasts, they may stay on the same page for hours (unless the requests to the binary data are tracked by ASP.NET too), but are still using your website.
Thus, my suggestion is to use the server logs to detect if the website is currently used by many people. It gives you the ability to:
See what sort of requests are done. It's quite easy to detect humans and crawlers, and with some experience, it's also possible to see if the human is currently doing something critical (such as writing a comment on a website, editing a document, or typing her credit card number and ordering something) or not (such as browsing).
See who is doing those requests. For example, if Google is crawling your website, it is a very bad idea to go offline, unless the search rating doesn't matter for you. On the other hand, if a bot is trying for two hours to crack your website by doing requests to different pages, you can go offline for sure.
Note: if a website has some critical areas (for example, writing this long answer, I would be angry if Stack Overflow goes offline in a few seconds just before I submit my answer), you can also send regular AJAX requests to the server while the user stays on the page. Of course, you must be careful when implementing such feature, and take in account that it will increase the bandwidth used, and will not work if the user has JavaScript disabled).
You can run command netstat and see how many active connection exist to your website ports.
Default port for http is *:80.
Default port for https is *:443.

Resources