Checking if a computer is hacked via the headers - http

If your computer is infected, apparently Google will tell you so - as shown in the image below:
According to this article, Google use HTTP headers to work this out. But how do they do it, what sort of headers should we look for?
Thank you!

The Google Security blog post you linked doesn't mention HTTP headers.
A key point in the blog post is this:
This particular malware causes infected computers to send traffic to Google through a small number of intermediary servers called “proxies.”
And this:
...taking steps to notify users whose traffic is coming through these proxies...
Google doesn't say much about the proxies, for instance if they were standards-compliant(ish) HTTP proxies or just servers echoing the user requests.
The "unusual" traffic that originated from Google would have been from a small set of IP addresses. No special HTTP headers would be necessary. Google only had to add the warning message to pages being served to the suspect IP addresses. That's it.
The term "signature" in the the follow up link from your comments is used very informally, probably alluding to the IP addresses of the proxy servers. If you want to imagine something more complicated than that, then I suppose it's possible that these proxies (like many HTTP clients) could be detected by some pattern of HTTP headers unique to them. For example the User-Agent or Via headers, or even something more subtle like the ordering or capitalization of headers. I doubt it came to that though, and I don't see much value in speculating, especially two years after the fact.

Related

For Network traffic, what does the word "origin" mean to you?

At my company, we often use the word "origin" to mean "the place where all your content lives and where we send our network requests to get that content." This does not seem to make sense to me, and I'm not sure that it makes sense to our prospective customers either.
For me, the concept of an origin being the place that things are sent TO is very non-intuitive... but I have not been in a networking/CDN field for very long, and I am trying to determine if this is an industry-standard definition.
In other words: if I talk to someone who needs to configure internet traffic for their company's network, and I say "origin," will it mean to them what we THINK it means to them?
Thanks!
As far as discussions of networking in the particular context of the Web, the word origin is a term of art with a specific meaning, defined in RFC 6454. It means a particular scheme/host/port triple, such that two URLs that have the same scheme/host/port triple are understood to have the same origin. So http://example.com and https://example.com do not have the same origin, because they have different schemes, and http://example.com and http://example.com:8888 do not have the same origin, because they have different ports.
That definition is most directly relevant to the Web because the entire Web security model is based on a well-defined "same origin" policy, but that said it seems like the underlying notion is useful for discussions in other contexts as well, because it amounts to considering things simply in terms of "a service on a particular host listening on a particular port and speaking a particular protocol".
An origin is where something originates from. In networking, the origin of network traffic is the source of the network traffic. This may sometimes look backwards because of the request/response of many network protocols. For example, a network client receiving a web page originated the conversation by sending an HTTP request to the server, so it is the origin of the conversation, but the server is the origin of the web page.
The word origin in networking is really no different than the simple definition provided by Webster's Dictionary:
origin
noun or·i·gin \ˈȯr-ə-jən, ˈär-\
Simple Definition of origin
Popularity: Top 30% of words
the point or place where something begins or is created : the source
or cause of something
the place, social situation, or type of family that a person comes
from

The reason for a mandatory 'Host' clause in HTTP 1.1 GET

Last week I started quite a fuss in my Computer Networks class over the need for a mandatory Host clause in the header of HTTP 1.1 GET messages.
The reason I'm provided with, be it written on the Web or shouted at me by my classmates, is always the same: the need to support virtual hosting. However, and I'll try to be as clear as possible, this does not appear to make sense.
I understand that in order to allow two domains to be hosted in a single machine (and by consequence, share the same IP address), there has to exist a way of differentiating both domain names.
What I don't understand is why it isn't possible to achieve this without a Host clause (HTTP 1.0 style) by using an absolute URL (e.g. GET http://www.example.org/index.html) instead of a relative one (e.g. GET /index.html).
When the HTTP message got to the server, it (the server) would redirect the message to the appropriate host, not by looking at the Host clause but, instead, by looking at the hostname in the URL present in the message's request line.
I would be very grateful if any of you hardcore hackers could help me understand what exactly am I missing here.
This was discussed in this thread:
modest suggestions for HTTP/2.0 with their rationale.
Add a header to the client request that indicates the hostname and
port of the URL which the client is accessing.
Rationale: One of the most requested features from commercial server
maintainers is the ability to run a single server on a single port
and have it respond with different top level pages depending on the
hostname in the URL.
Making an absolute request URI required (because there's no way for the client to know on beforehand whether the server homes one or more sites) was suggested:
Re the first proposal, to incorporate the hostname somewhere. This
would be cleanest put into the URL itself :-
GET http://hostname/fred http/2.0
This is the syntax for proxy redirects.
To which this argument was made:
Since there will be a mix of clients, some supporting host name reporting
and some not, it just doesn't matter how this info gets to the server.
Since it doesn't matter, the easier to implement solution is a new HTTP
request header field. It allows all clients and servers to operate as they
do now with NO code changes. Clients and servers that actually need host
name information can have tiny mods made to send the extra header field
containing the URL and process it.
[...]
All I'm suggesting is that there is a better way to
implement the delivery of host name info to the server that doesn't involve
hacking the request syntax and can be backwards compatible with ALL clients
and servers.
Feel free to read on to discover the final decision yourself. But be warned, it's easy to get lost in there.
The reason for adding support for specifying a host in an HTTP request was the limited supply of IP addresses (which was not an issue yet when HTTP 1.0 came out).
If your question is "why specify the host in a Host header as opposed to on the Request-Line", the answer is the need for interopability between HTTP/1.0 and 1.1.
If the question is "why is the Host header mandatory", this has to do with the desire to speed up the transition away from assigned IP addresses.
Here's some background on the Internet address conservation with respect to HTTP/1.1.
The reason for the 'Host' header is to make explicit which host this request refers to. Without 'Host', the server must know ahead of time that it is supposed to route 'http://joesdogs.com/' to Joe's Dogs while it is supposed to route 'http://joscats.com/' to Jo's Cats even though they are on the same webserver. (What if a server has 2 names, like 'joscats.com' and 'joescats.com' that should refer to the same website?)
Having an explicit 'Host' header make these kinds of decisions much easier to program.

Can GET and POST requests from a same machine come from different IPs?

I'm pretty sure I remember reading --but cannot find back the links anymore-- about this: on some ISP (including at least one big ISP in the U.S.) it is possible to have a user's GET and POST request appearing to come from different IPs.
(note that this is totally programming related, and I'll give an example below)
I'm not talking about having your IP adress dynamically change between two requests.
I'm talking about this:
IP 1: 123.45.67.89
IP 2: 101.22.33.44
The same user makes a GET, then a POST, then a GET again, then a POST again and the servers see this:
- GET from IP 1
- POST from IP 2
- GET from IP 1
- POST from IP 2
So altough it's the same user, the webserver sees different IPs for the GET and the POSTs.
Surely seen that HTTP is a stateless protocol this is perfectly legit right?
I'd like to find back the explanation as to how/why certain ISP have their networks configured such that this may happen.
I'm asking because someone asked me to implement the following IP filter and I'm pretty sure it is fundamentally broken code (breaking havoc for at least one major american ISP users).
Here's a Java servlet filter that is supposed to protect against some attacks. The reasoning is that:
"For any session filter checks that IP address in the request is the same that was used when session was created. So in this case session ID could not be stolen for forming fake sessions."
http://www.servletsuite.com/servlets/protectsessionsflt.htm
However I'm pretty sure this is inherently broken because there are ISPs where you may see GET and POST coming from different IPs.
Some ISPs (or university networks) operate transparent proxies which relay the request from the outgoing node that is under the least network load.
It would also be possible to configure this on a local machine to use the NIC with the lowest load which could, again, result in this situation.
You are correct that this is a valid state for HTTP and, although it should occur relatively infrequently, this is why validation of a user based on IP is not an appropriate determinate of identity.
For a web server to be seeing this implies that the end user is behind some kind of proxy/gateway. As you say it's perfectly valid given that HTTP is stateless, but I imagine would be unusual. As far as I am aware most ISPs assign home users a real, non-translated IP (albeit usually dynamic).
Of course for corporate/institutional networks they could be doing anything. load balancing could mean that requests come from different IPs, and maybe sometimes request types get farmed out to different gateways (altho I'd be interested to know why, given that N_GET >> N_POST).

How to tell if a Request is coming from a Proxy?

Is it possible to detect if an incoming request is being made through a proxy server? If a web application "bans" users via IP address, they could bypass this by using a proxy server. That is just one reason to block these requests. How can this be achieved?
IMHO there's no 100% reliable way to achieve this but the presence of any of the following headers is a strong indication that the request was routed from a proxy server:
via:
forwarded:
x-forwarded-for:
client-ip:
You could also look for the proxy or pxy in the client domain name.
If a proxy server is setup properly to avoid the detection of proxy servers, you won't be able to tell.
Most proxy servers supply headers as others mention, but those are not present on proxies meant to completely hide the user.
You will need to employ several detection methods, such as cookies, proxy header detection, and perhaps IP heuristics to detect such situations. Check out http://www.osix.net/modules/article/?id=765 for some information on this situation. Also consider using a proxy blacklist - they are published by many organizations.
However, nothing is 100% certain. You can employ the above tactics to avoid most simple situations, but at the end of the day it's merely a series of packets forming a TCP/IP transaction, and the TCP/IP protocol was not developed with today's ideas on security, authentication, etc.
Keep in mind that many corporations deploy company wide proxies for various reasons, and if you simply block proxies as a general rule you necessarily limit your audience, and that may not always be desirable. However, these proxies usually announce themselves with the appropriate headers - you may end up blocking legitimate users, rather than users who are good at hiding themselves.
-Adam
Did a bit of digging on this after my domain got hosted up on Google's AppSpot.com with nice hardcore porn ads injected into it (thanks Google).
Taking a leaf from this htaccess idea I'm doing the following, which seems to be working. I added a specific rule for AppSpot which injects a HTTP_X_APPENGINE_COUNTRY ServerVariable.
Dim varys As New List(Of String)
varys.Add("VIA")
varys.Add("FORWARDED")
varys.Add("USERAGENT_VIA")
varys.Add("X_FORWARDED_FOR")
varys.Add("PROXY_CONNECTION")
varys.Add("XPROXY_CONNECTION")
varys.Add("HTTP_PC_REMOTE_ADDR")
varys.Add("HTTP_CLIENT_IP")
varys.Add("HTTP_X_APPENGINE_COUNTRY")
For Each vary As String In varys
If Not String.IsNullOrEmpty(HttpContext.Current.Request.Headers(vary)) Then HttpContext.Current.Response.Redirect("http://www.your-real-domain.com")
Next
You can look for these headers in the Request Object and accordingly decide whether request is via a proxy/not
1) Via
2) X-Forwarded-For
note that this is not a 100% sure shot trick, depends upon whether these proxy servers choose to add above headers.

Is data sent via HTTP POST when the Server does not exist?

I work for a large-ish advertising company. We've created a very lightweight clone of the PayPal IPN so we can offer CC Processing services for our top advertisers.
Like the PP IPN, it's a simple RESTful interface.
I deliberately instructed our admin guys to configure the vhost for this web app to only respond to requests on port 443.
This particular question is beyond my HTTP Protocol knowledge:
This may vary from browser to browser, but when a user submits a form, and the ACTION for that form is, say http://www.somesite.com, if the browser cannot resolve that site, does the post payload ever get sent over the wire?
I know this is a bit esoteric and it's more of an implementation question than something that exists in the HTTP RFC (as far as I could tell). Any takers?
Before sending any data the browser needs to open a TCP connection to the target site. Since this connection to the target site cannot be opened in the first place, no data can be sent.
Update (Thanks for the hint in the comments):
Use HTTP-Requests like POST to avoid sending data over the wire which could be intercepted by proxies before the existence of the target could be checked. With proxies the TCP-connection is always established successfully and the HTTP-request-header is sent to it. The POST-request contains the additional data in his request-body which should be sent only if the request header returns no error. Nevertheless, the implementation of proxies differ and I cannot guarantee that there is no proxy which returns an error if the target-site is non-existing. But in such a case I don't know any way where you could avoid sending the complete data over the wire...

Resources