We set up server-side tagging using the docker-container google provides in its "manual setup guide"
Everything is working fine, but all request against the tagging-server are answered without any compression: no gzip, no deflate, no br, just plain text.
Is there anything we are missing? The docs provided by google do not give any hints...
As of 2022, this is not possible. We used a cdn that gives us content compression.
Related
A website was audited for vulnerabilities and it had flagged XSS for many pages which, from my point of view, do not appear to be vulnerable as I don't display any data captured from form the page or the URL (such as query string).
Acunetix flagged the following URL as XSS by adding some javacript code
http://www.example.com/page-one//?'onmouseover='pU0e(9527)
Report:
GET /page-one//?'onmouseover='pU0e(9527)'bad=' HTTP/1.1
Referer: https://www.example.com/
Connection: keep-alive
Authorization: Basic FXvxdAfafmFub25cfGb=
Accept: /
Accept-Encoding: gzip,deflate
Host: example.com
So, how could this be vulnerable or is it possible that it's vulnerable?
Above all, if onmouseover can be added as XSS then how will it be affected?
Since you asked for more information, I'll post my response as an answer.
The main question as I see it:
Can there still be an XSS vulnerability from the query string if I don't use any of the parameters in my code?
Well, if they actually aren't used at all, then it should not be possible. However, there are subtle ways that you could be using them that you may have overlooked. (Posting the actual source code would be useful here).
One example would be something like this:
Response.Write("<a href='" +
HttpContext.Current.Request.Url.AbsoluteUri) + "'>share this link!</a>
This would put the entire URL in the body of the web page. The attacker can make use of the query string even though they aren't mapped to variables because the full URL is written in the response. Keep in mind it could also be in a hidden field.
Be careful writing out values like HttpContext.Current.Request.Url.AbsoluteUri or HttpContext.Current.Request.Url.PathAndQuery.
Some tips:
Confirm that the scanner is not reporting a false positive by opening the link in a modern browser like Chrome. Check the console for an error about "XSS Auditor" or similar.
Use an antixss library to encode untrusted output before writing to the response.
read this: https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet
I am writing a webserver. I implemented GET and POST (application/x-www-form-urlencoded, multipart/form-data) and that works fine.
I am thinking of adding a RESTful module to the server. So had a look at some stuff that's out there and got opinions about when to PUT, POST, and GET.
My question is: what encoding (application/x-www-form-urlencoded, multipart/form-data) does PUT support (per the HTTP specifications), or can it handle both?
I am trying to make the webserver as standard specific as I can without shooting myself in the foot.
The limitation to application/x-www-form-urlencoded and multipart/form-data is not in the HTTP standard but in HTML. It's the only formats that can be created by an HTML form. From HTTP point of view, you can use any format, as long as you specify it to the server (Content-Type header) and obviously that the server can understand it. If not, it reply with a 415 Unsupported Media Type status code.
See:
http://www.w3.org/TR/1999/REC-html401-19991224/interact/forms.html#h-17.13.4
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.16
http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7
HTTP PUT can have whatever content-type the user wishes (the same as for all other HTTP methods).
I'm writing a simple crawler, and ideally to save bandwidth, I'd only like to download the text and links on the page. Can I do that using HTTP Headers? I'm confused about how they work.
You're on the right track to solving the problem.
I'm not sure how much you already know about HTTP headers, but basically an HTTP header is just a string formatting for a web server - it follows a protocol - and is pretty straightforward in that aspect. You write a request, and receive a response. The requests look like the things you see in the Firefox plugin LiveHTTPHeaders at https://addons.mozilla.org/en-US/firefox/addon/3829/.
I wrote a small post at my site http://blog.gnucom.cc/2010/write-http-request-to-web-server-with-php/ that shows you how you can write a request to a web server and then later read the response. If you only accept text/html you'll only accept a subset of what is available on the web (so yes, it will "optimize" your script to an extent). Note this example is really low level, and if you're going to write a spider you may want to use an existing library like cURL or whatever other tools your implementation language offers.
Yes, with using Accept: text/html you should only get HTML as a valid responses. That’s at least how it ought to be.
But in practice there is a huge difference between the standards and the actual implementations. And proper content negotiation (that’s what Accept is for) is one of the things that are barely supported.
An HTML page contains just the text plus some tag markup.
Images, scripts and stylesheets are (usually) external files that are referenced from the HTML markup. This means that if you request a page, you will already receive just the text (without the images and other stuff).
Since you are writing the crawler, you should make sure it doesn't follow URLs from images, scripts or stylesheets.
I'm not 100% sure, but I believe that GET /foobar.png will return the image even if you send Accept: text/html. For this reason I believe you should just filter what kind of URLs you crawl.
In addition, you may try to read the response headers in the crawler and close the connection before you read the body if the Content-Type is not text/html. It might be worthwhile for undesired larger files.
I have a website that contains pages with many small images. The images are set to cache, with the headers containing:
Expires "Thu, 31 Dec 2037 23:55:55 GMT"
Cache-Control "public, max-age=315360000"
When someone loads a page, however, it seems that we are still forced to send a 304 response for each image--better than sending the whole image, but still takes some time. Of course, this sort of caching is up to the browser, but is it possible to suggest to the browser that it use the cached images without any request at all?
If you have many small images on a page, consider making a CSS sprite with all the images - that will reduce the number of requests a lot. A List Apart explains the concept.
Take a look at RFC 2616, Part of the HTTP/1.1 Protocol:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
You'll see lots of options to play with. Primarily intended for proxies, not Browsers. You can't really force the browser to totally stop Modified-Since-Requests.
Especially older proxies might ignore your Cache-Control hints, see the mentioned paragraph on the aforementioned website:
Note that HTTP/1.0 caches might not implement Cache-Control and
might only implement Pragma: no-cache (see section 14.32).
If you are really concerned about such short requests, take a look if HTTP-keepalive feature in your server is enabled (which has sideeffects on it's own, of course).
Has anyone found any documentation or research about what data is transfered to Google Analytics when it's added to a site. The main thing I'm wondering about is post data, but the details of exactly what is sent would be useful.
I'm considering implementing it on a sites that have a lot of private data on them. I'm wondering what data Google will capture, if any. (The sites are login only.) I'm needing proof so I can provided to the users.
The official information can be found here
The visitor tracking information that you can get in the Google Analytics reports depends on Javascript code that you include in your website pages, referred to as the Google Analytics Tracking Code (GATC). Initial releases of the GATC used a Javascript file called urchin.js.
That script is then discussed in detail in that blog, and Google Analytics Help group can also provide some details.
a More detailed list of what that javascript collect is listed here.
I found the official google documentation here:
http://code.google.com/apis/analytics/docs/tracking/gaTrackingTroubleshooting.html
i also found this very discussion VERY useful:
http://www.google.com/support/forum/p/Google%20Analytics/thread?tid=5f11a529100f1d47&hl=en
helped me find out WTF utmcc actually DID
All info passes via URL and post params:
Cache
page
1
utmac
UA-745459-1
utmcc
__utma=52631473.656111131.1231670535.1235325662.1235336522.264;+__utmz=52631473.1235287959.257.8.utmccn
=(organic)|utmcsr=google|utmctr=site:domain.com|utmcmd=organic;+
utmcs
windows-1255
utmdt
page title
utmfl
10.0 r12
utmhid
1524858795
utmhn
www.domain.com
utmje
1
utmn
1273285258
utmp
/shakeit/?
utmr
0
utmsc
32-bit
utmsr
1280x800
utmul
en-us
utmwv
1.3
Host
www.google-analytics.com
User-Agent
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 (.NET CLR
3.5.30729)
Accept
image/png,image/;q=0.8,/*;q=0.5
Accept-Language
en-us,en;q=0.5
Accept-Encoding
gzip,deflate
Accept-Charset
ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive
300
Connection
keep-alive
Referer
http://www.hadash-hot.co.il/shakeit/?&page=1
Pragma
no-cache
Cache-Control
no-cache
look at http://www.google-analytics.com/urchin.js under the function urchinTracker and you'll see what's going on :)
I recommend trying the Google Chrome extension: https://chrome.google.com/extensions/detail/jnkmfdileelhofjcijamephohjechhna
This extension will provide debug information for all of the data sent to google analytics. It's especially helpful when you are adding new analytics features and want to verify they are working the way you expect them to.