email tracking image duplicate requests - http

I am embedding tracking images within emails that are being sent from a custom-built opt-in CRM system. The image src is an encoded .gif, such as src="12_34_675.gif". The image is served by an ASP.NET httphandler that decodes the src encoding and serves a transparent image.
Everything works fine, but some email clients request the image multiple times, creating duplicate entries. Some clients make three calls all within one second, and some seem to make tens of calls over a day or so. Mostly email clients make single calls, but these few duplicates are very perplexing. I know I can code around them, but I'd really like to understand what's going on.
I've checked the IIS log files, which show that the duplicate requests are coming from the client machines. I can't think what might be causing these duplicate http requests.
Help!

I don't think this is something you can control. What if they have an old version of outlook open (older version used to open images embedded in message by default, nice) with the preview pane and pass through your email a few time.
I am sure you are not using the tracking data directly from urchin or whatever you are using. Is there a reason the duplicate log entries for the image are a problem?

I actually have this same problem now and I'm not sure why.
in my code I redirect using mod_rewrite to a tracking script and this script basically parses the get params for which campaign and which contact list etc, then I write some data to the database.
at the end of the script, I output the image using code like
header("Content-Type: image/jpeg");
readfile($filename);
$filename is the correct file, I checked it, var_dump'd it and the script does output the correct file.
in my access logs, I get TWO hits on the script, I get TWO duplicate records in the database and all my stats are double what they should be.

Related

Calculate the number of visits based on downloaded GB

I have a website hosted in firebase that totally went viral for a day. Since I wasn't expecting that, I didn't install any analytics tool. However, I would like to know the number of visits or downloads. The only metric I have available is the GB Downloaded: 686,8GB. But I am confused because if I open the website with the console of Chrome, I get two different metrics about the size of the page: 319KB transferred and 1.2MB resources. Furthermore, not all of those things are transferred from firebase but from other CDN as you can see in the screenshots. What is the proper way of calculating the visits I had?
Transferred metric is how much bandwidth was used after compression was applied.
Resources metric is how much disk space those resources use before they are compressed (for transfer).
True analytics requires an understanding how what is on the web. There are three classifications:
Humans, composed of flesh and blood and overwhelmingly (though not absolutely) use web browsers.
Spiders (or search engines) that request pages with the notion that they obey robots.txt and will list your website in their websites for relevant search queries.
Rejects (basically spammers and the unknowns) which include (though are far from limited to) content/email scrapers, brute-force password guessers, vulnerability scanners and POST spammers.
With this clarification in place what you're asking in effect is, "How many human visitors am I receiving?" The easiest way to obtain that information is to:
Determine what user agent requests are human (not easy, behavior based).
Determine the length of time a single visit from a human should count as.
Assign human visitors a session.
I presume you understand what a cookie is and how it differs from a session cookie. Obviously when you sign in to a website you are assigned a session. If that session cookie is not sent to the server on a page request you will in effect be signed out. You can make session cookies last for a long time and it will come down to factors such as convenience for the visitor and if you directly count those sessions or use it in conjunction with something else.
Now your next thought likely is, "But how do I count downloads?" Thankfully you mention PHP in your website so I can thankfully give you some code that should make sense to you. If you just link directly to the file you'd be stuck with (at best) counting clicks via a click event on the anchor element though if the download gets canceled because it was a mistake or something else makes it more subjective than my suggestion. Granted my suggestion can still be subjective (e.g. they decide they actually don't want to download and cancel before the completion) and of course if they use the download is another aspect to consider. That being said if you want the server to give you a download count you'd want to do the following:
You'll may want to use Apache rewrite (or whatever the other HTTP server equivalents are) so that PHP handles the download.
You'll may need to ensure Apache has the proper handling for PHP (e.g. AddType application/x-httpd-php5 .exe .msi .dmg) so your server knows to let PHP run on the request file.
You'll want to use PHP's file_exists() with an absolute file path on the server for the sake of security.
You'll want to ensure that you set the correct mime for the file via PHP's header() as you should expect browsers to be horrible at guessing.
You absolutely need to use die() or exit() to avoid Gecko (Firefox) bugs if your software leaks even whitespace as the browser would interpret it as part of the file likely causing corruption.
Here is the code for PHP itself:
$p = explode('/',strrev($_SERVER['REQUEST_URI']));
$file = strrev($p[0]);
header('HTTP/1.1 200');
header('Content-Type: '.$mime);
echo file_get_contents($path_absolute.$file);
die();
For counting downloads if you want to get a little fancy you could create a couple of database tables. One for the files (download_files) and the second table for requests (download_requests). Throw in basic SQL queries and you're collecting data. Record IPv6 (Storing IPv6 Addresses in MySQL) and you'll be able to discern from a query how many unique downloads you have.
Back to human visitors: it takes a very thorough study to understand the differences between humans and bots. Things like Captcha are garbage and are utterly annoying. You can get a rough start by requiring a cookie to be sent back on requests though not all bots are ludicrously stupid. I hope this at least gets you on the right path.

HTTP PUT and POST alternatives for uploading content

Other than HTTP PUT and POST, what other methods can a web application designer use to allow users to upload content (either files or listbox text) from a page of his web app to a remote server?
On the same topic, I was wondering what technology/APIs does a service like Google Docs or Google Drive use? The reason I ask this is: Our Sys Admin has disabled file uploading (via Squid proxy), yet I was able to create and share a document using Google Docs / Google Drive.
Many thanks in advance,
/HS
EDIT Please see the strikeout above.
This depends on the server in question - as the standard set of HTTP commands can be expanded, and some may not be configured/allowed. One of the common commands is "OPTIONS" that ask "what can I do".
But to answer more helpfully: you generally have two main options:
POST (the one you probably want to user as it's nearly always avaiable
GET. You could use GET (but I'm NOT advocating it - just saying you could you it - you should not use a GET to make changes to the server). There are problems with this approach (including size of files, manually handling the encoding etc) but it's possible if you have to go this route.
PUT it often not enabled on servers for security reasons.
More reading: http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
Edit: if "file uploading" is prevented by proxy, have you tried encoding the POST? i.e. As opposed to sending a multipart POST, try encoding the files yourself into POST string and sending that instead? Or encode the file and split into multiple small posts and piecing them together at the other end?
Google Docs uses a mixture of POST and GET. POST for the updates. Google Drive I don't know.

Scraping ASP.NET with Python and urllib2

I've been trying (unsuccessfully, I might add) to scrape a website created with the Microsoft stack (ASP.NET, C#, IIS) using Python and urllib/urllib2. I'm also using cookielib to manage cookies. After spending a long time profiling the website in Chrome and examining the headers, I've been unable to come up with a working solution to log in. Currently, in an attempt to get it to work at the most basic level, I've hard-coded the encoded URL string with all of the appropriate form data (even View State, etc..). I'm also passing valid headers.
The response that I'm currently receiving reads:
29|pageRedirect||/?aspxerrorpath=/default.aspx|
I'm not sure how to interpret the above. Also, I've looked pretty extensively at the client-side code used in processing the login fields.
Here's how it works: You enter your username/pass and hit a 'Login' button. Pressing the Enter key also simulates this button press. The input fields aren't in a form. Instead, there's a few onClick events on said Login button (most of which are just for aesthetics), but one in question handles validation. It does some rudimentary checks before sending it off to the server-side. Based on the web resources, it definitely appears to be using .NET AJAX.
When logging into this website normally, you request the domian as a POST with form-data of your username and password, among other things. Then, there is some sort of URL rewrite or redirect that takes you to a content page of url.com/twitter. When attempting to access url.com/twitter directly, it redirects you to the main page.
I should note that I've decided to leave the URL in question out. I'm not doing anything malicious, just automating a very monotonous check once every reasonable increment of time (I'm familiar with compassionate screen scraping). However, it would be trivial to associate my StackOverflow account with that account in the event that it didn't make the domain owners happy.
My question is: I've been able to successfully log in and automate services in the past, none of which were .NET-based. Is there anything different that I should be doing, or maybe something I'm leaving out?
For anyone else that might be in a similar predicament in the future:
I'd just like to note that I've had a lot of success with a Greasemonkey user script in Chrome to do all of my scraping and automation. I found it to be a lot easier than Python + urllib2 (at least for this particular case). The user scripts are written in 100% Javascript.
When scraping a web application, I use either:
1) WireShark ... or...
2) A logging proxy server (that logs headers as well as payload)
I then compare what the real application does (in this case, how your browser interacts with the site) with the scraper's logs. Working through the differences will bring you to a working solution.

Need to check uptime on a large file being hosted

I have a dynamically generated rss feed that is about 150M in size (don't ask)
The problem is that it keeps crapping out sporadically and there is no way to monitor it without downloading the entire feed to get a 200 status. Pingdom times out on it and returns a 'down' error.
So my question is, how do I check that this thing is up and running
What type of web server, and server side coding platform are you using (if any)? Is any of the content coming from a backend system/database to the web tier?
Are you sure the problem is not with the client code accessing the file? Most clients have timeouts and downloading large files over the internet can be a problem depending on how the server behaves. That is why file download utilities track progress and download in chunks.
It is also possible that other load on the web server or the number of users is impacting server. If you have little memory available and certain servers then it may not be able to server that size of file to many users. You should review how the server is sending the file and make sure it is chunking it up.
I would recommend that you do a HEAD request to check that the URL is accessible and that the server is responding at minimum. The next step might be to setup your download test inside or very close to the data center hosting the file to monitor further. This may reduce cost and is going to reduce interference.
Found an online tool that does what I needed
http://wasitup.com uses head requests so it doesn't time out waiting to download the whole 150MB file.
Thanks for the help BrianLy!
Looks like pingdom does not support the head request. I've put in a feature request, but who knows.
I hacked this capability into mon for now (mon is a nice compromise between paying someone else to monitor and doing everything yourself). I have switched entirely to https so I modified the https monitor to do it. The did it the dead-simple way: copied the https.monitor file, called it https.head.monitor. In the new monitor file I changed the line that says (you might also want to update the function name and the place where that's called):
get_https to head_https
Now in mon.cf you can call a head request:
monitor https.head.monitor -u /path/to/file

How can I prevent/make it hard to download my flash video?

I want to at least prevent normal users to download my flash video.
What's the best way to do it?
Create a httphandler, add a token (e.g. timeid), set the cache control to no-cache so that only the users with correct token can view the correct video. Is that feasible?
It is the requirement from client that the video should not be downloaded by users and should be watched only in the particular website.
I want to know if this works:
http://www.somesite.com/video.swf?time=1248319067
Server will generate a token(time in the above example) so that user can only have one request to this link. If the user wants to watch the video again, he needs to go to our website to get the token again. Is this okay to prevent novices from downloading?
I can't download this flash video by the downloadHelper firefox plugin:
http://news.bbc.co.uk/2/hi/americas/8164177.stm
Updated (13:49 pm 2009/07/23):
The above file can be downloaded using some video download software.
The video files of following Chinese sites are well protected (I can't download it using many video download software):
http://programme.tvb.com/drama/abrideforaride/video/
Do you know how it is done?
I dont think there is an easy way to stop people from getting your videos if they want them,
there are plenty of plugins for firefox that allow downloading from even youtube and many places. And i imagine those plugins would disable any attempt you made to hide your videos.
not too terribly different than taking an image from flicker, they put a clear gif image over the image that you want to view, so that when you right click and save you get "the shield" image, however can be defeated by the lowly print screen button.
if you want casual users from getting your file, use a flash control and buffer a minute or two of your videos and make that flash authenticate with the server to get those files. that seems reasonable to me
I don't think there really is an easy way to limit people from getting at it. Your sending them the video, that is how they are able to view it. Any user could just use FRAPS or a similar tool to copy the video from the screen as well.
If your worry is being copied and used elsewhere then you can watermark it or use a few other types of copy protection methods that will allow you to identify your work on other sites. If your worried about people copying it for personal use, then you really have no way of stopping it, you are sending it to them.
Edit: Due diligence would be to inform your customer of how easy it is to copy the work that they will be posting. Most clients have really no idea how easy it is.
This is how I like to tackle this issue.
This method works by creating a ticket to download the content over one http request...Another attempt to use the same ticket to download the content will fail, hence any extensions that attempt to download the content again or a user manually attempting to fail to do so, hence the flash player will be the only way to download the content. However there is one downfall for this approach, users will not be able to skip to a part of the video that has not been download...in some standard player implementation that may even stop the video from loading. Any ideas on this will be highly appreciated.
I begin by writing a PHP script that takes in a video_id, file_name, or a local path to your video file (Depending on the storage infrastructure of your video collection) in a GET request along with a unique hash value (a hard to guess and come up with probably generated with a secret key so it can be validated to be coming from our reciever (flash player), if the hacker send us a used hash or an invalid hash (does not satisfy our key), we will not send him the file). The PHP script then opens the video file and sends its content with the correct video mime type. for FLV the mime type is video/x-flv. It makes sure that once a unique hash has not been used before and is validly generated from your secret encryption key.
Then once the page with the flash player is loading we can give the .php file with the right get parameters as the video url to the video player. (If it is a prude player that only allows flv files you can always program your .htaccess file to parse .flv files as php script in the specific folder only, and rename your .php file as .flv and try your luck)...anyways...Also generate a hash key...perhaps you can take the servers current time and append it to a salt value such as another key known by both scripts, and encrypt this final concatenation with your secret key.
So once the video gateway php script will recieve a filename or hash key...it will decrypt the hash key and figure out if it is validly generated from teh sister script, and make sure not to send the video again to the same hash key...
For added security you can perhaps reset the secret key everyday using either a cronjob or bootstrap mechanism. To prevent duplicate use of hashkeys you can store them in a mysql database, file operations, or NOSQL (depending on your needs and infrastructure).
Make sure that the file is requested by the same user agent the hash key was generated for. In case the hacker trys to cURL or Wget your videos unused url before the flash player gets a chance to consume the hash key. In this case the hacker will have to imitate the browser's user agent or download the file using their command line tool as well...However please note that this is not your average champ.
It sounds like you need to add authorization and authentication.
You could put the flash video under a different folder in your ASP.Net application and add a web.config file in that folder to deny access to unauthorized users. For example:
Then you need to enable authentication for your website. The simplest method is forms authentication. A trivial example with hard coded username and password is provided here.
There is loads that you can do with the authentication framework in ASP.Net I suggest googling a bit.
The only way to do this is with a trusted client, DRM and an encrypted source.
Your player opens up a connection, the user has a connection to the stream, you perform some magic authentication with their token and then transmite the encrypted data to them.
If you don't do this then anyone can download your video and save it out.
However with all that aside, someone can run screen capture, then save your video and do it again. This is again where the DRM comes in as one of the key features of the DRM in windows clients is that the buffer cannot be sniffed as it's on the protected media pathway.
I guess its a question of how to protect your revenue but dealing with pirates is always going to be a problem for software devs no matter what their business is.
I have a solution that i'm gonna try for myself (as I have the same worries) but I know that it includes a lot of extra time and work...
Solution: using flash compress the video into an swf file. Before compressing add some AS code to the movie for authentication. suggestions for authentication:
1 test url
2 create a dedicated flash player that has handshake code checked by the video.swf
I like #2 better, and as an extra measure, you can overlay an id code over the video, so if someone captures the video using screen recording software, you'd at least be able to track the original source of the copied video.. and exact suitable retribution...
Simply you can't prevent it.
But..you can make it difficult.
Here some ideas come in my mind
1 First of all add your identifier to the video (always someone can download it)
2 The hard way... Add Ajax call back to server to check a random generated key that it will stored in the session every N seconds. After every post back clear the buffer of the player and start the video from were i was (using javascript).
Use again JavaScript prevent the video source from downloading by "view source".
3 Handle all your videos in urls like http://www.example.com/viewvideo/1 OR ../?id=1.
Add blank image overlay with transparent background.
Serve the original video and a blank video somewhere on the page with normal extension and style attribute "display:none". (will create problems to some download helpers)
4 Everytime you serve a video CHECK if the request is from a browser (ie check UserAgent)
5 Cookie with some random value combined with the id of the video. Check it client-side and server side and then serve the video.
6 On focusout event hide the video with javascript. put a resume button in the flash and leave the frame unchange (like pause but with no original video in buffer).
7 Combine those methods
these are random generated ideas,
not tested neither i say that guaranties no video downloading.
I have attempted two way to prevent the downloading but fails.
Using javascript to dynamically generate the object for flash.
Using the token idea proposed in the question.
What annoying me most is that a simple SAVE/AS from the firefox browser could easily bypass the tricks.
The only variable way so far is to using an empty swf file to load another swf file in. Combined with the token idea, it works.
in my answer you cant stop image/video theft but you can make harder for normal users but you can't make it harder for the programmers like us( i mean thiefs that knows little web programming) there are some tricks you can try:-
1.) Using flash as youtube and many others sites like http://www.funnenjoy.com does .
2.) Div overlaping or background pic setting (but users with little sense can easily save all resources by opening inspect element or other developer option).
3.) You can disable right click and specific keys like CTRL + S and others possibles with JAVASCRIPT but main drawback is that if user disable JAVASCRIPT our all tricks fail down.
4.)Save image in none online directories(if you have full access to web server) and read that files with server side languages like PHP every time when image / video is required and change image id time to time or create script that can automatically change ID after every access.
5.)Use .htaccess in apache to prevent linking of your images by others sites. you can use this site to automatically generate .htacess http://www.htaccesstools.com/hotlink-protection/

Resources