How can I track downloads of files from remote websites

How can I track downloads of files from remote websites - google-analytics

I am sharing the link of a file (e.g. pdf), which is stored in my server. Is it possible to track whenever some user is downloading the file? I don't have access to the script of the other page but I thought I could track the incoming requests to my server. Would that be computationally expensive? Any hints towards which direction to look?

You can use the measurement protocol, a language agnostic description of a http tracking request to the Google Analytics tracking server.
The problem in your case is that you do not have a script between the click and the download to send the tracking request. One possible workaround would be to use the server access log, provided you have some control over the server.
For example the Apache web server can user piped logs, e.g. instead if being written directly to a file the log entry is passed to a script or program. I'm reasonably sure that other servers have something similar.
You could pipe the logs to a script that evaluates if the log entry points at the URL of your pdf file, and if so breaks down the info into individual data fields and sends them via a programming language of your choice to the GA tracking server.
If you cannot control the server to that level you'd need to place a script with the same name and location as the original file on the server, map the pdf extension to a script interpreter of your choice (in apache via addType, which with many hosts can be done via a htaccess file) and have the script sending the tracking request before delivering the original file.
Both solutions require a modicum of programming practice (the latter much less than the former). Piping logs might be expensive, depending on the number of requests to your server (you might create an extra log file for downloadable files, though). An intermediary script would not be an expensive operation.

Related

Why Jmeter can't record sites using the firebase as data connection

I tried to record a site using JMeter which uses Firebase for data storage but it fails to access the firebase and I can not log into the site while recording. Is there any way to access firebase during the recording of load testing in JMeter? I entered the JMeter certificate also but still, the problem is there. And also tried using the chrome extension still it also didn't give the expected output Error Description Image

Most probably it's due to incorrect JMeter configuration for recording, you need to import JMeter's certificate into your browser. The file is called ApacheJMeterTemporaryRootCA.crt, JMeter generates it under its "bin" folder when you start the HTTP(S) Test Script Recorder.
See HTTPS recording and certificates documentation chapter for more details.
Going forward consider looking at View Results Tree listener output and jmeter.log file, they should provide sufficient amount of information in order to get to the bottom of the issue. If you cannot interpret what you see there yourself - add at least essential parts of response/log to your question.
Also be aware of alternative "non-invasive" way of recording a JMeter test - JMeter Chrome Extension, in that case you won't have to worry about proxies and certificates and should be able to normally record whatever HTTP(S) traffic your browser generates

Wordpress logging requests into a database

I am trying to create a plugin which logs http requests from users into a database. So far I've logged the requests for php files by hooking my function to the init function. But now I want to know if I can also log requests for files such as images, documents, etc. Is there any php code executed when someone requests files? Thank you.

Not by default, no. The normal mod_rewrite rules (not to be confused with WP's own rewrite rules) Wordpress uses specifically exclude any existing files such as images, css or javascript files. Those will be handled directly by Apache.
You obviously could add a custom script that runs on each request, logs the access to the database, reads those files and prints their content to the client, but it would come at a considerable cost, I'm afraid.
Apache, albeit not the fastest webserver around, is much, much faster in delivering a file to a client than running a php script, setting up a database connection, logging etc pp would be.
You'd get much higher server load, and probably noticeably slower page loads.
Instead, I recommend that you parse the access logs. They'll most likely contain all of the data you're looking for, and if you have access to the configuration, you can add specific headers sent by the client. You can easily do this with a cronjob that runs once a day, and it doesn't even have to run on the same server.

Right way to transfer a CSV file to a BI application?

We are doing a BI application, and our customers send us data files daily. We are doing data exchange using CSV files, because our customers are used to watch data with Excel, and they are not ready yet to use an API on their system (maybe in few years we will be able to use XML/JSON webservice, we hope).
Currently the data transfer is made with FTP (SFTP in fact). Our customers upload file automatically on an FTP server, and we have a CRON task that watches if a file has been sent.
But there are many disadvantages with that:
We cannot know with reliability if the upload is done, or still in progress (we asked them to upload a file with a temporary name, and move it after, but many of them still don't do that)
So, we can try to guess, and consider upload is done if enough time has passed. But FTP protocol doesn't allow to get server time, and time can be desynced. So we can upload an empty file and read it's date to know the time of the server. But we need write permission to do that...
FTP protocol allow to pause upload...
Then, we are considering to transfer files by asking our customer to upload them directly on our application, using HTTPS. This is more reliable, but less convenient:
Our customer cannot check the content of the file after upload
We have to be careful with upload size and timeout on our server
Files can be quite large (up to 300Mo), so it's better to zip them before upload (can reduce size to 10%).
This is more work for us than just an FTP server (we need to create UI, upload progress, list files to download them back, ...)
There is other solutions? How usually BI applications share data? Is HTTPS a good solutions for us?

We found a solution which is a webdav server. We are using Nextcloud, it provides an online interface, and script access with webdav protocol.
It's more reliable than FTP, because the file appear only when upload is done.
And it's better than HTTP upload on our application. We don't have to handle file upload, create interfaces, ...

Is it insecure to execute code via an HTTP URL?

I'm suspicious of the installation mechanism of Bioconductor. It looks like it is just executing (via source()) the R script from an HTTP URL. Isn't this an insecure approach vulnerable to a man-in-the-middle attack? I would think that they should be using HTTPS. If not, can someone explain why the current approach is acceptable?

Yes, you are correct.
Loading executable code over a cleartext connection is vulnerable to a MITM.
Unless loaded over HTTPS where SSL/TLS can be used to encrypt and authenticate the connection, or unless the code has been signed and verified at the client then a MITM attacker could alter the input stream and cause arbitrary code to be executed on your system.

Allowing code to execute via a HTTP GET request essentially means you're allowing user-input to be directly processed by the application thus directly influencing the behavior of the application. Whilst this is often what the developer would like (say to query specific information from a database) it may be exploited in ways as you have already mentioned (E.g MITM). This is often (however I'm not directly referring to Bioconductor in any way) a bad idea as it opens the system to possible XSS/(B)SQLi attacks amongst others.
However the URL - http://bioconductor.org/biocLite.R is essentially just a file placed on the Web Server and from what is seems source() is being used to directly download it. There does not seem to be any user-input anywhere in this example so no, I wouldn't mark is as unsafe; however your analogy is indeed correct.
Note: This is simply referring to GET requests - E.g: http://example.com/artists/artist.php?id=1. Such insecurities could be exploited in many HTTP requests such as Host Header attacks, however the general concept is the same. No user-input should ever be directly processed by the application in any way.

Need to check uptime on a large file being hosted

I have a dynamically generated rss feed that is about 150M in size (don't ask)
The problem is that it keeps crapping out sporadically and there is no way to monitor it without downloading the entire feed to get a 200 status. Pingdom times out on it and returns a 'down' error.
So my question is, how do I check that this thing is up and running

What type of web server, and server side coding platform are you using (if any)? Is any of the content coming from a backend system/database to the web tier?
Are you sure the problem is not with the client code accessing the file? Most clients have timeouts and downloading large files over the internet can be a problem depending on how the server behaves. That is why file download utilities track progress and download in chunks.
It is also possible that other load on the web server or the number of users is impacting server. If you have little memory available and certain servers then it may not be able to server that size of file to many users. You should review how the server is sending the file and make sure it is chunking it up.
I would recommend that you do a HEAD request to check that the URL is accessible and that the server is responding at minimum. The next step might be to setup your download test inside or very close to the data center hosting the file to monitor further. This may reduce cost and is going to reduce interference.

Found an online tool that does what I needed
http://wasitup.com uses head requests so it doesn't time out waiting to download the whole 150MB file.
Thanks for the help BrianLy!

Looks like pingdom does not support the head request. I've put in a feature request, but who knows.
I hacked this capability into mon for now (mon is a nice compromise between paying someone else to monitor and doing everything yourself). I have switched entirely to https so I modified the https monitor to do it. The did it the dead-simple way: copied the https.monitor file, called it https.head.monitor. In the new monitor file I changed the line that says (you might also want to update the function name and the place where that's called):
get_https to head_https
Now in mon.cf you can call a head request:
monitor https.head.monitor -u /path/to/file

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex