How can I programmatically get the same headers as "Copy as Fetch (node.js)" from the dev console? - fetch

I've got a script that automates downloading some a number of files from a webpage. I need to log into the page, so the download request requires things like secure cookies and other headers.
I've created a bookmarklet to quickly grab the ID's I need and paste them into my script, but I also need to go to the network tab, right click a request, copy / Copy as Fetch (node.js) and then pull out the 2 lines I need and feed them into the script - is there any way to streamline this step?
To be clear, this is not violating any ToS or anything, it just saves me having to make 100+ clicks to download 100+ reports on the poorly created interface.

Related

How do I download an aspnetForm page with links

I'm trying to download a municipal planning plan together with all the relevant documents.
All documents can be reached from the following link
I've tried the following command (that worked well for other sites) and some variations without success.
wget -E -k -r -l 3 "http://www.mavat.moin.gov.il/MavatPS/Forms/SV4.aspx?tid=4&et=1&mp_id=ppnCWTcsST9gG0%2fa0ayWnjFyZ%2bo14s221Ujlpi7UvR4jIRAHLKhJ8lOLSkomZ%2fvlHk8b2T0oENpI6Wh2hKzxQJCw9BPJP8gav%2ftgiKlk5S0%3d"
The same plan in their new site I can't get the files either,
https://mavat.iplan.gov.il/SV4/1/5000931297/310
I'd appreciate any help.
Well, these days, and especially with .net web sites?
We don't use hyper-links with a simple (full) path name to actual files from the web server. In fact in most cases one will not even give the web server rights to those folders. (they are not exposed to Internet Services).
So, no actual links as a full "url" to documents exist.
What happens is when you click on a button or button link? Then the code behind on the web server runs. (and that is code you don't have). And further more, that code behind can browser, read, retrieve any file from any folder on the server or other servers. But links from the web site don't exist and it not even possible to type in a url to resolve to a actual file name on the server.
So the server side code (not internet services) goes and grabs the document. In fact, the documents could be in a database. So, the code behind on the server side runs and pulls the binary data from the database (which represents a valid PDF file). Or the code behind reads the file from disk and then STREAMS the file for a download.
Now, this is often done for reasons of security. It means that no valid URL exists to get at a document.
Not only is this done for security, but from a developer point of view, it often better to retrieve a row from a database. That row can have the information you SEE rendered on that form, but the web page is not static, and the display of information is thus a developer coding a pull of rows from a database, and then you simply "assign" that data to some type of control - save datagrid, or listview or whatever. (this assignment of data is only 1 or two lines of code, and then the control + web server renders that datagrid control.
So, this is done since the developer thus only assigns the result of a database query to the control when then renders on the form. Thus, to add or remove documents? Then you only have to edit the database for the information on the web page to render.
As a result? There is no direct links to the actual documents on the server. To retrieve a document, you would have to send to the web site the exact command required.
You can hit f12 (most browsers support this). This will put your browser into developer mode. If we do this, and then select elements (select element feature). Now click on a pdf link. You get this:
<img src="../images/ft/file_PDF.gif" style="cursor:pointer"
onclick="openDoc('99000526871729',
'AABA7BE646E182B67DB1C15220E531DF36BBB591D8EEA7757435B2606C08E6F9')">
So, note above. The above code event openDoc is the SERVER side code you have to run to retrive a document. There is thus NO link. And you not going to be able to wire up, or run your OWN web page that hits that server and runs the routine "onclick".
However, the onclick DOES expose the internal database document numbers used to pull/read and retrieve a given document. But the path name, and how the code gets/grabs this file? You have no idea, and HAVE to run server side code (c#, or vb.net) code. That code as noted grabs the file and then uses code to "stream" the file when you download or click on a link.
So for simple HTML like pages? Well, for those that took a one day HTML course? Sure, such web sites will have scr=some path name to a valid url). And these simple systems thus allow you to enter a URL to grab/get a document. And those documents are fully exposed to the web site, and a simple valid URL path name to a file exists. Not so with asp.net, and as noted, this is not only done for security, but it a better over all developer experience to write code that grabs the files as opposed to rendering full path link names to files.
There are many additional benefits. For example, the database that drives this likely has a setting (or some settings) that contain the path names to the documents. If they run out of storage, or say want to move older files to a much slower storage system, which of course is much lower cost? Then can move the files, and update the path name columns in the database. The web site will continue to work, since we NEVER using a exposed URL on the web site. And as noted, actual direct URL's don't exist, and the web server (IIS) as opposed to the code behind will not even have rights to the file names.
As a result?
You not be able to simply pull the web page, and THEN extract the URL's to file names.
What you might be able to do is write code that loads the web page, and then scans all the event code stubs for the links, and have your code click on each button with web browser automation. But, even that don't allow you to enter file names into the download prompts.
So, what you ask is not easy, likely not possible, and a very difficult task. And the simple reason is that site does not use simple HTML and static links to files, and it never actually exposes a direct link to files, and even worse yet is the web server does not have or even allow a URL direct link to a site - they don't exist, and the web site will not even have rights or even allow such URL's to file names. (only the .net code behind does - not internet services).
and grabs the document and then code "streams" the file to to the web site or link you clicked on. So the simple HTML coders in the past would create say a folder (usually a virtual folder) that points to the files on some server/folder. But with .net, it easier (and far more secure).
Modern development tools don't use old fashioned ideas like a URL's to directly retrieve a file - they are designed differently.
In some cases, URL's are allowed or created, and this is done for reasons of sharing links. So if you have a cute video or document? Then the designers of the system will often permit use of parameters in the URL, so you can share a link to someone else. This page has no such provisions. So, you can share a link to the page, but no actual URL to documents or even provisions to allow URL's to a document even exists.
So this quite much means to retrieve a document, you have to go to that web page, and ONLY when you click on a document will the web site "stream" down that one particular document in question.

Programmaticaly spoofing an http script request in an iframe

I'm building a backend admin system which edits json files that control the look and feel of the main site. I want to add a 'preview' button before the user hits save. To do that, I want to use the main site, but instead of calling the actual json file in production, save a temp version of it and redirect this user's traffic for that file to the temp file - from the original site code.
i've considered both chrome pluggins, configuring iframe somehow or, in worst case scenario, grabbing the production front-end, parsing out the call to the prod json file and replacing with new temp json file. That is obviously not ideal as it would entail a lot of work and if anything changes on the prod site, this will have to be updated.
I would love your ideas!
Do you have access to the main site's source code? You could implement a preview option from the main site which accepts a GET parameter and uses a temporary JSON setting based on this GET parameter.
From the backend admin system's point of view, it's just a matter of adding the JSON as part of the ajax GET request.
Unfortunately though, there is no easy way of doing this if you don't have access to the main site's source code or if you can't reach out to whoever maintains that main site.
Your cleanest option might be to recreate the main site's look and feel instead and pass it off as a "preview".

File download via external link - how to implement?

I have a web application which consists of many aspx pages ... one of them shows a grid with rows that can be exported to a file via button click. This works fine. Now I want to have that feature which allows a user to access an external link to this page (or another) and to export to a file and download. I dont need any information on the page, just the file download. How could I do this also including security features like encryption?
Thanks :)
The easiest way to do this is simply implement an HttpHandler that contains the logic to create that file and write it to the Response stream.
There are lots of examples of how to do this on the web that I won't repeat in this question. Just do a Google search for "Download File HttpHandler" and you should be golden.
One of the search results: http://www.c-sharpcorner.com/UploadFile/jhblankenship/DownloadingFromMemStreamHttp11262005061852AM/DownloadingFromMemStreamHttp.aspx
What you're going to have to do is when the gridview shows the correct rows to provide a 'unique link' which will be your website URL with url variables at the end. When the page loads it can check these variables and then use the database to look up the correct data etc.
Encryption in transit will be done via HTTPS (SSL), and to secure otherwise you would require a login to view the gridview / file.

How to track a completed file download in ASP.NET

I have this ASP.NET web site that allows users to download program installation packages (just normal files). I want to be able to track when a download is completed (i.e. the file has been fully downloaded to the user's computer) and then invoke a Google Analytics script that reports a completed download as a 'Goal' (obviously, one of my goals is to increase file downloads).
The problem is that I need to support direct file URLs, as opposed to the "redirect page" solution. This is because a lot of traffic comes from software download sites that explicitly demand a direct file URL when submitting a product. Perhaps, they do their own file analysis (i.e. virus checking). But with this set of limitations, a typical scenario is:
The user visits my product listing on a software download site
The user clicks the "Download" button on this site
The "Download" page is typically a redirect that finally brings the user to my file via the direct URL I've initially submitted, i.e. http://www.ko-sw.com/somefile.exe
If under these conditions, an exact solution for monitoring is not possible, maybe there exists a workaround? What comes to my mind is temporarily storing the number of performed downloads on the server and then accessing an administrative page that somehow reports this number to Google Analytics and finally sets it back to zero. With this workaround, there is at least no need to try to attach a javascript handler to a non-HTML resource. But even then there are issues:
How to track if a download has completed?
How to track user geolocation and browser capabilities to make them further visible in the reports?
Thanks everybody in advance
According to awstats aborted download has http status code 206 so if you analyze server log for such code you can get those downloads that were not completed.
#Kerido ~ I'm curious what the business case is here. Are you trying to track installs or downloads? If installs, go with #SamMeiers solution.
However, if you're trying to track downloads, then the next question is what webserver base are you using? IIS? Apache? Something else?
In IIS, assuming you're using 7 (or later), you could (easily?) write a HttpHandler that checks for the last bytes of the file to be sent, and on that, record a log somewhere.
On Apache, just setup logging to tell you how many bytes were transferred (a trivial change in httpd.conf) and then parse the logs daily (awstats [amongst others] is pretty good for this, but you might have to write a sed/awk script) and find out how many full transfers were completed. Just depends on how thorough you're trying to be.
But I go back to, what's the business case for this? What does it matter if there were unfinished downloads?
It's possible to track links as a goal, which may be of use to you. However, this won't track when the download was completed.
http://www.google.com/support/analytics/bin/answer.py?answer=55529
Hope this helps.
Cheers
Tigger
I think the solution of #SamMeiers is very good but you may optimized by calling a web services after the installation complete but you might find a small problem if the use installing the app in an environment without internet but you might force to check if there is an internet or not.
You can create any trigger when you installation start as a start flag then when if finish check if the start flag exists then the app have been downloaded and installed also.

need to copy files on client system, is thr any possible way?

I m developin an Online Examination System in C#.net and want to copy files on client machine as soon as exam starts, so that even if internet gets disconnected examinee can continue with test
You may wish to consider a client server solution, such as WPF or winforms as this is more suited to this type of development. You can use one click deployment to have this still launched from the web and updated on every run.
If you do decied to use asp.net this will result in a very javascript heavy site with a very slow load in the first page.
To do this you would load all your test qustions into a javascript datastructure on the first page, when every the user when to the next page you would need to, using javascript, collect all the answers and store in javascript. then rereender the entire page using your definitions of the test in javascript with no trip back to the server. then once the test was complete you would need to send your results back to the server, the internet must be active once you've compleated the test.
You'll have to create a download package and provide a link for the user to click to request the files. You can't force a download.
If your exam in all in one web page, you don't need to do anything. Once the page appears in the users browser, it has already been "copied locally".

Resources