Download data from a website using python requests - python-requests

I'm trying to download the data from this website: https://cdr.ffiec.gov/public/PWS/DownloadBulkData.aspx.
My questions are (1) how I can set the appropriate "payload" and post to the url for the three inputs: available products, report period end date and available file formats and (2)how I can get the link of the files since in the website, there is a download button (i can't get the link by right clicking on the button). Sorry that my questions are basic but i hope someone can provide me step-by-step guidance. Thanks.

You can’t manipulate the web page (selecting from drop downs etc) with just requests.
You need to use dev tools to capture the URL you’re redirected to when you submit the form, then use requests to call that URL with the parameters it expects.

Related

How to avoid this sort of paywall when scraping with Python requests?

I am trying to download content from a website which has a sort of paywall.
You have a number of free articles you can read and then it requires a subscription for you to read more.
However, if you open the link in incognito mode, you can read one more article for each incognito window you open.
So I am trying to download some pages from this site using Python's requests library.
I request the URL and then parse the result using Bs4. However it only works for the first page in the list, the following ones don't have content but have instead the message with "buy a subscription etc.".
How to avoid this?
I think you can try to turn off javascript in the browser, it may work, but not 100%.

How to track a PDF view (not click) on my website using Google Tag Manager

How can I track that someone visited the following URL of my website http://www.website.com/mypdf.pdf.
I tried using a Page View trigger on a Page View tag. I'm completely new at Google Analytics so not sure how to proceed. Most people are going to be going to that pdf directly via URL, as there is no link to it on my website, but I really want to be able to track how many people view it.
Thanks in advance!
You cannot track PDF views with the help of GTM. GTM for web is a javascript injector, and one cannot inject Javascript into a PDF document from the browser.
One way to circumvent this is to have a gateway page, i.e. have the click go to a HTML page that counts the view before redirecting to the document in question (naturally you could use GTM in that page). Since people go directly to the PDF URL this would require a bit of scripting - you would have to redirect all PDF links to your gateway page via a server directive, count the view and then have the page load the respective document.
Another even more roundabout way would be to parse your server log files and send PDF requests to GA via the measurement protocol (actually many servers allow to have log writes redirected to another script, so you could do this in realtime). I would not really recommend that approach - it's technologically interesting, but probably more effort than it is worth.
The short version is, if you are not comfortable fiddling a little with your server setup you will probably not be able to track pdf views. GTM does not work on PDF files.
Facing same issue…
My solution was to use url shortener (like bitly.com) which includes opening statistics.
Not the perfect solution but it works for direct pdf access from external source (outside your site).

Download/Upload of Page Remarkup in Phabricator Wiki (Phriction)

The company I work for uses the "Phriction" wiki in Phabricator for a considerable amount of documentation. I'd like to be able do the following, programmatically, in order of importance:
Download (e.g., with curl or wget) the ReStructuredTExt (RST) to a local file where I can edit it, diff it, etc. Ideally I should be able to download either the latest version or any specific version.
Locally render (e.g., in a local graphical web browser) the markup as Phabricator would render it. If relative links can link correctly back to the original wiki, that's a bonus.
Upload new versions of the wiki page.
If you have don't know how to do exactly any of this, but have information or tool suggestions that would help me get started on writing software to do the above, please mention them. (If you're worried about too many answers that don't actually answer any of the questions above, try adding or editing a single community answer for this sort of information.)
I would do the following in your situation:
Downloading the single phriction pages using the API (Conduit) methods in the phriction section.
Therefore you need a Conduit Api Token. You can create in your profile settings of your phabricators intstance.
Then take a look at the phriction.info mehtod: This methods needs the page slug as parameter. In this example I use the /changelog/ page.
You can choose between arcanist, cURl or PHP to use the RestApi. Additionally you can use any other way to preform RestApi commands in the cURL syntax.
If you need some more examples how to run the conduit method you can toggle between some variations at the bottom of the output page.
Transform the page content as you like.
Upload the page again with the conduit methods (phriction.edit).
The way you downloaded the content you can edit the documents, too. But here you need some more parameter:
I personally, try first all conduit methods via the web interface first and then transform it to an a script.

ASP.NET Browser Debug (support information) page

So one of the many many tasks I'm faced with daily as a developer is trying to get our support department to get as much information about the end users environment as possible.
Browser version, current cookies, plugins, etc etc and it would be handy to point people to a specific page on our site and say "copy paste this to support".
In the past I've always written these by hand, and used third party tools (such as BrowserHawk) to get as much info as possible.
How does everyone else deal with getting this information from end users, is there a nice package I'm unaware of to give a detailed dump a users env without having to get the users to run an app?
Just to clarify I'm not looking at an elmah style reporting (which is very helpful as well!) but this mainly for the client side stuff.
Some months ago I have see the googles ads page have a cool nice report button. What this button do is that capture using javacript the page as it is and send you the report, with all the details, and an image of the actually page.
So I have found this library http://html2canvas.hertzen.com/ that make the same think.
And here are some example pages with this feedback.
http://hertzen.com/experiments/jsfeedback/
So I add this feedback option, and I ask from the users to point out the issue, and send the feedback, so for pages I have a very nice image for what is not going well.
The next think is that I log and check all errors, and I fix them soon.

File download via external link - how to implement?

I have a web application which consists of many aspx pages ... one of them shows a grid with rows that can be exported to a file via button click. This works fine. Now I want to have that feature which allows a user to access an external link to this page (or another) and to export to a file and download. I dont need any information on the page, just the file download. How could I do this also including security features like encryption?
Thanks :)
The easiest way to do this is simply implement an HttpHandler that contains the logic to create that file and write it to the Response stream.
There are lots of examples of how to do this on the web that I won't repeat in this question. Just do a Google search for "Download File HttpHandler" and you should be golden.
One of the search results: http://www.c-sharpcorner.com/UploadFile/jhblankenship/DownloadingFromMemStreamHttp11262005061852AM/DownloadingFromMemStreamHttp.aspx
What you're going to have to do is when the gridview shows the correct rows to provide a 'unique link' which will be your website URL with url variables at the end. When the page loads it can check these variables and then use the database to look up the correct data etc.
Encryption in transit will be done via HTTPS (SSL), and to secure otherwise you would require a login to view the gridview / file.

Resources