Logging into a webpage via HTTP Request - http

So I have a webpage, ("http://data.terapeak.com/verify/") and I don't see any & tags in the URL so I am unaware how to post data to this. I need to do this via HTTPRequest rather than browser control. I am creating a double threaded batch searching program. I have already successfully made this using a single browser control but that wont allow for multi-threading, atleast with my current knowledge due to the fact that even when creating a new frmBrw that already exists it needs for me to set the threat apartment to single. If i set it to single, I am unable to have it send the data the the excel sheet I need both threads to access. I hope this is clear... The basic question is how can I log into this form via HTTP request.

This isn't going to be easy to answer without further details however I suspect you'll need to provide the variables via a HTTP POST request.
Can you successfully login to this page in your browser? If so, run a proxy tool such as fiddler and check the HTTP headers it makes to the server. You should see the form variables being passed over. You then need to mimic this in code.
How to: Send Data Using the WebRequest Class
Hope this gets you started

Related

head request returns different content-type [duplicate]

I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1&region=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)

How to create article in Drupal 8 via JMeter?

I am trying to create a JMeter test case for article creation in Drupal 8. I am able to add steps for other navigations. But when clicking the Create Article button after entering some values in the form fields, from JMeter I am getting HTTP response 200. But the article is not getting created.
If I do the same steps in browser I am getting HTTP response 303 and article getting created successfully.
I found this in request headers of POST request while hitting the Create Article button. I am suspecting this might be the reason Drupal server is not accepting the request. Because I am not sure how this dynamic ID "JJPKbuyIinQT5mQZ" is getting generated
Is this being generated by browser? If yes, how to do the same action in JMeter?
Is this being generated by server? If yes, I don't see this token in previous request, like form_token.
This dynamic ID should be automatically generated by JMeter given you tick Use multipart/form-data for POST box, this is so called multipart boundary
Other things to be considered:
Don't forget to add HTTP Cookie Manager, otherwise you will not be able to even perform a login
Correlate form_build_id and form_token. You can do this using CSS/JQuery Extractor
Correlate changed, you can generate timestamp like 1532969982 using __groovy() function like: ${__groovy(Math.round(System.currentTimeMillis() / 1000),)}
Correlate created[0][value][date]. You can do this using __time() function like ${__time(YYYY-MM-dd,)}
Correlate created[0][value][time]. You can do this using the same __time() function like ${__time(HH:mm:ss,)}
That's probably it, other values should be good to be used from the recorder.

Observe http request and simulate the same request in code

Is there a way to observe a http request in the browser and save that request (header data and parameters) and simulate the same request in code?
What I want is to "simulate" a browser in my project, to get the same response back like if the user is using a normal browser.
I don't know exactly how to ask the question correctly, but what I want is to simulate the authentification on some websites and scrape the same data as when I were in the browser.
What I wanted was to crawle a website, which is secured with authentification, using simple http-requests and build the request-header in my code. And it was not only about sending a POST-request with name + password, but also some other hidden parameters which are first generated when a user visits the website - on the clientside with javascript.
Maybe it is possible to understand the algorithm behind generating those hidden parameters, but it can take a long time because of the complexity.
The best way to crawle a website in a automated way without caring about the correct headers is to use a "headless" browser, which is nothing else then a normal browser without a GUI. You can control it in your code. A list of those headless browsers can be found here.
So no need for observing and recording the request and simulating it in code - just use a headless browser.

How to manipulate a .NET ASPX form programmatically?

I'm trying to manipulate a .net ASP form on a site that's using AJAX Control Toolkit. The site is only accessible to valid logins, and I do have a valid account. It consists of a search page with a form. Each time a submit button is clicked on the form, the server is updated using the values of some text fields on the form, and then the VIEWSTATE and EVENTVALIDATION tokens will be updated based on the response from the server, ready for the next request.
I'm using HttpClient in Java to do this. I suspect there's something I'm not doing correctly with regard to interacting with ASPX forms in general.
When I hit the main search page for the first time (cookies are validating my login with the server), I get the HTML for the search page back. I extract the VIEWSTATE and EVENTVALIDATION tokens for the next request. I've examined the exact form fields and their values that need to be sent to the server in a POST by looking at the Chrome debugger utility after making a request on the site manually. I've replicated them exactly as they should be, inserting the VIEWSTATE and EVENTVALIDATION appropriately.
But the response I get back from the server is not what it should be. What I get back is just the same HTML for the main search page that I get the first time I hit the webpage. The form data I'm using looks like this:
ctl00$ScriptManager1:ctl00$ContentPlaceHolder1$UpdatePanel1|ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$acceptButton
ctl00_ContentPlaceHolder1_TabContainer1_ClientState:{"ActiveTabIndex":0,"TabState":[true,true]}
__EVENTTARGET:
__EVENTARGUMENT:
__LASTFOCUS:
__VIEWSTATE:<token extracted from first page hit>
__VIEWSTATEENCRYPTED:
__EVENTVALIDATION:<token extracted from first page hit>
ctl00$ContentPlaceHolder1$LabelFee:0
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$RadioButtonList1:Person
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$snameText:aSurname
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$HiddenField1:
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$fnameText:aFirstname
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$dayFromTextBox:01
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$monthFromTextBox:January
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$yearFromTextBox:2001
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$dayToTextBox:01
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$monthToTextBox:January
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$yearToTextBox:2008
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$DropDownList1:aCity
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$PropText:
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel2$RefText:
__ASYNCPOST:true
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$acceptButton:Accept
I've also tried replicating the headers that the Chrome debugger shows, so my request is including the same Content-Type, Host, Origin, Referer, User-Agent (for my browser) and every other header, including this header X-MicrosoftAjax: Delta=true.
I know there's a lot of moving parts here, but I intentionally haven't mentioned how I'm actually making the POST request with the HttpClient lib because I'd don't want to complicate the question anymore or alienate anyone who doesn't know Java but knows ASP. I'd like to know if there's an ASP issue I'm not addressing, but I can post the Java code is necessary.
Edit:
I've checked the debugging info that HttpClient is outputting just before sending the request, and the form data is being added properly as multi-part form data. The headers are all there too.
This answer is a long shot, but I've seen weirder things.
You mention this header:
X-MicrosoftAjax: Delta=true
I did some deep googling and found that this is often shown as all lower case in dumps of Ajax and UpdatePanel POST requests:
x-microsoftajax: Delta=true
See here and here.
Could it be as simple as not casing the header correctly?
I eventually got this working. The problem was not specific to ASP in general, it was actually a problem with how Java (specifically HttpClient) was sending the request. I was using HttpClient to compile the request using multi-part form, but after using Fiddler to analyse and compare the requests (see the edited part of this question for more details on that) sent from both my application and the actual webpage, my app request was structured very differently.
The real website request had the form options embedded in the request body in what looked like a URL encoded query string. My request was a series of entries in the request body where each option was wrapped in the Content-Type and Content-Disposition headers. The requests succeeded after changing the POST to add the parameters like:
request.setEntity(new UrlEncodedFormEntity(paramList));

How does Backbone send a PUT and PATH request to server

Regarding this question and also many documents have stated that sending a PUT request directly via form in browser is impossible due to security reason.
However, What I am seeing in Backbone is that it could still send a direct PUT request via browser without a workaround like adding a hidden form field.
And they're confusing to me. Is there anything that I'm missing here?
A form can only send a GET or a POST request, as set in the method attribute.
However, Backbone delegates its requests to jQuery.ajax by default (or whatever you want via Backbone.ajax) which itself wraps XMLHttpRequest, an object that can send PUT/DELETE/PATCH requests.
From https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest
XMLHttpRequest is a JavaScript object that was designed by Microsoft
and adopted by Mozilla, Apple, and Google. It's now being standardized
in the W3C. It provides an easy way to retrieve data from a URL
without having to do a full page refresh. A Web page can update just a
part of the page without disrupting what the user is doing.
XMLHttpRequest is used heavily in AJAX programming.
many documents have stated that sending a PUT request directly via browser is impossible due to security reason
Citation please.
Backbone sends a PUT just like it sends any other request, with jQuery,
Backbone.ajax({
type: 'PUT'
...
});
It is just some server side langauges,like PHP, that have problems with receiving a PUT request.
The hidden form field is used when posting from a <form>. Backbone uses javascript.

Resources