Retrieve openid bearer token using headless browser setup - web-scraping

Using OkHttp3 I was happily scraping a website for quite some time now. However, some components of the website have been upgraded and are now using an additional OpenID bearer authentication.
I am 99.9% positive my requests are failing due to this bearer token because when I check with Chrome dev tools, I see the bearer token popping up only for these parts. Moreover, a couple of requests request are going to links that end with ".well-known/openid-configuration". In addition, when I hardcode the bearer token from my browser in my OkHttp3 code, everything works. Without the code, I get an 401 non authorized message.
I figured that my browser emulation was not close enough to the real situation so I decided to use a headless browser setup that is doing some javascript invocations. Since I am using Java, I used HtmlUnit. Using this tool I could quickly get to the point where I could successfully scrape parts of the website (just as with OkHttp3) but it would again fail with the newly updated parts. I checked but couldn't find the bearer token in any of the responses (nor in the headers or in the cookies).
Is there any chance this approach (using a headless browser) could work? Or are there perhaps alternative approaches I could check.

Related

Do Desktop apps need HTTP preflight requests?

I'm developing an app that needs to get info from a third party API. I've been developing it to be a web application with Vuejs. For the requests I tried to use axios, jquery and the fetch API, but I'm having trouble with the preflight requests, it seems that the API is not treating the OPTIONS requests properly and it throws me a 405 error (I made a GET request on the same url through Postman and it worked normally and I also edited a OPTIONS request on firefox network panel to become a GET request and it returned a 200 status).
Now I'm thinking of abandon the idea of the web application and work it as a desktop application, but I need to know if the preflight requests are going to be a default behavior in this kind of app too.
Thanks for your attention!
No, CORS preflight requests are made by browsers, and are necessary due to the browser security model. They would not be used by a desktop application.
You can easily test this with curl, postman, etc. It sounds like you tried this, but the details you've described are off. Don't change anything to GET. Use the actual request you're trying to make, but do it outside the browser context. If the API responds appropriately then it should work in a desktop application.

Obtaining token from token service

I am trying to obtain token for my UCWA app using passive auth. My setup is that once I receive the 401 challenge, I take the link to the token service from ms_rtc_passiveauthuri parameter and I visit this website (PassiveAuth.aspx) by creating a hidden iframe in the background of my website. Couple of redirects happen in that iframe but eventually I successfuly get the cookie and I proceed with creating the UCWA app.
This works nicely in IE, Chrome, Firefox and Opera, but Safari seems to refuse to do this redirections inside of that iframe.
I also tried to visit this token service by using the XFrame (and using helper library's Transport.clientRequest), but the result is 406 Not Acceptable.
Do you know about any workaround for Safari? Or, more importantly, is my approach correct - is this how it's meant to be used?
Thanks for any suggestion
Did you manage to work this out? I am having the same issues.
Edit : See the comments below for the answer - look out for the WWW-Authenticate and Www-Authenticate headers.

How to automate logging in and retrieve data?

I want to automate logging into a website and retrieving certain data.
I thought the way to do this would be to sniff the HTTP requests so I know where the login form is being POSTed to so I can do the same using NodeJS/Java/Python.
However I can't seem to find the HTTP request that handles it.
The site seems to use some Java-applet and a lot of Javascript.
This is the site: link
Should I have a different approach?
Also also wonder about storing a cookie session, and sending it with each HTTP request after logging in.
I'm sorry if I am not to clear, I will try to explain myself further and edit this post if needed.
You can use the developer console (hit F12) in Chrome (this works also in other browsers) and then click the "Network" tab. There you see all network calls.
To detect what http requests are performed from a mobile device, you can use a proxy like Charles Proxy.
Also be aware that if you post from nodejs the cookies won't be set in the users browser.

how to implement the authentication in Single Page Application?

As the title says,I want to build a App that run in browser with a Single Html page.but how to implement the Authentication.and my solution is:
the server-side is all the RESTful APIs,which can used by multiple Platform,web ,mobile side ,etc.and every API that need auth will be get a token to parse,if the API does not get a token return 401.
cuz my first practise is in the browser,so I need to request for the token to get login,and when the app needs to request the auth-APIs,I will put the token in the header for requesting...
and my questions is : does it safe enough? any other better solution?
No it's not safe enough if the token is accessible through javascript for the same reason that you should set your cookies to http only and restrict to ssl.
If a hacker can inject javascript into your app, it can steal the token and use it from their machine.
For that reason I suggest you use a secure, http only cookie instead of the token when using a website.
If your API is going to be accessed from a native mobile app then you could add a token to each url.
Having a custom header in the http request might cause issues with certain proxies which might not pass all headers through.
A cookie is nothing more than a standardised http header so you might as well reuse that.
What you could also consider using is OAuth if you're going to allow 3rd party apps access to parts of your API.
There is no reason why you could not use cookies for browser based clients and an ApiKey query parameter for other clients.

Embedding User + Password data for HTTP Basic Access Authentication in Querystring

We're trying to test an API that requires HTTP Basic Access Authentication credentials (http://en.wikipedia.org/wiki/Basic_access_authentication) in the request.
Ideally, we could just test the API using a web browser by putting all API parameters in the URL querystring, but we haven't yet found a way to encode the HTTP Basic Access Authentication credentials (username and password) in the querystring.
Does anyone know a way to do this?
Thus far, we've tried:
https://username:password#mydomain.com/
...without success.
username:password#url authentication has been disabled in many browsers for security reasons.
For example in IE:
Internet Explorer does not support user names and passwords in Web site addresses (HTTP or HTTPS URLs)
As far as I know, there is no way to circumvent this if this is blocked. It's possible that this can be turned of in Firefox using a setting in about:config. Or use some other browser that doesn't block it - I don't know which ones do and which don't.
Alternatively, consider building a quick web form that submits the option to a server-side language (e.g. PHP) that makes the request, or use a command line client like wget to send the requests. The latter might even be easiest

Resources