Rvest doesn't recognize a form - r

I'm trying to parse a website that require to log in a session using Rvest.
I'm using this code to begin :
login<-"https://www.drugs.com/account/login/"
session<-html_session(login)
form<-html_form(session)
But even after extracting all forms it just recognize the "Advanced Search" form and not the login form.
Do you have an idea why this happen? I was wondering if the login form require javascript or something like this.
Thank you,
Vitruves

Depending on where you are, I believe the problem may be the EU GDPR consent. The first time I opened the website it asked me to accept cookies in order to log in. Accepting set the following cookie in my browser:
ddbab21688799cacb48f7d384642573f = "agree"
and only after displayed the log-in form. For me the name of the cookie was always set to the same value, but if this is not always the case then you may have to accept consent within your rvest session to have the cookie set.
If I set the cookie when opening the rvest session, I get two forms returned, one of which is the log-in form.
You can set the cookie as follows:
login <- "https://www.drugs.com/account/login/"
session <- html_session(login, httr::set_cookies(ddbab21688799cacb48f7d384642573f = "agree"))
form <- html_form(session)

Related

Extract part of an URL behind a login page with Paw

I'm a newbie but I think Paw can do what i need :
I need to extract a session id behind a login page.
I go to https://admin.booking.com, filling the form (login and pass) and the landing page behind includes a session id :
https://admin.booking.com/pc/index.html?ses=xxxxyyyyyzzzzz11112222233333
I'd like to :
1) Push credentials with Paw as part of my request,
2) get the above item (ses) item as a response so i can use the php script extension provided by Paw and then call this script "on demand".
Is this possible ? If so, what should i do ?
Thanks for your help
UPDATE*: we've added a documentation article to describe the process a little more: Login via a web form in Paw. We've detailed the process to deal with CSRF tokens too.
Paw isn't quite yet ready for handling web/HTML forms. Though, there's one way to do it the right way: if you inspect the form with the Chrome dev tools you'll find the name of the input from the DOM/HTML:
In your case, you have the inputs: loginname, password, lang.
Also, find the <form…> tag to see what's the action attribute. If there's no action attribute (like in your example), it means the target URL for your form is the current page's URL (https://admin.booking.com/ in your case). Also, make sure the method="POST" is also there in the <form…> tag, otherwise this method won't work.
Then jump into Paw and set:
URL (in your case https://admin.booking.com/)
method to POST
go to the Body tab and use "Form URL-Encoded + fill up the fields from your form
If all works, you'll see Paw show a redirection request, and if you go to the right-hand side panel under "Response" > "Headers", you should see a Location header with a value similar to the URL you initially mentioned (https://admin.booking.com/pc/index.html?ses=xxxxyyyyyzzzzz11112222233333). Hurray! You got your value into Paw!
Now that you have that, you can create in a new request (click on the + button at the bottom of the left-hand side list). And wherever you want to use this session token/ID, you can insert a dynamic value to retrieve that URL value. You have more infos here, in our docs, but I'll describe the steps here:
On whichever field you want to insert the token, right-click and pick Responses > Response Header.
Make sure you pick the first request in the "Request" dropdown menu, and enter Location in the "Header" field:
You should see the value of the Location header of the previous response appear here.
Now what you want to do is to extract only the part you want (i.e. the value of the ses param in your case). For that you'll need that extension for Paw, so please install it now: https://luckymarmot.com/paw/extensions/RegExMatch
Copy the dynamic value you have just inserted (the blue token), and right-click on that field to insert a new dynamic value, and pick Extensions > RegExp match:
In the Input field, paste the previous dynamic value you copied. And use the RegExp field to write a regular expression that will successfully extract the part of the URL you want (this should work in your case ses=(.*)).
Now that you're set up. You should be able to use this little new blue token wherever you like and automagically extract the value from the previous form. And whenever you send again the initial request, and get a new token, everything else will also update! :)
It was a little long guide, but I hope this will help you and hopefully others too.

Passing a Session Variable via a URL (ASP.Net)

I have an aspx page, which is a "User Log-In" area. I want to pass the userid to another page which is linked from the aspx page.
the link I have looks something like this:
www.abcdefg.com/Home/Redirect/?authtkn=123456abcd=xxxx
I need the xxxx to be a session variable which in this case is userid.
**userid is not sensative information, this is simply to redirect the user to another page for specified information.
Any thoughts on how to pass a session variable to a URL, or if this can be done. The example www.abcdefg.com is a different domain (on a different server) from the original aspx page.
Why not appending like this?
string.Format("www.abcdefg.com/Home/Redirect/?authtkn=123456abcd={0}",
Session["UserId"]);
if i understood you correctly.
Think your question is how to maintain cross domain session or authentication.
Check this link Maintaining Session State Across Domains, may give you some idea
Or this one How can I share a session across multiple subdomains?
You don’t need to pass a session variable via the url. You can just start session on the next page and have access to all your session variables! If you do want the variables up there so you can quickly get to different userids with just a quick url change, send userid to the url on your previous page with with $_GET[ ].

Restrict page access only can enter from a specified page?

I am kind of new to ASP.NET.
I wonder is there any way to restrict user can only enter from a specify page?
Like, I have a Page A to let them enter some information, then when submit, I will use Response.Redirect to Page B. But I don't want the user can go into Page B directly from URL....
If I use Session, then if the user didn't close the browser to end the session, the another user can just go into Page B directly...
Because the computer that access to these pages is using by the public, so I want to see if there is anyway to make sure they only do one way process? Can't go back to previous or jump to another page.
Thanks in Advance.
If you control the other page, start a session and set a session variable to a value that can be reversed that only your server could (or should) create, much like serial keys. For example 72150166 because the sum of every second number equals the sum of every other number (7 + 1 + 0 + 6 = 2 + 5 + 1 + 6) but you could choose an algorithm as complex or as simple as you want. When the user navigates to the second page, check the session variable. This is not invincible security, but it is better than checking the referrer (especially since some browsers do not set it) and I imagine security based on coming from a certain page doesn't have to be that strict.
Edit: You should also add it to a database and link it with the particular user's IP so someone else can't use the same key.
You can use Request.UrlReferrer property in the Page Load of PageB to see which page is the request coming from. If the request is not coming from PageA then redirect the user to PageA.
Check this link for more information: http://msdn.microsoft.com/en-us/library/system.web.httprequest.urlreferrer.aspx
Note: UrlReferrer is dependent on a request header and someone can set the header to mimic the request coming from PageA.
You could have the page that redirects send some sort of specifically generated hash/key in the query string that expires quickly and/or once viewed. This should be a lot more solid on the security side.
You will still need some way to store this key or value producing the hash so you can validate it on the receiving end- I would think your DB.

Change link after submitted the form (ASP)

I have one question. How to change the link after the user has submitted the form? What I mean is that once the user submit the form, the link that direct the user to the FORM will be change to another url which is ViewFormA.asp. How can I do that? Need your advice. Thanks.
Does this help?
Response.Redirect "/ViewFormA.asp"
at first, in the beginning of FORM page on the server side you need to check your own special Cookie or Session variable (like Session("AlreadySubmitted"))
a) if this variable is exist , it means that the user already submitted the form and must be redirected to an other page.
b) if this variable is not exist yet or equal to zero , the user is allowed to fill the FORM and submit the data.
at second, on a page that get submitted data you have to set this variable to 1

Validate Origin of FORM POST to ensure it came from same server/app

I want find a platform/language agnostic solution to ensuring the origin of a FORM POST is from an expected source. I.e. Page1.aspx posting to Page2.php within the same web site.
Specifically what I am attempting to do here is to prevent request forgery.
Use a hidden field in your form, which contains a token your app generated. Store the token in the user session. When the form is submitted, your app will check that the value of the hidden field is identical to the value stored in the user session.
If it is identical, then you know the submitted form comes from where it is expected to come.
Old Thread, but might still be useful.
If you do not have session info set (best option) then you can include a hidden field with an encrypted timestamp then compare it (after de-crypt) to the current time on the process end to make sure it is relatively close and thus as recent as you deem necessary.
You could include into the form a hidden field which would be the SHA1Hash("some-secret" + Remote_IP + PerSessionSecret).
The PerSessionSecret is something you autogenerate in the beginning of the session. "some-secret" is a global secret value - which will help a little bit in case the randomly generated PerSessionSecret turns out not to be very random enough.
Then do the same calculation upon the form submission and you know it's most probably submitted from the same client that it was sent to. (Of course, if you have multiple clients behind the single address, like a proxy or a NAT, you can not distinguish between them reliably).

Resources