scrapy.Request() not working with response.url

scrapy.Request() not working with response.url - web-scraping

I am building a spider to crawl different tabs in a page.
There are cases where I need to extract an URL to go to the next page:
url = i.css('a').attrib['href']
yield response.follow(url=url, callback=self.parse_menu)
And there are cases where I dont need to go to a different page, but still want to go to the next step in the pipeline (parse_menu), so I do something like this:
yield response.follow(url=response.url,callback=self.parse_menu)
The first scenario works well, but in the second scenario parse_menu never gets called.
I think there is something I am missing in how the request and callback work maybe.
Thanks in advance!

I am not sure if I understand you well, but I think you are sending the same request twice, so you need to set dont_filter to True.
yield response.follow(url=response.url,callback=self.parse_menu,dont_filter=True)

Try:
url = i.css('a::attr(href)').get()

Related

Avoid Areas not working in Here Fleet Telematics

Im using the "avoidareas" parameter to remove some points from the initial trip. This method worked a couple of months ago, but now it returns the same route as the initial trip, ignoring the "avoidareas" parameter.
Here is the exemple im working with:
"https://fleet.ls.hereapi.com/2/calculateroute.json?apiKey={YOUR API KEY}&mode=fastest;truck;traffic:enabled;&currency=EUR&restTimes=EU&traverseGates=true&tollVehicleType=3&trailerType=2&trailersCount=1&length=13.6&width=2.4m&height=3m&limitedWeight=24000kg&legAttributes=li,-mn,sh&linkAttributes=wn,le,sh,-fc&mapMatchRadius=5000&ignoreWaypointVehicleRestriction=5000;0;all&departure=2022-09-08T16:00:00&waypoint0=40.613223,-3.2451044&waypoint1=39.919898,-8.634333&avoidareas41.6419,-5.16831;41.541900000000005,-5.06831"
Am i doing something wrong? Has something been updated?

The 'avoidAreas' feature works, but the request needs to be verified for the following:
the parameter name is in camelCase, so it should be 'avoidAreas', you are missing "=" sign right after avoidareas in your request url,
avoid area rectangle should be specified by latMax,lonMin;latMin,lonMax
A sample request looks like https://fleet.api.here.com/2/calculateroute.json?waypoint0=50.112698,8.675777&waypoint1=48.544180,9.662530&mode=fastest;truck;traffic:enabled&departure=2022-08-09T13:12:35&alternatives=0&weightPerAxle=3.25t&limitedWeight=7.5t&height=3.4m&width=2.5m&length=7.2m&trailersCount=1&avoidAreas=48.988757459020015,8.436328214242295;48.94714399157084,8.493687099461848
Thanks
/MS

Understand Dynamic Links Firebase

I would like to understand better Firebase Dynamic Links because i am very new to this subject.
What i would like to know :
FirebaseDynamicLinks.instance.getInitialLink() is supposed to return "only" the last dynamic link created with the "initial" url (before it was shorten) ?
Or why FirebaseDynamicLinks.instance.getInitialLink() doesn't take a String url as a parameter ?
FirebaseDynamicLinks.instance.getDynamicLink(String url) doesn't read custom parameters if the url was shorten, so how can we retrieve custom parameters from a shorten link ?
My use case is quite simple, i am trying to share an object through messages in my application, so i want to save the dynamic link in my database and be able to read it to run a query according to specific parameters.

FirebaseDynamicLinks.instance.getInitialLink() returns the link that opened the app and if the app was not opened by a dynamic link, then it will return null.
Future<PendingDynamicLinkData?> getInitialLink()
Attempts to retrieve the dynamic link which launched the app.
This method always returns a Future. That Future completes to null if
there is no pending dynamic link or any call to this method after the
the first attempt.
https://pub.dev/documentation/firebase_dynamic_links/latest/firebase_dynamic_links/FirebaseDynamicLinks/getInitialLink.html
FirebaseDynamicLinks.instance.getInitialLink() does not accept a string url as parameter because it is just meant to return the link that opened the app.
Looks like there's no straightforward answer to getting the query parameters back from a shortened link. Take a look at this discussion to see if any of the workarounds fit your use case.

What is the most efficient way of filling in details on a web form?

Take a standard web page with lots of text fields, drop downs etc.
What is the most efficient way in webdriver to fill out the values and then verify if the values have been entered correctly.

You only have to test that the values are entered correctly if you have some javascript validation or other magic happening at your input fields. You don't want to test that webdriver/selenium works correctly.
There are various ways, depending if you want to use webdriver or selenium. Here is a potpourri of the stuff I'm using.
Assert.assertEquals("input field must be empty", "", selenium.getValue("name=model.query"));
driver.findElement(By.name("model.query")).sendKeys("Testinput");
//here you have to wait for javascript to finish. E.g wait for a css Class or id to appear
Assert.assertEquals("Testinput", selenium.getValue("name=model.query"));
With webdriver only:
WebElement inputElement = driver.findElement(By.id("input_field_1"));
inputElement.clear();
inputElement.sendKeys("12");
//here you have to wait for javascript to finish. E.g wait for a css Class or id to appear
Assert.assertEquals("12", inputElement.getAttribute("value"));

Hopefully, the results of filling out your form are visible to the user in some manner. So you could think along these BDD-esque lines:
When I create a new movie
Then I should see my movie page
That is, your "new movie" steps would do the field entry & submit. And your "Then" would assert that the movie shows up with your entered data.
element = driver.find_element(:id, "movie_title")
element.send_keys 'The Good, the Bad, the Ugly'
# etc.
driver.find_element(:id, "submit").click
I'm just dabbling in this now, but this is what I came up with so far. It certainly seems more verbose than something like Capybara:
fill_in 'movie_title', :with => 'The Good, the Bad, the Ugly'
Hope this helps.

IE response.redirect

I ran into an extremely odd issue with IE today. IE fails every time I try to do a response.redirect more than ten times! Of course, the page works fine in FF and Chrome. Has anyone else experienced something like this?
Here are some code snippets to make sure I am not doing anything blatantly wrong...
Loop
if ( iDomain < ubound(aDomain) ) then
Response.Redirect "/home/login/a_logout.asp?site=" & strSite & "&domain=" & iDomain+1 & "&l=" & ilogout & "&s=" &sSid
end if
Array
Dim aDomain(10)
aDomain(0) = ".x.com"
aDomain(1) = "www.x.com"
aDomain(2) = "w1.x.com"
aDomain(3) = "w2.x.com"
aDomain(4) = "x.com"
aDomain(5) = "w3.corporate.x.com"
'aDomain(5) = "w4.x.com"
aDomain(6) = "w5.x.com"
aDomain(7) = "w6.x.com"
'aDomain(8) = ""
'aDomain(9) = "w8.x.com"
aDomain(8) = "w9.x.com"
aDomain(9) = "w10.x.com"
Removed context sensitive data.
Let me know if you need any other info. Thanks!

This is the default behaviour to prevent a user from being looped back to the same page infinitely.
IE8s limit is 10 requests to the same page, Chrome and FireFox I believe are 20.
And no, a different querystring doesn't constitute a new page as I found out myself.

I would highly suggest that you change this. Redirecting multiple times is a pretty bad idea.
Instead, just run whatever code is being run by your a_logout page locally. I'm assuming your clearing several cookies. Go ahead and resend all of the appropriate cookies with blank data and an expires yesterday time.

Redirecting so often is blatantly wrong. The ideal maximum number of redirects is 1. In practice it can be a lot easier to do certain tasks if you allow for more than that, but anywhere more than 5 redirects happen should be considered a bug (more than 1 on the same server or more than 3 that crosses to another server should be considered sub-optimal, but not urgent to fix).
Browsers can't depend upon servers never doing anything blatantly wrong, so after a few goes they give up to save the user from the server. Sometimes user-agents don't protect themselves in this way (not serious browsers, but it's an easy mistake to make writing a simple piece of HTTP client code). It isn't pretty.
To demonstrate just how bad this can be, consider a case where the handler for /somePath/?id=1 redirects to /somePath/?id=2 which redirects to /somePath/?id=3 and so on. For all the server knows, you've just got a more obscure version of that, and will never stop redirecting.

ASP.Net links won't disable if done during postback

I'm still fairly new to ASP.Net, so forgive me if this is a stupid question.
On page load I'm displaying a progress meter after which I do a post back in order to handle the actual loading of the page. During the post back, based on certain criteria I'm disabling certain links on the page. However, the links won't disable. I noticed that if I force the links to disable the first time in (through debug) that the links disable just fine. However, I don't have the data I need at that time in order to make the decision to disable.
Code Behind
If (Not IsCallback) Then
pnlLoading.Visible = True
pnlQuote1.Visible = False
Else
pnlLoading.Visible = False
pnlQuote1.Visible = True
<Load data from DB and web service>
<Build page>
If (<Some Criteria>) Then
somelink.Disable = True
End If
End If
JavaScript
if (document.getElementById('pnlQuote1') === null) {
ob_post.post(null, 'PerformRating', ratingResult);
}
ob_post.post is an obout js function that does a normal postback and then follows up with a call to the server method named by the second param. then followed by the call to a JavaScript method named by the third param. The first parameter is the page to post back to. A value of null posts back to the current page.
The post back is working fine. All methods are called in the correct order. The code that gives me trouble is under the code behind in bold. (somelink.disabled = True does not actually disable the link) Again, if I debug and force the disabling of the link to happen the first time in, it disables. Does anyone know what I might do to get around this?
Thanks,
GRB

Your code example is using the IsCallBack check, while the question text talks about the IsPostback Check. I'd verify that you're using Page.IsPostBack in your code to turn off the links.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

scrapy.Request() not working with response.url - web-scraping

I am not sure if I understand you well, but I think you are sending the same request twice, so you need to set dont_filter to True. yield response.follow(url=response.url,callback=self.parse_menu,dont_filter=True)

Try: url = i.css('a::attr(href)').get()

Related

Avoid Areas not working in Here Fleet Telematics

Understand Dynamic Links Firebase

What is the most efficient way of filling in details on a web form?

IE response.redirect

ASP.Net links won't disable if done during postback

Categories

Resources