Lua script failed clicking on a button - web-scraping

I'm trying to scrape flights from link with scrapy-splash using this lua script:
function main(splash)
local waiting_time = 2
-- Go to the URL
assert(splash:go(splash.args.url))
splash:wait(waiting_time)
-- Click on "Outgoing tab"
local outgoing_tab = splash:select('#linkRealTimeOutgoing')
outgoing_tab:mouse_click()
splash:wait(waiting_time)
-- Click on "More Flights" button
local more_flights_btn = splash:select('#ctl00_rptOutgoingFlights_ctl26_divPaging > div.advanced.noTop > a')
more_flights_btn:mouse_click()
splash:wait(waiting_time)
return splash:html()
end
and from some reason I'm getting this error:
'LUA_ERROR', 'message': 'Lua error: [string "..."]:16: attempt to index local \'more_flights_btn\' (a nil value)', 'error': "attempt to index local 'more_flights_btn' (a nil value)"}, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script'}
Does anyone know why this happens?
Also does anyone know where I can get a toturial for lua script integration with splash? besides the offical site?
Thanks in advance!

This just looks like a timing issue. I ran your Lua script a couple of times and I got that error only once.
Simply waiting longer before getting the button should be enough. However, if the time it takes varies a lot and you don't always want to wait the full time, then you can try a slightly smarter loop like this:
-- Click on "More Flights" button
local more_flights_btn
-- Wait up to 10 seconds:
for i=1,10 do
splash:wait(1)
more_flights_btn = splash:select('#ctl00_rptOutgoingFlights_ctl26_divPaging > div.advanced.noTop > a')
if more_flights_btn then break end
-- If it was not found we'll wait again.
end

Related

Requests(url) is having after 5 iteration

I am attempting to run a webscraping algo on indeed using beautifulSoup and loop through the different pages. However, after 2-6 iterations, the requests.get(url) hangs and stops finding the next page. I have read that it might do something with the server being blocked but that would have blocked the original requests and it also says online that Indeed allows for web scraping. I have also heard that I should set a header but I am unsure how to do that. I am running on the latest version of safari and MacOs 12.4.
A solution I came up with, thought this does not answer the question specifically, is by using a try expect statement and setting a timeout value to the request. Once the timeout value is reached, it enters the try except statement, sets a boolean value, and then continues the loop and try again. Code is inserted below.
while(i < 10):
url = get_url('software intern', '', i)
print("Parsing Page Number:" + str(i + 1))
error = False
try:
response = requests.get(url, timeout = 10)
except requests.exceptions.Timeout as err:
error = True
if error:
print("Trying to connect to webpage again")
continue
i += 1
I am leaving the question as unanswered for now however as I still don't know the root cause of this issue and this solution is just a workaround.

Scheduling an an R script that shows a popup/message box in case of an error on Windows

My aim is to check the http status of a website every 5 minutes and throw an alert message in case it is not 200. To keep it simple, I would like to discuss my question based on the piece of code given below.
library(httr)
a <- status_code(GET("http://httpbin.org/status/404"))
if (a == 404) system('CMD /C "ECHO Client error: (404) Not Found && PAUSE"',
invisible=FALSE, wait=FALSE)
The last bit that begins with system found on
https://heuristically.wordpress.com/2013/04/19/popup-notification-from-r-on-windows/
and
Show a popup/message box from a Windows batch file
The lines above results in
This is a message box from C:\windows\SYSTEM32\CMD.exe poping up that says:
Client error:(404) Not Found
Press any key to continue...
Is there a possiblity to add Sys.time() along this message?
Using the taskscheduleR I scheduled the script above. To get help see:
http://bnosac.be/index.php/blog/50-taskscheduler-r-package-to-schedule-r-scripts-with-the-windows-task-manager-2
library(taskscheduleR)
myscript <- "the address of your r script"
taskscheduler_create(taskname = "myfancyscript_5min", rscript = myscript,
schedule = "MINUTE", starttime = "11:20", modifier = 5)
In this case the message box I get is shown below. Note that this time it is without the message.
How can I get the message written when I run the script using the task scheduler?
You just need to edit the first part of the code. As advised in the comment we will make use of notifier:
https://github.com/gaborcsardi/notifier
In case you have issues with installing the notifier, I was only able to install it through the command below.
devtools::install_version("notifier")
Replace the first bit with the following:
library(httr)
library(notifier)
a <- status_code(GET("http://httpbin.org/status/404"))
if (a == 404) notify(
title = "404",
msg = c("Client error: (404) Not Found")
)

AutoIt Scripting for an External CLI Program - eac3to.exe

I am attempting to design a front end GUI for a CLI program by the name of eac3to.exe. The problem as I see it is that this program sends all of it's output to a cmd window. This is giving me no end of trouble because I need to get a lot of this output into a GUI window. This sounds easy enough, but I am begining to wonder whether I have found one of AutoIt's limitations?
I can use the Run() function with a windows internal command such as Dir and then get the output into a variable with the AutoIt StdoutRead() function, but I just can't get the output from an external program such as eac3to.exe - it just doesn't seem to work whatever I do! Just for testing purposesI I don't even need to get the output to a a GUI window: just printing it with ConsoleWrite() is good enough as this proves that I was able to read it into a variable. So at this stage that's all I need to do - get the text (usually about 10 lines) that has been output to a cmd window by my external CLI program into a variable. Once I can do this the rest will be a lot easier. This is what I have been trying, but it never works:
Global $iPID = Run("C:\VIDEO_EDITING\eac3to\eac3to.exe","", #SW_SHOW)
Global $ScreenOutput = StdoutRead($iPID)
ConsoleWrite($ScreenOutput & #CRLF)
After running this script all I get from the consolWrite() is a blank line - not the text data that was output as a result of running eac3to.exe (running eac3to without any arguments just lists a screen of help text relating to all the commandline options), and that's what I am trying to get into a variable so that I can put it to use later in the program.
Before I suggest a solution let me just tell you that Autoit has one
of the best help files out there. Use it.
You are missing $STDOUT_CHILD = Provide a handle to the child's STDOUT stream.
Also, you can't just do RUN and immediately call stdoutRead. At what point did you give the app some time to do anything and actually print something back to the console?
You need to either use ProcessWaitClose and read the stream then or, you should read the stream in a loop. Simplest check would be to set a sleep between RUN and READ and see what happens.
#include <AutoItConstants.au3>
Global $iPID = Run("C:\VIDEO_EDITING\eac3to\eac3to.exe","", #SW_SHOW, $STDOUT_CHILD)
; Wait until the process has closed using the PID returned by Run.
ProcessWaitClose($iPID)
; Read the Stdout stream of the PID returned by Run. This can also be done in a while loop. Look at the example for StderrRead.
; If the proccess doesnt end when finished you need to put this inside of a loop.
Local $ScreenOutput = StdoutRead($iPID)
ConsoleWrite($ScreenOutput & #CRLF)

'File not found' error on an existing file

I have sometimes a 'file not found' error on the 'DeleteFile' line of this small script:
(I guess when several clients open the script as the same time)
if objFSO.FileExists(fileName) then
Set f = objFSO.GetFile(fileName)
if DateDiff("d", f.DateLastModified, date()) > 3 then
Application.Lock
objFSO.DeleteFile(fileName)
Application.Unlock
end if
Set f = nothing
end if
But this should be protected by the 'FileExists' on the first line?
Any idea ? Thanks.
You're running into a race condition. The file attributes are cached in the second line with GetFile. If the file exists at that point, the code will continue to run. You either need to lock before that point, or refresh your attribute cache and double-check existence after Application.Lock.

Vendors black box function can only be called successfully once

(first question here, sorry if I am breaking a piece of etiquette)
My site is running on an eCommerce back end provider that I subscribe to. They have everything in classic ASP. They have a black box function called import_products that I use to import a given text file into my site's database.
The problem is that if I call the function more than once, something breaks. Here is my example code:
for blah = 1 to 20
thisfilename = "fullcatalog_" & blah & ".csv"
Response.Write thisfilename & "<br>"
Response.Flush
Call Import_Products(3,thisfilename,1)
Next
Response.End
The first execution of the Import_Products function works fine. The second time I get:
Microsoft VBScript runtime error '800a0009'
Subscript out of range: 'i'
The filenames all exist. That part is fine. There are no bugs in my calling code. I have tried checking the value of "i" before each execution. The first time the value is blank, and before the second execution the value is "2". So I tried setting it to null during each loop iteration, but that didn't change the results at all.
I assume that the function is setting a variable or opening a connection during its execution, but not cleaning it up, and then not expecting it to already be set the second time. Is there any way to find out what this would be? Or somehow reset the condition back to nothing so that the function will be 'fresh'?
The function is in an unreadable include file so I can't see the code. Obviously a better solution would be to go with the company support, and I have a ticket it in with them, but it is like pulling teeth to get them to even acknowledge that there is a problem. Let alone solve it.
Thanks!
EDIT: Here is a further simplified example of calling the function. The first call works. The second call fails with the same error as above.
thisfilename = "fullcatalog_testfile.csv"
Call Import_Products(3,thisfilename,1)
Call Import_Products(3,thisfilename,1)
Response.End
The likely cause of the error are the two numeric parameters for the Import_Products subroutine.
Import_Products(???, FileName, ???)
The values are 3 and 1 in your example but you never explain what they do or what they are documented to do.
EDIT Since correcting the vender subroutine is impossible, but it always works for the first time it's called lets use an HTTP REDIRECT instead of a FOR LOOP so that it technically only gets called once per page execution.
www.mysite.tld/import.asp?current=1&end=20
curr = CInt(Request.QueryString("current"))
end = CInt(Request.QueryString("end"))
If curr <= end Then
thisfilename = "fullcatalog_" & curr & ".csv"
Call Import_Products(3,thisfilename,1)
Response.Redirect("www.mysite.tld/import.asp?current=" & (curr + 1) & "&end=" & end)
End If
note the above was written inside my browser and is untested so syntax errors may exist.

Resources