How to automate scrapy - web-scraping

How to automate scrapy - web-scraping

I've got a little problem with my scrapy spider. So I set up scrapy and all is working fine but everytime I want to scrape a website I have to start the spider by myself. But I want it to be full automated and doesn´t know how to do.
Actually I start the spider with cmdline.execute. I thought I could simply write a while True loop but turns out it doesn´t work. And i found out, that the spider doesn´t really quit. Hard to explain. Pycharm says "Finished with exit code 0" but if i put a print("End of program") after the cmdline.execute it doesnt print out anything.
And at this point I'm confused what to do. Can you help me?

Try using scrapyd.
Scrapyd is an application for deploying and running Scrapy spiders. It enables you to deploy (upload) your projects and control their spiders using a JSON API.
Some tutorials:
How to deploy scrapy spider using scrapyd?
Deploy, Schedule & Run Your Scrapy Spiders

There are many options for scheduling spiders.
CRON:
Like Alexander commented you can create a CRON Job, it think this is best suited for a situation where you have just a few spiders that you're not gonna change the schedule for often.
Scrapydweb: It's a web interface for managing scrapyd. You must host it yourself. Quite easy to use in my experience.
Zyte: Practially the same as scrapydweb but it's a SaaS app that you do not host yourself. Very easy to use but expensive.
Gerapy: I have not tried it but I believe it's similar to scrapydweb but seems to be built on some more modern frameworks.

Thank you all but I´ve found a solution. I wrote a BASH script and in that script I am able to loop.

Related

Is there a way to set breakpoints in Cypress tests?

Like the title says. I've recently gotten a hold of a Cypress codebase that I need to convert into Playwright. I'm new to both Cypress and Playwright but have experience using other automated testing systems. The last one I used made it pretty easy to set breakpoints on any line that would let me step through the code and see what each line was doing. I figured if I could do this, it would make my deciphering of the Cypress code so I could turn it into something that works with Playwright an easier prospect. Google has not been the most helpful here.

You can use Cypress' debugger to achieve something similar to setting breakpoints while executing via cypress open.
cy.get('foo').debug();

If you prefer VSCode, there is a similar SO question. It seems it require some manual setup to get debugger to work.
For IntelliJ (IDEA, WebStorm etc.), there is a paid Cypress Support Pro plugin (I'm the author). It fully integrates the IDE debugger with Cypress. See this video overview.

How can you specify your terminal emulator in Corda

Xterm is used when running Corda locally on one computer using gradle.
Is there a way to specify your terminal editor when running as suggested by the following issue?
https://github.com/corda/corda/issues/2605

I completely share your pain on this. The way that runnodes has its tooling baked in makes it impossible for you to customize how the cordform plugin runs the nodes without digging into the internals.
Some other ideas for you
one thing you could do would be to stop using cordform altogether and run your corda network using dockerform (example here: https://github.com/corda/samples-java/blob/master/Features/dockerform-yocordapp/build.gradle#L93) so that the plugin doesn't need to actually create new terminals.
the much harder way would be to actually download the corda gradle plugins (https://github.com/corda/corda-gradle-plugins#installing-locally) and install it locally with your edits to the cordform task so that it opens the terminal of your choice. You may be able to PR them as the cordform task that's usually used to generate the runnodes script comes from here as far as I know.
As a separate note, I saw your github issue and I was disappointed by how that got handled. I'm sorry you had that experience and I'm going to dig into that issue internally to find out what's happening with that.
feel free to reach out to me (David Awad) on slack.corda.net and I can let you know what's going on there.
Thanks as always

What is a good hosting solution for running node.js with R / Rserve?

I need to run R with Node.js, using Rio (https://github.com/albertosantini/node-rio) as the node binding to Rserve.
I like Heroku but this seems like it is pushing the Heroku envelope beyond what it or I am competent with:
I've looked briefly into installing a custom buildpack
https://github.com/virtualstaticvoid/heroku-buildpack-r
to run simultaneously with node.js:
https://github.com/ddollar/heroku-buildpack-multi
This all seems pretty scary. Anyone got any good advice for how best to host this? My app works just fine locally.

http://prgmr.com/xen/
I currently use this solution to run my Node.js server and it's currently great.
They have wonderful support and they're uptime is 100%. I cannot recommend this any higher, but you will need to know how to set up a simple OS and run it from the ground up.
For example, if you want to run a server without having it stop when you close the SSH connection, you would use screen node script.js and press [control] + [A] + [D] keys.
You might already know this, so simply take my advice and view the website.

After some research and recommendations from Heroku, I believe the Heroku solution would be
Use https://github.com/virtualstaticvoid/heroku-buildpack-r
in combination with
https://github.com/ddollar/heroku-buildpack-multi#readme
to build a multi build pack.

Jmeter console manipulation for automation purposes

I am pretty newbie by this question.
I want to know about possibilities of how to manipulate Jmeter through the console (bash or cmd).
My goal for a start consists in understanding of how to run my testplan.jmx for several URLS. For this I add "server" and "port" parameters into my testplan.
How could I can change these parameters through the console and then run Jmeter ?
Morover, I want to ask you guys to suggest any free online tutorials where I can learn more about "Jmeter in non gui mode" and possibilities for integration Jmeter between different frameworks to use for automated testing.
Thank you very much indeed.

See:
http://jmeter.512774.n5.nabble.com/How-to-Run-Jmeter-in-command-line-td2640725.html
You can launch your test plan from the command line, specifying parameters, like:
jmeter -n -t plan.jmx -Jmy_url=http://www.firsturl.com
Inside your testplan you'd reference that command line param as ${__P(my_url)}
In terms of capturing results when running in non-gui mode, you may want to see:
http://blogs.amd.com/developer/2009/03/31/using-apache-jmeter-in-non-gui-mode/
Personally, my experience is with using the GUI and writing and running test plans that way but this seems workable.

How do I debug server side code in Aptana Jaxer?

I'm trying to debug some server-side JavaScript code running in Aptana Jaxer and I'm not having any success. I haven't even been able to find any tutorials or posts about this issue. Does anyone know if it's possible and if so, what am I missing?

You can set you Jaxer.Config.DEV_MODE = true; to get some error information in your browser.
Also use the Jaxer.Log to debug.
Hope this helps a bit.

Jaxer and Aptana Studio do not yet have the ability to debug remote scripts from the client side. That is, you can't single-step into a callback and have your code window show you the first line of code in the remote method. This is on their wishlist, of course, but it'd be pretty tricky to do well.
Personally, I use logging. Jaxer has strong facilities for this, in Jaxer.Log.*.
A lot of people sneer at "printf() debugging", but the fact is, it works, and it's often less trouble to set up than an interactive debugger, especially for server applications and remote method invocation. You just sprinkle logging messages wherever you want to know the state of the system at that point, then make your app try to do the thing that's failing. Study the logs, rinse, repeat.

tail -f /opt/AptanaJaxer/logs/jaxer.log

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to automate scrapy - web-scraping

Try using scrapyd. Scrapyd is an application for deploying and running Scrapy spiders. It enables you to deploy (upload) your projects and control their spiders using a JSON API. Some tutorials: How to deploy scrapy spider using scrapyd? Deploy, Schedule & Run Your Scrapy Spiders

Thank you all but I´ve found a solution. I wrote a BASH script and in that script I am able to loop.

Related

Is there a way to set breakpoints in Cypress tests?

How can you specify your terminal emulator in Corda

What is a good hosting solution for running node.js with R / Rserve?

Jmeter console manipulation for automation purposes

How do I debug server side code in Aptana Jaxer?

Categories

Resources