Is it possible to implement Scrapy into a dekstop app? - web-scraping

Can Scrapy be implemented in my custom desktop app where it automates the scraping process on a lists of webpages and export it as table into database?

Yes, you can. I have done the same via python, PYQT5, scrapy and Database.
1. Create the application frontend using Pyqt5(GUI,Designer option available ).
2. Create scrapy project and run the spider via python script scrapy call.
If you are deploying the spider in any services, use the service API call.
3. Check the database connectivity in the frontend itself.
if it succeeds, save the spider response to the corresponding database.
4. Use pyinstaller or py2exe to convert the code to a standalone executable.

Yes you can do that, if you are using scrapy python scraped the web pages and stored the desired data into any database you want and then periodically pick the data from database and show it in Desktop app. Desktop app can be of any framework or language.

Related

How can I run a standalone script within a next.js project?

Within a next.js project, I wish to run a standalone script written in typescript. This script needs to import various source files in my next.js project but it just performs some database queries and writes some output to stdout so it doesn't need to start a webserver or anything like that. What is the best way to do this?
If I just run node myscript.js then I get a ton of import errors, presumably because my typescript code is not directly parseable by node.
What is the canonical way to run a standalone script in a next.js project?

Connect airflow to google fusion

I'd like to write python script which manages my google data fusion pipelines and instances (creates new, deletes, starts, etc). For that purpose I use airflow installed as library. I've read some tutorials and documentations but I still can't make that script connect with data fusion instance. I've tried to use next string:
export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://?extra__google_cloud_platform__key_path=%2Fkeys%2Fkey.json&extra__google_cloud_platform__scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform&extra__google_cloud_platform__project=airflow&extra__google_cloud_platform__num_retries=5'
with my data json key file and Project id but it still doesn't work. Can you give me an example of creating that connection?
You can find an example python script here:
https://airflow.readthedocs.io/en/latest/_modules/airflow/providers/google/cloud/example_dags/example_datafusion.html
This page provides a breakdown for each Data Fusion Operator if you would like to learn more about them:
https://airflow.readthedocs.io/en/latest/howto/operator/gcp/datafusion.html

How can I use SonarQube web service API for reporting purpose

I want to create a custom report. Response format for sonarqube web service API /api/issues/search is JSON or XML. How can I use that response to create a html or CSV file using "unix shell without using command line tools" so that I can use it as a Report. Or is there any other better way to achieve this?
you can generate a html file if you run an analysis in the preview mode http://docs.sonarqube.org/pages/viewpage.action?pageId=6947686
It looks as if the SonarQube team has been working hard to not allow people to do this. They appear to want people to purchase an Enterprise Subscription in order to export reports.
An old version of sonar-runner (now called sonar-scanner) had an option to allow local report output. But that feature is "no more supported".
ERROR: The preview mode, along with the 'sonar.analysis.mode' parameter, is no more supported. You should stop using this parameter.
Looks like version 2.4 of Sonar Runner does what you want. If you can find it. Of course they only have 2.5RC1 available on the site now.
Using the following command should work on version 2.4:
sonar-runner -Dsonar.analysis.mode=preview -Dsonar.issuesReport.html.enable=true
There at least two open-source projects that query the SQ API to generate reports in various formats.
https://github.com/cnescatlab/sonar-cnes-report/tree/dev (Java)
https://github.com/soprasteria/sonar-report (JavaScript/Node)
At the time of writing both are active.

Quartz.Net (2.2.3) Scheduling New Jobs

I am running the Quartz.Net server as a Windows service, like described in the documentation. I am trying to understand how I can create new jobs for Quartz to schedule, without the need to rebuild the Quaretz.net server application everytime.
I would like to be able to add new jobs from an exe, dll, or other options welcome. This way I can add jobs dynamically. From what I can tell it seems all jobs must be defined up front and built into the server. From there the user can pass parameters and enable triggers via XML file. I am using MS SQL Server instead of XML file for persistence layer.
My use case is I need to generate reports at particular times, but the users can create new reports after launch of my application. I am using Dev Express for my reporting (not sure if this matters).
Any guidance is very appreciated.
You should check out the work Tolis Bekiaris did on the eXpand Framework's JobScheduler. It's a module for DevExpress's XAF and Quartz.NET which should give you plenty of sample code, especially if you are already using XPO for your data.
You can get the source code here.
Or alternatively, it's on Github.
You'll find the job scheduler code in eXpand/Xpand/Xpand.ExpressApp.Modules/JobScheduler.

Is there a good way to create a recurring import for an ASP.NET site?

The site I'm working on is running Windows Server 2003 and SQL Server 8 (2000?), and ASP.NET 3.5.
I need to have some sort of script or application run to import data from an FTP'd text file, into the database. There is already a site running on the machine, that uses the current database. Can I use a scheduled task to reliably kick off some sort of .aspx page that will import the data? Or is there a better approach?
What about making sure that no one else can access the page that runs the import? I don't want random users running the import!
Thanks in advance!
P.S. some processing needs to occur on the data before its inserted. i.e. lookups, conditionals, etc, so the DB tools aren't robust enough (I think). I hate DTS, and I SSIS is not available in this version I think.
If you want to have a C# App handle your import I would suggest a windows application (exe) w/o a form (better than a console app because it does not pop up any UI whenever it runs). Have it run every so often (every minute) by a scheduled task.
Why would you use ASP.NET? Depending on the complexity of the job you could either load it directly to the database (BULK LOAD) or use DTS (SQL Server 2000) or SSIS (SQL Server 2005/2008) if more complex processing is needed.
DTS and stored procedures in a job.
BCP and stored procedures in a job.
You say you need to do alot of lookups and conversions? SQL is good at that - and good at doing it fast. It can seem a little intimidating at first, but it's not hard.
run a BULK INSERT or bcp to import the data instead, see here http://msdn.microsoft.com/en-us/library/aa173839(SQL.80).aspx
I'll echo other people here - you don't want to have a scheduled task hit a web page. SQL Server provides some good data import options, or you could just write a simple windows program and run it as a scheduled task.
Another option would be to write a windows service that watches your FTP directory and does the import.
As others have said, probably a separate console application (triggered by a scheduled task) or a windows service would be the best option for this scenario.
On the other hand, if you already have all the required functionality available in the web app running on the server, then you could probably set up a scheduled task, that starts a script (VBscript, JScript), which in turn calls a page of the web app.
To have some sort of security (e.g. preventing that any user can call that page), you could add some code to the page, that checks if the page was called with http://localhost. This would at least prevent the page from being called from a remote client.

Resources