Automatically fetch data every 10 minute (Simple html dom) - simple-html-dom

im working on a project, where i want to fetch last minute flights and then save them into my database. The problem is that i don't want scrape everytime the user visits the website and then save into my database because that will only cause alot of duplicates. Can i somehow make the website fetch the data for me on a scheduled time and then delete previous records in the database?

If you want the OS to execute a task periodically, cron job is what you want.
Either get the cron job to call your program via the command line, or use wget to fetch the page that would trigger the data fetching.
More on cron jobs:
http://www.thesitewizard.com/general/set-cron-job.shtml

Related

How to get fresh db data with wc_get_order

I have a script that works as a daemon.
This script every so often is supposed to retrieve the order data and then processes the data.
In a situation where the script is running and retrieves the data of a given order and the order has a status of, for example, "on-hold" and then I change its status to "processing" the script still sees the status "on-hold" when I retrieve the data via wc_get_order because it uses an internal wp cache that is not refreshed.
So how do I retrieve the most current order data from the database.
I searched in the source code if there is perhaps a parameter to force the retrieval of data from the database but did not find it.
After hours of searching I made it.
wp_using_ext_object_cache( false );
wp_cache_flush();
wp_cache_init();
Using this 3 lines of code clear cache.

I have a news app, which fetches rss feed through a cron every hour. As the list of files (URLs) it has to fetch is in 100's it gets stuck

I have a news app, which fetches rss feed through a cron every hour. As the list of files (URLs) it has to fetch is in 100's it gets stuck. Is it possible to write a cron script, which is conditional, like fetching only 10 at a time from the list of 100's and run
You can do it simply by adding a timestamp field in database. Which update on every fetch. So when your url fetch by cron its update the last fetch time. Then you can make a query which fetch the urls according to last fetched time in DESC order and Limit 10.
So only 10 urls fetched which is not fetch from a long time. I think this will solve your problem.

Unordering jobs that are scheduled to run in Control-M

How do I stop my running Control-M jobs from executing? Basically I want them to stop running, remove them from the Monitoring view.
in Control-M version 8 you can un-schedule a whole folder by just selecting Manual Order in Order Method. if it's just one specificjob, you can edit the scheduling properties of that job directly and select Manual Order too.
Right click on your job and select "Hold". Now more actions/delete should be available
at controlM V7 and V8/9 open Desktop/Planning environment, Load the schedule table, select the job individually, at Scheduling tab un-select Months, Verify and Checkin. This method is to disable jobs to load and execute( obviously will not show at monitoring window).
Note: Always at planning load the scheduling table, not individual jobs or subapplications. This will overwrite CtmServer DB.

How to reschedule a coordinator job in OOZIE without restarting the job?

When i changed the start time of a coordinator job in job.properties in oozie, the job is not taking the changed time, instead its running in the old scheduled time.
Old job.properties:
startMinute=08
startTime=${startDate}T${startHour}:${startMinute}Z
New job.properties:
startMinute=07
startTime=${startDate}T${startHour}:${startMinute}Z
The job is not running at the changed time:07th minute,its running at 08th minute in every hour.
Please can you let me know the solution, how i can make the job pickup the updated properties(changed timing) without restarting or killing the job.
You can't really change the timing of the co-ordinator via any methods given by Oozie(v3.3.2) . When you submit a job the contents properties are stored in the database whereas the actual workflow is in the HDFS.
Everytime you execute the co-ordinator it is necessary to have the workflow in the path specified in properties during job submission but the properties file is not needed. What I mean to imply is the properties file does not come into the picture after submitting the job.
One hack is to update the time directly in the database using SQL query.But I am not sure about the implications of it.The property might become inconsistent across the database.
You have to kill the job and resubmit a new one.
Note: oozie provides a way to change the concurrency,endtime and pausetime as specified in the official docs.

Wordpress Multi-Master DB Replication: Deadlock when updating cron table in wp_options

We're running Wordpress in an environment that features a multi-master DB behind a load-balancer. The error log was filling up with a deadlock error when WP tried to update the cron table in wp_options. We disabled wp-cron altogether but are still seeing the error, so, two questions:
1) What causes the cron table in wp_options to be updated?
2) It appears to run on every page load. Can this be disabled and a cronjob setup to run it periodically in crontab?
Thanks
Wordpress uses wp-cron.php as a means for running scheduled tasks when the user doesn't have access to or want to setup cronjobs via Unix. This process looks at the scheduled jobs in the cron table in wp_options and if the specified time (or more) has elapsed then the job executes.
wp-cron.php uses wp-includes/cron.php (the Wordpress Cron API) to run scheduled jobs. In cron.php you'll find a number of functions that update the cron table table, these functions are all around the scheduling of events.
Any function of Wordpress or plugin that requires a scheduled event uses the Cron API to do so. However, the action of scheduling an event (even if it already exists) updates the cron table in wp_options. Even with wp-cron.php totally disabled, these elements of Wordpress/the plugin are loading and scheduling their events, trying to update the cron table in the process.
I've not figured-out exactly why the deadlock occurs, other than knowing it must be related to the DB/site config, but I do now know that Wordpress is behaving itself.
I've run into this same issue -- the databases would go out of sync very quickly. Certain plugins made it occur faster (they scheduled lots of cron jobs), but even with them disabled, eventually the errors would block replication.
I was able to keep replication working by doing two things.
The first, in my.ini, was to add:
slave-skip-errors = 1062
This instructs MySql to skip creating entries when a duplicate key already exists. My cluster is set up as active-passive, so in theory, there should be no "real" writes to the passive MySql node unless the active node is down, in which case, there will be no "real" writes to that node. The only stuff that gets written to the passive node is as a result of wp-cron jobs, which (in theory) are also running on the active node.
The second, in each site's wp-config.ini, was to add:
/** disable cron */
define('DISABLE_WP_CRON', true);
This blocks wp-cron from running at all, so either one of these solutions should work on their own.
Another option would be to disable wp-cron, but leave the full database syncing in place, and schedule a script to call each site's wp-cron.php (you'd be accomplishing manually what the wp-cron service does automatically). That way, it will only run on the active node and the data should be synced over to the passive node with no problems.

Resources