How to scrap a site protected by cloudfare in 2022 - web-scraping

I used the code in the stack overflow
How to scrape site protected by cloudfare
but as of now the cloudflare module is not working and it shows Please Wait... | Cloudflare response after using the module mentioned so is there a possible way for scrap a website using python or node as of now 2022 ?

Related

Wordpress website stops working for a particular IP after updating a page

Weirdest problem ever. I use wordpress (last version) for my website and I use WP-bakery page builder. It has worked without problems for years. All my plugins are updated to the latest version, except WP bakery page builder because i'm not paying for it.
Today, i decided to update the website, and I go to the contact page of my website and change 1 word in the page and click "update". The website ends up showing a "timeout" and the website stopped working.
"This site can’t be reached. Try:
Checking the connection
Checking the proxy and the firewall
ERR_CONNECTION_TIMED_OUT"
In opera the error is DNS_PROBE_FINISHED_NXDOMAIN
Then I tried with a VPN and the website works, so I try to update the page again and "timeout" again and the website is down
Then I changed the VPN ip again and my website is working on that new IP. I try to change a blog post, no problem, I add new pages with the testimonial plugin and no problem, then, thinking that everything is ok, i go to the contact page again and i change 1 word and boom... time out and the website is dead for that IP
Now I run out of VPNs to test and I can't access my website anymore, but this problem is very strange.
I tried https://www.isitdownrightnow.com/ and that website says that my site is up, but i can't access it anymore due to this change that produced the same awful result 3 or 4 times.
Who knows about this mysterious issue and a potential solution?
The following is the list of plugins i use:
Akismet
Contact Form 7
Cookie Notice
Google Analytics for WordPress by MonsterInsights:
Hello Dolly
Hide Featured Image:
Jivo Chat
Maintenance and WP Maintenance mode:
PageBuilder by Site Origin and WPBakery page builder:
Read More without refresh and WP Show More:
ShortPixel image optimizer:
Show Hide Author:
Simple Custom CSS:
Slider Revolution:
ThemesFlat by Themesflat.com:
WP Downgrade | Specific core version:
I don't know why it happened, but I discovered that a rule of modsecurity blocked my IP whenever I attempted to update my page... Seems to be a security thing with wordpress to avoid attacks... This shows up in the server log.
To solve it I removed the rule for my website... which may cause some security issues, but who cares about security.
But I don't know where this blocking comes from...

Website not posting to Facebook: security & app id issues

I'm a new WordPress designer. My site runs Tesseract Theme and is built with Beaver Builder.
PROBLEM: When I post my website (https://louiseclark.tech) on Facebook it removed my site after a couple minutes. Now when I try to post my site it gives me this message--> It looks like a link you're sharing might be unsafe. If you can, please remove this link: louiseclark.tech Note: The unsafe link might be on the page you’re linking to.
What I've done to try and resolve:
When I ran my site through the Facebook debugger I got this message:
The 'fb:app_id' property should be explicitly provided, Specify the app ID so that stories shared to Facebook will be properly attributed to the app. Alternatively, app_id can be set in url when open the share dialog.
I created an app id following this instructional video: https://www.youtube.com/watch?v=V97h03H21y0
I pasted my app id into my Yoast SEO plugin under the Facebook category.
Check my Google Webmaster Tools Sitemap...all is verified and sitemap set.
SSL certificate is set - checked with my hosting company SiteGround. When I asked them about this problem they didn't really feel that the security issues where from their side.
I've reported this problem to the black hole that is Facebook support.
Thank you for any insight.
In case anyone sees this thread, I found the solution.
When I moved my WordPress sites to managed WordPress hosting I also migrated my websites to https with the SSL certificates. While the pages were migrated and displaying the https just fine, the images still held their old url (http).
I did two things:
I installed SSL Content Fixer plugin. This worked for some images but not others.
I installed Better Search Replace plugin. I had found the specific insecure images using Firefox. From my page in Firefox, I went to:
Tools -> Page Info -> Media This showed me every image/js/css call on this page. Finding these images allowed me to use the plugin to make the changes.
It worked. I'm quite sure knowing how to code my site would be much better in this situation. But I'm a newbie and this is what I could come up with.
What I learned: It's a flag when you have a secure site that embeds non secure objects/images.

Getting an Adsense account approved on a Meteor website

I am having difficulty getting approved with Adsense. It seems there is not enough content but I have many blog articles, no inappropriate content or copyright infringements and I have the Ad code in place within the footer.
I believe the issue may caused by my site using client side rendering. (Meteor javascript framework)
So this means that if I do:
$> curl http://www.dales-sports-media.com
I get mostly empty html (meta and html tags, but nothing in the body)
Sharing articles from my site to Facebook and Twitter seems to work fine
Is it possible that google's adsense approval bot is unable to see the fully rendered page?
Has anyone successfully applied for a Adsense account with a Meteor web app?
Thanks,
Mick
What you need is Prerender which is a service that will render and cache your page(s), and then bots will be served up that version so they get the full HTML body.
You should set up nginx to be in front of your Meteor app, so that nginx will use proxy_pass to pass traffic from port 80 into your Meteor app on localhost port 3000, for example.
Then use this nginx config file as a guideline to set up Prerender: https://gist.github.com/thoop/8165802
If you're limited and can't install your own web server, make sure you've tried the spiderable package.
$ meteor add spiderable

Can I mirror wordpress public REST API to my self hosted wordpress site?

I have a self hosted wordpress site, for example, www.example.com,
I can access the site's posts and other data through wordpress official REST API site "https://public-api.wordpress.com" via Jetpack plugin installed and enabled.
But I can not access "https://public-api.wordpress.com" this site in my country since it is blocked by firewall that deployed within my country.
So,my question is:
Can I mirror this feature to my self hosted wordpress site so that I can directly access contents in my local APP?
I have never used the official api, but i have used these plugins to create api for my wordpress. https://wordpress.org/plugins/json-api/ and https://wordpress.org/plugins/json-rest-api/.
The plugins are preety functional, you just need to install them and start using the REST API. There might not be too many features in them but they get the job done. for advance stuff you will have to modify the plugin to your needs.

Integrating wordpress with website written in JSP

I have my website which is developed using Spring MVC using JSP. I wanted to integrate the wordpress blog to the website. I.e Install wordpress on the webserver and have it linked to the site menu.
The site is hosted on AWS with apache tomcat and database used is RDS.
I have read some pointers on integrating but most of them suggest using iframes. Is that the only solution? Can someone share some thoughts?
Appreciate your help
Best,
Donald
#see: http://codex.wordpress.org/XML-RPC_WordPress_API
Wordpress has a XML RMI api. You can use it to obtain content and for submitting new blog items.

Resources