How to prevent Scrapy Shell from redirecting - response.redirect

I am trying to scrape data from a search query in Lulu and Georgia's website. This is the link to the search query when I searched up "desks": https://www.luluandgeorgia.com/catalogsearch/result/?indexName=magento_en_products&facetsFilter=&page=0&numericFilters=
However, when I use the scrapy shell with that link and try to extract images of the products, it is extracting images from the main web page, meaning that using that link redirects the website to its main home page. To prevent this redirect, I went into Shell.py and made redirect = False wherever I could. However this is not working and the shell continues to scrape information from the main home page. What else can I do to prevent this?

Related

Please introduce me a WordPress plugin

I need a page to redirect users. Usually, these pages are used to download free files so that the users has to view advertisements. I would like to make this page with a plugin because I am not a programmer .But I can put codes in the header, footer and body with the help of the WordPress template (Jannah template).
The page should be in such a way that users wait for some time after entering this page, for example 30 seconds. After that, the download button will be displayed for them.
Note: I'm know redirecting users and I'm familiar with plugins like Yost, I need a page, a page that can redirect users.
Sample: https://weadown.com/get/down=06EheQIs9D0ICD9MS4cy9OwdV
Please help me friends.
I didn't try anything because no matter what I searched I couldn't find a plugin for this.

Removed a page from a Wordpress live website, but it still showing up in google search

Working on a live Wordpress website deleted a page that has been published by mistake:
1- Turned the page into a Draft page, but it was still showing
2- Moved the page to Trash on the Wordpress dashboard and then deleted it permanently from the bin, but it still showing in the google search.
3-Cleared the catch, but it still showing up in google search and the link returns a 404 error.
Following up Google instruction:
Make removal permanent:
Remove or update the actual content from your site (images, pages,
directories) and make sure that your web server returns either a 404
(Not Found) or 410 (Gone) HTTP status code. Non-HTML files (like PDFs)
should be completely removed from your server. (Learn more about HTTP
status codes)
Block access to the content, for example by requiring a password.
Indicate that the page should not to be indexed using the noindex meta
tag. This is less secure than the other methods.The Remove URLs tool is
only a temporary
The first part is easy and shows the 404 error.
The second part:
Turning the page to a password protected page, now it takes me to password protected page and it is still showing on the Google search.
Also for noindex I had the following options:
Any idea why or any recommendation?
Thank you
Google search is based on indexing, it takes time to update your website's content, pages, and dependencies. Search updates are based on mechanism of Google crawlers (so called spiders). These spiders crawls your website's content and follow your meta tags and robots.txt file.
Generally, it will take about 1-3 days to get your page removed from the search results. There are no quick ways to do it as it is based on indexing. Make sure you update your Google webmasters account (just check for errors on your account which relates to a page not found).

How to restrict viewers from downloading until they subscribe to my wordpress site?

I have some pdf's link on my site. I want whenever someone tries to download those pdf's, a popup will pop for a subscription. Now the main thing is I don't want users to cut this popup unless they subscribe to my website.
Pdf's to be allowed to download only after subscription process.
Note : There is no login or signup option on the website. It is just a normal surfable website with some pdf's link.
This plugin should do what you're looking for:
https://wordpress.org/plugins/email-to-download/
I've used it before and it worked fine back then.

Google cached url for specific site and parmanent url redirection

first i like know how can i see what are the pages are cached of my web site. say my web site is www.mysite.com
i am going to change few urls of my site but there is one problem that i may loose SEO. suppose google cached this page of my url like
www.mysite.com/detailproduct.aspx?id=200 now i have change the location and name of the page. say now detail product name change to product and url looks like
www.mysite.com/catalog/products.aspx?id=200 so when people search google and if this link www.mysite.com/detailproduct.aspx?id=200 comes in google search and if user click on this link then no relevant page will display. so first of all i need to know what are the different pages has been cached of my web site by google if i know then i can write permanent redirection logic as a result google cache pages url will be change..i guess.
if anyone know any best practice to handle this situation then please discuss in details. the situation is few page name and location has been change and if user search google and if old page url comes then no page will display when user click on those link. i want to handle this situation in best way....what is all of your suggestion. thanks

Python : Problems getting past the login page of an .aspx site

Problem:
I have searched several websites/blogs/etc to find a solution but did not get to what I was looking for. The problem in short, is that I would like to scrape a site - but to get to that site - I have to get past the login page.
What I did:
I did manage to use urllib2 and httplib to open the page, but even after logging in (no errors being displayed) the redirection of the login page as shown in the browser does not happen. My code was not too different than what was displayed here: How to use Python to login to a webpage and retrieve cookies for later usage? ; except that I did not use Cookies.
What am I looking for?
I am not entirely sure what fields I need to be looking for besides the "username" and "password" fields. What I would like for the script to do is
1) Successfully login to the .aspx site and display a message of some sort that the login was successful
2) Redirect to another page after logging in, in order for me to scrape the data off from the site.
3) How to gather any site's POST/GET fields so I know that I am passing/calling the right parameters?
Any assistance/help/advise would be much appreciated.

Resources