I've used selinium to scrape articles on a website and had to indicate the exact name of the div to search.
Now I have a list of sites that I want to check whether they display any ads at all, but going via element way wouldn't work since all the sites name their elements/divs different.
Is there a way for selinium to check if there are any ads on the website?
Thank you.
Related
I'm currently using woocommerce and by chance when looking through google search console I noticed well over 15,000 pages had been indexed. Instantly raised a concern because I know I should have no more than 400 actual pages.
After looking into it, I noticed that absolutely every possible parameters "variations, grid styles, shipping methods" etc is being indexed causing what is 400 into 15,000 variations.
Does this have an affect on google ranking showing 15,000 pages when really there are only 400.
I can not find a single resource that explains whether google indexing so many variations has a positive or negative impact on google rankings.
Finally how to prevent google or any other search engines from indexing url's with parameters. I've seen advice using robot.txt but no recommendation of what standard woocommerce filters should be excluded or if that's a bad idea?
I have a feeling I am losing alot of link juice by having so many indexed pages with parameters?
WordPress by default expects that you wish to share everything because why not? Surely if you have 15,000 items that are index-able surely you'd be more mad if your site decided on a whim to hide parts of that without you knowing. It is understandable that this would be confusing though, and quite rightly it is always better to have control over which parts of your site gets seen by search engines.
Anyway there are plugins which allow you to customise which parts of a website are visible or hidden to search engines by providing a mechanism for generating an XML site map (an index of which pages should be crawl-able). A very common plugin I see people use is called Yoast SEO. Another one I have had some success with is called Google XML Sitemaps, you may find another to work better for you, but that should give an idea of what you need.
Also if you are not getting visited often by google bot then take the sitemap from one of those types of plugins and submit via the google console to help google better understand your site.
I have a website with ads on it, and people can purchase ad space. The person I'm creating the website for wants to be able to preview live pages with "example" ads (it's basically a placeholder ad with the pixel size on it, they are just images that have already been generated).
So they want to be able to see, for example, the LIVE blog page but with the current ads replaced by the placeholder ads to preview it to the people purchasing ad space.
I'm not sure what the best way to do this is. If it was just for the homepage I'd probably to an if/else to display the right ad div, but this is occurring on 5-6 pages. I've tried using page variants but I can't get them to work at all.
Any guidance is appreciated.
I've been using a very simple approach to implement this kind of functionality.
You might use $_GET requests in your URLs to show a live example.
I might be wrong, but as far as I remember, any $_GET request will result in X-Drupal-Cache:MISS, therefore you might use drupal_get_query_parameters to check the URL for a query and alter the output.
And then you might use http://example.com/?client=demo1 to show your "conditional" ads.
Hypothetical Situation: I have a small obscure website called "miniatureBoltsInCarburetors.com" which provides content about the miniature bolts which hold a carburetor together as well as some general related automotive information. My site also has a single page which allows someone to find the missing bolt in their carburetor, and while no one will access this page directly from my website, one billion other popular automotive sites have embedded this single page in their website using an iframe, yet not included a link back to my site.
I recognize that this question is related to SEO which is considered off topic, however, all of the many SEO related forums discuss the marketing steps one could take, and not the programming steps or strategies, and hope others will allow this question to be answered here.
I wish my site "miniatureBoltsInCarburetors.com" to be ranked high for general automotive searches. What could I do to allow the 3rd party sites which include an iframe back to my site to improve my ranking? Could using JavaScript in the iframe to create a link on the parent page provide any value? What about when my server renders the page, use PHP to get the referring URL from $_SERVER, and include it in the content?
I am providing a solution here. Not sure if this is what you want though.
In your page which is used by other websites in iframe you can put below Javascript. This javascript checks if the webpage is opened inside an iframe or directly in browser.
So using this check when you see it is opened in an iframe. On click on something navigate to your website.
// This works in all browsers
function inIframe () {
try {
return window.self !== window.top;
} catch () {
return true;
}
}
Also for your reference you can check the below URL.
How to prevent my site page to be loaded via 3rd party site frame of iFrame
Hope it helps.
Iframes are seen seperate pages by Google. Your approach may end up being penalized due to being sourced from untrusted site. According to Google Webmaster Support
Frames can cause problems for search engines because they don't
correspond to the conceptual model of the web. Google tries to
associate framed content with the page containing the frames, but we
don't guarantee that we will.
One of the best approaches to rank higher for a specific keyword is, make multiple related sites. In your case a 3-4 paged site about carburetors, bolts, other things your primary site contain would do it. These mini sites will be more intense about the subject due to less page count. Of course they should contain unique articles on each page. Then link from mini websites to primary websites and you can see the dramatic change.
In fact, the thing you are trying to do was a tactic to rank competitors down worked occasionally a few years ago. Now, it is still a risk.
I see. You don't want to mess up the page for your own site, but you want to do something with all the uncredited embeddings.
The solution is fairly simple:
Create a copy of the page.
Switch your site to use the copy.
Amend the version that countless other sites are embedding, so that there is a small link back to you. Or, add an iframe blocker script that will load your site.
If the page is active (ie user interacts with it to find the missing bolt) you could include a sales message with the response encouraging the user to visit your site.
I think that your goal is getting your link onto these other sites long enough to get indexed by Google before it is noticed by the people doing the embedding, so it's a bit of a balancing act.
I see conflicting advice about how Google indexes iframes. You should use a PageRank checker to see if the existing iframe page url has PageRank, and compare it to the page that you embed it on.
I dont Think you need to worry ,.
Google bot does seem to crawl through Iframes ,but the Web-Page Containing that Iframe is not Credited for that Content .. In other Words,, Page-Ranking of that particular Web-Page do not Change due to Contents from Iframe .
is IFrame crawled by Google?
Do robots crawl iframes?
I have website with a lot of tinyMCE instances. I am wondering is that a problem for Google robots to index my content because inside the iframes there is a lot of valuable information. If this information is not indexed it will be bad impact to my page rank and appearing in top results when someone perform search will be impossible!
If you don't want google to index your files, or want to change how it crawls your site, edit your server's robots.txt. Info on how to configure it here
Is it possible to track if someone links to data on my site? Specifically if my data is used in a site dynamically generated by a developer program? I would like to know if someone is blatantly passing off my site's data as their own. There are obviously ways around directly linking to content, such as content manipulation or even manual manipulation. But if someone where to link(or directly add word for word or manipulate) my content into their website, is there a way to track it?
Can I avoid someone being able to scrape my website at all, or is everything just up for grabs?
the best answer and the easy one is called GOOGLE - WEBMASTER TOOLS!
HERE
actually doing that is very hard and you would need to crawl the web to discover those links that address to your pages... dynamic content as well is linked so it would be find by google as well.
this tool will allow you to see outer links that address to your site.. and you can check them.
for extra - you can monitor requests and traffic to your site and find ip's that are using the same page over and over again. that can tell u that an outer page is dynamically loading content from your web page.
EDIT:
here is a good article in this subject: link - scroll down and you can see the use of google
webmaster tool with some other progrmas and method.
here is a good start guide to the google webmaster: link
ENJOY!