Woocommerce Parameters Google Indexing - wordpress

I'm currently using woocommerce and by chance when looking through google search console I noticed well over 15,000 pages had been indexed. Instantly raised a concern because I know I should have no more than 400 actual pages.
After looking into it, I noticed that absolutely every possible parameters "variations, grid styles, shipping methods" etc is being indexed causing what is 400 into 15,000 variations.
Does this have an affect on google ranking showing 15,000 pages when really there are only 400.
I can not find a single resource that explains whether google indexing so many variations has a positive or negative impact on google rankings.
Finally how to prevent google or any other search engines from indexing url's with parameters. I've seen advice using robot.txt but no recommendation of what standard woocommerce filters should be excluded or if that's a bad idea?
I have a feeling I am losing alot of link juice by having so many indexed pages with parameters?

WordPress by default expects that you wish to share everything because why not? Surely if you have 15,000 items that are index-able surely you'd be more mad if your site decided on a whim to hide parts of that without you knowing. It is understandable that this would be confusing though, and quite rightly it is always better to have control over which parts of your site gets seen by search engines.
Anyway there are plugins which allow you to customise which parts of a website are visible or hidden to search engines by providing a mechanism for generating an XML site map (an index of which pages should be crawl-able). A very common plugin I see people use is called Yoast SEO. Another one I have had some success with is called Google XML Sitemaps, you may find another to work better for you, but that should give an idea of what you need.
Also if you are not getting visited often by google bot then take the sitemap from one of those types of plugins and submit via the google console to help google better understand your site.

Related

How does amp analytics affect multiple sessions on the same property?

Essentially, I'm concerned that a single user can be counted twice. Is there a best practice, etc. I've tried googling and I'm not sure if I'm just not asking the right question with the right words. Platform is on sitecore.
Using the same property to track AMP and non-AMP pages will result in multiple users. See here for Google's recommendation.
Though looks like you can use the Google AMP Client ID API to work around this.

Bing Web Search shows results only for my domain

Using Bing Web Search API, I need to filter results only for my domain, example query:
https://api.cognitive.microsoft.com/bing/v7.0/search?q=site:mysite.com+myquery
But in results I received not only mysite.com results but also from sites like wikipedia and others.
How I can search result only for my domain?
Bing Custom Search not work for me because I have more than 10k transactions
Your website is probably not known or indexed by Bing. Since you use Bing Search API, not a custom indexing service or search across a sitemap.
The actual Bing website will need to be able to find your site.
Since it doesn't, this triggers the default behavior of returning as relevant as possible results.
This behavior is valid for urls such as the following:
https://api.cognitive.microsoft.com/bing/v7.0/search?q=microsoft+site:notAnIndexedWebsite.com
Formatting wise, there is multiple options, as seen here.
None of it is a problem in this case.
You can try Bing custom search if it helps (at https://www.customsearch.ai/), especially as it is now in GA. It also provides option to crawl your pages in Bing index through webmaster, if they are not crawled already. This should make sure that you get results only from your website.

Web crawlers and IFrames

Hypothetical Situation: I have a small obscure website called "miniatureBoltsInCarburetors.com" which provides content about the miniature bolts which hold a carburetor together as well as some general related automotive information. My site also has a single page which allows someone to find the missing bolt in their carburetor, and while no one will access this page directly from my website, one billion other popular automotive sites have embedded this single page in their website using an iframe, yet not included a link back to my site.
I recognize that this question is related to SEO which is considered off topic, however, all of the many SEO related forums discuss the marketing steps one could take, and not the programming steps or strategies, and hope others will allow this question to be answered here.
I wish my site "miniatureBoltsInCarburetors.com" to be ranked high for general automotive searches. What could I do to allow the 3rd party sites which include an iframe back to my site to improve my ranking? Could using JavaScript in the iframe to create a link on the parent page provide any value? What about when my server renders the page, use PHP to get the referring URL from $_SERVER, and include it in the content?
I am providing a solution here. Not sure if this is what you want though.
In your page which is used by other websites in iframe you can put below Javascript. This javascript checks if the webpage is opened inside an iframe or directly in browser.
So using this check when you see it is opened in an iframe. On click on something navigate to your website.
// This works in all browsers
function inIframe () {
try {
return window.self !== window.top;
} catch () {
return true;
}
}
Also for your reference you can check the below URL.
How to prevent my site page to be loaded via 3rd party site frame of iFrame
Hope it helps.
Iframes are seen seperate pages by Google. Your approach may end up being penalized due to being sourced from untrusted site. According to Google Webmaster Support
Frames can cause problems for search engines because they don't
correspond to the conceptual model of the web. Google tries to
associate framed content with the page containing the frames, but we
don't guarantee that we will.
One of the best approaches to rank higher for a specific keyword is, make multiple related sites. In your case a 3-4 paged site about carburetors, bolts, other things your primary site contain would do it. These mini sites will be more intense about the subject due to less page count. Of course they should contain unique articles on each page. Then link from mini websites to primary websites and you can see the dramatic change.
In fact, the thing you are trying to do was a tactic to rank competitors down worked occasionally a few years ago. Now, it is still a risk.
I see. You don't want to mess up the page for your own site, but you want to do something with all the uncredited embeddings.
The solution is fairly simple:
Create a copy of the page.
Switch your site to use the copy.
Amend the version that countless other sites are embedding, so that there is a small link back to you. Or, add an iframe blocker script that will load your site.
If the page is active (ie user interacts with it to find the missing bolt) you could include a sales message with the response encouraging the user to visit your site.
I think that your goal is getting your link onto these other sites long enough to get indexed by Google before it is noticed by the people doing the embedding, so it's a bit of a balancing act.
I see conflicting advice about how Google indexes iframes. You should use a PageRank checker to see if the existing iframe page url has PageRank, and compare it to the page that you embed it on.
I dont Think you need to worry ,.
Google bot does seem to crawl through Iframes ,but the Web-Page Containing that Iframe is not Credited for that Content .. In other Words,, Page-Ranking of that particular Web-Page do not Change due to Contents from Iframe .
is IFrame crawled by Google?
Do robots crawl iframes?

Google Analytics and measuring search terms to destination pages

Been using internal site search with Google Analytics and while I love the ability to see what my users are searching for, I am having a really hard time figuring out what search terms lead to which pages.
When I search on both the nextPagePath and searchKeyword dimensions while filtering on the search results page at the current path, the nextPagePath is always the search results page even when I know it shouldn't be (when tracking my own obscure searches). The same goes for using the searchDestinationPage dimension. I can't get any data that shows a jump from a search results page to another page on the site.
Here's a cleaned up example of my api query.
dimensions=ga:searchKeyword,ga:nextPagePath&metrics=ga:pageviews&filters=ga:previousPagePath=#dosearch
When I use the standard Analytics UI and look at the Destination Pages list under Content->Site Search->Destination pages, I only have 25 or so, all of which are just the variations on the base search-result page URL.
Do I need additional tracking code on my search results pages? Custom variables? A different query through the API?
I can see the tracking requests going out from both the search results and the pages selected from the results.
I found a couple of questions in the Analytics forums that ask this same question, but none of them had anything resembling a working solution.
I would bet you are not using the proper dimensions in the API
See https://developers.google.com/analytics/devguides/reporting/core/dimsmets/internalsearch
ga:searchDestinationPage is probably what you wanted when using ga:nextPagePath

Un-Published items showing in Drupal search results (google search appliacne)

I inherited a Drupal 5 site recently and have a series of enhancements to make. Several of then revolve around search results.
Unpublished pages showing up in
search engine results. Some of these
are old pages, others are recently
unpublished. All are correctly
marked as unpublished in the CMS and
are still showing up.
Outdated pages are showing up from the search engine. The URL path structure changed and those items are old results in the DB.
From what I can tell the site uses Google Search Appliance(GSA) for the search rather than the default Drupal search. Is there a way I can be certain that it's using GSA other than seeing the module enabled?
If it is GSA it seems that I could get someone with access to the GSA to rebuild the search results on the site. Is this correct?
If rebuilding the search results is the right way to go about it, it seems whenever a fair amount of content is removed from the site I'll need to get someone to rebuild the search. Is there a better/automatic way?
Sounds like it's drupal that is handling the search. Google would need db access to show unpublished nodes. It could be you are using views to do search but forgot to only take published nodes.
If Drupal is handling the searchyou just need to flush and rebuild the search index. This can be done without too much trouble if you don't have too much content.
The GSA could still be showing deleted content depending on what your data source is.
If the content is coming from a database feed and is then dropped from the query it would be dropped. If the content was coming from a natural crawl or through a custom connector feed it would not be removed from the index on delete. Instead it needs to naturally cycle out of the index which can take a while.
One way to block deleted url's from being displayed is to do it through the front end. In the GSA Admin interface go to Serving > Front Ends then choose your front end and click the Remove URL tab. You can either list your url's or block a group of url's through regular expressions.
I have posted an answer to your more general question concerning node access. The problem with your search results might well be related to that.
In order to keep the Google Appliance more up to date, you might try out XmlSiteMap, a module that publishes a proper xml sitemap for all your content.
For an online website, publishing a sitemap is a good way to keep the search engines up to date, as they can use it to know about new pages and to purge old pages. I'm assuming that the Google Appliance would use this too,.

Resources