How to create a sitemap for growing site that potentially will have huge amount of URLs? - xml-sitemap

I have a website that will have potentially huge amount or URLs and I would like google bot to know about them.
So I figured I will use a sitemap index that will point on another sitemap index in a tree like way that the leaf level of the tree will point on the URLs.
But as I understand, Sitemap index can't refer to another Sitemap index.
So how can I include all my URLs without having to manually submit a new sitemap index of 50,000 links each time?

Your sitemap index (http://yourdomain.com/sitemap.xml) should be a sitemap index of sitemaps, not sitemap indexes.
A sitemap can hold a max of 50,000 links. That means you can have a sitemap index (sitemap.xml) of 50,000 sitemaps with 50,000 links each. That's 2.5 billion pages.
If you don't have more than 2.5 billion pages, you only need one sitemap index that points to all of your sitemaps.
Here's an example hierarchy of a sitemap that I've worked on it the past.

Related

How to stop recrawling and reindexing the pages on my website?

I have about 2 million pages on my website (Wordpress) . The google is always crawl many pages on my website (recrawl old pages) and recently it took about 3 days to index the new pages in order to show in google result.
How to stop recrawl?
Sample of my sitemap : http://www.serze.com/post_part1.xml
Discourage search engine from indexing choice is your entire site will make non accessible for google. Your website wont rank on search result after that.
You can set update frequency for specific pages on your sitemap.xml.
https://www.sitemaps.org/protocol.html (check changefreq and priority tag)
Also you can check crawl stats on search console.
https://support.google.com/webmasters/answer/48620?hl=en
But it might be different problem. Did you anything for seo or software updates?
For that just go to admin > settings > Reading enable Discourage search engine from indexing this site.
and for Google create a separate robot.txt

WP plugin(s) that will easily voting up/down rows in a list (not at post level)

In my WP site I have a table (currently formidable forms entries but not attached) that contains only a couple of fields (stem and word). I want users of the site to be able to see a list of all the "words" on a single page (or actually paging through say 25 at a time of the list which is several hundred rows) and be able to click to vote up/down each entry accordingly. I've tried a number of vote/like plugins but these always work only at the post level and I don't want users to have to click into the post itself just to be able to vote. Anything out there that does this? TIA, M

Find how many users viewed any page within a section of pages

Site structure is simple:
/menu
/blog
/photos
Each of these "sections" have many pages below them. All of these pages follow the url structure: pages under /menu have the url /menu/nameofthething, etc.
How do I answer the question: "of all of the users to our website, how many users ever view any page under /blog?"
"Content Drilldown" allows you to see page views for a section like /menu or /blog but it doesn't appear to give me any data that would allow me to calculate that from page views into users.
Any help would be much appreciated.
You can create a custom segment of your users, who have visited a given page or set of pages, or a segment of sessions, that contain a visit of selected page(s). (Look for Conditions tab under Advanced group of New segment dialog.) You can apply this segment to your reports, e.g. Audience reports in your case, which will give you number of users for this particular segment and selected time period.

How to make search engines crawl tens of thousands of pages in my website

I have a large amount of items that each item has a page like this site/item_show.aspx?id=The_ID_here there are tens of thousands of items and each day nearly two thousands are added. Furthermore each item has a description in its page so for each item, its page should be crawled by search engines.
My question is with this amount of data described: How can I generate sitemaps or anything like that to make all items visible by google and other search engines?
It is clear that I cannot show all items in the first pages but I can make pages that simply just contains the link of items and provide tens of them each page for just search engine. Would it work or is it anything better to do for making items be indexed by google?
essentially there are 3 methods which will help you with the Mass-Indexing :
1. Create an XML SiteMap for all of your pages and link to it from you HomePage.
2. you should have Google Webmaster Tools Installed and you can load that same XML file into it.
3. Have an Organized Categories structure - depending on the type of your site think about a logical Categories Structure for example in eCommerce stores all the products are categorized by product Main-Category then a product Sub-category and sometimes by Brands etc... - of course you should do this via your Shopping-cart platform - Just remember that if you begin chaging the URLs-structure you'll need to take care of all the Redirects from the old URLS to the new ones.
First, use XML sitemaps and submit those to Google (note that i said sitemapS - more than one).
Next, ensure that your on-site content is nicely organised into categories and sub-categories - ideally you'd want all elements to be reachable in as few clicks as possible without users (or Googlebot) having to resort to the search function.
Finally, ensure that your more popular / important items are featured in the homepage or 1-2 clicks deep, and get links and social shares to those specific product pages.
Be popular and get links to your site. Have a good server which can handle the crawl.
There is also not a hard limit on our crawl. The best way to think
about it is that the number of pages that we crawl is roughly
proportional to your PageRank. So if you have a lot of incoming links
on your root page, we’ll definitely crawl that. Then your root page
may link to other pages, and those will get PageRank and we’ll crawl
those as well. As you get deeper and deeper in your site, however,
PageRank tends to decline...
https://www.stonetemple.com/matt-cutts-interviewed-by-eric-enge-2/

Find #pages in website

I'm collecting data on complexities of several domains- represented by total pages, visited and unvisited.
I was initially finding what I wanted from Google Analytics by drilling down to Behavior -> Site Content -> Landing Pages but wasn't sure if that was returning unvisited sites. Then I tried All Pages per domain, but that returned like 1,800 results for "pages", with params in some cases /Pages/Results.aspx?k=update.
That being said, I don't think I can rely on GA for total pages per site.
Then I thought about using a web scraper, namely web2disk or httrack.com, to scrape for the #pages per domain. Is that a good path to take? Is it necessary to get this information?
Thanks
If you want to know how many pages there are on your site you need to crawl your site to find all the pages. Because of the way it works Google analytics will 100% only show you data on pages which have been loaded (which fires the analytics code) in a browser.
http://www.screamingfrog.co.uk/seo-spider/ is a paid for crawler you can use to find all the pages (£99), or you could potentially try to hack something together using a free crawler like http://import.io (disclaimer: I work at import.io) to get all the URls.
Find all visited pages via GA:
Behaviour -> Site Content -> Landing Pages does not give you any pages which were not 'Landed upon'.
Then I tried All Pages per domain, but that returned like 1,800 results for "pages", with params in some cases /Pages/Results.aspx?k=update.
To remove the params from the page URls you can use a report filter at the top right of the table. Click 'advanced', and use the tools there to exclude params from URls.
Alternatively you can switch your primary dimension to 'Page title' if you have Unique page titles for each page (and identical ones for pages with params).

Resources