How to get all sites from a given domain? - web-scraping

If this is not the right place to ask this kind of question please tell me where I can ask this.
Basically I want to scrape news from a site. It has the following format:
The link to a specific news:
https://www.presseportal.de/blaulicht/pm/4970/1345341
Where https://www.presseportal.de/blaulicht/pm/4970/ is the root and 1345341 is a random number.Only the random number changes.
So how can I find all the news that belong to this specific root?
(There is no link to them on the website, because they only show 300 pages)

Use paginated url like www.presseportal.de/blaulicht/**nr**/4970/{number} it will give you page of 27 titles. Then scrape links from titles.
Than just increment the {number} and get next page until you get to the end.
www.presseportal.de/blaulicht/**nr**/4970/27
www.presseportal.de/blaulicht/**nr**/4970/54
...

Related

Given the URL of Page A, how can I see what other pages have featured in sessions that Page A has?

My end goal is to find related pages for any given input page, using 'other pages viewed in a session' as a proxy.
So given the URL of '/mens/sweatshirts', I wanted to know that there are usually always views recorded on '/mens/sweatshirts', '/mens/hoodies' and '/mens/t-shirts' for example
Any direction appreciated. Thanks
You can do this with segments.
Your setup would be something like this:
Say you're working with the google demo account and was looking at men's shirts:
You would note that URL and use it in the segment configuration like so (make sure it is 'session' at first, use 'user' if you want to isolate it by the user across multiple sessions ):
Apply the segment and now all the data you see when browsing around GA will be focused only on people who have visited that men's t-shirt page. In this specific example, it shows all the pages people visit in sessions that visited the men's t-shirt page.

Blog post not showing in specific categories - BeTheme

I have categorized every blogpost into specific category on my blog, yet whenever I try to access a particular category page, I am not able to see posts specific to it.
I am having the following 6 categories:
Mixed
Relationships
Entrepreneurship
Blogging
Health and Lifestyle
Technology
Now, let us assume I try to open Relationships category, so my URL will be something like: An example of my URL while opening a particular category page
In here, I want only the posts which belong to 'Relationships' to get visible to my users, however it is not happening.
What the users see is - a complete list of blog posts
Any way to resolve this issue? Is there any way wherein I can simply call blog posts as per the category chosen?
Looking forward to some assistance on this issue.
Regards!

How to make search engines crawl tens of thousands of pages in my website

I have a large amount of items that each item has a page like this site/item_show.aspx?id=The_ID_here there are tens of thousands of items and each day nearly two thousands are added. Furthermore each item has a description in its page so for each item, its page should be crawled by search engines.
My question is with this amount of data described: How can I generate sitemaps or anything like that to make all items visible by google and other search engines?
It is clear that I cannot show all items in the first pages but I can make pages that simply just contains the link of items and provide tens of them each page for just search engine. Would it work or is it anything better to do for making items be indexed by google?
essentially there are 3 methods which will help you with the Mass-Indexing :
1. Create an XML SiteMap for all of your pages and link to it from you HomePage.
2. you should have Google Webmaster Tools Installed and you can load that same XML file into it.
3. Have an Organized Categories structure - depending on the type of your site think about a logical Categories Structure for example in eCommerce stores all the products are categorized by product Main-Category then a product Sub-category and sometimes by Brands etc... - of course you should do this via your Shopping-cart platform - Just remember that if you begin chaging the URLs-structure you'll need to take care of all the Redirects from the old URLS to the new ones.
First, use XML sitemaps and submit those to Google (note that i said sitemapS - more than one).
Next, ensure that your on-site content is nicely organised into categories and sub-categories - ideally you'd want all elements to be reachable in as few clicks as possible without users (or Googlebot) having to resort to the search function.
Finally, ensure that your more popular / important items are featured in the homepage or 1-2 clicks deep, and get links and social shares to those specific product pages.
Be popular and get links to your site. Have a good server which can handle the crawl.
There is also not a hard limit on our crawl. The best way to think
about it is that the number of pages that we crawl is roughly
proportional to your PageRank. So if you have a lot of incoming links
on your root page, we’ll definitely crawl that. Then your root page
may link to other pages, and those will get PageRank and we’ll crawl
those as well. As you get deeper and deeper in your site, however,
PageRank tends to decline...
https://www.stonetemple.com/matt-cutts-interviewed-by-eric-enge-2/

How to track conversion funnel in Google Analytics where a banner ad is shown on many pages.

I have a website that features a call to action/promotion button on nearly all pages of the site.
I have currently configured a conversion funnel that shows me how many people arrive on the call to action page, and then how many people make it to successfully complete the action page.
What I want to see though is how many unique visitors over the reporting see the banner at the top of the funnel.
eg. Something like this:
Visitors accessing website: 1000
Visitors clicked on call to action page: 100
Visitors successfully submitted call to action form: 45
My initial thoughts was to do this using the frontpage only, but I forgot that this banner/call to action ad is featured on many pages around the website. Many people find the site through SEO and never even pass through the frontpage.
Is it possible to use a wildcard for a domain or something similar in Google Analytics? Or maybe I am approaching this the wrong way.
Last of all - I know I can accomplish this by pulling up 2 reports: site wide unique visitors and comparing that to how many people hit the first stage of the existing conversion funnel. But it's a hassle to have to do this regularly manually.
While using funnel analysis, it is normal to have funnel steps that represent more than 1 urls. Take the basic case of ecommerce sites, where the final goal maybe the same transaction completion page, but the funnel step corresponding to product page can be triggered by many different product pages and not just one.
Based on the page url structure of your website, you can choose any of the below 2 match types to add multiple urls to a single step:
1, Begins with : If all the different pages displaying the ad have a set of common characters in the beginning, then use this.
2, Regular Expression Match : If the different pages that contain your banner ad how totally unrelated url, then find a suitable regex that can capture all those urls

Limit wordpress posts loop to continue onto another page

Theres a site i did that on the home page, i created kind of like a news box type
of thing which is what the client wanted...
the posts bit im using to limit the posts on the index page is
query_posts('posts_per_page=4');
Now, this limits my posts to "4" or whatever number i want etc, when the user clicks on the
"news" page button, it takes them to a page which has the full posts loop without the
query_posts('posts_per_page=4'); which essentially shows them all the posts.
Now the problem is that, this person is posting A LOT now and the page now has a bunch of posts one after the other.
Is there a way to limit these to any number (that i choose) and show the rest of the posts on another page???
So essentially something that says
filter out 20 messages/excerpts for example, BUT after 20 show a "next" page or something like that?
is that possible? Because as it stands now, by years end this ONE page will have hundreds of posts.
ive looked around but not sure exactly what im looking for. This one came close (here on SO)
Wordpress loop show limit posts
And although its similar, i need a way to limit the posts on the page AND continue them on another page so that one page doesnt hold ALL the posts.
Thanks in advanced.
The homepage is controlled by index.php or a template file used by a page which you can set up to be your homepage at Settings > Reading.
You must check your index.php file or the template file used the page from the homepage.
If your queries in the code are not including any "posts per page" arguments, you can control this numbers from Settings > Reading.
There is also a plugin which can help you:
http://wordpress.org/extend/plugins/custom-post-limits/
Now, clicking on news page, you must check if that is a page or a category listing?
If it's a page, then it is handled by page.php or an attached/used template for it. Again, the code is important, how the queries are written. It it's a category listing, then the category.php is handling that page.
Pay attention for the template files are using each every page/post and learn this: http://codex.wordpress.org/Class_Reference/WP_Query
Good luck! :)
http://codex.wordpress.org/Settings_Reading_Screen
In the settings you can set a default max limit of how many posts per page.

Resources