Retrieving relevant posts from Wordpress blogs - wordpress

I have a requirement to write a program in Java to retrieve all the posts from all the wordpress sites containing a keyword(s).
This is how I approached the problem. I initially thought I would crawl the wordpress sites looking for the keywords I am interested in. But I realized if there is an endpoint for wordpress search, it makes my job a lot easier. So I have looked around to see if there is any search endpoint to submit queries and get the links for the posts.
All I found is just http://wwww.en.search.wordpress.com. I can still tweak the url and get some links. But
I like to know if there is any better way to handle this problem
The search link I posted is for the users and it might be limiting my search results since I query it through a program
Also I like to retrieve posts from the given date range. I am not sure if this is possible with my approach.
Appreciate any help in this regard. Thank you.

How about this approach:
Assuming you don't need to go back to the history and scrap all the data I would just stick to tags
http://en.wordpress.com/tags/
I would crawl it every day get the most popular tags (by font size) then on each tag get the articles published in the past 24 hours
On each post get all the comments and search for your keywords
Would that work? if not please share more details
Good luck

Related

How do i make Wordpress Auto Republish old post to make them look new on search results

Please I need to delete and repost my old posts on WordPress so that they can be reindexed by search engines like new content. I don't want to manually do it. Please can someone help with an idea of how to?
To do this you update your posts' post_date values. There are plugins for the purpose.
But be careful. Search engines like Google are really good at detecting attempts to play games with them to boost search rankings. People have been playing those games since Yahoo ruled the search-engine world. If they detect such things on a site they down-rank it or even stop showing it altogether. Simply updating dates without changing content may trigger a game-playing filter. You don't want that.
Ask yourself the question, "how do I make my site more useful to people who search for it?" New and updated content is a good way to do that.

Woocommerce Parameters Google Indexing

I'm currently using woocommerce and by chance when looking through google search console I noticed well over 15,000 pages had been indexed. Instantly raised a concern because I know I should have no more than 400 actual pages.
After looking into it, I noticed that absolutely every possible parameters "variations, grid styles, shipping methods" etc is being indexed causing what is 400 into 15,000 variations.
Does this have an affect on google ranking showing 15,000 pages when really there are only 400.
I can not find a single resource that explains whether google indexing so many variations has a positive or negative impact on google rankings.
Finally how to prevent google or any other search engines from indexing url's with parameters. I've seen advice using robot.txt but no recommendation of what standard woocommerce filters should be excluded or if that's a bad idea?
I have a feeling I am losing alot of link juice by having so many indexed pages with parameters?
WordPress by default expects that you wish to share everything because why not? Surely if you have 15,000 items that are index-able surely you'd be more mad if your site decided on a whim to hide parts of that without you knowing. It is understandable that this would be confusing though, and quite rightly it is always better to have control over which parts of your site gets seen by search engines.
Anyway there are plugins which allow you to customise which parts of a website are visible or hidden to search engines by providing a mechanism for generating an XML site map (an index of which pages should be crawl-able). A very common plugin I see people use is called Yoast SEO. Another one I have had some success with is called Google XML Sitemaps, you may find another to work better for you, but that should give an idea of what you need.
Also if you are not getting visited often by google bot then take the sitemap from one of those types of plugins and submit via the google console to help google better understand your site.

Change url's in analytics reports

I got this kind of URL in my site: /preinscripcion/231, the number is the code of a course and it is self generated by my CRM, the problem comes when I want to see to which course the person is trying to register, I have an excel with this codes but it is too demanding and cumbersome as I already have hundreds of courses, I've tried with advanced filters to change my code with the name of the course, but I can only make up to 100 filters.
¿Is there a way of doing it?
It does not have to be on analytics itself, my site is made under wordpress http://formarte.edu.co/
Thanks in advance for all of your help.

modify wordpress search to display author pages

I have been looking around the web but could not find a solution to this problem. Right now the search function in Wordpress - as I understand it - goes through the posts and looks for matching words. That is fine, however I am working on a site right now where I wont have many posts / any posts. I will however have a lot of users - let's call them authors even though they won't be writing anything. I want the normal search to go through the author pages and display results based on what is written there.
Is this possible? I do understand a bit of code but mostly copy of tutorials.
Thank you so much in advance!
since you didn't provide any code attempts i will stick with a plugin solution..
try this
https://wordpress.org/plugins/amr-users/screenshots/

How to find out upload/post time of an special website URL?

Often when searching for information i hit the problem, that the author of an article/website/blog post doesnt give out a date.
Is there any way (maybe special meta search engine, web-archives, use of google search operators to find out at least on which month & year a website URL was uploaded?
thx
puttin
javascript:alert(document.lastModified)
in the adress bar of a browser with loaded page pops up a date and time. Where this time data is coming from i have no idea, probably time html or php file was created on server. On the other way i thought javascript cannot access filesystem, but im no expert...
Still curious if someone knows a reliable method of finding out when a specific .html site was created as i find it useful for enquiry.

Resources