What us the best approach to work around Wordpress' default chronological behavior? What is the best plugin or method to fine tune my search results? I have found 3 candidates:
https://wordpress.org/plugins/relevanssi/
http://wordpress.org/support/plugin/search-unleashed
Google custom search engine
Background:
I'm building a search/browse interface where users can find activities. i am writing activities as posts, then applying metadata to each post to maximize findability across a number of dimensions. i want to display results by relevancy, not chronology.
relevanssi has worked pretty well for me, I believe there is also a 'pro' version that adds more features.
Related
I am developing a project that is using the WP REST API. After some tests, I realized that it's not retrieving all the categories based on what I am trying to GET. In precise, there are supposed to be 21 results but only 10 come up. Is there some kind of restriction that I am not seeing? Any settings that I'll have to change.
Here is what I am trying:
https://example.com/wp-json/wp/v2/categories?parent=97
I saw this post here, but some of the answers relied on using the WordPress JSON API, which is a plugin that is no longer available due to security concerns.
TIA
I had to look at the documentation more closely. In terse, The default result shows a count of 10. Specifying the per page would change that. So:
https://example.com/wp-json/wp/v2/categories?parent=97&per_page=X
(Change X to the respective number)
I've started using Bing Custom Search API and many of the top search results I get are... surprisingly old and irrelevant.
The Custom Search interface allows you to rank slices of websites higher than others and to boost some results, but it remains URL-based and doesn't go into weighting of actual page contents or metadata such as date, keywords, author and so on.
Will "classic" SEO tips such as using one h1, optimizing page title/description/keywords, etc. help improve result relevance?
I guess it boils down to asking "does Bing Custom Search API use the regular Bing search engine behind the scenes?", but if it is more complex than that, any answer to my main problem will do.
Bing Custom Search is basically indexing and ranking mechanism similar to the Bing search engine. Only difference in Bing Custom Search is restricting results to certain sites and/or controlling ranking of results. So, anything that helps in improving page quality (and hence ranking in Web Search) will also help in improving Bing Custom Search results.
This actually becomes more important as the candidate pool to select from is very small in Custom Search (or any such API) compared to the full-fledged Web Search API, which has billions of pages to select from.
Only caveat is it takes time to improve page quality and hence ranking, so until then you may have to pin/block/boost results.
I use Kimonolabs right now for scraping data from websites that have the same goal. To make it easy, lets say these websites are online shops selling stuff online (actually they are job websites with online application possibilities, but technically it looks a lot like a webshop).
This works great. For each website an scraper-API is created that goes trough the available advanced search page to crawl all product-url's. Let's call this API the 'URL list'. Then a 'product-API' is created for the product-detail-page that scrapes all necessary elements. E.g. the title, product text and specs like the brand, category, etc. The product API is set to crawl daily using all the URL's gathered in the 'URL list'.
Then the gathered information for all product's is fetched using Kimonolabs JSON endpoint using our own service.
However, Kimonolabs will quit its service end of february 2016 :-(. So, I'm looking for an easy alternative. I've been looking at import.io, but I'm wondering:
Does it support automatic updates (letting the API scrape hourly/daily/etc)?
Does it support fetching all product-URL's from a paginated advanced search page?
I'm tinkering around with the service. Basically, it seems to extract data via the same easy proces as Kimonolabs. Only, its unclear to me if paginating the URL's necesarry for the product-API and automatically keeping it up to date are supported.
Any import.io users here that can give advice if import.io is a usefull alternative for this? Maybe even give some pointers in the right direction?
Look into Portia. It's an open source visual scraping tool that works like Kimono.
Portia is also available as a service and it fulfills the requirements you have for import.io:
automatic updates, by scheduling periodic jobs to crawl the pages you want, keeping your data up-to-date.
navigation through pagination links, based on URL patterns that you can define.
Full disclosure: I work at Scrapinghub, the lead maintainer of Portia.
Maybe you want to give Extracty a try. Its a free web scraping tool that allows you to create endpoints that extract any information and return it in JSON. It can easily handle paginated searches.
If you know a bit of JS you can write CasperJS Endpoints and integrate any logic that you need to extract your data. It has a similar goal as Kimonolabs and can solve the same problems (if not more since its programmable).
If Extracty does not solve your needs you can checkout these other market players that aim for similar goals:
Import.io (as you already mentioned)
Mozenda
Cloudscrape
TrooclickAPI
FiveFilters
Disclaimer: I am a co-founder of the company behind Extracty.
I'm not that much fond of Import.io, but seems to me it allows pagination through bulk input urls. Read here.
So far not much progress in getting the whole website thru API:
Chain more than one API/Dataset It is currently not possible to fully automate the extraction of a whole website with Chain API.
For example if I want data that is found within category pages or paginated lists. I first have to create a list of URLs, run Bulk Extract, save the result as an import data set, and then chain it to another Extractor.Once set up once, I would like to be able to do this in one click more automatically.
P.S. If you are somehow familiar with JS you might find this useful.
Regarding automatic updates:
This is a beta feature right now. I'm testing this for myself after migrating from kimonolabs...You can enable this for your own APIs by appending &bulkSchedule=1 to your API URL. Then you will see a "Schedule" tab. In the "Configure" tab select "Bulk Extract" and add your URLs after this the scheduler will run daily or weekly.
I want to provide a search mechanism on my CMS. What is the preferred approach, what would be the best indexing technology to allow a site-wide search?
The CMS is written in .Net.
I would recommend that you have a look at Lucene for .NET. Its a very nice helper when it comes to searching and its easy to use.
A very smooth feature with Lucene is that you can set annotations on your entities. This makes it very easy to customize how different variables should be indexed and searched for. (I have only used Lucene with Java, might be some differences with .NET)
You could use Google Site Search for this, the paid version is something like $100(so thats what? 20 euros?) a year. You can customise the search result as much as you want, you call GSS with there api and get the results in XML. There is also an autocomplete included. Allot of Google search features are supported.
I can see where to get an rss feed for the BUG LIST, however I would like to get rss updates for modifications to current bugs if possible.
This is quite high up when searching via Google for it, so I'm adding a bit of advertisement here:
As Bugzilla still doesn't support this I wrote a small web service supporting exactly this. You can find its source code here and a running instance here.
What you're asking for is the subject of this enhancement bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=256718
but no one seems to be working on it.
My first guess is that the way to do it is to add a template somewhere like template/en/default/bug/show.atom.tmpl with whatever you need. Put it in custom or an extension as needed.
If you're interested in working on it or helping someone with it, visit channel #mozwebtools on irc.mozilla.org.
Not a perfect solution, but with the resolution of bug #255606, Bugzilla now allows listing all bugs, by running a search with no criteria, and you can then get the results of the search in Atom format using the link in the bottom of the list.
From the release notes for 4.2:
Configuration: A new parameter search_allow_no_criteria has been added (default: on) which allows admins to forbid queries with no criteria. This is particularly useful for large installations with several tens of thousands bugs where returning all bugs doesn't make sense and would have a performance impact on the database.