Wordpress search results by keyword - wordpress

Is there a way to prioritise the search results by keywords or tags.
For example, if a user types in Admissions then domain/admissions page comes first and then all the rest. If the word is apply then domain/apply page is first on the search result page and etc
Was trying to find the examples, but no luck. Never used the extended search before, so even don't know where to dig

What you are talking about here is returning search results by "relevance" and there are a few examples out there including plugins such as Relevanssi.
I have no example of how to do this, however, I thought it might be useful to know the term you need to dig deeper.

Related

How to query for similar words?

I'm not sure how I would accomplish this, but I have built an application in PHP with MariaDB. I want users to be able to search for a word, and exact results, and results similar show up.
For example, a recent one, a user typed Adison into the search box, and nothing came up. The real spelling is Addison.
I saw some answers based on Levenshtein Distance, but I figure there has to be a simpler solution

How do I search with the "&" keyword, e.g I want food & drink

In bootstrap-table jquery (http://issues.wenzhixin.net.cn/bootstrap-table/)
How do I search with the "&" keyword, e.g I want food & drink.
However, when read the source code, it has .replace for /&/ to &. Any idea I can bypass this? It is impossible to ask user to key in & in the search text box.
So... im still a little unclear as there is no bug in core code.
See this fiddle which proves "&" search works fine:
http://jsfiddle.net/dabros/x8efv6wf/1/
If this somehow doesnt answer your needs, elaborate on exactly why.
Then look at some extensions like multiple search, filter and filter-control:
http://bootstrap-table.wenzhixin.net.cn/extensions/
If still not happy, then create a custom search function. Easy to do if using server-side pagination, but possible even if client-side.
If you fixed on using client-side (not server side pagination + search, meaning that the js does the search, not your own server code) then look at custom search.
Relatively new feature i think, not used it myself as if i wanted a custom search i would use server-side code.
But here it is:
Custom search function #1956
https://github.com/wenzhixin/bootstrap-table/issues/1956
How to search within row details? #2007
https://github.com/wenzhixin/bootstrap-table/issues/2007
https://github.com/wenzhixin/bootstrap-table/pull/1979
That pull requests seems most detailed, it appears full example still otw, but shouldnt be hard if you read through there.
Though frankly, stick with plugins/extensions i listed above or use server-side code if still not happy - gives you far greater control with far simpler execution/code-maintenance.

Aggregating from various sources

It could be a project well beyond my skills right now but I've got around one full month to spend on it so I think I can do it. What I want to build is this: Gather news about a specific subject from various sources. Easy, right? Just get the rss feeds and display them on a page. Well, I want something more advanced: Duplicates removed and customized presentation (that is, be able to define/change the format in which the news headlines are displayed).
I've played a bit with Yahoo Pipes and some other tools and I am facing two big problems:
Some sources don't provide rss feeds. How do I create one?
What's the best method to find and remove duplicates. I thought about comparing the headlines and checking if there is a matching bigger than, say, 50%. Is that a good practice though?
Please add any other things (problems, suggestions, whatever) I might not have considered.
Duplication is a nasty issue. What I eventually ended up doing:
1. Strip out all HTML tags except for links (Although I started using regex, I was burned. I eventually moved to custom parsing to remove tags)
2. Strip out all whitespace
3. Case-desensitize
4. Hash all that with MD5.
Here's why you leave the link in:
A comment might be as simple as "Yes, this sucks". "Yes, this sucks" could be a common comment. BUT if the text "this sucks" is linked to different things, then it is not a duplicate comment.
Additionally, you will find that HTML tag escaping is weird with RSS feeds. You would think that a stray < would be double-encoded: (I think)&<;
But it is not. It is encoded <
But so too are HTML tags! :<p>
I eventually copied all the known HTML tags as parsed by Mozilla Firefox and manually recognized those tags.
Creating an RSS feed from HTML is quite nasty and I can only point you to services such as Spinn3r, which are fantastic at de-duplication and content extraction. These services typically use probability-based algorithms that are above me. I know of one provider that got away with regexing pages (They had to know that a certain page was MySpace-based or Blogger-based) but they did not perform admirably.
You might want to try to use the YQL module to scrape a webpage that doesn't provide RSS. Here's a sample of a YQL statement to scrape HTML.
About duplicates, take a look at this pipe.
Customized presentation: if you want it truly customized you'll have to manipulate the pipe results yourself, e.g. get it as JSON an manipulate it with Javascript, or process it server-side.

Interpreting Search Results

I am tasked with writing a program that, given a search term and the HTML source of a page representing search results of some unknown search engine (it can really be anything, a blog, a shop, Google, eBay, ...), needs to build a data structure of the results containing "what's in the results": a title for earch result, the "details" link, the position within the results etc. It is not known whether the results page contains any of the data at all, and whether there are any search results. The goal is to feed the data structure into another program that extracts meaning.
What I am looking for is not BeautifulSoup or a RegExp but rather some clever ideas or algorithms on how to interpret the HTML source. What do I do to find out what part of the page constitutes a single result item? How do I filter the markup noise to extract the important bits? What would you do? Pointers to fields of research covering what I try to to are aly greatly appreciated.
Thanks, Simon
I doubt that there exist a silver-bullet algorithm that without any training will just work on any arbitrary search query output.
However, this task can be solved and is actually solved in many applications, but with different approach. First you have to define general structure of single search result item based on what you actually going to do with it (it could be name, date, link, description snippet, etc.), and then write number of html parsers that will extract necessary necessary fields from search result output of particular web sites.
I know it is not super sexy solution, but it probably the only one that works. And it is not rocket science. Writing parsers is actually extremly simple, you can make dozen per day. If you will look into html source of search result, you will notice that output results are typically very structured and marked with specific div sections or class atributes, so it is very easy to find it in the document. You dont have even use any complicated HTML parsing library for that, something grep-like will be enough.
For example, on this particular page your question starts with <div class="post-text"> and ends with </div>. Everything in between is actually a post text with some HTML formatting that you may want to remove along with extra spaces and "\n". And this <div class="post-text"> appears on the page only once.
Once you go at large scale with your retrieval applicaiton, you will find out that there is not that big variety of different search engines on different sites, and you will be able to re-use already created parsers for sties using similar search engines.
The only thing you have to remember is built-in self-testing. Sites tend to upgrade and change design from time to time. If your application is going to live for some time, you will need to include into your parsers some logic that will check validity of their results and notify you every time search output has changed and is not compatible anymore with your parser. Then you will have to modify particular parser or write new one.
Hope this helps.

Token replacement

I currently implement a replace function in the page render method which replaces commonly used strings - such as replace [cfe] with the root to the customer front end. This is because the value may be different based on the version of the site - for example the root to the image folder ([imagepath]) is /Images on development and live, but /Test/Images on test.
I have a catalogue of products for which I would like to change [productName] to a link to the catalogue page for that product. I would like to go through the entire page and replace all instances of [someValue] with the relevant link. Currently I do this by looping through all the products in the product database and replacing [productName] with the link to the catalog page for that product. However this is limited to products which exist in the database. "Links" to products which have been removed currently wont be replaced, so [someValue] will be displayed to the user. This does not look good.
So you should be able to see my problem from this. Does anyone know of a way to achieve what I would like to easily? I could use regexes, but I don't have much experience of those. If this is the easiest way, using "For Each Match As String In Regex.Matches(blah, blah)" then I am willing to look further into this.
However at some point I would like to take this further - for example setting page layouts such as 3 columns with an image top right using [layout type="3colImageTopRight" imageURL="imageURL"]Content here[/layout]. I think I could kind of do this now, but I cant figure out how to deal with this if the imageURL were, say, [Image:Product01.gif] (using regex.match("[[a-zA-Z]{0,}]") I think would match just [layout type="3colImageTopRight" imageURL="[Image:Product01.gif] (it would not get to the end of the layout tag). Obviously the above wouldn't quite work, as I haven't included double quotes in the match string or anything, but you get the general idea. You should be able to get the general idea of what I am getting at and what I am trying to do though.
Does anyone have any ideas or pointers which could help me with this? Also if this is not strictly token replacement then please point me to what it is, so I can further develop this.
Aristos - hope reexplaining this resolves the confusion.
Thanks in advance,
Regards,
Richard Clarke
#RichardClarke - I would go with Regular Expressions, they're not as terrible to learn as you might think and with a bit of careful usage will solve your problems.
I've always found this a very useful tool.
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
goes nicely with a cheat sheet ;-)
http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/
Good luck.

Resources