Can regular SEO help improve Bing Custom Search API results? - microsoft-cognitive

I've started using Bing Custom Search API and many of the top search results I get are... surprisingly old and irrelevant.
The Custom Search interface allows you to rank slices of websites higher than others and to boost some results, but it remains URL-based and doesn't go into weighting of actual page contents or metadata such as date, keywords, author and so on.
Will "classic" SEO tips such as using one h1, optimizing page title/description/keywords, etc. help improve result relevance?
I guess it boils down to asking "does Bing Custom Search API use the regular Bing search engine behind the scenes?", but if it is more complex than that, any answer to my main problem will do.

Bing Custom Search is basically indexing and ranking mechanism similar to the Bing search engine. Only difference in Bing Custom Search is restricting results to certain sites and/or controlling ranking of results. So, anything that helps in improving page quality (and hence ranking in Web Search) will also help in improving Bing Custom Search results.
This actually becomes more important as the candidate pool to select from is very small in Custom Search (or any such API) compared to the full-fledged Web Search API, which has billions of pages to select from.
Only caveat is it takes time to improve page quality and hence ranking, so until then you may have to pin/block/boost results.

Related

Twitter scrapping vs twitter advanced search in providing results

Need to analyze tweets on a specific topic, my application (for using Twitter API) was not approved. Instead, I tried to do it manually using Twitter Advanced Search. However it's more burdensome than using easy-to-use Twitter API, I think advanced search doesn't retrieve all the relevant tweets containing a particular keyword, as I tested it for several case.
So, there are two questions. Firstly, am I right about incompleteness of advanced search in providing results? Secondly, is there another way (or turn around) to use API without need to approval?
Specifically, is there any limit for returning results by advanced search or it provides all the possible results, just like API?

Import.io - Can it replace Kimonolabs

I use Kimonolabs right now for scraping data from websites that have the same goal. To make it easy, lets say these websites are online shops selling stuff online (actually they are job websites with online application possibilities, but technically it looks a lot like a webshop).
This works great. For each website an scraper-API is created that goes trough the available advanced search page to crawl all product-url's. Let's call this API the 'URL list'. Then a 'product-API' is created for the product-detail-page that scrapes all necessary elements. E.g. the title, product text and specs like the brand, category, etc. The product API is set to crawl daily using all the URL's gathered in the 'URL list'.
Then the gathered information for all product's is fetched using Kimonolabs JSON endpoint using our own service.
However, Kimonolabs will quit its service end of february 2016 :-(. So, I'm looking for an easy alternative. I've been looking at import.io, but I'm wondering:
Does it support automatic updates (letting the API scrape hourly/daily/etc)?
Does it support fetching all product-URL's from a paginated advanced search page?
I'm tinkering around with the service. Basically, it seems to extract data via the same easy proces as Kimonolabs. Only, its unclear to me if paginating the URL's necesarry for the product-API and automatically keeping it up to date are supported.
Any import.io users here that can give advice if import.io is a usefull alternative for this? Maybe even give some pointers in the right direction?
Look into Portia. It's an open source visual scraping tool that works like Kimono.
Portia is also available as a service and it fulfills the requirements you have for import.io:
automatic updates, by scheduling periodic jobs to crawl the pages you want, keeping your data up-to-date.
navigation through pagination links, based on URL patterns that you can define.
Full disclosure: I work at Scrapinghub, the lead maintainer of Portia.
Maybe you want to give Extracty a try. Its a free web scraping tool that allows you to create endpoints that extract any information and return it in JSON. It can easily handle paginated searches.
If you know a bit of JS you can write CasperJS Endpoints and integrate any logic that you need to extract your data. It has a similar goal as Kimonolabs and can solve the same problems (if not more since its programmable).
If Extracty does not solve your needs you can checkout these other market players that aim for similar goals:
Import.io (as you already mentioned)
Mozenda
Cloudscrape
TrooclickAPI
FiveFilters
Disclaimer: I am a co-founder of the company behind Extracty.
I'm not that much fond of Import.io, but seems to me it allows pagination through bulk input urls. Read here.
So far not much progress in getting the whole website thru API:
Chain more than one API/Dataset It is currently not possible to fully automate the extraction of a whole website with Chain API.
For example if I want data that is found within category pages or paginated lists. I first have to create a list of URLs, run Bulk Extract, save the result as an import data set, and then chain it to another Extractor.Once set up once, I would like to be able to do this in one click more automatically.
P.S. If you are somehow familiar with JS you might find this useful.
Regarding automatic updates:
This is a beta feature right now. I'm testing this for myself after migrating from kimonolabs...You can enable this for your own APIs by appending &bulkSchedule=1 to your API URL. Then you will see a "Schedule" tab. In the "Configure" tab select "Bulk Extract" and add your URLs after this the scheduler will run daily or weekly.

Best plugin/homegrown approach to Wordpress relevancy search?

What us the best approach to work around Wordpress' default chronological behavior? What is the best plugin or method to fine tune my search results? I have found 3 candidates:
https://wordpress.org/plugins/relevanssi/
http://wordpress.org/support/plugin/search-unleashed
Google custom search engine
Background:
I'm building a search/browse interface where users can find activities. i am writing activities as posts, then applying metadata to each post to maximize findability across a number of dimensions. i want to display results by relevancy, not chronology.
relevanssi has worked pretty well for me, I believe there is also a 'pro' version that adds more features.

Is static URL better than dynamic URL in terms of SERP?

I've been reading up on SEO and how to construct my links in terms of getting better SERP.
I'm using WordPress as the framework for my site and have custom templates retrieving data from my DB.
What makes a URL dynamic, is the usage of ? and &. Nothing more, nothing less. Google recommends that I should not have too many attributes in my URL - and that's understandable.
Dynamic: www.mysite.com/?id=123&name=some+store+name&city=london
Static: www.mysite.com/london/some+store+name/123
Q1: I don't feel that adding the store ID in this static URL looks nice. But I do need it in order to fetch data from the DB, right?
Reading various blogs, I see many SEO (experts) saying different things, but I feel most of it is just talk without actually proving their statements. We can all agree that static URLs are good in terms of usability (and readability).
Q2: But many claim that static URLs prevent duplicate content. I don't agree on that as all my contents have unique ID. Can anyone comment on this?
Q3: In the end, for the Google search engine (and others) it really doesn't matter if the URL is static or dynamic. But since Google is working towards user friendly content, is that the only argument for having static URLs?
1) There's no problem using DB ids alongside static URLs. Many huge e-commerce and other commercial sites do this (Amazon, eBay... hell, everyone really.)
2) A static URL in and of itself does not prevent duplicate content. There are hundreds of ways duplicates can happen (child pages, external copy, javascript, form fields, ajax, archive sections... the list foes on.)
3) It doesn't matter if it's static or dynamic for indexing. But in terms of ranking well, static URLs with informative (and relevant to the targeted keywords) searches are hugely beneficial. Multivariate testing I've done shows users are also generally re-assured by clean looking URLs in terms of usability.
If you give me some more examples, I can probably help out a bit more.
Urls without parameters are always better. It won't absolutely kill SEO - but it is better not to have them.
!0 years ago Google would ignore parameters and would penalize you for URLs with parameters. Today they are really good at figuring out these db parameters - but not perfect. Among other things Google has to try to figure out which URL parameters matter, and which don't and if parameter order matters.
E.g. you may have URL parameters that store user preferences, navigation state etc. This will just proliferate URLs that Google has to try to decode. So what you should do is:
Right before generating an URL at least sort your parameters.
Convert parameters that matter into things that don't look like parameters. So if I had a shoe store with a urls like http://mystore.com/mypage?category=boots&brand=great&color=red I'd rewrite that to something like http://mystore.com/mypage/category/boots/brand/great/color/red or even better:
http://myscore.com/mypage/boots/great/red
Then you can add the parameters that don't matter for the page content at the end. Google will figure out they don't matter.
The other reason to fix your URLs is that Goolge displays them to users in the SERP, and people are more likely to click on readable URLs than database URLs.
Why do big stores like amazon use database urls? because they are giant, bad urls don't hurt them, and their systems are so large and complex it is the only way to manage it. But for smaller sites with fewer products, readble URLs are achievable and are one of the few advantages a small site can have over a big one.
If anyone observing closely Google SERP results definitely find some part of SERP results are highlighted and bold as well. Now noticing further one can easily find "Search Query" are getting highlighted or bold in "Title" , "Descriptions" and "URL" who are using same "Search Query" in Title, Descriptions and URL as well.
Now thing is if any website URL's are dynamic and coming with parameter ID, they are loosing keywords from Title, Descriptions and URL as well.
Ex:
http://www.johnzaccheofineart.com/catagory-2/?id=4
http://www.johnzaccheofineart.com/painting/johnzaccheo
Sample Search : Painting for Sale
Now easily we can understand difference between static and dynamic URL performance. One URL coming with such word which has no search value, other URL is coming with category name as well as painter name.
So, being a user i will give preference to 2nd one which is understandable from URL itself.

web information extraction

I want to create a shopping search engine that shows products from many websites and I wonder how can I retrieve information about products from those sites.
I am not interested in search engine part but in extracting product information from web pages in an automated manner using auto-generated templates. Does anybody knows some good algorithms for this / papers to read..
Dapper looks pretty close to what you are looking for. http://open.dapper.net/

Resources