The Google Search Appliance goes through and finds out the date of each article when it crawls (last modified date is the default).
However, it doesn't turn up articles when you query by date code.
Is there any way to get the GSA to do this?
(We have a daily broadcast which people often search for by date code. Right now we have to manually put in the 4 most common date codes into the meta-keywords in order for them to be pulled up through a query)
Have you tried using inmeta:date as described in the Search Protocol Reference documentation?
Alternatively, if the date code is in the document content or the URL you could use entity recognition to extract it.
One way to make sure GSA is collecting the document date is to check the search results in XML format and see if tag has the date value. You can see the results in XML format by removing any proxystylesheet parameter in the URL.
If the value of tag is empty then GSA is not getting the document dates.
You can configure the document dates under Crawl and Index > Document Dates (at least at GSA version 7). We are using a meta tag approach. We put a date meta tag to each document/page and tell GSA to use this meta tag to sort the documents. The full list of options are:
URL
Meta Tag
Title
Body
Last Modified
Here are some links that helped me to find answers when dealing with a similar problem:
https://support.google.com/gsa/answer/2675414?hl=en
https://developers.google.com/search-appliance/documentation/64/xml_reference#request_sort_by_date
https://groups.google.com/forum/#!searchin/google-search-appliance-help/sort$20by$20date$20not$20working
Related
I'm trying to add some filters to request for news in bing api but currently I don't get any effects of doing this ( for example filter for news from current month).
I'm trying to this with : https://api.cognitive.microsoft.com/bing/v5.0/news/search?freshness=month&?category=business , and replacing some filters here but I always getting the same result.
Currently i want to add three filters : freshness, category and language for news from current day and month.
So it is bug or I'm doing something wrong with filters ?
One problem is that you have an extra "?" in your query. You only need the first one, and then you can use "&" to delimit individual parameters:
https://api.cognitive.microsoft.com/bing/v5.0/news/search?freshness=month&category=business
You might also try adding a market to the query string, like so:
https://api.cognitive.microsoft.com/bing/v5.0/news/search?freshness=month&category=business&mkt=en-us
I'm using 7.0 and don't know what headers you're passing, so I can't test this directly, but it's possible a default market isn't being set. Since categories are market specific, then depending on how Bing handles this, it could plausibly prevent your category from being used.
I am trying to scrape a web forum using Scrapy for the href link info and when I do so, I get the href link with many letters and numbers where the question mark should be.
This is a sample of the html document that I am scraping:
I am scraping the html data for the href link using the following code:
response.xpath('.//*[contains(#id, "thread_title")]/#href').extract()
When I run this, I get the following results:
[u'showthread.php?s=f969fe6ed424b22d8fddf605a9effe90&t=2676278']
What should be returned is:
[u'showthread.php?t=2676278']
I have ran other tests scraping for href data with question marks elsewhere in the document and I also get the "s=f969fe6ed424b22d8fddf605a9effe90&" returned.
Why am I getting this data returned with the "s=f969fe6ed424b22d8fddf605a9effe90&" instead of just the question mark?
Thanks!
It seems that the site I am scraping from uses a unique identifier in order to more accurately update the number of views per the thread. I was not able to return scraped data without a unique id, it changed over time, and scraped a different HTML tag for the thread ID and then joined it to the web address (showthread.php?t=) to create the link I was looking for.
First of all I'm kinda new to Odoo and I'm trying to understand some Basic logic. I created my own Report based on the Basic Report of Odoo.
There are a lot of fields like t-field="o.date_invoice" or t-field="o.partner_id etc. which work really fine but where can I find all functions? Is there any list?
For Example I Need a Field for the order date and for the print date or for a Customer ID.
With a t-field attribute you can access and print fields from the actual model or from a related model, for example with the following element you can print the content of the phone column (field) of the actual record:
<span t-if="o.phone"
t-field="o.phone" />
Explanation of t-field in the documentation:
The t-field directive can only be used when performing field access
(a.b) on a "smart" record (result of the browse method). It is able to
automatically format based on field type, and is integrated in the
website's rich text edition.
Check this link for further information if you want to build reports and this one, where you can read about some the elements that you can use in Qweb
In addition, you can check here a list of some attributes that you can use in a Qweb template
In Amazon MWS API, when requesting report of type "_GET_MERCHANT_LISTINGS_DATA_"
What is the difference between the returned attributes:
product-id
listing-id
asin1
I also have tried to find any reference for the tab-delimited report types, but it seems to be scattered all around the web. The best description I found was part of the instructions for the Amazon Inventory Loader. (Note: may require a MWS seller login, the corresponding XLS does not have all columns described on the linked webpage) That page should answer most of your questions.
Since the link above might require a login, here's a short description on what these columns do:
asin1 refers to an item's Amazon Standard Identification Number. Every item on Amazon has such a number, there even is a Wikipedia entry describing what it is.
product-id along with product-id-typerefers to the item's non-Amazon standard identification number, if such a thing exists (otherwise it'll contain a copy of the item's ASIN).
product-id-type=1 -> product-id is ASIN
product-id-type=2 -> product-id is ISBN.
product-id-type=3 -> product-id is UPC
product-id-type=4 -> product-id is EAN (now called GTIN)
sku is your own item identifier such as part number. You created the link between an ASIN and your own SKU by creating the product. (I know you didn't ask for this, but this is for the sake of completeness)
listing-id There does not seem to be a lot of documentation on what theses are. There is a page explaining how to find out an item's listing id. It does not say why you'd ever want to know, though. I assume a listing ID identifies a certain seller's (your) offer for a specific item, but all MWS requests I've ever done either required me to link to a ASIN or my own SKU, but there may be others that require this id.
Sidenote: I find it weird that a single listing-id may relate to more than one ASIN - otherwise, why are there columns named asin2 and asin3?
I'm using YQL to retrieve an RSS feed using javascript (as json), for example i use the following query:
select * from rss where url = "http://feeds2.feedburner.com/ajaxian"
The response contains the feed items, already parsed as json and everything is cool so far.
Now, I also want to get the title of the entire feed (not the title of a specific item) - but it's not a part of the result (even though the original XML feed contains it).
There is the possibility of querying the original XML itself. for example:
select channel.title from xml where url = "http://feeds2.feedburner.com/ajaxian"
and it indeed returns the feed title for that specific RSS, but that query is only valid for a RSS 2.0 formatted feeds, which stores it under rss\channel\title.
What about atom feeds which store the title under feed\title ?
What about other formats?
My question is - is there any generic way to request the feed's title through YQL? maybe somehow along with the feed itself?
thanks,
You can use the feednormalizer table to convert the feed (regardless of its format) into one of the standard formats, then grab the title from the proper node for that format.
To take the Ajaxian feed, "normalize" it as Atom and get the feed title, the query would look like:
SELECT title
FROM feednormalizer
WHERE output="atom_1.0" AND url="http://feeds2.feedburner.com/ajaxian"
(Try this in the YQL console)
There are also other tables that you can use like feed, rss and atom.
Regarding your follow up question of how to find data tables:
Go to the YQL console, make sure that the Community Tables are loaded (should already be the case with this link) and then just type in the search box on the right hand side what you are looking for. Often you can find something useful.