I downloaded the freebase triples from https://developers.google.com/freebase/data
But then i have no idea how to convert the MID to the actual entity name like /m/02hl8db is mapped to "Burgertime".
Is there any dump available on freebase side which provides all attributes associated with mid.My use case is to find all product names and the companies that own that product. Similarly i also want to extract all names of persons and the name variants.
The Freebase triples (ie RDF) dump is that dump. The format is subject MIDproperty/predicateobject, so if you have an object MID that you want an attribute of (e.g. it's name), just find the appropriate triple with that as the subject MID.
Here's what all the property values look like for that subject:
ns:cvg.game_version ns:type.type.instance ns:m.02hl8db.
ns:m.02hl8db ns:type.object.type ns:cvg.game_version.
ns:m.02hl8db ns:cvg.game_version.game ns:m.01xq80.
ns:m.02hl8db ns:type.object.name "Burgertime"#en.
ns:m.02hl8db ns:type.object.type ns:common.topic.
ns:m.02hl8db ns:cvg.game_version.platform ns:m.01145.
ns:m.02hl8db rdfs:label "Burgertime"#en.
ns:m.02hl8db rdf:type ns:cvg.game_version.
ns:m.02hl8db rdf:type ns:common.topic.
as well as those triples where it is the object
ns:m.01xq80 ns:cvg.computer_videogame.versions ns:m.02hl8db.
ns:m.01145 ns:cvg.cvg_platform.games_on_this_platform ns:m.02hl8db.
When converting IDs that you see on Freebase.com, whether they be type/property IDs or topic IDs, just remember to replace the slashes with dots when looking for them in the RDF dump.
Related
I want to create a SQLite virtual table with a content of a real one.
I have a small sample which demonstrates my problem. I already red the official tutorial, but can't find anything wrong in this code. Some users use a rebuild option, but it doesn't work for me.
CREATE TABLE if NOT EXISTS posts (a INTEGER PRIMARY KEY);
INSERT OR IGNORE INTO posts (a) VALUES(510000);
INSERT OR IGNORE INTO posts (a) VALUES(510001);
INSERT OR IGNORE INTO posts (a) VALUES(510300);
CREATE VIRTUAL TABLE IF NOT EXISTS posts_fts using fts5(content=posts, content_rowid=a, a);
SELECT * FROM posts_fts where posts_fts MATCH '10' ORDER BY a ASC;
If I run this, I get:
0 rows returned in 2ms from: SELECT * FROM posts_fts where posts_fts match '10' ORDER BY a ASC;
Does anyone have an idea wat I do wrong?
"10" is not a token in the FTS table.
From the doc:
4.3.1. Unicode61 Tokenizer
The unicode tokenizer classifies all unicode characters as either
"separator" or "token" characters. By default all space and
punctuation characters, as defined by Unicode 6.1, are considered
separators, and all other characters as token characters. More
specifically, all unicode characters assigned to a general category
beginning with "L" or "N" (letters and numbers, specifically) or to
category "Co" ("other, private use") are considered tokens. All other
characters are separators.
Each contiguous run of one or more token characters is considered to
be a token. The tokenizer is case-insensitive according to the rules
defined by Unicode 6.1.
Also from the doc:
3.2. FTS5 Phrases
FTS queries are made up of phrases. A phrase is an ordered list of one
or more tokens.
You might try a "prefix query" i.e. MATCH "5*" to see that you get results.
Present records
{"myLabel":"AFRICANA"}
{"myLabel":"africans"}
{"myLabel":"AFRICAN"}
{"myLabel":"Africa"}
Query : `cts:json-property-word-match("myLabel", "Africa*")`
Result:
{"myLabel":"Africa"}
Query returns only match case data not all relavent rows.
Query : `cts:json-property-word-match("myLabel", "Africa*", "case-insensitive")`
Result:
your query returned an empty sequence
If I use "case-insensitive" option it returns empty sequence.
I have set word lexicons as myLabel.
How do I search for JSON data case insensitively?
Both of the examples provided return the expected results for me. Did you intend to show the second query as a search with cts:json-property-value-query()?
If that is the case, then applying the wildcarded option will ensure that values are matched case-insenstive and as a wildcarded query:
cts:search(doc(),
cts:json-property-value-query("myLabel", "Africa*", ("wildcarded","case-insensitive")))
Double check to see if you have "trailing wildcard searches" enabled, or any of the three, two, or one character searches enabled for your content database. The rules for wildcard searches state that you need specific database indexes enabled for the query to automatically apply queries with wildcard patters as a wildcarded query:
If neither "wildcarded" nor "unwildcarded" is present, the database configuration and $text determine wildcarding. If the database has any wildcard indexes enabled ("three character searches", "two character searches", "one character searches", or "trailing wildcard searches") and if $text contains either of the wildcard characters '?' or '*', it specifies "wildcarded". Otherwise it specifies "unwildcarded".
With either of those index settings enabled, a query term with a * should automatically execute as a wildcard query, and you can remove the explicit wildcarded option:
cts:search(doc(),
cts:json-property-value-query("myLabel", "Africa*", "case-insensitive"))
I am using Jena's SPARQL engine and trying to write a query to filter on a date range as I need to find the value of a property after a fixed date.
My date property is in the following format:
Fri May 23 10:20:13 IST 2014
How do I write a SPARQL query to get other properties with dates greater than this?
With your data in that format you can't filter on a range of it without adding a custom extension function to ARQ (which is intended for advanced users) since you would need to parse and interpret the date time string.
What you should instead be doing is translating your data into the standard date time format xsd:dateTime that all SPARQL implementations are required to support. See the XML Schema Part 2: Datatypes specification for details of this format.
Your specific example date would translate as follows:
2014-05-23T10:20:13+05:30
And you must ensure that you declare it to be a typed literal of type xsd:dateTime when you use it in data and queries. For example in the readable Turtle RDF syntax:
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix : <http://example.org> .
:subject :date "2014-05-23T10:20:13+05:30"^^xsd:dateTime .
You could then write a SPARQL query that filters by range of dates like so:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX : <http://example.org>
SELECT *
WHERE
{
?s :date ?date .
FILTER (?date > "2014-05-23T10:20:13+05:30"^^xsd:dateTime)
}
This finds all records where ?date is after the given date
I have an application that allow users to search on multiple columns (prod_name,prod_desc)
So I used full text search like below, but it does not return all the records, for excample I tried to find 'o' character in 2 columns (prod_name,prod_desc)but it can not find for some records.
Also when I do not use wildcard for the 'o' character it can not find any thing while contains means like %o%.
I am a bit confused about full text search.
Please help what is the problem.
CREATE FULLTEXT CATALOG catalog_crashcourse3;
CREATE FULLTEXT INDEX ON products(prod_name,prod_desc)
KEY INDEX pk_products ON catalog_crashcourse3;
SELECT prod_name, prod_desc
FROM products
WHERE CONTAINS((prod_name,prod_desc), '"*o*"');
SQL Server FTS is a word-based search process. When you create a full-text index on a column, the indexing engine crawls the content and breaks it into individual words in a process known as tokenization. The index then stored the word, the primary key of the row it was found in, and the word's position in the content (i.e. is is the first word in the field, the 57th word, or whatever).
When you specify a CONTAINS predicate such as
CONTAINS((prod_name,prod_desc), '"o"');
the SQL Server FTS engine looks for tokens (i.e. words) in its index that are "o". If your content does not have the word "o" in it (which is probably doesn't) then no matches will be found.
As you point out, you can do wildcard searches, where you try and matched patterns in the indexed word. For example, if you specify a predicate such as
CONTAINS((prod_name,prod_desc), '"o*"');
then the search will return all words in the indexed content that start with the letter "o"
FTS is best used when you want to search for groups of words in your indexed content. It can do sophisticated word stemming (such as searching for "ran" and "running" when you specify "run"). It also provides a ranking of the search result content so that you can find the best match. If you just want to search for a specified word in your content and your content is not too large, you may not need FTS. As MikeSmithDev pointed out in the comments, you may be able to just get away with a LIKE clause.
Note added: In response to your comment, if you have a table with 8 columns that you want to search using FTS, then you would create full-text indexes on each of these columns and search them as follows:
CONTAINS(*, '"Word"')
where the asterik indicates that all 8 indexed columns in the table should be included in the search.
You have two issues:
You are using a prefix wildcard *o which Sql Server FTS is
helpless with. It only works with suffix wildcards like word*.
You are using a single-character search term. Single character words
are excluded from the FT index by default, which is a good thing.
Unless specified otherwise, SQL Server associates the system
full-text stoplist by default when creating the index.
To see the default stoplist your database is using behind your back, use this query
Select SysStop.stopword, Langs.name
From sys.fulltext_system_stopwords SysStop
Inner Join sys.fulltext_languages Langs
On Langs.lcid = SysStop.language_id;
If you really want to search for single characters, you can drop and
recreate the FT index using the option WITH STOPLIST OFF, but be prepared
for a lot of noise. See Create FullText Index.
I am faced with a database (sqlite specifically) query that I am not sure how to approach.
I'm looking for all tuples that have 1-n word matches between their 'name' attribute and a constant. Sorted in descending order.
For example it is a database containing food items. If the constant is "Maranatha Natural Almond Butter 26oz Lightly Roasted" I would like any tuple in the database that contains atleast one of the words in that constant to be returned. For example "Almond Butter Natural" would come before "Maranatha Natural" which would come before "Almond", etc.
Essentially as long as there is one intersecting word between the tuples attribute and the constant it qualifies a match.
Matching words is what SQLite's full-text search extension is designed for. Please read that page to see how SQLite must be compiled and what is possible, but I'll add some remarks:
Simple matching is done with a query like:
SELECT * FROM foods_fts_tab WHERE name MATCH 'Maranatha Natural Almond etc.'
This will just return all records where at least one word matches.
You can weight the matches with information returned by the auxiliary functions.
For example,
SELECT ... ORDER BY length(offsets(foods_fts_tab)) DESC
will sort by the number of matching words.
You are asking for the number of word matches, but real search engines also use other information to compute relevancy scores. See the matchinfo() function in section 4.3 and the example in appendix A.