Include synonyms in solr without using synonyms.txt - drupal

I am using Drupal Apache Solr for my searches. in this I found a synonyms.text file in which you can include synonyms manually for the words u want.
But as I suppose it would be very hard to include synonyms manually for each word as my application has large data.
What I want to achieve is as following in my search results:
when the user will search for allu in place of potato, we will display potato as 1st result.
Another example: when user will search for 'raw apple' then we'll display 'apple' as 1st record because 'raw apple' is synonym of 'apple'.
But the problem is 100K records and each record has 4-5 synonyms. Entering them manually is not possible.
Another issue is If I want to make changes to synonyms of particular record I will have to do it manually which is time consuming as well.
I want to know is there any other option so that I need not to enter synonyms manually?

IMO this is close to search engine optimization. Also you may have a tough time managing the synonyms manually.
Follow what Indian e-retail sites are doing to accomodate synonyms. For example e-retail stores have adapted by renaming a certain product belly shoes as shoppers tend to mispronounce and misspell "ballet". They wouldnt have anticipated it before users actually searched for them.
So log all requests which return few results (and otherwise dissatisfy customers). Maintain a list of synonyms in the index. And include these synonyms in the keywords when adding a new product: when adding a product x y z, automatically fetch all synonyms to x, y and z and let your data entry guys choose from them.
'type':'synonym'
'terms':'ballet','belly'
'type':'synonym'
'terms':'potato','allu','aloo'
'type':'product'
'name':'home garden potato planter'
'keywords':'allu','aloo'
'type':'product'
'name':'aloo mutter fry mix'
'keywords':'potato','allu','cheese'

we can maintain a list of synonyms in the index. and include these synonyms in the keywords by adding a new product. when adding a new product a b c it can fetch synonyms to a, b and c.
'type' :product'
'name' :'monety carlo shirt for men'
'keywords' : 'montey carlo', 'shirts'
Example: Online Shopping Store Has adapted to rename certain products and misspell name.

Related

How to generate recommendations for a User using Gremlin?

I am using gremlin QL on AWS Neptune Database to generate Recommendations for a user to try new food items. The problem that I am facing is that the recommendations need to be in the same cuisine as the user likes.
We are given with three different types of nodes which are- "User", "the cuisine he likes" and "the category of the cuisine" that it lies in.
In the picture above, the recommendations for "User 2" would be "Node 1" and "Node 2". However "Node 1" belongs to a different category which is why we cannot recommend that node to "User2". We can only recommend "Node 2" to the user since that is the only node that belongs to the same category as the user likes. How do I write a gremlin query to achieve the same?
Note- There are multiple nodes for a user and multiple categories that these nodes belong to.
Here's a sample dataset that we can use:
g.addV('user').property('name','ben').as('b')
.addV('user').property('name','sally').as('s')
.addV('food').property('foodname','chicken marsala').as('fvm')
.addV('food').property('foodname','shrimp diavolo').as('fsd')
.addV('food').property('foodname','kung pao chicken').as('fkpc')
.addV('food').property('foodname','mongolian beef').as('fmb')
.addV('cuisine').property('type','italian').as('ci')
.addV('cuisine').property('type','chinese').as('cc')
.addE('hasCuisine').from('fvm').to('ci')
.addE('hasCuisine').from('fsd').to('ci')
.addE('hasCuisine').from('fkpc').to('cc')
.addE('hasCuisine').from('fmb').to('cc')
.addE('eats').from('b').to('fvm')
.addE('eats').from('b').to('fsd')
.addE('eats').from('b').to('fkpc')
.addE('eats').from('b').to('fmb')
.addE('eats').from('s').to('fmb')
Let's start with the user Sally...
g.V().has('name','sally').
Then we want to find all food item nodes that Sally likes.
(Note: It is best to add edge labels to your edges here to help with navigation.)
Let's call the edge from a user to a food item, "eats". Let's also assume that the direction of the edge (they must have a direction) goes from a user to a food item. So let's traverse to all foods that they like. We'll save this to a temporary list called 'liked' that we'll use later in the query to filter out the foods that Sally already likes.
.out('eats').aggregate('liked').
From this point in the graph, we need to diverge and fetch two downstream pieces of data. First, we want to go fetch the cuisines related to food items that Sally likes. We want to "hold our place" in the graph while we go fetch these items, so we use the sideEffect() step which allows us to go do something but come back to where we currently are in the graph to continue our traversal.
sideEffect(
out('hasCuisine').
dedup().
aggregate('cuisineschosen')).
Inside of the sideEffect() we want to traverse from food items to cuisines, deduplicate the list of related cuisines, and save the list of cuisines in a temporary list called 'cuisinechosen'.
Once we fetch the cuisines, we'll come back to where we were previously at the food items. We now want to go find the related users to Sally based on common food items. We also want to make sure we're not traversing back to Sally, so we'll use simplePath() here. simplePath() tells the query to ignore cycles.
in('eats').
simplePath().
From here we want to find all food items that our related users like and only return the ones with a cuisine that Sally already likes. We also remove the foods that Sally already likes.
out('eats').
where(without('liked')).
where(
out('hasCuisine').
where(
within('cuisineschosen'))).
values('foodname')
NOTE: You may also want to add a dedup() here after out('eats') to only return a distinct list of food items.
Putting it altogether...
g.V().has('name','sally').
out('eats').aggregate('liked').
sideEffect(
out('hasCuisine').
dedup().
aggregate('cuisineschosen')).
in('eats').
simplePath().
out('eats').
where(without('liked')).
where(
out('hasCuisine').
where(
within('cuisineschosen'))).
values('foodname')
Results:
['kung pao chicken']
At scale, you may need to use the sample() or coin() steps in Gremlin when finding related users as this can fan out really fast. Query performance is going to be based on how many objects each query needs to traverse.

How to give synonyms to an employee name lookup table?

I have created a lookup table of employee name text file referring to the rasa blog(link below).
Improving entity extractions with Rasa
Now my use case also requires me to give synonyms to these employees in the lookup table. For example, “Nicholas” can also be referred to as “Nick” or “Nic”, so that the rasa bot can extract “nick” as “nicholas” and fulfill the use case.
Please advice how to achieve this.
Thanks
Lookup and synonyms have a different purpose as while, lookups are used for entity extraction, synonyms are used as a filtration method to change the format of any synonyms to original text. Therefore, I think, you can't have synonyms within the lookup table so you might have to do that separately.
However, If you have a long list of synonyms you can use a file path instead of list.
## synonym:Nick
data/path/nick.txt
I had a similar situation with City names and their nick while I was using City name from the lookup but placed their synonyms in the main data file as
## synonym:New York City
- NY
- NYC
- New York
## lookup:city
data/lookups/city_lookup.txt
I recommend using https://github.com/rodrigopivi/Chatito which will really ease the task for you as it has a really good mapping system that does the work for you with regards to synonyms and lookups.

Amazon MWS Report Type "_GET_MERCHANT_LISTINGS_DATA_" result attributes meaning

In Amazon MWS API, when requesting report of type "_GET_MERCHANT_LISTINGS_DATA_"
What is the difference between the returned attributes:
product-id
listing-id
asin1
I also have tried to find any reference for the tab-delimited report types, but it seems to be scattered all around the web. The best description I found was part of the instructions for the Amazon Inventory Loader. (Note: may require a MWS seller login, the corresponding XLS does not have all columns described on the linked webpage) That page should answer most of your questions.
Since the link above might require a login, here's a short description on what these columns do:
asin1 refers to an item's Amazon Standard Identification Number. Every item on Amazon has such a number, there even is a Wikipedia entry describing what it is.
product-id along with product-id-typerefers to the item's non-Amazon standard identification number, if such a thing exists (otherwise it'll contain a copy of the item's ASIN).
product-id-type=1 -> product-id is ASIN
product-id-type=2 -> product-id is ISBN.
product-id-type=3 -> product-id is UPC
product-id-type=4 -> product-id is EAN (now called GTIN)
sku is your own item identifier such as part number. You created the link between an ASIN and your own SKU by creating the product. (I know you didn't ask for this, but this is for the sake of completeness)
listing-id There does not seem to be a lot of documentation on what theses are. There is a page explaining how to find out an item's listing id. It does not say why you'd ever want to know, though. I assume a listing ID identifies a certain seller's (your) offer for a specific item, but all MWS requests I've ever done either required me to link to a ASIN or my own SKU, but there may be others that require this id.
Sidenote: I find it weird that a single listing-id may relate to more than one ASIN - otherwise, why are there columns named asin2 and asin3?

DBpedia : Get list of Chinese universities and their adresses to populate google map?

I'm trying to get list of Chinese universities and their adresses. The minimum being the City/Town name. I will use these addresses to populate a googlemap, fiddle here.
I saw interesting code such as:
SELECT ?resource ?value
WHERE {
?resource a <http://dbpedia.org/class/yago/CitiesAndTownsInDenmark> .
?resource <http://dbpedia.org/property/populationTotal> ?value .
FILTER (?value > 100000)
}
ORDER BY ?resource ?value
Since CitiesAndTownsInChina doesn't work,
1. Where to find the exact name of the class I'am targeting ? and
2. Where to find dbpedia's operators manual ?
Note: I'am a very active user on Wikipedia, I'am well aware of all the data available there, but the dbpedia ontology/syntaxe/keywords is quite hard to get.
Personal note: queries on http://dbpedia.org/snorql/ , http://dbpedia.org/sparql/ , http://querybuilder.dbpedia.org/
(Expanding on my reply to How to find cities with more than X population in a certain country)
CitiesAndTownsInDenmark exists because people use the category http://en.wikipedia.org/wiki/Category:Cities_and_towns_in_Denmark in wikipedia. Wikipedia categories are pretty loose and as a result there's a lot of variation in style, so even if a useful category exists the name may not be guessable.
In addition categories are maintained manually, and may not be consistently applied.
A good place to start is looking at the data. Visiting http://dbpedia.org/page/Beijing I see yago:MetropolitanAreasOfChina which seems promising, but if you follow that link you'll see it's not well populated.
As a consequence avoid relying on the existence of such categories and directly querying for populated places in a country. This information comes from wikipedia infoboxes, and they're much more consistent than categories. Taking Beijing as an exemplar again I found:
select ?s {
?s a <http://dbpedia.org/ontology/PopulatedPlace> ;
<http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/China>
}
(The relevant properties and values for my query were found by copying link location in the Beijing page)
with the result:
"http://dbpedia.org/resource/Hulunbuir"
"http://dbpedia.org/resource/Guangzhou"
"http://dbpedia.org/resource/Chongqing"
"http://dbpedia.org/resource/Kuqa_County"
"http://dbpedia.org/resource/Changzhou"
... nearly 3000 results ...
You'll notice that position is encoded multiple times (geo:lat and long, georss:point, various dbpprop:latd longd things), and there seem to be two values excitingly. You can either simply deal with the multiple values in whichever format you prefer, or try picking just one using GROUP BY and SAMPLE.
As for a manual, almost everything I know of are academic papers, and not very useful. However the data is reasonably self documenting.
for your first question:
you can see possible classes by querying one member of your intended set of entities (ex: Shanghai).
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?type WHERE {
<http://dbpedia.org/resource/Shanghai> rdf:type ?type.
FILTER regex(str(?type), ".*China", "i").
} LIMIT 100
which gives this result:
dbpedia:class/yago/MetropolitanAreasOfChina [http]
dbpedia:class/yago/PortCitiesAndTownsInChina [http]
dbpedia:class/yago/MunicipalitiesOfThePeople'sRepuBlicOfChina [http]
dbpedia:class/yago/PopulatedCoastalPlacesInChina [http]
they are CamelCase versions of the categories that you will find at the bottom of wikipedia pages. I was fooled for a while by the erroneous capitalization of RepuBlic and finally saw that it contains only 4 cities, so it is of limited use for you.
so I would propose to go with #user205512 answer and get the cities by linking 2 properties.
for your second question:
I would advice you to search/ask on http://answers.semanticweb.com

Allow users to create new categories and fields on ASP.NET website

We have a db driven asp.net /sql server website and would like to investigate how we can allow users to create a new database category and fields - is this crazy?. Is there any examples of such organic websites out there - the fact that I havent seen any maybe suggest i am?
Interested in the best approach which would allow some level of control by Admin.
I've implemented things along these lines with a dictionary table, rather than a more traditional table.
The dictionary table might look something like this:
create table tblDictionary
(id uniqueidentifier, --Surrogate Key (PK)
itemid uniqueidentifier, --Think PK in a traditional database
colmn uniqueidentifier, --Think "column name" in a traditional database
value nvarchar, --Can hold either string or number
sortby integer) --Sorting columns may or may not be needed.
So, then, what would have been one row in a traditional table would become multiple rows:
Traditional Way (of course I'm not making up GUIDs):
ID Type Make Model Year Color
1 Car Ford Festiva 2010 Lime
...would become multiple rows in the dictionary:
ID ITEMID COLUMN VALUE
0 1 Type Car
1 1 CarMake Ford
2 1 CarModel Festiva
3 1 CarYear 2010
4 1 CarColor Lime
Your GUI can search for all records where itemid=1 and get all of the columns it needs.
Or it can search for all records where itemid in (select itemid from tblDictionary where column='Type' and value='Car' to get all columns for all cars.
In theory, you can put the user-defined types into the same table (Type='Type') as well as the user-defined columns that that Type has (Type='Column', Column='ColumnName'). This is where the sortby column comes into it - to help build the the GUI in the correct order, if you don't want to rely on something else.
A number of times, though, I have felt that storing the user-defined dictionary elements in the dictionary was a bit too much drinking-the-kool-aid. Those can be separate tables because you already know what structure they need at design time. :)
This method will never have the speed or quality of reporting that a traditional table would have. Those generally require the developer to have pre-knowledge of the structures. But if the requirement is flexibility, this can do the job.
Often enough, what starts out as a user-defined area of my sites has had a later project to normalize the data for reporting, etc. But this allows users to get started in a limited way and work out their requirements before engaging the developers.
After all that, I just want to mention a few more options which may or may not work for you:
If you have SharePoint, users already have the ability to create
their own lists in this way.
Excel documents in a shared folder that are saved in such a way
to allow multiple simultaneous edits would also serve the purpose.
Excel documents, stored on the webserver and accessed via ODBC
would also serve as single-table databases like this.

Resources