I'm new to Elastic Search in ASP.NET, and I have a problem which I'm, so far, unable to resolve.
From documentation, I've seen that & sign is not listed as a special character. Yet, when I submit my search ampersand sign is fully ignored. For example if I search for procter & gamble, & sign is fully ignored. That makes quite a lot of problems for me, because I have companies that have names like M&S. When & sign is ignored, I get basically everything that has M or S in it. If I try with exact search (M&S), I have the same problem.
My code is:
void Connect()
{
node = new Uri(ConfigurationManager.AppSettings["Url"]);
settings = new ConnectionSettings(node);
settings.DefaultIndex(ConfigurationManager.AppSettings["defaultIndex"]);
settings.ThrowExceptions(true);
client = new ElasticClient(settings);
}
private string escapeChars(string inStr) {
var temp = inStr;
temp = temp
.Replace(#"\", #"\\")
.Replace(#">",string.Empty)
.Replace(#"<",string.Empty)
.Replace(#"{",string.Empty)
.Replace(#"}",string.Empty)
.Replace(#"[",string.Empty)
.Replace(#"]",string.Empty)
.Replace(#"*",string.Empty)
.Replace(#"?",string.Empty)
.Replace(#":",string.Empty)
.Replace(#"/",string.Empty);
return temp;
}
And then inside one of my functions
Connect();
ISearchResponse<ElasticSearch_Result> search_result;
var QString = escapeChars(searchString);
search_result = client.Search<ElasticSearch_Result>(s => s
.From(0)
.Size(101)
.Query(q =>
q.QueryString(b =>
b.Query(QString)
//.Analyzer("whitespace")
.Fields(fs => fs.Field(f => f.CompanyName))
)
)
.Highlight(h => h
.Order("score")
.TagsSchema("styled")
.Fields(fs => fs
.Field(f => f.CompanyName)
)
)
);
I've tried including analyzers, but then I've found out that they change the way tokenizers split words. I haven't been able to implement changes to the tokenizer.
I would like to be able to have following scenario:
Search: M&S Company Foo Bar
Tokens: M&S Company Foo Bar + bonus is if it's possible to have M S tokens too
I'm using elastic search V5.0.
Any help is more than welcome. Including better documentation than the one found here: https://www.elastic.co/guide/en/elasticsearch/client/net-api/5.x/writing-queries.html.
By default for a text field the analyzer applied is standard analyzer. This analyzer applies standard tokenizer along with lowercase token filter. So when you are indexing some value against that field, the standard analyzer is applied on that value and the resultant tokens are indexed against the field.
Let's understand this by e.g. For the field companyName (text type) let us assume that the value being passed is M&S Company Foo Bar while indexing a document. The resultant tokens for this value after the application of standard analyzer will be:
m
s
company
foo
bar
What you can notice is that not just whitespace but also & is used as delimiter to split and generate the tokens.
When you query against this field and don't pass any analyzer in the search query, it by default apply the same analyzer for search as well which is applied for indexing against the field. Therefore, if you search for M&S it get tokenised to M and S and thus actual search query search for these two tokens instead of M&S.
To solve this, you need to change the analyzer for the field companyName. Instead of standard analyzer you can create a custom analyzer which use whitespace tokenizer and lowercase filter (to make search case insensitive). For this you need to change the setting and mapping as below:
{
"settings": {
"analysis": {
"analyzer": {
"whitespace_lowercase": {
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"companyName": {
"type": "text",
"analyzer": "whitespace_lowercase",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Now for the above input the tokens generated will be:
m&s
company
foo
bar
This will ensure that when searching for M&S, & is not ignored.
Related
Currently, I'm building an app with with following similar logic:
...
const user = {
isAdmin: true,
company: '5faa6a847b42bf47b8f785a1',
projects: ['5faa6a847b42bf47b8f785a2']
}
function defineAbilityForUser(user) {
return defineAbility((can) => {
if (user.isAdmin) {
can('create', 'ProjectTime', {
company: user.company,
}
);
}
can(
'create',
'ProjectTime',
["company", "project", "user", "start", "end"],
{
company: user.company,
project: {
$in: user.projects
}
}
);
});
}
const userAbility = defineAbilityForUser(user); //
console.log( permittedFieldsOf(userAbility, 'create', 'ProjectTime') );
// console output: ['company', 'project', 'user', 'start', 'end']
Basically an admin should be allowed to create a project time with no field restrictions.
And a none admin user should only be allowed to set the specified fields for projects to which he belongs.
The problem is that I would expect to get [] as output because an admin should be allowed to set all fields for a project time.
The only solution I found was to set all fields on the admin user condition. But this requires a lot of migration work later when new fields are added to the project time model. (also wrapping the second condition in an else-block is not possible in my case)
Is there any other better way to do this? Or maybe, would it be better if the permittedFieldsOf-function would prioritize the condition with no field restrictions?
There is actually no way for casl to know what means all fields in context of your models. It knows almost nothing about their shapes and relies on conditions you provide it to check that objects later. So, it does not have full information.
What you need to do is to pass the 4th argument to override fieldsFrom callback. Check the api docs and reference implementation in #casl/mongoose
In casl v5, that parameter is mandatory. So, this confusion will disappear very soon
Edit for clarity: There are no error messages, it simply returns an empty list if the input string is from the context.arguments, suggesting that it simply isn't getting the input variable out on the query tester (setting it up incorrectly brings up that famous typing error of course). I've also made this into a pipeline with the exact same result. Looking around, people suggest making an intermediate object, but surely I'm just getting my input variables out wrong somehow.
I'm working on a project in AWS Appsync using DynamoDB and I've run into a problem with the context.arguments input.
Basically the code all works if I hardcode the string for the book id into the query (full context to follow), but if I use the context.arguments, it simply refuses to work properly, returning an empty array for the "spines".
I have the following types in my schema:
type Book {
id: ID!
title: String
spines: [Spine]
}
type Spine {
id: ID!
name: String
bookId: ID!
}
I use the following query:
type Query {
getBook(id: ID!): Book
query getBook($bookId: ID!){
getBook(id: $bookId){
title
id
spines {
name
bookId
}
}
}
With the following input (assume this is a relevant guid):
{
"bookId": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
}
And this resolver for the spines object:
{
"version" : "2017-02-28",
"operation" : "Query",
"index" : "bookId-index",
"query" : {
"expression": "#bookId = :bookId",
"expressionNames" : {
"#bookId" : "bookId"
},
"expressionValues" : {
":bookId" : { "S" : "${context.arguments.id}" }
}
}
}
}
I made sure my data set contained false positives too (spines for other books) so that I know when my query brings back the correct data.
This works if I hardcode a guid as string instead of using context.arguments, and gets exactly what I'm looking for for each book guid.
For example, replacing the expression values with this works perfectly:
"expressionValues" : {
":bookId" : { "S" : "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" }
}
Why does "${context.arguments.id}" not get the input variable here the same way as it seems to in other queries?
Thanks to #IonutTrestian for pointing me in the right direction.
$ctx.args was empty, but I decided to go up the chain to see what was in the entire context, so $util.error($util.toJson($ctx)).
The json object I found included a little object called "Source", which contained the query return for the Book object.
Long story short, $ctx.source.id when applied to my query worked a charm.
I also know a bit more about debugging DynamoDB resolvers in case I encounter problems like this in future. Thank you so much!
Searching on 'Newtonsoft.Json.JsonReaderException: Additional text encountered after finished reading JSON content: {. Path '', ...' finds at least 3 SO questions all of which were traced to invalid Json.
I've tried 3 different validators on:
[{"Imported": "This registration imported on: 06/20/2016"},{"ContactInfoUpdated": " Street Address2: Suite 222 to Shipping Address2: "}]
and all three report it as valid. And yet the runtime error tosses that same 'Additional text encountered...':
if (!string.IsNullOrWhiteSpace(UserComments))
{
JToken addresses;
addresses = JObject.Parse(UserComments).GetValue("CarbonCopy"); //errors here
if (!ReferenceEquals(null, addresses))
{
//stuff
}
}
To establish that there are no unintended characters after the json closes, here's the sql:
UPDATE dbo.[Order] SET UserComments = '[{"Imported": "This registration imported on: 06/20/2016"},{"ContactInfoUpdated": " Street Address2: Suite 222 to Shipping Address2: "}]' WHERE idOrder =121050
With thanks to Brian's prompting I found this post to be very helpful:
Get Value from JSON using JArray
My json is somewhat unusual in that it's an array of dissimilar objects. Instead of a series of name/value pair where Name stays constant, my 'UserComments' field contains pairs where Name can be 'ProfileWasEdited', 'CCRequested', 'Feedback', and so on.
In order to accommodate that type of structure I need to test against the Name property:
var fields = JToken.Parse(UserComments);
var isCC = "";
foreach (JObject content in fields.Children<JObject>())
{
foreach (JProperty prop in content.Properties())
{
if (prop.Name == "CarbonCopy")
isCC = prop.Value.ToString();
}
}
Resharper informs me that I can Linq-ize the above to:
foreach (JProperty prop in fields.Children<JObject>().SelectMany(content => content.Properties().Where(prop => prop.Name == "CarbonCopy")))
{
isCC = prop.Value.ToString();
}
I've created an index in sense which I'm happy with and am trying to implement a typed query in the NEST client as follows:
var node = new Uri("http://elasticsearch-blablablamrfreeman");
var settings = new ConnectionSettings(node)
.SetTimeout(300000)
.SetDefaultIndex("films")
.MapDefaultTypeIndices(d => d
.Add(typeof(film), "films"))
.SetDefaultPropertyNameInferrer(p=>p);
Inject it (amongst the searcher and indexer) with my DI:
builder.Register(c => new ElasticClient(settings)).Named<ElasticClient>("esclient");
Search using any query, such as the below:
var result = _client.Search<film>(s => s
.AllIndices()
.From(0)
.Size(10)
.Query(q => q
.Term(p => p.Title, query)
));
The indexer seems to work fine so code not included here. I've swapped in any number of settings parameters so I know that there's some redundancy in the code set above (or at least the default index would've sufficed).
The result var contains nothing whatsoever, with a big fat 0 across all it's properties, despite my having a wealth of data across my indices (including the "films" index).
I've even tried a raw QueryRaw method with a matchall and nada!
EDIT (Chris Pratt was along the right lines here)
Running:
var result = _client.Search<film>(s => s
.From(0)
.Size(10)
.QueryRaw(#"{ ""match_all"": {} }"));
And having:
var settings = new ConnectionSettings(node)
.SetTimeout(300000)
.MapDefaultTypeIndices(d => d
.Add(typeof (film), "chosen_index"))
.MapDefaultTypeNames(t => t
.Add(typeof (film), "en"));
Returns debug info as:
[Elasticsearch.Net.ElasticsearchResponse<Nest.SearchResponse<film>>] = {StatusCode: 200,
Method: POST,
Url: http://elasticsearch-blablablamrfreeman/chosen_index/film/_search,
Request: {
"from": 0,
"size": 10,
"query": { "match_all": {} }
},
Response: <Response stream not captured or already read...
My question being: It seemed I was in fact querying the wrong URL as per Chris Pratt's comment, but why isn't the type inference working for the type but it is for the index?
/chosen_index/film/_search
should read
/chosen_index/en/_search
If my inferencing is correct.
Should it POST or GET? I usually GET via the search API on sense. And finally, what if I want to write my queries against my native film type but have it override the ES-type in the URL in some instances.
For example if I inject a different language parameter and wish to now query the same index but both "en" and "de" ES-types etc (which are all valid types under the same index as already constructed via sense).
Thanks in advance!
Nothing obvious is jumping out at me for why this isn't working for you. However, I can give you a few avenues to pursue to attempt to resolve the issue.
I'm not familiar with the particular DI container that you're using, but it's possible that it's not binding properly, resulting some of your settings options not actually being utilized in the instance that's created. Might be a long shot, but I'd recommend digging in and at least verifying that the client instance you're getting is setup the way it should be.
It sort of side-steps the issue in a way, but Elasticsearch explicitly recommends you don't handle localization via different types. You should either use different indexes, i.e. chosen_index_en, chosen_index_es, etc., or use multifields:
"title": {
"type": "string",
"fields": {
"en": {
"type": "string",
"analyzer": "english"
},
"es": {
"type": "string",
"analyzer": "spanish"
}
}
Then you can search on things like title.en or title.es.
As I see you are using the default mappings for the film type. That is, the data are analyzed by the standard analyzer before being indexed.
In the query, you are using the Term query which finds documents that contain the exact term (not analyzed) specified in the inverted index (see here). So be careful what your query is.
Try to use a match query like below:
var result = _client.Search<film>(s => s
.AllIndices()
.From(0)
.Size(10)
.Query(q => q
.Match(p => p.Title, query)
));
The query is now analyzed by the standard analyzer before being applied (see here).
I have a situation where I have json String that has a child as Array that contains only Strings. Is there as way I can get the object reference of the arrays that contains a specific String.
Example:
{ "Books":{
"History":[
{
"badge":"y",
"Tags":[
"Indian","Culture"
],
"ISBN":"xxxxxxx",
"id":1,
"name":"Cultures in India"
},
{
"badge":"y",
"Tags":[
"Pre-historic","Creatures"
],
"ISBN":"xxxxxxx",
"id":1,
"name":"Pre-historic Ages"
}
]
}
}
To Achieve:
From the above JSON String, need to get all books in History which contains "Indian" inside the "tags" list.
I am using JSONPATH in my project but If there is other API that can provide similar functionality, any help is welcome.
If you're using Goessner JSONPath, $.Books.History[?(#.Tags.indexOf('Indian') != -1)] as mentioned by Duncan above should work.
If you're using the Jayway Java port (github.com/jayway/JsonPath), then
$.Books.History[?(#.Tags[?(# == 'Indian')] != [])] or more elegantly, use the in operator like this $.Books.History[?('Indian' in #.Tags)]. Tried them both here.
Assuming you are using Goessner JSONPath (http://goessner.net/articles/JsonPath/) the following should work:
$.Books.History[?(#.Tags.indexOf('Indian') != -1)]
According to the Goessner site, you can use underlying JavaScript inside the ?() filter. You can therefore use the JavaScript indexOf function to check if your Tags array contains the tag 'Indian'.
See a working example here using this JSONPath query tester:
http://www.jsonquerytool.com/sample/jsonpathfilterbyarraycontents
Did you try to use underscoreJS ? You can get the Indian books like this :
var data = {"Books:"....};
var indianBooks = _.filter(data.Books.History, function(book) { return _.contains(book.Tags, "Indian"); })