is there a way to verify if a word exists in determined idiom? - idioms

I have some word lists that was extracted from aspell dictionary.
The problem is that some of the words that aspell returns isn't valid words.
I'd like know if there are a way to check if the words returned exists in determined idiom or not.
Thanks!

You could validate them using google spelling api.

Related

Watson Conversation example case sensitivity

I am trying to use the Watson REST APIs to add examples to an Intent for Watson. Before I call the Create Endpoint I call the Get Example endpoint with the intent and example.
When I call the Get Example endpoint with the word "fine" it returns a 404. Then when I try to Create an example it returns a 400 response
{"error":"Unique Violation: The value \"fine\" already exists"}
This is happening because we already have an example "Fine" (notice the first letter is capital).
How can I prevent this? Are there best practices to store examples in all lower case? Or should I just catch the 400 exception and look at the error.
Violation error means it didn’t update. So you can certainly check for that to take action. Although I personally recommend looking for the related item first to avoid the error.
Coding convention recommendations. These formats are used to easily recognize what is referenced in code and if an identifier is missed.
For example is this below an intent, entity or context variable?
accountingPayBillCode
Intents
All caps, spaces as underscores.
#ACCOUNTING_PAY_BILL
The examples (questions) should be entered untouched as how you received them. Do not attempt to fix spelling / grammar errors.
Example:
I need to pay my bill. Can yuo help me?
Entities
CamelCase with first word capitalized. The value should be all lowercase, and avoid multiple words (but must be meaningful).
#AccountDetail:code
The reason to avoid multiple words as the value is that you can end up with something like this.
#AccountDetail:(part number)
It makes it more prone to a mistake.
Synonyms should also be stored all in lower case.
Context variables.
Always reference using the $ prefix. Use camelCase with first character lowercase.
$accountCode

How search special characters in Kibana

I would like to find messages which contains this sequence "--->", but Kibana result it's wrong.
How escape this characters to have a good result ?
Thanks,
First I think you should be checking your mappings, whether your fields are not marked as not_analyzed (or don't have keyword analyzer). If it happened to be there as such you won't be able to see any search results. Standard analyzer removes characters when indexing a document.
What if you search it within the quotes including your special character?
This SO could help you. You could also maybe have a look at the Special Characters section at the Lucene doc. Hope it helps!

how access specific part of data as an input of AWK

Suppose I want to access an online dictionary and need to look for a specific word. I just like to have the specific part of data, which is those related to word and its translation as input of AWK,any idea?
In other words, I just want to have on my machine a margin of data, How can I prevent downloading all the data and hopefully save space and time. Is there any way to do so without downloading all the data to local machine?
This question is related to my last question here.
Edit 1:
I select dictionary as an example because when you want to look up for a word, it is enough to access a specific part of data and there is no need to process whole of it.
I am not an expert in programming so i was thinking I can modify this answer to make it work(that is why I add AWK tag again). I dont use any specific OS or tool. this is just a basic idea to see what are the possibilities so I dont know how can I improve the tags.
awk cannot download. You must download the file and pipe it into a command that terminates as soon as it finds a result:
wget -qqO- http://example.com/path |grep -wim1 "word"
wget -qqO- URL will have no output other than the content of the given URL, which is placed on standard out so you can then parse it. grep -wim1 "word" will find the first bounded word matching "word" and then terminate. If you don't need it outputted, you can use -wiq instead. If the dictionary has one word per line (and nothing else), you're better off with -x instead of -w so that you can match "can" in its entirety rather than "can't" (' is a word boundary). Remove the -i if you want to match case.
In the comments, you asked:
it may improve to jumpt to start of "w" character maybe so not to download whole data from "a" to "w". is it possible? I guess not
Some programs can "resume" downloads and you may be able to play with that, but you'd have to guess where to start. This would be a lot of work and you might seek too far and therefore fail to get a match.
If you are querying this dictionary more than once, I'd recommend downloading it and saving it so you can query it locally. Even the largest dictionary I know of is only 213MB (compressed, search with zgrep), though I am assuming you're talking about a traditional word list rather than a hash table or other arbitrary data form. Of course, anything longer would take such a long time to download that you'd only want to do it once.
If you really don't want to store it locally, you should probably consider a database rather than a flat file.

Comment in aspell .dic files?

They look like this:
abaft
abbreviation/M
abdicate/DNGSn
Abelard/M
abider/M
Abidjan
ablaze
abloom
I am using this kind of dictionary with Node.js application, but I will need it to be smarter. Specifically, I want to remember occurrence probability of every word based on already processed text. I'd like to save this information in existing .dic file - but how to do that without making it invalid?
Is there any comment syntax that would allow me to store additional data next to the words in file? Such that normal dictionary parser will ignore it?

Extract wordlist from wordweb lst files

Hi.. I am working on a natural language processing project (It will be on JAVA)... A paragraph of text goes in and it (will) extracts information from the paragraph and add it to its database... Like give it text of wikipedia and it will update its database...
First of all I have to identify verbs, nouns, adjectives, etc(all the facts known in advance) from the paragraph..
It can be done using a dictionary database from which I can query if a word is Verb, Noun, Adjective or something else...
But the problem is I can't find a dictionary database to download...
Or is there any other way I can do this... Please tell me if I am not clear...!!

Resources