Content moderation api for text fails recognizing simple drugs term as "cocaine" and other simple profanities. It seems to work only on a very limited set of profanities.
I'm using the Web GUI with my resouce key at the address: https://westus.dev.cognitive.microsoft.com/docs/services/57cf753a3f9b070c105bd2c1/operations/57cf753a3f9b070868a1f66f/console
Cristian, Content Moderator's Text API is for filtering out profanity specifically and does not check for mention of drugs.
You can use the custom lists API to create your stop-words that the API will use to scan against in addition to the built-in list of terms.
See https://westus.dev.cognitive.microsoft.com/docs/services/57cf755e3f9b070c105bd2c2/operations/57cf755e3f9b070868a1f67f.
I would also appreciate knowing more about the sample terms that you felt should have been detected. Let's figure out a way to share those if that's all right with you.
Thanks!
Related
Hi and thank you for looking into this.
(Disclaimer: I have little-to-no technical background and would like to find the least complex solution. Ideally, only "connecting" different out-of-the-box components and no coding.)
HIGH-LEVEL PROBLEM:
I have trained a model for text classification using Google AutoML. I want to make this model available on a website, ie I want to enable visitors to enter their text and to receive the model's predicted class.
CONSIDERATIONS SO FAR: AutoML allows us to deploy the model via REST API and I understand that what I want are the API's PUT and GET function (right?). Ideally, I would use some form of plug-in or script to create an input field for the user which accepts the PUT and then delivers the GET.
Are you aware of any services for this? I'm also happy to host the website in an content management system like WordPress.
I'm very open regarding other approaches to solving my problem and highly appreciate any constructive input.
Many thanks!
AutoML Documentation https://cloud.google.com/natural-language/automl/docs/predict
EDIT Jan 10 There is another question related to this and a depo is shared which supposedly provided a solution. I'm not able to access the depo but the question might help you to understand my issue. Is there a way to use Googles AutoML with JavaScript?
EDIT Jan 16 I have learned that in order to provide the input to the model the POST function could be used instead of the PUT.
When using the translation API, I get a different translation (and worse) than if I use translate.google.com.
I am working on a project for a client, and the client was dissatisfied with the translation and noticed the difference.
Do these two service use different engines? I read that the API uses nmt-mode now, and that translate.google.com already uses the same engine.
Both set to translate from Norwegian to English.
Any more information that can clear this up?
Thanks!
The result differences between the translate.google.com and the Translation API calls are considered as an expected behavior that can be generated due to maintenance tasks and the logic used by the internal processes; However, the engines used for each service seems to be private information.
Based on this, it is normal to get some variances when using the API. I think you can use the model parameter option as an available workaround in case you want to specify which of the available models to use, as well as take a look on the Specifying a model official documentation to get detail information about this alternative.
It's almost about 3 years later and the problem still remains!
So I was trying to translate a dataset with the Google Translate API, but in the end it failed to translate some texts to the target language (in my case, Persian/Farsi). So I decided to check them to see if there's a pattern and maybe translate them using the web version of Google Translate.
As I was doing so, I figured that the web version actually could translate some of those untranslated texts, BUT not all. When trying to find a reason for such behaviour, I found out that most of them were names and not sentences. But as we know, names can easily be written with the target language characters as the translation. But why the API doesn't transform those names while the web version does? This photo will explain everything perhaps:
verified translation
As can be seen, some translations have a badge indicating that the translation has been verified, while some others don't.
So to recap, my guess is that maybe the API is set to only use verified translations, but as for the web version, even unverified translations are allowed since you can edit or report them.
I know that Google Dictionary was discontinued in 2011, but the dictionary information and definitions are still available through google search results:
Does anyone know whether this information can be accessed through the Custom Search API or the Translate API?
I found this related question (but sadly without a satisfying answer).
I also needed Google Dictionary API for my project, it was not present so I decided to create one.
I scrapped the WebPage for the url https://www.google.com/#q=define+term where term is any word you want to get meaning of, and created the API, you can find it here Google Dictionary API.
How to use
The basic syntax of a URL request to the API is shown below:
https://api.dictionaryapi.dev/api/v2/entries/<--language_code-->/<--word-->
As an example, to get definition of English word hello, you can send request to:
https://api.dictionaryapi.dev/api/v2/entries/en/hello
The API also provides other meanings of the word, example sentences, and synonyms, if any.
If you want me to include any other details, please comment and I will happily extend the API to cover your needs.
In case you wish to see the code, it is on github.
Google Dictionary's content is licenced from Oxford Dictionaries' Lexico. Their API can be accessed from here.
Note their free access platform ("prototype") has a number of limitations:
1000 requests per month
Limited data access
Limited request rate
It doesn't look promising from the API Explorer
https://developers.google.com/apis-explorer/#search/dictionary/
I use Kimonolabs right now for scraping data from websites that have the same goal. To make it easy, lets say these websites are online shops selling stuff online (actually they are job websites with online application possibilities, but technically it looks a lot like a webshop).
This works great. For each website an scraper-API is created that goes trough the available advanced search page to crawl all product-url's. Let's call this API the 'URL list'. Then a 'product-API' is created for the product-detail-page that scrapes all necessary elements. E.g. the title, product text and specs like the brand, category, etc. The product API is set to crawl daily using all the URL's gathered in the 'URL list'.
Then the gathered information for all product's is fetched using Kimonolabs JSON endpoint using our own service.
However, Kimonolabs will quit its service end of february 2016 :-(. So, I'm looking for an easy alternative. I've been looking at import.io, but I'm wondering:
Does it support automatic updates (letting the API scrape hourly/daily/etc)?
Does it support fetching all product-URL's from a paginated advanced search page?
I'm tinkering around with the service. Basically, it seems to extract data via the same easy proces as Kimonolabs. Only, its unclear to me if paginating the URL's necesarry for the product-API and automatically keeping it up to date are supported.
Any import.io users here that can give advice if import.io is a usefull alternative for this? Maybe even give some pointers in the right direction?
Look into Portia. It's an open source visual scraping tool that works like Kimono.
Portia is also available as a service and it fulfills the requirements you have for import.io:
automatic updates, by scheduling periodic jobs to crawl the pages you want, keeping your data up-to-date.
navigation through pagination links, based on URL patterns that you can define.
Full disclosure: I work at Scrapinghub, the lead maintainer of Portia.
Maybe you want to give Extracty a try. Its a free web scraping tool that allows you to create endpoints that extract any information and return it in JSON. It can easily handle paginated searches.
If you know a bit of JS you can write CasperJS Endpoints and integrate any logic that you need to extract your data. It has a similar goal as Kimonolabs and can solve the same problems (if not more since its programmable).
If Extracty does not solve your needs you can checkout these other market players that aim for similar goals:
Import.io (as you already mentioned)
Mozenda
Cloudscrape
TrooclickAPI
FiveFilters
Disclaimer: I am a co-founder of the company behind Extracty.
I'm not that much fond of Import.io, but seems to me it allows pagination through bulk input urls. Read here.
So far not much progress in getting the whole website thru API:
Chain more than one API/Dataset It is currently not possible to fully automate the extraction of a whole website with Chain API.
For example if I want data that is found within category pages or paginated lists. I first have to create a list of URLs, run Bulk Extract, save the result as an import data set, and then chain it to another Extractor.Once set up once, I would like to be able to do this in one click more automatically.
P.S. If you are somehow familiar with JS you might find this useful.
Regarding automatic updates:
This is a beta feature right now. I'm testing this for myself after migrating from kimonolabs...You can enable this for your own APIs by appending &bulkSchedule=1 to your API URL. Then you will see a "Schedule" tab. In the "Configure" tab select "Bulk Extract" and add your URLs after this the scheduler will run daily or weekly.
E.g. - translating "amigo" from Spanish to English.
This gives a result ("friend"), which I'd expect in the API.
Does the API also offer the dictionary-like elements from that page like in the following image?
The API is not free to test, so I've been unable to see if it contains the result I want or not.
If not possible, can anyone suggest a different API for the purpose (multilingual dictionary, at least English -> other languages)?
No. The Google Translate API doesn't expose an endpoint for retrieving the dictionary-like elements you're asking about.
As of today the functions available through the API are for:
Translation of text
Detection of the source language of the given text
Listing which language codes the API supports.
There's no endpoint available for retrieving the audio for the translations either.
On the plus side, I've seen that the API has had its list of supported languages expanded regularly, though, and its language models have apparently been being updated.
Recommendations for other APIs to use is outside the scope of StackOverflow, but some Google searching should help you find what's available.