Cognitive Services Translation and Profanity Filtering

Cognitive Services Translation and Profanity Filtering - microsoft-cognitive

Issue Description
I use cognitive services TranslateArray to translate my users comments. One of the advantages of this service is that we can use ProfanityAction to mark every profane words in the destination language. I also make use of the automatic language detection, so that I do not have to identify the content before sending it in.
When I get my translation back for a destination language which match the source language, the profanity is not marked. Is there another endpoint I could/should hit, or a parameter I do not know about, or is there a possible improvement of the service ?
Corresponding Documentation
Follow the cognitive service protocol to hit the TranslateArray endpoint, with an english sentence containing profanities, with the ProfanityAction: Marked behavior: http://docs.microsofttranslator.com/text-translate.html#!/default/post_TranslateArray
Reproduction Steps
Send an English sentence with profanities
Translate to fr, notice correctly marked profanities
Translate to en, notice the missing profanities tag
Expected Behavior
Profanities should be marked even if no translation occured.
Actual Results
I obtained the unmodified sentence back.

There is nothing in the documentation that specifies what happens if the source and target language are the same. My guess is that if it sees that they match it will do nothing.
However, there is a specific API that detects profanity for any given language: Content Moderation for Text. The API docs are here.
The Text - Screen function does it all – scans the incoming text (maximum 1024 characters) for profanity, autocorrects text, and extracts Personally Identifiable Information (PII), all while matching against custom lists of terms.

Your observation that Translator API does nothing if source and target languages are the same, is correct. Not an answer, just clarification.

Related

What are the components of a google.com URL string? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 months ago.
Improve this question
When I google a keyword into google.com, I see this URL in the browser:
https://www.google.com/search?q=harry+potter&sxsrf=AOaemvJzqEslTi5rksHz8Da7pgdZ1J3uMw%3A1634810260185&source=hp&ei=lDlxYYaCCNaL9u8Popq2-AQ&iflsig=ALs-wAMAAAAAYXFHpA2d9PU58mYXikU2pl90IN7Z8wXq&ved=0ahUKEwiGnNLmntvzAhXWhf0HHSKNDU8Q4dUDCAg&uact=5&oq=harry+potter&gs_lcp=Cgdnd3Mtd2l6EAMyCAguEIAEEJMCMgUILhCABDIFCAAQgAQyBQguEIAEMgUIABCABDIFCAAQgAQyBQguEIAEMgUIABCABDIFCC4QgAQyBQgAEIAEOgcIIxDqAhAnOgQIIxAnOgUIABCRAjoLCC4QgAQQxwEQowI6CwguEIAEEMcBEK8BOgsILhCABBDHARDRA1D3GliFJmDtJmgAcAB4AIABowGIAeQKkgEDNi43mAEAoAEBsAEK&sclient=gws-wiz
I understand that virtually all websites work via the Hypertext Transfer Protocol. Some of the most common HTTP methods are GET and POST.
I assume the above is a POST method, since it has a request payload (my search query) and a response payload (the webpage returned).
The parameter "q" is clearly my search keyword.
What do
sxsrf=AOaemvJzqEslTi5rksHz8Da7pgdZ1J3uMw%3A1634810260185
source=hp
ei=lDlxYYaCCNaL9u8Popq2-AQ
iflsig=ALs-wAMAAAAAYXFHpA2d9PU58mYXikU2pl90IN7Z8wXq
ved=0ahUKEwiGnNLmntvzAhXWhf0HHSKNDU8Q4dUDCAg
uact=5
oq=harry+potter
gs_lcp=Cgdnd3Mtd2l6EAMyCAguEIAEEJMCMgUILhCABDIFCAAQgAQyBQguEIAEMgUIABCABDIFCAAQgAQyBQguEIAEMgUIABCABDIFCC4QgAQyBQgAEIAEOgcIIxDqAhAnOgQIIxAnOgUIABCRAjoLCC4QgAQQxwEQowI6CwguEIAEEMcBEK8BOgsILhCABBDHARDRA1D3GliFJmDtJmgAcAB4AIABowGIAeQKkgEDNi43mAEAoAEBsAEK
sclient=gws-wiz
represent, and how does one know?

There is two ways to know that: Semi-automated using the Unfurl tool and manually reading the list of explanations.
Semi-automated way with use of Unfurl
There is an project Unfurl URL parser and browser, a free tool to check and decode Google Search URLs, https://dfir.blog/introducing-unfurl/.
And here is online hosted version of Unfurl: https://dfir.blog/unfurl/
It is a visual 2D browser of URL parameters, use mouse wheel to zoom in and zoom out, use mouse to unclutter the nodes and watch the explanations for query parameters, not only google.
And further I collected some information, googled it now, 2022-Sep.
Beware that Google query parameters explanations can get outdated very soon, every several years, so the only thing you can do is to search again for newer explanations in the Net.
Manual way of reading the list of explanations
Google query parameters explanations from [2021][2021]:
q= query sent to search engine
oq= 'original query' text of query last typed by user into the search box before the user selected a search term from given suggestions; it coincides with q= if the latter was entered all manually
ei= Search Session Start Date/Time
represents the time that the user’s session started, "Google time" (so no dependence on the local system time).
ved= Page Load Date/Time
sxsrf= Previous Page Load Date/Time
Explanations from [2016][2016]:
Here is a list of the URL parameters that we would commonly see:
q= the query string (keyword) that the user searched
oq= tracks the characters that were last typed into the search box before the user selected a suggested search term
hl= controls the interface language
redir_esc= unknown
sa= user search behavior
rct= unknown; seems to be related to Google AdWords
gbv= control the presence of JavaScript on the page
gs_l= unknown; seems to be related to what type of search is being done (i.e., mobile, serp, img, youtube, etc.)
esrc= set to ‘s’ for secure search
frm= unknown
source= where the search originated (i.e., google.com, toolbar, etc.)
v= unknown
qsubts= unknown
action= unknown
ct= click location
oi= unknown
cd= ranking position of the search result that was clicked
cad= unknown; appears to be a referrer, affiliate or client token
sqi= unknown
ved= contains information about the search result link that was clicked (see https://moz.com/blog/inside-googles-ved-parameter)
url= the URL that Google will redirect the user to after a search result link is clicked
ei= passes an alphanumeric parameter that decodes the originating SERP where user clicked on a related search
usg= unknown; possibly handling the encrypted search string
bvm= unknown; possibly a location tracker
ie= input encoding (default: utf-8)
oe= output encoding
sig2= unknown
Sources:
[2021]: Analyzing Timestamps in Google Search URLs - Magnet Forensics
[2016]: The Approaching Darkness: The Google Referral URL In 2016
I hope others will update this list, and will add here newer explanations later.
Also I leave here few articles with outdated explanations:
2008 article - Moz' The Ultimate Guide to the Google Search Parameters, this is very similar or the same as mentioned in neighbour answer blog post Google Search URL Parameters [Ultimate Guide] by SEOQuake, it is again like from 2008.
2014 article How to Use the Information Inside Google's Ved Parameter - Moz.

First of all, the request you wrote uses GET as Request Method.
You can easily check that on the Network tab of the Developer Tools in any browser:
Second, the difference between GET and POST (there many other methods but that's other topic) isn't the one you write. Mainly, the difference between these two methods is the presence or not of a body in the request (even if you could send a body in a GET-Request, but that's highly unrecommended).
The Request Methods goal is to indicate the destination server how it should treat the request.
Now, focusing on your question, you could have discover the values of all of them with a simple Google search, but anyway here you have a blog where all the parameters of the Google Search URL are explained:
Google Search URL Parameters [Ultimate Guide]

Google translate API with annotations

When using the Google Translate API, it just returns one translation for a given word. For example, when I let it translate the English word "hide" to Italian, it just responds with "nascondere".
However, Google Translate on the web offers much more: they don't just show one translation (or list of possible translations), but also the frequency as well as the precise meaning of a specific translation:
I'd like to get these results via an API.
Is there a public API that offers the same results?
Of course, I could just use the endpoint /translate_a/single that is used by the Google Translate website. But this endpoint does not include an API key, so if I send too many requests, they will most likely block me.
Also, the endpoint /translate_a/single returns many fields of which I do not know the precise meaning, so its usage would most likely involve some reverse engineering.

Is Google translate API return any data about translated word verification?

It's important for me to identify, is a word verified.
What I mean

This does not appear to be available. If you check the docs for the body of a response to the API's translate method, there's no field indicating that the translation is checked by the Community (which is what this little badge next to the translated text implies).

What is 'uniform' about URI, URL and URN [Uniform Resource Identifier, Uniform Resource Location, Uniform Resource Name]?

I have read about the differences of the URI, URN and URL here and here but the answers talk of the differences of the last letter, that is, the differences amongst identifier, name and location respectively.
What I have not understood is why all these terms have the word 'uniform' and what is uniform about them. This Wikipedia section doesn't mention much about the reason why the change was made from 'universal' to 'uniform'.
I would like to find the missing explanation and not just memorize the terms as they are without fully understanding them.

Based on Tim Berners-Lee’s own account, as published in his book Weaving the Web:
At an IETF meeting, Tim Berners-Lee tried to form a working group that would create an Internet standard for what he suggested to be named universal document identifiers.
About the meeting, in his words (page 61):
[…] there was a strong reaction against the "arrogance" of calling something a universal document identifier. How could I be so presumptuous as to define my creation as "universal"? If I wanted the UDI addresses to be standardized, then the name "uniform document identifiers" would certainly suffice.
While he didn’t agree (it was an issue of whether the Web could be something "universal"), there wasn’t much time and so (page 62):
I was willing to compromise so I could get to the technical details. So universal became uniform, and document became resource.
They formed "a uniform resource identifier working group". (And this group then decided that "identifier" wasn’t a good label, and they chose "locator" instead, forming "URL" – which he also didn’t agree with.)
The current URI Internet standard (RFC 3986) describes the meaning of "Uniform", "Resource" and "Identifier" in section 1.1.

Make search URL search engine friendly: hash -> what?

I am developing a flight search engine for a customer, and currently the URLs look as follows (ad = destination airport, ao = origin airport, dates and number of passengers are not specified here):
http://example.com/#ad=S%C3%A3o+Paulo+-+Todos+os+aeroportos+(SAO),+Brasil&ao=Recife+-+Guararapes+Intl+(REC),+Brasil
My customer wants to make search pages more search engine friendly (SEO). The idea is that Brazilians who are looking for flights from, say, SAO to REC by e.g. Google should have a higher chance of finding that particular flight search engine.
The first step is probably replacing the fragment identifier (#) by a query string (?). The server then dynamically generates nice text content that can be viewed without JavaScript (search results would still be loaded via XHR). In my opinion, that makes a lot of sense.
Now, to make the URLs more search engine friendly:
(A) My customer proposes adding additional keywords into the URL, something like:
http://example.com?flights+to+Porto+Alegre&S%C3%A3o+Paulo+-+Todos+os+aeroportos+(SAO),+Brasil&ao=Recife+-+Guararapes+Intl+(REC),+Brasil
(B) I propose adding a slug instead, which can easily be internationalized, and which is good to read also for humans. Example:
http://example.com/pt_BR?ad=REC&ao=SAO/voos_de_Sao_Paulo_para_Recife
(C) Or, perhaps without a slug (but - due to parsability - only for a limited parameter set, which has the disadvantage of limiting sharing of URLs by users):
http://example.com/pt_BR/voos_de_Sao_Paulo_(SAO)_para_Recife_(REC)
What do you suggest? Any examples of good URLs for similar use cases?
That all being said: I understand that links from highly ranked pages are still the most important ranking measure. In the end, I wonder if all that complexity really is worth the effort. When I look at Google's own search pages, then they are rather simple. For example, there is no summary of the search query in a H1 tag, just as my customer wants. Of course, Google doesn't search itself...

don't use _ (underscore) to delimit words. Google interprets hello_world as one word but hello-world as two words.
don't put your human readable keywords in the query string (after the ?). Instead make it a normal URL http://example.com/pt_BR/search/voos-de-Sao-Paulo-(SAO)-para-Recife-(REC)
I would go for a something like: http://example.com/pt_BR/2012-10-28/voos-de-Sao-Paulo-(SAO)-para-Recife-(REC)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Cognitive Services Translation and Profanity Filtering - microsoft-cognitive

Your observation that Translator API does nothing if source and target languages are the same, is correct. Not an answer, just clarification.

Related

What are the components of a google.com URL string? [closed]

Google translate API with annotations

Is Google translate API return any data about translated word verification?

What is 'uniform' about URI, URL and URN [Uniform Resource Identifier, Uniform Resource Location, Uniform Resource Name]?

Make search URL search engine friendly: hash -> what?

Categories

Resources