Why does text within an HTML tag sometimes not get translated? - google-translate

I've written a program that uses the Google Translate Python API to translate webpages. Most of the time, the API does translation as I expect, but in some cases text within a tag does not get translated.
I tried putting one such tag in the Google Translate web interface and found that the text is still not translated; i.e., the problem has to do with the Google Translate service rather than the way I am using the API.
The specific tag I am looking at is: <div class="someClass">World:</div>
I want the word "World" to be translated in the output, regardless of the language into which I am translating. In certain languages, such as French and Khmer, the word "World" is translated as expected, but in other languages, such as Spanish and Somali, it remains "World." I have noticed that removing the class attribute sometimes helps (translation then works in Spanish but not in Somali), and adding more text seems to help as well (I've never seen this issue when the text is a full sentence or paragraph, for example).
In the context of my project, it is particularly important that the case of a tag with just one word inside be handled correctly. Does anyone know why this is happening or how I can make translation happen consistently? A solution requiring minimal to no changes to the original HTML would be ideal.
Edit A little more context based on playing around with things: Directly calling google.cloud.translate.Client().translate('<div class="someClass">World:</div>', 'es') actually has the correct behavior: "World" becomes "Mundo." I incrementally lengthened the page text by adding tags that came before and after that div in the original webpage--none of which wrapped more than one word of text--and the text between tags stopped being translated when the text was around 1,000 characters long. However, when I changed "World:" to a whole sentence, all of the text between tags was translated even when the page text was longer than 1,000 characters.

Related

Translate link destination with WPML for Elementor Text Editor links

After having been given the task of translating a whole "document" from german to english, I am really glad that WPML's Translation Editor makes it so easy to keep the structure intact and only change the actual text inside it.
That said, I've been running into an issue when the text comes from an Elementor Text Editor and contains links: WPML's Editor doesn't seem to offer a way to also translate a link's destination. It provides a way to mark where the text of a link is in the translation, but that's it.
And some links are language coded, like &lang=de, but since the Editor only translates the text of the link, the result then points at the wrong destination.
E.g. the english section of a text pointing to a source in german.
This is something I keep having to fix manually afterwards, because the Editor only offers translating destinations for Elementor widgets that declare the fields translateable, and even there it can be spotty, as I've found out.
Now I am wondering if there's something I could do to configure WPML to take those links inside the Text Editor element like it does with actual link elements, or if that's out of the question.
I'd really prefer to not have to routinely check the translation destinations, because as it stands do I not even know when WPML triggered an automatic translation because something about the layout changed, but no text changes happened.
So the changes to the link destinations could be just undone silently, which has happened before for a different part of the site.

How does #:~:text= work for highlighting text?

When following a link from the Google search page, I often see the most relevant part of the page highlighted like so:
Looking at the URL, I see bits of the highlighted text appear after the #:~:text= anchor (hash, colon, tilde, colon, text - in case someone is trying to Google this question):
https://ru.wikipedia.org/wiki/%D0%A4%D0%B0%D0%BC%D0%B8%D0%BB%D1%8C%D0%BD%D0%B0%D1%8F_%D0%BF%D1%80%D0%B8%D1%81%D1%82%D0%B0%D0%B2%D0%BA%D0%B0#:~:text=%D0%BF%D0%B0%D0%BB%D0%BE%D0%BC%D0%BD%D0%B8%D1%87%D0%B5%D1%81%D1%82%D0%B2%D0%BE%20%D0%B2%20%D0%9C%D0%B5%D0%BA%D0%BA%D1%83-,%D0%90%D1%80%D0%BC%D0%B5%D0%BD%D0%B8%D1%8F,%D1%8F%D0%B2%D0%BB%D1%8F%D0%BB%D1%81%D1%8F%20%D1%81%D1%8B%D0%BD%D0%BE%D0%BC%20%D0%B8%D0%BB%D0%B8%20%D0%BF%D0%BE%D1%82%D0%BE%D0%BC%D0%BA%D0%BE%D0%BC%20%D1%81%D0%B2%D1%8F%D1%89%D0%B5%D0%BD%D0%BD%D0%B8%D0%BA%D0%B0
Is this notation (#:~:text=) Wikipedia-specific or is it a more standard way of highlighting text on a web page? Where can I read up on this? So far my google-fu has failed me.

Assign <wbr> to specific characters within CSS style

I am new to CSS and web development in general. Hopefully there is a way to accomplish what I am trying to do. What I am trying to do is simple to explain, but I need to give some background info first, sorry for the length of the post.
I have created a webpage that is in the Tibetan language. Tibetan does not have spaces between words, it only has a character called a "tsheg" (་ - U+0F0B) that is used to separate every syllable. It also has a mark called a "shey" (། - U+0F0D) that comes at the end of phrases and clauses and sentences. Although sometimes it is doubled, after a shey is generally a space before the next line of text. When typing in Tibetan this space is represented not as a normal space (U+0020) but instead U+00A0, however when it comes to browsers and HTML/coding in general these two seem to behave the same.
In any Tibetan writing, the ideal aesthetic is for full justification. Traditionally there would be slight spaces placed between the tsheg marks and the shey marks to achieve a perfectly flush left and right alignment. (The exception would be the last line of a text, or a paragraph in contemporary formatting, does not need to be justified). It is acceptable for lines to break mid-word or mid-sentence, but never mid syllable. So the last character on any line is going to be either a tsheg or a shey. It is also not acceptable to start a line with a shey. In the last few years this has been easy to achieve for desktop publishing using MS Word, using "Thai Justification." However that option is not available even in other Office products, never mind outside of the Office environment. Other work-arounds have been to add invisible width characters after every tsheg and shey, allowing for wrapping at any point.
Now comes the question and difficulty. I am using distributed justification, and that seems to be the best option. It does not break syllables up, which is important. But it only wants to break at those spaces after shey marks, and it breaks elsewhere when there is a long string of text without a space, but if there is a space then it breaks there, sometimes stretch one or two syllables across an entire line, which is obviously not ideal.
Now, when coding the HTML of the text I can use the same work-around that is used for desktop publishing pre "Thai justification," I can add a <wbr> after every single tsheg, and this will not be visible to the end user and should allow cleaner breaking. However, there are two problems with this. But inserting that many <wbr> characters I am essentially doubling, or close to doubling, my character count, which can make the page take twice as long to load, even if half of those characters are invisible. However, more important is that it disrupts search functionality. Although you may see the word that has the syllables "AB" for instance, if you tried searching for AB you wouldn't find it, because the HTML sees "AB". And being able to search is kind of critical. Enough so that an ugly formatting is preferable to losing the ability to search and to be indexed properly. Obviously, since I need the site to be responsive and I do not know what size screens will be used I cannot have forced line breaks, either, another trick used when publishing.
So, finally, my question. Is there a way I can define a style or function or some sort of element that automatically associates a certain character--in my case the tsheg character--as having a <wbr> command after it without actually needing to input that command into my HTML? So when the text is justified it treats every tsheg as a <wbr>? I have a class .Tibetan in my stylesheet that defines the font and the justification and so forth, is there some way I can add some code there that achieves what I am looking for?
The one other thing I tried was replacing all of the spaces with which gave a beautiful justified appearance but it also caused the browser to disregard the tsheg marks entirely and it allowed for the cutting in half of syllables.
If you want to see an example of what I am talking about you can visit this page of my site: http://publishing.simplebuddhistmonk.net/index.php/downloads/critical-editions/ and next to the word "English" click the Tibetan characters and that will bring up a paragraph of prose, or you can look here: http://publishing.simplebuddhistmonk.net/index.php/downloads/tibetan/essence-of-dispelling-errors-tib/ (though the formatting on that latter page is less egregious than the former, at least on my screen).
EDIT It looks like the solution this person used might be able to be adapted for my use: Dynamically add <wbr> tag before punctuation however I do not actually understand what I would need to add, and where, to make that work for me. Anyone think that might apply to this scenario? And if so, what code would I add where?
NEW EDIT So, I think the problem might be with the search function that comes from my WordPRess theme. I used my workaround as mentioned above, adding the tag after every tsheg, on this page: http://publishing.simplebuddhistmonk.net/index.php/downloads/tibetan/essence-of-dispelling-errors-tib/ and as you can see, it displays perfectly. But if you search for any phrase from that page using the search function that is up in my header, it will not find it. If you do a Ctrl+F and search on the page, though it will find it. Even if you copy the text from the page and paste it into the search box it still does not find it. Copy the text into a word editor doesn't reveal any hidden or invisible characters. However, if you search for a term from this page http://publishing.simplebuddhistmonk.net/index.php/downloads/tibetan/beautiful-garland-ten-innermost-jewels-tib/ which I have not added the tags to, you will see that it finds it no problem.
So, that leads me to believe the error is in the search function. Any experience with this? Because search is important but I can quite possibly find alternative earch widgets to replace the one that comes with the theme. What is most important though is if you search for a line of text on Google it needs to be found. My site has not been indexed fully by any search engine so I cannot yet confirm if this does or does not affect them.
So.... At this point I wil take any advice I can get. Any advice regarding the original question (is there a way to tell the style guide "if your are displaying X then treat it like X" ) or any idea about this issue with the search functionality, and how the tag may or may not affect search, both from within the site and also from search engines.

writing right-to-left sentences in ASP.NET labels and text boxes

how can I CORRECTLY display English and non-English (Persian, Farsi language, middle eastern) words in ASP.NET labels or text boxes? it is OK when I type or display only English or only non-English (Farsi) words, but when I type or display a sentence which contains both of them, everything gets out of order, my sentences are misplaced, punctuation symbols are wrongly inserted, in another word it is difficult to understand what is written.
When I'm going to use Office Word for writing Persian documents (which may contain English words), first I set paragraph direction as Right-To-Left, is it possible to do something similar in ASP.NET? of course I set following style in my ASPX files and now my texts boxes start writing from right to left but it does nothing for solving the aforementioned problem!
Style="text-align: right"
how can I solve it? thanks
You need to use the correct value for the dir attribute - in this case, rtl:
dir="rtl"
This needs to be done in the containing element.
There are also CSS properties you can set, as discussed in this document (thanks #ANeves).

How to get plain html from textarea

I want to get data from textbox in plain HTML i.e if i write Hello World then it should return
Hello World
. I dont want to use HtmlEditor can i get plain html using textArea?
http://www.dotnetperls.com/encode-html-string
If you really need the you can always string-replace spaces
You're probably going to need to transform the text manually (with string.Replace() or something similar) to accomplish this. Consider, for example, the "enter" and "space" tags you're looking for. If the user enters this as plain text (such as in a TextArea):
Hello World
Another line
Then that's precisely the value that's in the TextArea. The user didn't enter this:
Hello World<br />Another line
That's an entirely different string value. A WYSIWYG HTML editing control (and there are many for ASP.NET) would do some of the text transforms for you. At least it would probably convert the carriage returns into break tags for you.
But I doubt it would convert every space into a non-breaking space, since that's a very different value than what was entered. You'll likely have to do that yourself. (And be aware that converting all spaces into non-breaking spaces might not render the output like you expect. Look forward to a lot of horizontal scrolling.)
HTML isn't a translation of text into another medium, it's markup that's included in the text. Both of my examples above are perfectly valid HTML. One of them just doesn't include any markup tags.

Resources