Using the Tridion TMS connector, adding comments to the text sent for translation - tridion

I am using the Tridion TMS / WorldServer Connector to send translations to SDL. This is working fine.
However, I also want to send a comment along with the text, such as 'This is a heading and please keep it to 30 characters' or 'Use slang and make this sound cool'. This is not for translation.
Can I add a comments field value (normal Schema field in Tridion) to the XML generated for TMS / World Server and mark it as 'not for translation'?

I don't have first hand experience, but to my understanding (based on a demo and Q&A session) that you can do this by creating the .anl files which describes what xml fields to be translated and not translated.
You need to define a field with an attribute to specify whether it is translatable or not. When you define an element/attribute you need to specify as XLATE="NO" for not to translate.
I don't have a the TMS binaries, but I believe the install has some sample .anl files.
Hope this information helps.

As long as you do not use the aggregation functionality the Tridion Translation Manager will send the full component XML to TMS/World Server, including your non-translatable string.
If you use an older version of Tridion and/or TMS there will be an ANL file on TMS specifying which texts to translate. On newer systems (that are not configurd to run in backwards compatibility mode) the component XML will include ITS markup.
But unfortunately neither Tridion nor TMS (I do not know about World Server) currently support the Localization Note markup rules of ITS, so the translator will not see notes unless they actively go looking for them (which might seem like a reasonable thing to expect them to do... until you look at the number of words they are expected to turn around in a day).
It might be possible to insert a workflow step in TMS or World Server that use a naming convention of your field to extract your comment and attach it to the translatable string. But I do not know if this is even possible - that would be a question for the TMS or World Server experts.

Related

How to reuse reviewed documents in further translations?

Our instruction manual is in markdown format. We improve the manual daily or weekly and then we would like to translate to other languages. Until now, this process is manual and we use TRADOS (a assisted translation software)
We are studying to improve the process. What we want to do is to translate it using Microsoft translator and then a human reviewer do the fine tune. The problem is how we can reuse the corrected document in Microsoft Translator for the next translation.
Many times we only change 2 or 3 words of a topic and then the translator would create a complete new translation, doing the reviewer work very tedious if the translation is not accurate.
I know that we can train the model, but I think that there is not a 100% of probabilities that the translator uses the review. Also, it seems very time consuming to maintain the distortionary after reviewing the document.
I was wondering if somebody has solved this kind a problem.
We can automate the translation of the document using Microsoft Flow. In Microsoft flow we can use the SharePoint service to store the file which we need to translate and then we can create a flow of operation.
Need to create a folder in SharePoint and mention the site address
Mention the folder name and to which language you need to translate it when the file is available in the SharePoint folder
Give the destination folder. Again, it can be a SharePoint folder.
Note: Can't assure that the meaning after translation will be the same as the original statement.
You are looking for a translation memory system (TMS). A TMS will store your human edits to documents and reapply them to future translations of documents where a paragraph is repeated. A TMS will also help your human translator find close matches of a new segment with a previously translated one, highlighting the change the human will have to perform.
Most TMSes integrate machine translation systems like Microsoft Translator and others as a suggestion to the human editor, who can then approve or edit the suggestion.
https://en.wikipedia.org/wiki/Translation_memory
When selecting a TMS here are some features to consider:
Integration with the content management system (CMS) your business uses for storing, maintaining and publishing documents
Collaboration: Multiple people working on a shared set of documents
Workflow management: Initiate a translation job, track the progress of it, and execute payments to the collaborating humans
Ease of use and translator acceptance of the human-facing CAT component of the TMS
Pretranslate with your favorite machine translation system from inside the TMS
Once you have collected a significant amount of domain-specific human translations, you can train a custom translation system directly from the content of your TMS, tuning future machine translations in the direction of the terminology and style that are exemplified by your human translations.
You said you already use Trados. So why don't you create a Translation Memory in Trados, and save your corrected work to that Translation Memory. Then the next time you create your project in Trados, presumably also using the MS Translator plugin that is freely available from the RWS AppStore you will pre-translate using your translation memory AND MS Translator. Any work you translated before and is in your translation memory will be used in preference to the machine translation result.
If you have the professional version of Trados you can also use Perfect Match. If you do this then when you receive your updated document with only a few words to change you match it against the bilingual file from your previous translation. Everything that remains the same receives a Perfect Match status and is optionally locked making it simple to identify what needs to be changed by the translator.

Translate API - different result from the web service

When using the translation API, I get a different translation (and worse) than if I use translate.google.com.
I am working on a project for a client, and the client was dissatisfied with the translation and noticed the difference.
Do these two service use different engines? I read that the API uses nmt-mode now, and that translate.google.com already uses the same engine.
Both set to translate from Norwegian to English.
Any more information that can clear this up?
Thanks!
The result differences between the translate.google.com and the Translation API calls are considered as an expected behavior that can be generated due to maintenance tasks and the logic used by the internal processes; However, the engines used for each service seems to be private information.
Based on this, it is normal to get some variances when using the API. I think you can use the model parameter option as an available workaround in case you want to specify which of the available models to use, as well as take a look on the Specifying a model official documentation to get detail information about this alternative.
It's almost about 3 years later and the problem still remains!
So I was trying to translate a dataset with the Google Translate API, but in the end it failed to translate some texts to the target language (in my case, Persian/Farsi). So I decided to check them to see if there's a pattern and maybe translate them using the web version of Google Translate.
As I was doing so, I figured that the web version actually could translate some of those untranslated texts, BUT not all. When trying to find a reason for such behaviour, I found out that most of them were names and not sentences. But as we know, names can easily be written with the target language characters as the translation. But why the API doesn't transform those names while the web version does? This photo will explain everything perhaps:
verified translation
As can be seen, some translations have a badge indicating that the translation has been verified, while some others don't.
So to recap, my guess is that maybe the API is set to only use verified translations, but as for the web version, even unverified translations are allowed since you can edit or report them.

How to feed Word 2010 (.docx) documents/templates with data from MySQL database?

What would be the best approach to replace placeholders in a .docx document (Word 2010) with data coming from a MySQL database?
Can I just open the file using a server side language and do a string replace per each placeholder?
Is there any existing tool/library available?
Thanks
Disclosure: I work for Invantive.
Using Invantive Composition (http://www.invantive.com/products/invantive-composition) you can fill Word documents (letters, legal pleadings, insurancy policies) with data from a database (IBM DB2, Oracle, MySQL, Teradata and SQL Server) and then fully change the contents at will manually. It is intended for real Microsoft Word end-users (both the guys that make the template and the ones that use it) that access the databases through a central webservice and models with queries. Invantive Composition allows nested repeating groups of data and lay-out. Integrates into Microsoft Word using click once.
In the past, I personally have also been using JasperReports (http://community.jaspersoft.com/project/jasperreports-library) to generate letters using the RTF output target of JasperReports. It is free and works fine as long as you do not want to edit the output more than a few words and have Java/SQL development skills. Just as Invantive Composition it works fine for large numbers of different reports.
As long as you can control the environment completely, you can also consider using RTF as intermediate language (not for end-users, only real developers). Save document as RTF, replace parts of the text you need to be replacable, write a webservice that accepts the parameter and dumps back the resulting RTF. Takes some time to generate more complex tables (tables are obviously something invented by the human race after the RTF specification was written :-) This approach only works with very limited number of templates and when you have sufficient developer time available to get it up and running and stabilized.
As an independent reviewer, I have also seen cases where XML templates were used, but the results were not as good as with JasperReports.
**Disclosure: I lead the docx4j project **
There are heaps of existing tools/libraries available!
Yes, you can just do a string replace, but that is a brittle approach, since Word may have split the string across runs.
You can use MERGEFIELDs, or content control data binding.
docx4j supports all three approaches, but content control data binding is the most powerful.
ContentControlsMergeXML
MERGEFIELDs
VariableReplace
One thing to consider especially is "repeats". If you want say a row of a table in Word, for each matching row in your MySQL table, then you need a way to make this happen.
docx4j does this with a "repeat" content control around the table row; whichever solution you choose, I'd make sure up front that it can handle repeats.
If you want to use PHP the most complete available solution is PHPDocX.
You may check in the tutorial how to substitute placeholder variables by data coming from any data source (like a MySQL DB).
In particular, you may populate table rows with an indefinite number of entries and you may delete whole blocks of the Word document depending on the data fed to the application or build dynamical Word charts.
You may check the available DEMO for a simple but quite illustrative example (its inner workings are explained in the tutorial section).
You can use open Open XML SDK and replace your placeholders like this.
Disclosure: I lead the docxgenjs project
I think you shouldn't have to code everything by yourself, that's why I created a Mustache-like templating engine for docx
Demo:
http://javascript-ninja.fr/docxgenjs/examples/demo.html
Repo
https://github.com/edi9999/docxgenjs
It is JS-based and works client and server side.
Yes, you can use server side language to do it.
Check on apache POI.
http://poi.apache.org
Hello I read the above esp the comments and Ivantive looks impressive - but the solution I needed was much simpler. Use Selection.Range.InsertDatabase in Word to fetch records from an access database or excel spreadsheet or even just another word document. With the access solution you can choose the layout of the records to fetch and have it fetch just particular recordds based on a field (eg ID). Google the words above and it'll take you to MS guidance and an example VB script. Worked well in just a few mins. Now looking for VB script that asks the person what ID they want from the dbase and we're done.
it uses docx templates that have merge fields with java objects (the objects have the information you load from mysql or any other source). The xdoc report is an project for java language, the home page of the project is https://code.google.com/p/xdocreport/.
*Disclosure: I create the templ4docx project *
Hello
You can use templ4docx java library, which is on maven central repository, so you can just add it to your maven dependencies:
<dependency>
<groupId>pl.jsolve</groupId>
<artifactId>templ4docx</artifactId>
<version>2.0.0</version>
</dependency>
Example usage:
Docx docx = new Docx("E:\\template.docx");
Variables variables = new Variables();
variables.addTextVariable(new TextVariable("${firstName}", "John"));
variables.addTextVariable(new TextVariable("${lastName}", "Sky"));
docx.fillTemplate(variables);
docx.save("E:\\filledTemplate.docx");
More details you can find here: http://jsolve.github.io/java/templ4docx/

English dictionary needed for a word game

I'm looking for a way to include a full blown English dictionary in an iPhone app (a word game), the database must be able to include all conjugation possibilities for verbs, must include singular and plural spellings. So my app can query the database to check if the spelling is correct.
Is there a free or commercial database that would include those data?
You can use UITextChecker for spell-checking.
Regarding a dictionary, when I built an iOS dictionary library sometime ago (www.lexicontext.com) I used WordNet. WordNet contains a lot of interesting semantic info ...
NSSpellChecker is your easiest option, but it might be more complete to use the online Scrabble official dictionary as well and check it against both (only one match required.)
You could do a web-service request using http://www.hasbro.com/scrabble/en_US/search.cfm
http://www.a2zwordfinder.com/cgi-bin/scrabble.cgi?Letters=&Pattern=______&MatchType=Exactly&MinLetters=3&SortBy=Alpha&SearchType=Scrabble
Change min letters to get different results
The best place to find a database for a spell-checker is probably a free text processing application. So, I'd try with Open Office version of Word. Download it, install it and simply find the dictionary file.
Open Office is licensed under LGPL, so it should be fine, just check if the licence covers the data as well (i.e. the dictionary file).
Maybe this English corpus helps: http://www.wordfrequency.info/free.asp

Storing content in multiple languages? E.g. English, French, German

How should I store (and present) the text on a website intended for worldwide use, with several languages? The content is mostly in the form of 500+ word articles, although I will need to translate tiny snippets of text on each page too (such as "print this article" or "back to menu").
I know there are several CMS packages that handle multiple languages, but I have to integrate with our existing ASP systems too, so I am ignoring such solutions.
One concern I have is that Google should be able to find the pages, even for foreign users. I am less concerned about issues with processing dates and currencies.
I worry that, left to my own devices, I will invent a way of doing this which work, but eventually lead to disaster! I want to know what professional solutions you have actually used on real projects, not untried ideas! Thanks very much.
I looked at RESX files, but felt they were unsuitable for all but the most trivial translation solutions (I will elaborate if anyone wants to know).
Google will help me with translating the text, but not storing/presenting it.
Has anyone worked on a multi-language project that relied on their own code for presentation?
Any thoughts on serving up content in the following ways, and which is best?
http://www.website.com/text/view.asp?id=12345&lang=fr
http://www.website.com/text/12345/bonjour_mes_amis.htm
http://fr.website.com/text/12345
(these are not real URLs, i was just showing examples)
Firstly put all code for all languages under one domain - it will help your google-rank.
We have a fully multi-lingual system, with localisations stored in a database but cached with the web application.
Wherever we want a localisation to appear we use:
<%$ Resources: LanguageProvider, Path/To/Localisation %>
Then in our web.config:
<globalization resourceProviderFactoryType="FactoryClassName, AssemblyName"/>
FactoryClassName then implements ResourceProviderFactory to provide the actual dynamic functionality. Localisations are stored in the DB with a string key "Path/To/Localisation"
It is important to cache the localised values - you don't want to have lots of DB lookups on each page, and we cache thousands of localised strings with no performance issues.
Use the user's current browser localisation to choose what language to serve up.
You might want to check GNU Gettext project out - at least something to start with.
Edited to add info about projects:
I've worked on several multilingual projects using Gettext technology in different technologies, including C++/MFC and J2EE/JSP, and it worked all fine. However, you need to write/find your own code to display the localized data of course.
If you are using .Net, I would recommend going with one or more resource files (.resx). There is plenty of documentation on this on MSDN.
As with most general programming questions, it depends on your needs.
For static text, I would use RESX files. For me, as .Net programmer, they are easy to use and the .Net Framework has good support for them.
For any dynamic text, I tend to store such information in the database, especially if the site maintainer is going to be a non-developer. In the past I've used two approaches, adding a language column and creating different entries for the different languages or creating a separate table to store the language specific text.
The table for the first approach might look something like this:
Article Id | Language Id | Language Specific Article Text | Created By | Created Date
This works for situations where you can create different entries for a given article and you don't need to keep any data associated with these different entries in sync (such as an Updated timestamp).
The other approach is to have two separate tables, one for non-language specific text (id, created date, created user, updated date, etc) and another table containing the language specific text. So the tables might look something like this:
First Table: Article Id | Created By | Created Date | Updated By | Updated Date
Second Table: Article Id | Language Id | Language Specific Article Text
For me, the question comes down to updating the non-language dependent data. If you are updating that data then I would lean towards the second approach, otherwise I would go with the first approach as I view that as simpler (can't forget the KISS principle).
If you're just worried about the article content being translated, and do not need a fully integrated option, I have used google translation in the past and it works great on a smaller scale.
Wonderful question.
I solved this problem for the website I made (link in my profile) with a homemade Python 3 script that translates the general template on the fly and inserts a specific content page from a language requested (or guessed by Apache from Accept-Language).
It was fun since I got to learn Python and write my own mini-library for creating content pages. One downside was that our hosting didn't have Python 3, but I made my script generate static HTML (the original one was examining User-agent) and then upload it to server. That works so far and making a new language version of the site is now a breeze :)
The biggest downside of this method is that it is time-consuming to write things from scratch. So if you want, drop me line and I'll help you use my script :)
As for the URL format, I use site.com/content/example.fr since this allows Apache to perform language negotiation in case somebody asks for /content/example and has a browser tell that it likes French language. When you do this Apache also adds .html or whatever as a bonus.
So when a request is for example and I have files
example.fr
example.en
example.vi
Apache will automatically proceed with example.vi for a person with Vietnamese-configured browser or example.en for a person with German-configured browser. Pretty useful.

Resources