Japanese to english translation - julia

I am trying to build power network model for Japan using OpenStreetMap data and some open source projects. When I filtered the power data I realized that most of the tag information (words such as power station, power lines etc.) is in Japanese (Japanese characters).
I am wondering whether if anyone knows of any translation packages that I can use to convert tags to English.
Does anyone know if there are any packages in Julia for translation purpose (Japanese -> English).

For map data you mostly need English name spelling rather than translation. Hence my first try would be TextUnidecode
using TextUnidecode
julia> unidecode("尾垂山")
"Wei Chui Shan"
Please also note that OSM records for popular places always have English translations (tagged with name:en) - and this is perhaps something you want to use - see the example below:
<node id="4165900342" lat="33.2750587" lon="134.1751027" version="2" ... >
<tag k="ele" v="242"/>
<tag k="name" v="尾垂山"/>
<tag k="name:en" v="Mt. Otaru"/>
<tag k="name:ja" v="尾垂山"/>
<tag k="name:ja-Hira" v="おたるやま"/>
<tag k="natural" v="peak"/>
<tag k="source" v="GSImaps/std"/>
</node>
If those approaches do not match your needs, you can just use a Python library via PyCall.jl or call a service such as AWS Translate which is directly supported via AWS.jl library.

Related

Meaning of Monit status codes

I need a way to find out what every monit xml status code means. I have some xml output which is in this format:
https://gist.github.com/plasticbrain/54ceaf101168d20f9a90
Or in case the link doesn't work, this:
<services>
<service name="system">
<type>5</type>
<collected_sec>1414691061</collected_sec>
<collected_usec>254769</collected_usec>
<status>0</status>
<status_hint>0</status_hint>
<monitor>1</monitor>
<monitormode>0</monitormode>
<pendingaction>0</pendingaction>
</service>
Within this code block we can see:
<status> 0 </status>
I've googled a lot to find a complete list of all possible codes and their meanings but I've been unable to find anything so far. The monit documentation does not appear to mention it at all either.

Improve performance of query with range indexes in eXist-db

Reading the docs http://exist-db.org/exist/apps/doc/indexing.xml
I'm finding difficult to understand how and if I can improve the performances of a 'read' query (with 2 parameters: a string and an integer).
Do eXist-db have a default structural index? Can I improve a 2 params query with a 'range index'?
More details about my XML db (note there are 2 different dbs simply merged on the same root):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<db>
<docs>
<doc>
<header>
<year>2001</year>
<number>1</number>
<type>O</type>
</header>
<metas>
<meta>
<number>26001</number>
<details>
<detail>
<description>legge</description>
<number>19</number>
<date>14/01/1994</date>
</detail>
<detail>
<description>decreto legge</description>
<number>453</number>
<date>15/11/1993</date>
</detail>
</details>
</meta>
</metas>
</doc>
<doc>
<header>
<year>2001</year>
<number>2</number>
<type>O</type>
</header>
<metas>
<meta>
<number>26002</number>
<details>
<detail>
<description>decreto legislativo</description>
<number>29</number>
<date>03/02/1993</date>
</detail>
</details>
</meta>
<meta>
<number>26016</number>
<details>
<detail>
<description>decreto legislativo</description>
<number>29</number>
<date>03/02/1993</date>
</detail>
</details>
</meta>
</metas>
</doc>
</docs>
<full_text_docs>
<doc>
<header>
<year>2001</year>
<number>1</number>
<type>O</type>
<president>ferrari</president>
</header>
<text>lorem ipsum ...
</text>
</doc>
<doc>
<header>
<year>2001</year>
<number>2</number>
<type>O</type>
<president>ferrari</president>
</header>
<text>lorem ipsum......
</text>
</doc>
</full_text_docs>
</db>
This is my xquery
xquery version "3.0";
let $doc := doc("/db//index_test/test_general.xml")//db/docs/doc
let $fulltxt := doc("/db//index_test/test_general.xml")//db/full_text_docs/doc
return <root> {
for $a in $doc[metas/meta/details/detail[date="03/02/1993" and number = "29"]]/header
return $fulltxt[header/year/text()=$a/year/text() and
header/number/text()=$a/number/text() and
header/type/text()=$a/type/text()
]
} </root>
Basically I simply find for the detail/number and detail/date that matches the input in the first db and take the results for querying the second db. The results are all the <full_text_header> documents that matches.
I would to know if I can create indexes for the fields number and date to improve performance. Note this is the ONLY query I need to optimize (the only I do on this db) obviously number and date changes :).
SOLUTION:
For a clear explanation read the joewiz answer. My problem was the correct recognition of the .xconf file. It have to be placed in /db/yourcollectiondir. If you're using eXide when you create the file you should select Xml type with template "eXist-db collection configuration". When you try to save the file you will see a prompt "Apply configuration?" then click 'ok'. Just then run this xquery xmldb:reindex('/db/yourcollectiondir').
Now if all it's right when you run an xquery involving an index you will see the usage in "Monitoring and profiling".
As that documentation page states, eXist does create a structural index for all XML stored in the database. This is not an index of values, though, so without further indexes, queries based on value (rather than structure) would involve a lookup of values in the DOM. As your data grows larger, looking up values in the DOM gets slower and slower. This is where value-based indexes, such a range index, saves the day. (For a fuller explanation, see the "Indexing" section of Wolfgang Meier's "Tuning the Database" article, which is essential for getting the most performance out of eXist.)
So, yes, you can create indexes for the <number> and <date> fields. I'd recommend the "new range" index, as described on that documentation page. Your collection.xconf file setting up these indexes would look like this:
<collection xmlns="http://exist-db.org/collection-config/1.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<index>
<range>
<create qname="number" type="xs:integer"/>
<create qname="date" type="xs:string"/>
</range>
</index>
</collection>
You have to store this within the /db/system/config/ collection, in a subcollection corresponding to the location of your data in the database. So if your data is located in /db/apps/myapp/data, you would place this collection.xconf file in /db/system/config/db/apps/myapp/data.
Note that the configuration here would only affect the for clause's queries of date and number values, and not the predicates in the return clause, which depend on the values of <year> and <type> elements. So, to ensure your query maximized the use of indexes, you should declare indexes on these; it seems that xs:integer would be the appropriate type for each.
Lastly, I would suggest eliminating the /text() steps, which are completely extraneous. For more on the use/abuse of text(), see Evan Lenz's article, "text() is a code smell".
Update (2016-07-17): With the updated code sample above, I have a couple of additional suggestions. First, since the code is in /db/index_test, we will store our files as follows:
Assuming you're using eXide, when you store the collection.xconf file in a collection, eXide will prompt you to have a copy of the file placed in the correct location in /db/system/config. If you're not using eXide, you need to store the collection.xconf file there yourself.
Using the unmodified query, I can confirm that despite the presence of the collection.xconf file, monex shows no indexes are being applied:
Let's make a few modifications to the file to ensure indexes are properly applied:
xquery version "3.0";
<root> {
for $a in doc("/db/index_test/test_general.xml")//detail[date = "03/02/1993" and number = 29]/ancestor::doc/header
return
doc("/db/index_test/test_general.xml")/db/full_text_docs/doc
[
header/year = $a/year and
header/number = $a/number and
header/type = $a/type
]
} </root>
With these modifications, monex shows that indexes are applied to the comparisons in the for clause:
The insights here are derived from the "Tuning the Database" article. To get full indexing for all comparisons, you will need to define additional indexes and may need to make similar modifications to your query.
One final note: the version of monex you see in these pictures is using a feature I added this weekend, called "Tare", which tries to filter out other operations from the query profiling results in order to help the user see just the effects of their own query. This feature is still just a pull request, so running the current release version, you won't see identical results.

Custom dictionary is not working in endeca

I am trying to add a custom dictionary in stemming but found no luck.
Steps I tried:
1) I have added the following lines in /config/script/DataIngest.xml:
<dgidx id="Dgidx" host-id="ITLHost">
<args>
.....
<arg>--stemming-updates</arg>
<arg>C:/Endeca/Apps/CRS/config/script/stemmingExtension.en.xml</arg>
</args>
</dgidx>
And added following lines in stemmingExtension.en.xml:
<word_forms_collection_updates>
<WORD_FORMS>
<WORD_FORM>shuts</WORD_FORM>
<WORD_FORM>shirts</WORD_FORM>
</WORD_FORMS>
</word_forms_collection_updates>
Ran a baseline update and then tried to search for "shuts" and expected to get "shirts" results, but not.
What's the correct way of setting up custom dictionary words in stemming?
Thanks in advance for your help.
Basavaraj
What version of the etl salience component are you using? I remember of a similar bug in oeid 3.0 bundle, and unluckily the answer is that the component used in clover etl doesn't call the appropriate method from java's api to get the stemmed word. You can build a mockup, directly calling java api's, to see the different methods used
For Endeca 3.1.2 version, try adding it to /MDEX/<version>/conf/stemming/en_word_forms_collection.xml (for English)
Example:
<WORD_FORMS_COLLECTION>
...
<WORD_FORMS>
<WORD_FORM>shuts</WORD_FORM>
<WORD_FORM>shirts</WORD_FORM>
</WORD_FORMS>
<WORD_FORMS_COLLECTION>

How to create/read a node bilingualy

I need to create nodes with bilingual properties, and use all of these nodes as (List constraints) where node property will be label and another property will be value.
So, is this doable ? & how ?
I see sys:localized, and its mean
Localization:
If you add this aspect to a node, then the server will assume that all non-multilingual
properties apply to this locale.
can this help me !
Thanks
Mohammed Amr
Senior System Developer
In order to handle Multi Lingual documents from your code you are supposed to use the MultilingualContentService to:
add a translation
retrieve the available translations
etc.
Multilingual documents have the cm:mlDocument aspect applied. This enables them to be listed as children of the special cm:mlContainer that's created under /cm:multilingualRoot to track translations of a single document. The cm:mlContainer is defined as follows:
<type name="cm:mlContainer">
<title>Multilingual Container</title>
<parent>sys:container</parent>
<associations>
<child-association name="cm:mlChild">
<source>
<mandatory>false</mandatory>
<many>false</many>
</source>
<target>
<class>cm:mlDocument</class>
<mandatory>true</mandatory>
<many>true</many>
</target>
</child-association>
</associations>
<mandatory-aspects>
<aspect>cm:versionable</aspect>
<aspect>cm:author</aspect>
<aspect>sys:localized</aspect>
</mandatory-aspects>
</type>
There are different kinds of localization options in Alfresco:
The MultilingualContentService (with the cm:mlDocument aspect) allows you to store translated content on a single node. You can use this if you have a document that is translated into multiple languages. There is no support for this in Share but you can use it via the Alfresco Explorer or the API.
There are also multilingual text properties. You can use the datatype d:mlText in your content model and to store property values (strings only) based on the user language. The build in properties cm:title and cm:description are of type d:mlText. The usage in Share is a bit tricky though - Alfresco uses the browser language to automatically choose the locale, so users with different browser languages will see different values.
As far as I understand your question, I think what you need are the ml:properties. The Share UI only supports them indirectly, maybe that's ok for you. As for the the List constraints I have not seen any multi language support there, so you probably have to extend those yourself.

dynamic connectors in Visio .vdx files

Currently I am trying to understand .vdx files, because in the future I want to generate my own. I'm having problems with dynamic connectors. When defining them as follows:
<Shape ID="46" Type="Shape" Master="10">
<Geom IX="0">
<MoveTo IX='1'></MoveTo><LineTo IX='23'></LineTo></Geom>
</Shape>
....
<Connect FromSheet="45" FromCell="BeginX" FromPart="9" ToSheet="1" ToCell="PinX" ToPart="3" />
<Connect FromSheet="45" FromCell="EndX" FromPart="12" ToSheet="23" ToCell="PinX" ToPart="3" />
they are not displayed. After moving a node, the connectors are displayed. What am I missing?
When taking the minimal settings from a Visio generated .vdx file, there are lots of coordinates, which I want to avoid:
<Shape ID="47" Type="Shape" Master="10">
<XForm>
<PinX F="Inh">1.669258233656828</PinX>
<PinY F="Inh">7.519214852067909</PinY>
</XForm>
<XForm1D>
<BeginX F="_WALKGLUE(BegTrigger,EndTrigger,WalkPreference)">1.737275462308963</BeginX>
<BeginY F="_WALKGLUE(BegTrigger,EndTrigger,WalkPreference)">7.671541057367827</BeginY>
<EndX F="_WALKGLUE(EndTrigger,BegTrigger,WalkPreference)">1.601241005004693</EndX>
<EndY F="_WALKGLUE(EndTrigger,BegTrigger,WalkPreference)">7.366888646767992</EndY>
</XForm1D>
<Geom IX="0">
<LineTo IX="2"><X>-0.1664424255025283</X><Y>-0.3046524105998358</Y></LineTo>
</Geom>
</Shape>
What is the best and easiest way to work with dynamic connectors in .vdx files?
EDIT: With Visio 2010 it is much better, and the connectors are shown most of the time. So it really looks like a Visio bug...
According to Connect documentation:
In untrusted XML files, when Visio opens the file, it uses the Connect elements to set glue formulas for shapes, similar to the GlueTo method in Automation. However, geometry will not be updated, so connectors may need to be manually rerouted.
With Visio 2010, the connectors are displayed, so it looks like it was a bug in an earlier version of Visio.

Resources