Word boundaries standard - standards

Some time ago I found an ISO-standard(I think) which described boundaries to use in order to determine a word in a text based on different languages.
Is this something that I have made up in my dreams? Or can you help me find it? I've tried google but I didnt find anything.
Thanks,
BJ

Found what I was looking for. Sorry, no ISO-standard but an Unicode Standard Annex:
http://www.unicode.org/reports/tr29/tr29-25.html

Related

What does the !! operator / notation stand for in R?

I was reading a question in SO and came up with the !! operator. Ive been working with R for some time and never have seen it. First, Ive search for questions about it, and couldnt find one in SO (so this may be a duplicate). Also, go back to diferent R operators post and pages, and no one says anything of it.
In the question, the !! preceded an R oject, like:
> !!object
Thanks for the help.
P.D.: If its a duplicate, please close.

White squares when using plotmath expressions

Whenever I try to use symbols in a plotmath expression in R, I get white squares. For example, when I run demo(plotmath), I get the following.
Does anyone have an idea where the problem may lie? I am using R 3.4.1 in Rstudio on Mac OS X 10.11.6.
Update:
As mentioned in the comments, it seems to be an issue with my fonts.
When I look at Symbol, I have two "Symbol Regular"s, and the second one appears as question marks when viewing both together. However, when I click on the second one individually, the fonts appear normally. I tried to validate fonts and remove duplicates, but Font Book did not detect any problems. What should I do?
If you go to Fontbook.app and examine the Symbol font, is it perhaps duplicated or can you see any other evidence of corruption? – 42- 25 mins ago
#42- Thanks very much, it does seem to be an issue with the Symbol font. I listed what I see above; do you know what I should do with the font to fix it? – angryavian 9 mins ago
Delete it. It will get replaced from some magic Apple storeroom buried deep in the bowels of the System.
I don't know how this happens, but it used to happen to me fairly often. Doesn't seem to be happening lately. I remain puzzled. I'm guessing there may be answers at Ask Different (but I didn't find an answer.) Whatever the mechanism it's been around for a long, long time:
http://hints.macworld.com/article.php?story=20031025010930633

Amount value regular expression

I am trying to create a regular expression for a dollar amount that accepts values between 5.00 and 1000.00.
Here is what I have so far:
^([5-9](\d){0,4}([.](\d){1,2})?|1000([.](0){1,2})?)?$
I have already tried the range validator and it isn't working this field.
Any help is much appreciated.
This is what I came up with, which could likely be improved. It seems to work for my limited testing. You may want to tag your question with "regex" to get some expert advice!
^(?:[5-9](?:\.\d{0,2})?|\d{2,3}(?:\.\d{0,2})?|1000(?:\.0{0,2})?)$

Genbank query (package seqinr): searching in sequence description

I am using the function query() of package seqinr to download myoglobin DNA sequences from Genbank. E.g.:
query("myoglobins","K=myoglobin AND SP=Turdus merula")
Unfortunately, for a lot of the species I'm looking for I don't get any sequence at all (or for this species, only a very short one), even though I find sequences when I search manually on the website. This is because of searching for "myoglobin" in the keywords only, while often there isn't any entry in there. Often the protein type is only specified in the name ("definition" on Genbank) -- but I have no idea how to search for this.
The help page on query() doesn't seem to offer any option for this in the details, a "generic search" without any "K=" doesn't work, and I haven't found anything via googling.
I'd be happy about any links, explanations and help. Thank you! :)
There is a complete manual for the seqinr package which describes the query language more in depth in chapter 5 (available at http://seqinr.r-forge.r-project.org/seqinr_2_0-1.pdf). I was trying to do a similar query and the description for many of the genes/cds is blank so they don't come up when searching using the k= option. One alternative would be to search for the organism alone, then match gene names in the individual annotations and pull out the accession numbers, which you could then use to re-query the database for your sequences.
This would pull out the annotation for the first gene:
choosebank("emblTP")
query("ACexample", "sp=Turdus merula")
getName(ACexample$req[[1]])
annotations <- getAnnot(ACexample$req[[1]])
cat(annotations, sep = "\n")
I think that this would be a pretty time consuming way to tackle the problem but there doesn't seem to be an efficient way of searching the annotations directly. I'd be interested in any solutions you might come up with.

Computer readable list of human names / phone directory?

I'm trying to compile a decent .zwl file for squiggly spell checking in Flex; using British words, not American as supplied by default.
Ive managed to create a decent British list of words and ran them through the AdobeSpellingGen app to get a .zwl; great stuff.
However i need to add into this list a list of names, so they wont be flagged.
Does anyone know of a good source of either free, or paid for list of English Fore and surnames? Im trying BT as i type :)
Thanks, any help with this would be greatly appreciated.
There are lots of baby names sites out there. This might be a good one http://www.listofbabynames.org/a_boys.htm as it would be fairly easy to copy.
I'll keep looking
You can screen scrape http://www.britishsurnames.co.uk/browse for a list of surnames. I'm not sure where you'd find first names though.
gnu aspell has spell checking for common names. You can try it out here:
http://chxo.com/scripts/spellcheck.php?showsource=1
source is here: http://aspell.net/
i'm not too familiar with it though so i couldn't tell you how to extract the dictionaries.
The US Census site has a list of >150k first names and surnames from the 1990 and 2000 censuses, at
http://www.census.gov/genealogy/www/
Of course, these aren't UK names, but might do if you can't find anything better.

Resources