What are ppearson and spearson in datamash, and how do you use them? - gnu-datamash

When I look at the documentation, there is no "correlation", but there is "ppearson" and "spearson". They are mentioned exactly once, as a "group-by statistical operation." But .. how exactly are they defined?
Also, when I try to use one, there is an error message, but I don't understand how to fix it. How do you use ppearson or spearson?
$ cat > foo.tsv
1^I2
2^I3
$ cat foo.tsv | datamash ppearson 1,2
datamash: operation ‘ppearson’ requires field pairs
EDIT: This documentation section says
GNU Datamash is designed to closely follow R project’s (https://www.r-project.org/) statistical functions. See the files/operators.R file for the R equivalent code for each of datamash’s operators. When building datamash from source code on your local computer, operators are compared to known results of the equivalent R functions.
Looking in R, I don't see an spearson:
> ?spearson
No documentation for ‘spearson’ in specified packages and libraries:
you could try ‘??spearson’

Related

How to combine two Vim commands into one (command not keybinding)

I've found few Stack Overflow questions talking about this, but they are all regarding only the :nmap or :noremap commands.
I want a command, not just a keybinding. Is there any way to accomplish this?
Use-case:
When I run :make, I doesn't saves automatically. So I'd like to combine :make and :w. I'd like to create a command :Compile/:C or :Wmake to achieve this.
The general information about concatenating Ex command via | can be found at :help cmdline-lines.
You can apply this for interactive commands, in mappings, and in custom commands as well.
Note that you only need to use the special <bar> in mappings (to avoid to prematurely conclude the mapping definition and execute the remainder immediately, a frequent beginner's mistake: :nnoremap <F1> :write | echo "This causes an error during Vim startup!"<CR>). For custom commands, you can just write |, but keep in mind which commands see this as their argument themselves.
:help line-continuation will help with overly long command definitions. Moving multiple commands into a separate :help :function can help, too (but note that this subtly changes the error handling).
arguments
If you want to pass custom command-line arguments, you can add -nargs=* to your :command definition and then specify the insertion point on the right-hand side via <args>. For example, to allow commands to your :write command, you could use
:command -nargs=* C w <args> | silent make | redraw!
You can combine commands with |, see help for :bar:
command! C update | silent make | redraw!
However, there is a cleaner way to achieve what you want.
Just enable the 'autowrite' option to automatically write
modified files before a :make:
'autowrite' 'aw' 'noautowrite' 'noaw'
'autowrite' 'aw' boolean (default off)
global
Write the contents of the file, if it has been modified, on each
:next, :rewind, :last, :first, :previous, :stop, :suspend, :tag, :!,
:make, CTRL-] and CTRL-^ command; and when a :buffer, CTRL-O, CTRL-I,
'{A-Z0-9}, or `{A-Z0-9} command takes one to another file.
Note that for some commands the 'autowrite' option is not used, see
'autowriteall' for that.
This option is mentioned in the help for :make.
I have found a solution after a bit of trial and error.
Solution for my usecase
command C w <bar> silent make <bar> redraw!
This is for compiling using make and it prints output only if there is nonzero output.
General solution
command COMMAND_NAME COMMAND_TO_RUN
Where COMMAND_TO_RUN can be constructed using more than one command using the following construct.
COMMAND_1_THAN_2 = COMMAND_1 <bar> COMMAND_2
You can use this multiple times and It is very similar to pipes in shell.

Is there a list of BCP 47 language codes in R?

I'm running the fantastic pandoc from within an R package, relying on the LaTeX babel package for some typesetting niceties.
Pandoc expects a lang argument as a BCP 47 code (e.g. en-US), but babel expects its own language codes (e.g. american).
Pandoc, being as awesome as it is, maps between the two in this haskell script.
In the spirit of defensive programming, I'd like to warn my users when they're using a wrong language code, and give them a definitive list of such acceptable BCP 47 codes.
Does such a list (or vector, or whatever) exist somewhere in R or a package for convenient use?
I'm trying to avoid manually typing up the pandoc haskell script.
I needed to provide a convenient selectize input, so I had to have the available options ready in R and ended up hand-copying them (yikes).
In case anyone finds this useful, here are:
- language codes in short (lang_short),
- variant or locale (var_short),
- a longer version of the language (helpful for input) lang_long (possibly non-standard!),
- a longer version of the variant or locale (helpful for input) var_long (probably non-standard!),
- logical values for polyglossia and babel, indicating whether pandoc maps to one or both of these (might come in handy if you need to rely on only one of these LaTeX packages.
Remember that pandoc expects languages of the form en_US etc., so you need to paste column 1 and 2.
Remember that these are not all languages and variants under the BCP 47 standard; it's just the (small) subset mapped by pandoc.
(If anyone comes across a more definitive list of language codes in R, that would be great).
lang_short;var_short;lang_long;var_long;polyglossia;babel
ar;DZ;arabic;Algeria;TRUE;FALSE
ar;IQ;arabic;Iraq;TRUE;FALSE
ar;JO;arabic;Jordan;TRUE;FALSE
ar;LB;arabic;Lebanon;TRUE;FALSE
ar;LY;arabic;Libya;TRUE;FALSE
ar;MA;arabic;Morocco;TRUE;FALSE
ar;MR;arabic;Mauritania;TRUE;FALSE
ar;PS;arabic;Palestinian Territory;TRUE;FALSE
ar;SY;arabic;Syria;TRUE;FALSE
ar;TN;arabic;Tunisia;TRUE;FALSE
de;DE;german;;TRUE;TRUE
de;AT;german;Austria;TRUE;TRUE
de;CH;german;Switzerland;TRUE;TRUE
dsb;;lower sorbian;;TRUE;FALSE
hsb;;upper sorbian;;FALSE;TRUE
el;polyton;greek;polytonic;TRUE;TRUE
en;AU;english;Australia;TRUE;TRUE
en;CA;english;Canada;TRUE;TRUE
en;GB;english;Great Britain;TRUE;TRUE
en;NZ;english;New Zealand;TRUE;TRUE
en;UK;english;United Kingdom;TRUE;TRUE
en;US;english;United States;TRUE;TRUE
grc;ancient;greek;ancient;TRUE;TRUE
la;;latin;;TRUE;TRUE
sl;;slovenian;;TRUE;TRUE
fr;CA;french;Canada;FALSE;TRUE
pt;BR;portoguese;Brazil;TRUE;TRUE
sr;;serbian;;TRUE;TRUE
af;;afrikaans;;TRUE;TRUE
am;;amharic;;TRUE;TRUE
ar;;arabic;;TRUE;TRUE
as;;assamese;;TRUE;TRUE
ast;;asturian;;TRUE;TRUE
bg;;bulgarian;;TRUE;TRUE
bn;;bengali;;TRUE;TRUE
bo;;tibetan;;TRUE;TRUE
br;;breton;;TRUE;TRUE
ca;;catalan;;TRUE;TRUE
cy;;welsh;;TRUE;TRUE
cs;;czech;;TRUE;TRUE
cop;;coptic;;TRUE;TRUE
da;;danish;;TRUE;TRUE
dv;;divehi;;TRUE;TRUE
el;;greek;;TRUE;TRUE
en;;english;;TRUE;TRUE
eo;;esperanto;;TRUE;TRUE
es;;spanish;;TRUE;TRUE
et;;estonian;;TRUE;TRUE
eu;;basque;;TRUE;TRUE
fa;;farsi;;TRUE;TRUE
fr;;french;;TRUE;TRUE
fur;;friulan;;TRUE;TRUE
ga;;irish;;TRUE;TRUE
gd;;scottish;;TRUE;TRUE
gez;;ethiopic;;TRUE;TRUE
gl;;galician;;TRUE;TRUE
he;;hebrew;;TRUE;TRUE
hi;;hindi;;TRUE;TRUE
hr;;croatian;;TRUE;TRUE
hu;;magyar;;TRUE;TRUE
hy;;armenian;;TRUE;TRUE
ia;;interlingua;;TRUE;TRUE
id;;indonesian;;TRUE;TRUE
is;;icelandic;;TRUE;TRUE
it;;italian;;TRUE;TRUE
km;;khmer;;TRUE;TRUE
kmr;;kurmanji;;TRUE;TRUE
kn;;kannada;;TRUE;TRUE
ko;;korean;;TRUE;TRUE
lo;;lao;;TRUE;TRUE
lt;;lithuanian;;TRUE;TRUE
lv;;latvian;;TRUE;TRUE
ml;;malayalam;;TRUE;TRUE
mn;;mongolian;;TRUE;TRUE
mr;;marathi;;TRUE;TRUE
nb;;norsk;;TRUE;TRUE
nl;;dutch;;TRUE;TRUE
nn;;nynorsk;;TRUE;TRUE
no;;norsk;;TRUE;TRUE
nqo;;nko;;TRUE;TRUE
oc;;occitan;;TRUE;TRUE
pa;;panjabi;;TRUE;TRUE
pms;;piedmontese;;TRUE;TRUE
pt;;portoguese;;TRUE;TRUE
rm;;romanian;;TRUE;TRUE
ro;;russian;;TRUE;TRUE
sa;;sanskrit;;TRUE;TRUE
se;;samin;;TRUE;TRUE
sk;;slovak;;TRUE;TRUE
sq;;albanian;;TRUE;TRUE
sr;;serbian;;TRUE;TRUE
syr;;syriac;;TRUE;TRUE
ta;;tamil;;TRUE;TRUE
te;;telugu;;TRUE;TRUE
th;;thai;;TRUE;TRUE
ti;;ethiopic;;TRUE;TRUE
tk;;turkmen;;TRUE;TRUE
tr;;turkish;;TRUE;TRUE
uk;;ukrainian;;TRUE;TRUE
ur;;urdu;;TRUE;TRUE
vi;;vietnamese;;TRUE;TRUE
Since the R script doesn't have access to the Haskell code (which is run in its own process), that won't be possible. However, pandoc >2.0 emits warnings to STDERR, in this case:
$ echo "foo" | pandoc -M lang=asdf -t latex
[WARNING] Invalid 'lang' value 'asdf'.
Use an IETF language tag like 'en-US'.
There should be a way to catch this from R.

differences between .sage and .spyx in numerical evaluation

the question seems very basic, i'm sorry but i could not find an answer in the documentation.
the content of both files test.sage and test.spyx is identical; it's just
a = 1/sqrt(2)
print a
if i run test.sage with
$ sage test.sage
i get
1/2*sqrt(2)
but the outcome is different from if i run the file test.spyx with
$ sage test.spyx
where i get
Compiling test.spyx...
0.707106781187
how can i prevent sage from numerically evaluating 1/\sqrt(2) in .spyx mode?
(i asked the question on ask.sagemath.org as well but got no answer yet...)

Visualization library for disjoint intervals

I want to visualize memory mapping states of processes. For this I parsed the output of
# strace -s 256 -v -k -f -e trace=memory,process command
and now I have a time series of disjoint sums of intervals on the real line. Is there a convenient visualization library for such data? Haskell interface would be the most time-saving for me, but any suggestion is welcome. Thanks!
Just in case this might be useful for anyone, I hacked up a little tool to do this. (By the way I ended up using R/Shiny for interactive visualization.)
Here's the github repo.
It's interactive in that if you click a region, the stack traces responsible for the memory mapping
will be shown like this:
trace:
22695 mmap(NULL, 251658240, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x2b4210000000
/lib/x86_64-linux-gnu/libc-2.19.so(mmap64+0xa) [0xf487a]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN2os17pd_reserve_memoryEmPcm+0x31) [0x91e9c1]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN2os14reserve_memoryEmPcm+0x20) [0x91ced0]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN13ReservedSpace10initializeEmmbPcmb+0x256) [0xac20a6]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN17ReservedCodeSpaceC1Emmb+0x2c) [0xac270c]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN8CodeHeap7reserveEmmm+0xa5) [0x61a3c5]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN9CodeCache10initializeEv+0x80) [0x47ff50]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_Z12init_globalsv+0x45) [0x63c905]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN7Threads9create_vmEP14JavaVMInitArgsPb+0x23e) [0xa719be]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(JNI_CreateJavaVM+0x74) [0x6d11c4]
/usr/lib/jvm/java-8-oracle/lib/amd64/jli/libjli.so(JavaMain+0x9e) [0x745e]
/lib/x86_64-linux-gnu/libpthread-2.19.so(start_thread+0xc4) [0x8184]
/lib/x86_64-linux-gnu/libc-2.19.so(clone+0x6d) [0xfa37d]
The same colors correspond to the same flags for mmap/msync/madvise etc.
Synopsis
$ make show-prerequisites
# (Follow the instructions)
$ make COMMAND="time ls"
...
DATA_DIR=build/data-2016-12-12_02h38m13s
Listening on http://127.0.0.1:5000
....
$ firefox http://127.0.0.1:5000
$ # Re-browse the previous results
$ make DATA_DIR=build/data-2016-12-12_02h38m13s
In the process of development I realized the striking geometricity of the problem.
So I created a module called Sheaf and described there a recipe for defining a Grothendieck
topology and a constant sheaf on it. It now seems the Grothendieck (or Lawvere-Tierney) topologies
are actually ubiquitous for programming.. but I'm not sure if it will prove anything worthy.
So feel free to check it!

R grep with 'AND' logic

I'm working with RJDBC on a server whose maintainers frequently update jar versions. Since RJDBC requires classpaths, this poses a problem when paths break. My situation is fortuitous in that the most current jars will always be in the same directory, but the version numbers will have changed.
I'm trying to use a simple grep function in R to isolate which jar I need based on a regex with some AND logic, however R makes this surprisingly difficult...
This question demonstrates how grep in R can function with the | operator for OR logic, but I can't seem to find similar AND logic operator.
Here's an example:
## Let's say I have three jars in a directory
jars <- list.files('/the/dir')
> jars
[1] "hive-jdbc-1.1.0-cdh5.4.3-standalone.jar" "hive-jdbc-1.1.0-cdh5.4.3.jar" "jython-standalone-2.5.3.jar"
The jar I want is "hive-jdbc-1.1.0-cdh5.4.3-standalone.jar"—how can I use AND logic in grep to extract it?
## I know that OR logic is supported:
j <- jars[grep('hive-jdbc|standalone', jars)]
> j
[1] "hive-jdbc-1.1.0-cdh5.4.3-standalone.jar" "hive-jdbc-1.1.0-cdh5.4.3.jar" "jython-standalone-2.5.3.jar"
## Would AND logic look like the same format?
> jars[grep('hive-jdbc&standalone', jars)]
character(0)
Not all-that-surprisingly, that last piece doesn't work... I found a useful, yet non-comprehensive, link for grep in R, but it doesn't show an AND operator. Any thoughts?
You could try
grep('hive-jdbc.*standalone', jars) # 'hive-jdbc' followed by 'standalone'
or
grepl('hive-jdbc', jars) & grepl('standalone', jars) # 'hive-jdbc' AND 'standalone'

Resources