Vectorization of different dimension - julia

I want to get all permutations of 1:3 as strings.
julia> string.(permutations(1:3)...) # 1:3 is for the example, the real problem is larger
3-element Vector{String}:
"112233"
"231312"
"323121"
However the result is "transposed" as I want
6-element Vector{String}:
"123"
"132"
"213"
"231"
"312"
"321"
This vector will be the input of some other (vectorized) function call F.(perms) and I want to do this efficiently.
How should I do this?

Just do:
julia> join.(permutations(1:3),"")
6-element Vector{String}:
"123"
"132"
"213"
"231"
"312"
"321"

Related

Add a character to each vector in a list of vectors in R

I have a list of character vectors:
vector.list <- list("cytidine", "uridine", "dihydrouridine", "2'-O-methylcytidine")
I want to add a the character "30" to each vector in this list, resulting in:
vector.list
[[1]]
[1] "30" "cytidine"
[[2]]
[1] "30" "uridine"
[[3]]
[1] "30" "dihydrouridine"
[[4]]
[1] "30" "2'-O-methylcytidine
I know how to do this with a "for loop" or by writing a function and using lapply. For example:
for (i in 1:length(vector.list)){
vector.list[[i]] <- c("30", vector.list[[i]])
}
However, I need to do this for many combinations of vector lists and characters and I want this code to be accessible to other people. I was wondering if there was a more elegant command for adding a character to each vector in a list of vectors in R?
lapply(vector.list, append, "30", after=0)
lapply(vector.list,
function(x) c("30", x))

How to pass the length of each element in an R vector to the substr function?

I have the following vector.
v <- c('X100kmph','X95kmph', 'X90kmph', 'X85kmph', 'X80kmph',
'X75kmph','X70kmph','X65kmph','X60kmph','X55kmph','X50kmph',
'X45kmph','X40kmph','X35kmph','X30kmph','X25kmph','X20kmph',
'X15kmph','X10kmph')
I want to extract the digits representing speed. They all start at the 2nd position, but end at different places, so I need (length of element i) - 4 as the ending position.
The following doesn't work as length(v) returns the length of the vector and not of each element.
vnum <- substr(v, 2, length(v)-4)
Tried lengths() as well, but doesn't work.
How can I supply the length of each element to substr?
Context:
v actually represents a character column (called Speed) in a tibble which I'm trying to mutate into the corresponding numeric column.
mytibble <- mytibble %>%
mutate(Speed = as.numeric(substr(Speed, 2, length(Speed) - 4)))
Using nchar() instead of length() as suggested by tmfmnk does the trick!
vnum <- substr(v, 2, nchar(v)-4)
If you just want to extract the digits, then here is another option
vnum <- gsub("\\D","",v)
such that
> vnum
[1] "100" "95" "90" "85" "80" "75" "70" "65" "60" "55"
[11] "50" "45" "40" "35" "30" "25" "20" "15" "10"

Regular expressions, extract specific parts of pattern

I haven't worked with regular expressions for quite some time, so I'm not sure if what I want to do can be done "directly" or if I have to work around.
My expressions look like the following two:
crb_gdp_g_100000_16_16_ftv_all.txt
crt_r_g_25000_20_40_flin_g_2.txt
Only the parts replaced by a asterisk are "varying", the other stuff is constant (or irrelevant, as in the case of the last part (after "f*_"):
cr*_*_g_*_*_*_f*_
Is there a straightfoward way to get only the values of the asterisk-parts? E.g. in case of "r" or "gdp" I have to include underscores, otherwise I get the r at the beginning of the expression. Including the underscores gives "r" or "gdp", but I only want "r" or "gdp".
Or in short: I know a lot about my expressions but I only want to extract the varying parts. (How) Can I do that?
You can use sub with captures and then strsplit to get a list of the separated elements:
str <- c("crb_gdp_g_100000_16_16_ftv_all.txt", "crt_r_g_25000_20_40_flin_g_2.txt")
strsplit(sub("cr([[:alnum:]]+)_([[:alnum:]]+)_g_([[:alnum:]]+)_([[:alnum:]]+)_([[:alnum:]]+)_f([[:alnum:]]+)_.+", "\\1.\\2.\\3.\\4.\\5.\\6", str), "\\.")
#[[1]]
#[1] "b" "gdp" "100000" "16" "16" "tv"
#[[2]]
#[1] "t" "r" "25000" "20" "40" "lin"
Note: I replaced \\w with [[:alnum:]] to avoid inclusion of the underscore.
We can also use regmatches and regexec to extract these values like this:
regmatches(str, regexec("^cr([^_]+)_([^_]+)_g_([^_]+)_([^_]+)_([^_]+)_f([^_]+)_.*$", str))
[[1]]
[1] "crb_gdp_g_100000_16_16_ftv_all.txt" "b"
[3] "gdp" "100000"
[5] "16" "16"
[7] "tv"
[[2]]
[1] "crt_r_g_25000_20_40_flin_g_2.txt" "t" "r"
[4] "25000" "20" "40"
[7] "lin"
Note that the first element in each vector is the full string, so to drop that, we can use lapply and "["
lapply(regmatches(str,
regexec("^cr([^_]+)_([^_]+)_g_([^_]+)_([^_]+)_([^_]+)_f([^_]+)_.*$", str)),
"[", -1)
[[1]]
[1] "b" "gdp" "100000" "16" "16" "tv"
[[2]]
[1] "t" "r" "25000" "20" "40" "lin"

Convert nested lists to multidimensional array in R, preserving slice order

I need to export some data bidirectionally between R and Matlab, and the latter prefers arrays. I am trying to convert my R nested-list data structures into a multidimensional array before conversion to matlab, such that the slicing remains the same. This is (analogous to) what I am currently doing:
nestlist <- lapply(1:2, function(x) lapply(1:3, function(y) lapply(1:4, function(z) paste(x, y, z, sep = ""))))
unlist(nestlist)
[1] "111" "112" "113" "114" "121" "122" "123" "124" "131" "132" "133" "134"
[13] "211" "212" "213" "214" "221" "222" "223" "224" "231" "232" "233" "234"
> length(nestlist)
[1] 2
> length(nestlist[[1]])
[1] 3
> length(nestlist[[1]][[1]])
[1] 4
As you can see, dimensions are 2x3x4 as expected. Now:
> ar <- array(unlist(nestlist), c(2, 3, 4))
> nestlist[[1]][[1]][[1]]
[1] "111"
> ar[1,1,1]
[1] "111"
so far so good, but....
> nestlist[[2]][[2]][[3]]
[1] "223"
> ar[2,2,3]
[1] "214"
So somehow array creation is not happening in the same order as the list is parsed with unlist. How can I do this efficiently, preserving the indexing orders and dimensions? I'd like avoid nested sapplies etc ("manual" parsing) if possible.
Here is a generalization of the transpose (t) function for multi-dimensional arrays:
tarray <- function(x) aperm(x, rev(seq_along(dim(x))))
Then you can define ar as follows:
ar <- tarray(array(unlist(nestlist), c(4, 3, 2)))
ar[2,2,3]
# [1] "223"

R format NaN funciton

I'm trying to find a format function which will suppress NaN output in R. I want to pass in a vector of double and have the NaN values return as empty and not as NaN. I'm trying for format output for a Latex table. This should be simple right? Is there such a function?
Here is what I get now:
> x <- c(seq(1,2,0.2), NaN)
> as.character(x)
> [1] "1" "1.2" "1.4" "1.6" "1.8" "2" "NaN"
This is what I want to get:
> x <- c(seq(1,2,0.2), NaN)
> formatting.function(x)
> [1] "1" "1.2" "1.4" "1.6" "1.8" "2" ""
Here you go:
R> x <- c(seq(1,2,0.2), NaN)
R> zx <- as.character(x)
R> zx
[1] "1" "1.2" "1.4" "1.6" "1.8" "2" "NaN"
So now we define a new function mattFun():
R> mattFun <- function(x) gsub("NaN", "", as.character(x))
and use it:
R> zy <- mattFun(x)
R> zy
[1] "1" "1.2" "1.4" "1.6" "1.8" "2" ""
R>
In all seriousness, you are simply looking for a simple pattern replacement which is what
regular expressions do. gsub() is one of several functions offering that. Try to read up on regular expression.
Just replace "NaN" by "":
x <- c(seq(1,2,0.2), NaN)
fofu <- function(x){
cx <- as.character(x)
cx[cx=="NaN"] <- ""
cx
}
fofu(x)
#[1] "1" "1.2" "1.4" "1.6" "1.8" "2" ""
(edit)
Or using implicit conversion to make it shorter:
fofu2 <- function(x) "[<-"(x, is.nan(x), "")
# or ... replace(x, is.nan(x), "")
This is about two times faster than the gsub based solution (using microbenchmark and x as defined above) - although of course, what matters most of the time, is convenience rather than computing time.

Resources