Sort Array by Order of another Array - julia

I have
a = ["B", "C", "A"]
and
b = [7, 10, 5]
How can I sort a by the order of the elements of b?
So, to explain, the order of the elements in b is the indexes of the sorted elements ([3,1,2]). I would like to use that to do this:
a[[3,1,2]]
["A", "B", "C"]

You are looking for sortperm:
sortperm(v; alg::Algorithm=DEFAULT_UNSTABLE, lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward)
Return a permutation vector I that puts v[I] in sorted order. The order is specified using the same keywords as sort!. The permutation is guaranteed to be stable even if the sorting algorithm is unstable, meaning that indices of equal elements appear in ascending order.
Applied to your example:
julia> a = ["B", "C", "A"]
3-element Array{String,1}:
"B"
"C"
"A"
julia> b = [7, 10, 5]
3-element Array{Int64,1}:
7
10
5
julia> perm = sortperm(b)
3-element Array{Int64,1}:
3
1
2
julia> a[perm]
3-element Array{String,1}:
"A"
"B"
"C"

Related

R: How to count length of intervals between specific word/symbol in a vector?

I have a vector that contains series of texts and numbers, like:
t <- c("A", 1:3, "A", 1:4, "A", 1:3)
t
#> [1] "A" "1" "2" "3" "A" "1" "2" "3" "4" "A" "1" "2" "3"
Created on 2022-08-06 by the reprex package (v2.0.1)
That is, the actual data is taken from a pdf, with the data frame collapsed into a single column vector, and the wrap length is uneven for some reason (probably because of the cell merging).
To process this data efficiently, I want to know the length from "A" to next "A" or end. In this example the answer would be 3, 4, 3 (Edit: sorry for a simple mistake, it would be 4, 5, 4).
I have tried many different methods but can't find one that works. Does anyone know of a better way?
An alternative using rle (run-length encoding)
with(rle(t == "A"), subset(lengths, !values))
#> [1] 3 4 3
You want the number of elements
(1) between adjacent "A"s;
(2) from the last "A" (excluding it) to the end.
We can use either of the following:
diff(c(which(t == "A"), length(t) + 1)) - 1
#[1] 3 4 3
diff(which(c(t, "A") == "A")) - 1
#[1] 3 4 3
Essentially we pad an "A" at the end to turn (2) into (1). If the last element of t happens to be an "A", the last value in the result will be 0.
Extension:
If you further want to know the number of elements from the beginning to the first "A" (excluding it), we can pad a leading "A":
diff(c(0, which(t == "A"), length(t) + 1)) - 1
#[1] 0 3 4 3
diff(which(c("A", t, "A") == "A")) - 1
#[1] 0 3 4 3
Here, the first value is 0, because the first element of t happens to be an "A".

Create array of values based on dictionary and array of keys

I'm new to Julia, so I'm sorry if this is a basic question.
Say we have a dictionary, and a vector of keys:
X = [2, 1, 1, 3]
d = Dict( 1 => "A", 2 => "B", 3 => "C")
I want to create a new array which contains values instead of keys (according to the dictionary), so the end result would be something like
Y = ["B", "A", "A", "C"]
I suppose I could iterate over the vector elements, look it up in the dictionary and return the corresponding value, but this seems awfully inefficient to me.
Something like
Y = Array{String}(undef, length(X))
for i in 1:length(X)
Y[i] = d[X[i]]
end
EDIT: Also, my proposed solution doesn't work if X contains missing values.
So my question is if there is some more efficient way of doing this (I'm doing it with a much larger array and dictionary), or is this an appropriate way of doing it?
Efficiency can mean different things in different contexts, but I would probably do:
Y = [d[i] for i in X]
If X contains missing values, you could use skipmissing(X) in the comprehension.
You can use an array comprehension to do this pretty tersely:
julia> [d[x] for x in X]
4-element Array{String,1}:
"B"
"A"
"A"
"C"
In the future it may be possible to write d.[X] to express this even more concisely, but as of Julia 1.3, that is not yet allowed.
As per the edit to the question, let's suppose there is a missing value somewhere in X:
julia> X = [2, 1, missing, 1, 3]
5-element Array{Union{Missing, Int64},1}:
2
1
missing
1
3
If you want to map missing to missing or some other value like the string "?" you can do that explicitly like this:
julia> [ismissing(x) ? missing : d[x] for x in X]
5-element Array{Union{Missing, String},1}:
"B"
"A"
missing
"A"
"C"
julia> [ismissing(x) ? "?" : d[x] for x in X]
5-element Array{String,1}:
"B"
"A"
"?"
"A"
"C"
If you're going to do that a lot, it might be easier to put missing in the dictionary like this:
julia> d = Dict(missing => "?", 1 => "A", 2 => "B", 3 => "C")
Dict{Union{Missing, Int64},String} with 4 entries:
2 => "B"
missing => "?"
3 => "C"
1 => "A"
julia> [d[x] for x in X]
5-element Array{String,1}:
"B"
"A"
"?"
"A"
"C"
If you want to simply skip over missing values, you can use skipmissing(X) instead of X:
julia> [d[x] for x in skipmissing(X)]
4-element Array{String,1}:
"B"
"A"
"A"
"C"
There's generally not a single correct way to handle missing values, which is why you need to explicitly code how to handle missing data.

Changing the levels of a pooled DataArray

I'm looking for a way to modify the levels of a DataArray:
result = pool(["a", "a", "b"])
levels(result) = ["A", "B"]
As a quick-and-dirty solution, you can change the pool field of the object -- it happens to be mutable.
result.pool = [ "A", "B" ]
result
# 3-element PooledDataArray{ASCIIString,Uint8,1}:
# "A"
# "A"
# "B"
xdump( result )
# PooledDataArray{ASCIIString,Uint8,1}
# refs: Array(Uint8,(3,)) Uint8[0x01,0x01,0x02]
# pool: Array(ASCIIString,(2,)) ASCIIString["a","b"]

Shuffling a vector - all possible outcomes of sample()?

I have a vector with five items.
my_vec <- c("a","b","a","c","d")
If I want to re-arrange those values into a new vector (shuffle), I could use sample():
shuffled_vec <- sample(my_vec)
Easy - but the sample() function only gives me one possible shuffle. What if I want to know all possible shuffling combinations? The various "combn" functions don't seem to help, and expand.grid() gives me every possible combination with replacement, when I need it without replacement. What's the most efficient way to do this?
Note that in my vector, I have the value "a" twice - therefore, in the set of shuffled vectors returned, they all should each have "a" twice in the set.
I think permn from the combinat package does what you want
library(combinat)
permn(my_vec)
A smaller example
> x
[1] "a" "a" "b"
> permn(x)
[[1]]
[1] "a" "a" "b"
[[2]]
[1] "a" "b" "a"
[[3]]
[1] "b" "a" "a"
[[4]]
[1] "b" "a" "a"
[[5]]
[1] "a" "b" "a"
[[6]]
[1] "a" "a" "b"
If the duplicates are a problem you could do something similar to this to get rid of duplicates
strsplit(unique(sapply(permn(my_vec), paste, collapse = ",")), ",")
Or probably a better approach to removing duplicates...
dat <- do.call(rbind, permn(my_vec))
dat[duplicated(dat),]
Noting that your data is effectively 5 levels from 1-5, encoded as "a", "b", "a", "c", and "d", I went looking for ways to get the permutations of the numbers 1-5 and then remap those to the levels you use.
Let's start with the input data:
my_vec <- c("a","b","a","c","d") # the character
my_vec_ind <- seq(1,length(my_vec),1) # their identifier
To get the permutations, I applied the function given at Generating all distinct permutations of a list in R:
permutations <- function(n){
if(n==1){
return(matrix(1))
} else {
sp <- permutations(n-1)
p <- nrow(sp)
A <- matrix(nrow=n*p,ncol=n)
for(i in 1:n){
A[(i-1)*p+1:p,] <- cbind(i,sp+(sp>=i))
}
return(A)
}
}
First, create a data.frame with the permutations:
tmp <- data.frame(permutations(length(my_vec)))
You now have a data frame tmp of 120 rows, where each row is a unique permutation of the numbers, 1-5:
>tmp
X1 X2 X3 X4 X5
1 1 2 3 4 5
2 1 2 3 5 4
3 1 2 4 3 5
...
119 5 4 3 1 2
120 5 4 3 2 1
Now you need to remap them to the strings you had. You can remap them using a variation on the theme of gsub(), proposed here: R: replace characters using gsub, how to create a function?
gsub2 <- function(pattern, replacement, x, ...) {
for(i in 1:length(pattern))
x <- gsub(pattern[i], replacement[i], x, ...)
x
}
gsub() won't work because you have more than one value in the replacement array.
You also need a function you can call using lapply() to use the gsub2() function on every element of your tmp data.frame.
remap <- function(x,
old,
new){
return(gsub2(pattern = old,
replacement = new,
fixed = TRUE,
x = as.character(x)))
}
Almost there. We do the mapping like this:
shuffled_vec <- as.data.frame(lapply(tmp,
remap,
old = as.character(my_vec_ind),
new = my_vec))
which can be simplified to...
shuffled_vec <- as.data.frame(lapply(data.frame(permutations(length(my_vec))),
remap,
old = as.character(my_vec_ind),
new = my_vec))
.. should you feel the need.
That gives you your required answer:
> shuffled_vec
X1 X2 X3 X4 X5
1 a b a c d
2 a b a d c
3 a b c a d
...
119 d c a a b
120 d c a b a
Looking at a previous question (R: generate all permutations of vector without duplicated elements), I can see that the gtools package has a function for this. I couldn't however get this to work directly on your vector as such:
permutations(n = 5, r = 5, v = my_vec)
#Error in permutations(n = 5, r = 5, v = my_vec) :
# too few different elements
You can adapt it however like so:
apply(permutations(n = 5, r = 5), 1, function(x) my_vec[x])
# [,1] [,2] [,3] [,4]
#[1,] "a" "a" "a" "a" ...
#[2,] "b" "b" "b" "b" ...
#[3,] "a" "a" "c" "c" ...
#[4,] "c" "d" "a" "d" ...
#[5,] "d" "c" "d" "a" ...

R dynamic lists of lists

Hi I'm new to R and for a school project I'm trying to to create a lists of lists that I can access by index and append to. Something like
aList[1] = A, B, C
aList[1] returns [1] A, B, C
aList[1] += D
aList[1] returns [1] A, B, C, D
aList[2] = 1, 2, 3
aList[2] returns [2] 1, 2, 3
aList returns [1] A, B, C, D
[2] 1, 2, 3
However, I'm not sure if I'm using the right datatype (and definitely not the proper syntax) as everything I've tried just either makes a single index of a list or makes multiple indexes of one item.
This isn't the homework. This shouldn't even be an issue but I can't find a solution.
Lists in R are separate from vectors- each item in a vector can only be a basic type like a number or a string, while a list can contains vectors or other lists. It sounds like you want to create a list of vectors. This could be done as:
> aList = list(c("A", "B", "C"), c(1, 2, 3))
> aList[[1]]
[1] "A" "B" "C"
> aList[[1]] = c(aList[[1]], "D")
> aList[[1]]
[1] "A" "B" "C" "D"
> aList[[2]]
[1] 1 2 3
> aList
[[1]]
[1] "A" "B" "C" "D"
[[2]]
[1] 1 2 3
Note that you normally access a list using double brackets, like [[1]]. If you access a list using single brackets, you'll get a subset of the list:
[[1]]
[1] "A" "B" "C" "D"
Which isn't what you want if you want to modify that item.

Resources