Create array of values based on dictionary and array of keys - julia

I'm new to Julia, so I'm sorry if this is a basic question.
Say we have a dictionary, and a vector of keys:
X = [2, 1, 1, 3]
d = Dict( 1 => "A", 2 => "B", 3 => "C")
I want to create a new array which contains values instead of keys (according to the dictionary), so the end result would be something like
Y = ["B", "A", "A", "C"]
I suppose I could iterate over the vector elements, look it up in the dictionary and return the corresponding value, but this seems awfully inefficient to me.
Something like
Y = Array{String}(undef, length(X))
for i in 1:length(X)
Y[i] = d[X[i]]
end
EDIT: Also, my proposed solution doesn't work if X contains missing values.
So my question is if there is some more efficient way of doing this (I'm doing it with a much larger array and dictionary), or is this an appropriate way of doing it?

Efficiency can mean different things in different contexts, but I would probably do:
Y = [d[i] for i in X]
If X contains missing values, you could use skipmissing(X) in the comprehension.

You can use an array comprehension to do this pretty tersely:
julia> [d[x] for x in X]
4-element Array{String,1}:
"B"
"A"
"A"
"C"
In the future it may be possible to write d.[X] to express this even more concisely, but as of Julia 1.3, that is not yet allowed.
As per the edit to the question, let's suppose there is a missing value somewhere in X:
julia> X = [2, 1, missing, 1, 3]
5-element Array{Union{Missing, Int64},1}:
2
1
missing
1
3
If you want to map missing to missing or some other value like the string "?" you can do that explicitly like this:
julia> [ismissing(x) ? missing : d[x] for x in X]
5-element Array{Union{Missing, String},1}:
"B"
"A"
missing
"A"
"C"
julia> [ismissing(x) ? "?" : d[x] for x in X]
5-element Array{String,1}:
"B"
"A"
"?"
"A"
"C"
If you're going to do that a lot, it might be easier to put missing in the dictionary like this:
julia> d = Dict(missing => "?", 1 => "A", 2 => "B", 3 => "C")
Dict{Union{Missing, Int64},String} with 4 entries:
2 => "B"
missing => "?"
3 => "C"
1 => "A"
julia> [d[x] for x in X]
5-element Array{String,1}:
"B"
"A"
"?"
"A"
"C"
If you want to simply skip over missing values, you can use skipmissing(X) instead of X:
julia> [d[x] for x in skipmissing(X)]
4-element Array{String,1}:
"B"
"A"
"A"
"C"
There's generally not a single correct way to handle missing values, which is why you need to explicitly code how to handle missing data.

Related

Any build-in function in MATLAB that can work like `ave` in R?

I am looking for a build-in function in MATLAB which can work in a similar way with ave in R.
Here is an example with R:
set.seed(0)
x <- sample(c("A", "B"), 10, replace = TRUE)
xid <- ave(seq_along(x), x, FUN = seq_along)
which gives
> x
[1] "B" "A" "B" "A" "A" "B" "A" "A" "A" "B"
> xid
[1] 1 1 2 2 3 3 4 5 6 4
In other words, I have no idea which function in MATLAB allows me group by x and assign the sequence ids by groups, such that I can get an array like xid. I know splitgroup might be close to the goal, but it doesn't give me the desired output since it yields summarized results.
The question asks to replace each entry in x by the number of times it has occurred so far.
I don't know of a built-in function that does this. Here are some approaches. Let
x = ['B' 'A' 'B' 'A' 'A' 'B' 'A' 'A' 'A' 'B']; % example data. Row vector
Short code, but memory-inefficient (computes an intermediate N×N matrix, where N is the length of x):
xid = sum(triu(x==x.'));
A little more efficient (computes an intermediate U×N matrix, where U is the number of unique elements of x ):
t = x==unique(x).';
xid = nonzeros(t.*cumsum(t,2)).';
Boring efficient code with a loop:
xid = NaN(size(x)); % preallocate
for u = unique(x)
t = x==u;
xid(t) = 1:sum(t);
end

Sort Array by Order of another Array

I have
a = ["B", "C", "A"]
and
b = [7, 10, 5]
How can I sort a by the order of the elements of b?
So, to explain, the order of the elements in b is the indexes of the sorted elements ([3,1,2]). I would like to use that to do this:
a[[3,1,2]]
["A", "B", "C"]
You are looking for sortperm:
sortperm(v; alg::Algorithm=DEFAULT_UNSTABLE, lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward)
Return a permutation vector I that puts v[I] in sorted order. The order is specified using the same keywords as sort!. The permutation is guaranteed to be stable even if the sorting algorithm is unstable, meaning that indices of equal elements appear in ascending order.
Applied to your example:
julia> a = ["B", "C", "A"]
3-element Array{String,1}:
"B"
"C"
"A"
julia> b = [7, 10, 5]
3-element Array{Int64,1}:
7
10
5
julia> perm = sortperm(b)
3-element Array{Int64,1}:
3
1
2
julia> a[perm]
3-element Array{String,1}:
"A"
"B"
"C"

Changing the levels of a pooled DataArray

I'm looking for a way to modify the levels of a DataArray:
result = pool(["a", "a", "b"])
levels(result) = ["A", "B"]
As a quick-and-dirty solution, you can change the pool field of the object -- it happens to be mutable.
result.pool = [ "A", "B" ]
result
# 3-element PooledDataArray{ASCIIString,Uint8,1}:
# "A"
# "A"
# "B"
xdump( result )
# PooledDataArray{ASCIIString,Uint8,1}
# refs: Array(Uint8,(3,)) Uint8[0x01,0x01,0x02]
# pool: Array(ASCIIString,(2,)) ASCIIString["a","b"]

Random sequence from fixed ensemble that contains at least one of each character

I am trying to generate a random sequence from a fixed number of characters that contains at least one of each character.
For example having the ensemble
m = letters[1:3]
I would like to create a sequence of N = 10 elements that contain at least one of each m characters, like
a
a
a
a
b
c
c
c
c
a
I tried with sample(n,N,replace=T) but in this way also a sequence like
a
a
a
a
a
c
c
c
c
a
can be generated that does not contain b.
f <- function(x, n){
sample(c(x, sample(m, n-length(x), replace=TRUE)))
}
f(letters[1:3], 5)
# [1] "a" "c" "a" "b" "a"
f(letters[1:3], 5)
# [1] "a" "a" "b" "b" "c"
f(letters[1:3], 5)
# [1] "a" "a" "b" "c" "a"
f(letters[1:3], 5)
# [1] "b" "c" "b" "c" "a"
Josh O'Briens answer is a good way to do it but doesn't provide much input checking. Since I already wrote it might as well present my answer. It's pretty much the same thing but takes care of checking things like only considering unique items and making sure there are enough unique items to guarantee you get at least one of each.
at_least_one_samp <- function(n, input){
# Only consider unique items.
items <- unique(input)
unique_items_count <- length(items)
if(unique_items_count > n){
stop("Not enough unique items in input to give at least one of each")
}
# Get values for vector - force each item in at least once
# then randomly select values to get the remaining.
vals <- c(items, sample(items, n - unique_items_count, replace = TRUE))
# Now shuffle them
sample(vals)
}
m <- c("a", "b", "c")
at_least_one_samp(10, m)

R dynamic lists of lists

Hi I'm new to R and for a school project I'm trying to to create a lists of lists that I can access by index and append to. Something like
aList[1] = A, B, C
aList[1] returns [1] A, B, C
aList[1] += D
aList[1] returns [1] A, B, C, D
aList[2] = 1, 2, 3
aList[2] returns [2] 1, 2, 3
aList returns [1] A, B, C, D
[2] 1, 2, 3
However, I'm not sure if I'm using the right datatype (and definitely not the proper syntax) as everything I've tried just either makes a single index of a list or makes multiple indexes of one item.
This isn't the homework. This shouldn't even be an issue but I can't find a solution.
Lists in R are separate from vectors- each item in a vector can only be a basic type like a number or a string, while a list can contains vectors or other lists. It sounds like you want to create a list of vectors. This could be done as:
> aList = list(c("A", "B", "C"), c(1, 2, 3))
> aList[[1]]
[1] "A" "B" "C"
> aList[[1]] = c(aList[[1]], "D")
> aList[[1]]
[1] "A" "B" "C" "D"
> aList[[2]]
[1] 1 2 3
> aList
[[1]]
[1] "A" "B" "C" "D"
[[2]]
[1] 1 2 3
Note that you normally access a list using double brackets, like [[1]]. If you access a list using single brackets, you'll get a subset of the list:
[[1]]
[1] "A" "B" "C" "D"
Which isn't what you want if you want to modify that item.

Resources