Difference between `CategoricalArray` constructor and `categorical` function - julia

The CategoricalArray constructor and the categorical function from CategoricalArrays.jl seem to be nearly identical in behavior:
julia> using CategoricalArrays
julia> x = CategoricalArray(["a", "b", "c"]; ordered=true, levels=["c", "b", "a"])
3-element CategoricalArray{String,1,UInt32}:
"a"
"b"
"c"
julia> y = categorical(["a", "b", "c"]; ordered=true, levels=["c", "b", "a"])
3-element CategoricalArray{String,1,UInt32}:
"a"
"b"
"c"
julia> x == y
true
Is there any notable difference between CategoricalArray and categorical? If they're basically the same, then what's the reason for including the redundant categorical function?

categorical supports compress keyword argument as opposed to CategoricalArray.

Related

Subset a vector and retrieve first elements if exceed the length in R

Imagine I have a vector like this one:
c("A", "B", "C", "D")
there are 4 positions. If I make a sample with size 1 I can get 1, 2, 3 or 4. What I want is to be able to subset of length 3 of that vector according its order, for example, if I get 2:
c("B", "C", "D")
If I get 3:
c("C", "D", "A")
If I get 4:
c("D","A", "B")
So that's the logic, the vector is sorted and the last elements connects with the first element when I subset.
Using seq, f gives you the desired subset for a specified vector v, of which you would like to subset l elements with a starting point at the nth position.
f <- function(v, n, l) v[seq(n - 1, length.out = l) %% length(v) + 1]
output
f(v, n = 4, l = 3)
#[1] "D" "A" "B"
f(v, n = 3, l = 4)
#[1] "C" "D" "A" "B"
f(v, n = 2, l = 5)
#[1] "B" "C" "D" "A" "B"
I think I got it!
v <- c("A", "B", "C", "D")
p <- sample(1:length(v), 1)
r <- c(v[p:length(v)])
c(r, v[!(v %in% r)])[1:3]
And the outputs:
v <- c("A", "B", "C", "D") # your vector
r <- c(v[2:length(v)])
c(r, v[!(v %in% r)])[1:3]
#> [1] "B" "C" "D"
r <- c(v[3:length(v)])
c(r, v[!(v %in% r)])[1:3]
#> [1] "C" "D" "A"
r <- c(v[4:length(v)])
c(r, v[!(v %in% r)])[1:3]
#> [1] "D" "A" "B"
Created on 2022-05-16 by the reprex package (v2.0.1)
Wrapped in a function:
f <- function(v, nth) {
r <- c(v[nth:length(v)])
return(c(r, v[!(v %in% r)])[1:3])
}

Reconstructing Markov chain from figure in R

I am trying to reconstruct a Markov process from Shannons paper "A mathematical theory of communication". My question concerns figure 3 on page 8 and a corresponding sequence (message) from that Markov chain from page 5 section (B). I just wanted to check if I coded the right Markov chain to this figure from Shannons paper:
Here is my attempt:
install.packages("markovchain")
library(markovchain)
MessageABCDE = c("A", "B", "C", "D", "E")
MessageTransitionMatrix = matrix(c(.4,.1,.2,.2,.1,
.4,.1,.2,.2,.1,
.4,.1,.2,.2,.1,
.4,.1,.2,.2,.1,
.4,.1,.2,.2,.1),
nrow = 5,
byrow = TRUE,
dimname = list(MessageABCDE, MessageABCDE))
MCmessage = new("markovchain", states = MessageABCDE,
byrow = TRUE,
transitionMatrix = MessageTransitionMatrix,
name = "WritingMessage")
steadyStates(MCmessage)
markovchainSequence(n = 20, markovchain = MCmessage, t0 = "A")
My goal was to also create a sequence (message) from that chain. I am mostly uncertain about the transition matrix, where infered the probabilities had to be all the same in every row. I am happy with the output of markovchainSequence, but I am not 100% sure, if I am doing it right.
Here is my console output for markovchainSequence:
> markovchainSequence(n = 20, markovchain = MCmessage, t0 = "A")
[1] "D" "E" "A" "D" "A" "A" "B" "D" "E" "C" "A" "A" "E" "C" "C" "D" "D" "D"
[19] "A" "C"
Looks fine. It's maybe odd because with fully independent states like this there isn't any need for a Markov chain. One could equally well use
tokens <- c("A", "B", "C", "D", "E")
probs <- c(0.4, 0.1, 0.2, 0.2, 0.1)
sample(tokens, size=20, replace=TRUE, prob=probs)
## [1] "A" "B" "A" "B" "D" "B" "C" "D" "A" "D" "C" "E" "A" "A" "C" "E" "C" "D" "C" "C"
Will likely make more sense once there is a variety of conditional probabilities.

R: Vectorize deparse(substitute(x))

I want to vectorize deparse(substitute(x)).
f <- function(...){
deparse(substitute(...))
}
f(a, b, c)
This gives only the first element, "a", but I await "a", "b", "c". By accident, I found this
f2 <- function(...){
deparse(substitute(...()))
}
f2(ab, b, c)
This gives "pairlist(ab, b, c)". Now I could delete all the stuff I do not need to obtain "a", "b", "c". But this seems not elegant to me. Is there a way to vectorize deparse(substitute(x))?
I know there is a question with a similar issue but the answer does not include deparse(substitute(x)).
match.call is a good starting point. I'd encourage you to explore all what you can do with it. I believe this gets you where you want though:
f <- function(...){
as.character(match.call(expand.dots = FALSE)[[2]])
}
and an example of using it...
f(hey, you)
[1] "hey" "you"
We can use match.call
f <- function(...) sapply(as.list(match.call())[-1], as.character)
f(a)
#[1] "a"
f(a, b)
#[1] "a" "b"
f(a, b, c)
#[1] "a" "b" "c"
Or using substitute
f <- function(...) sapply(substitute(...()), deparse)
f(a)
#[1] "a"
f(a, b)
#[1] "a" "b"
f(a, b, c)
#[1] "a" "b" "c"

How do i change split to a horizontal split in R?

I have the vector
x <- c("A", "B", "C", "D", "E", "F")
that I split in the following manner:
split(x, 1:2)
It comes out as (a, c, e) and (b, d, f), yet I want (a, b, c) and (d, e, f). Any way of changing it to a horizontal split rather than a vertical one?
You can do:
split(x, rep(1:2, each = length(x)/2))
which gives:
$`1`
[1] "A" "B" "C"
$`2`
[1] "D" "E" "F"
We can also use gl
split(x, as.numeric(gl(length(x), 3, length(x))))

Make R consider negative position searches as being out of bounds

vec<-c("a", "b", "c", "d")
my task is to extract the second element from the right and left of the key string.
If our key string is "d", if we do
i<-c("d")
vec.1 <- append(vec.1, vec[which(vec == i) + 2])
we get NA. But if we do
i<-c("a")
vec.1 <- append(vec.1, vec[which(vec == i) - 2])
we get "b", "c", "d". Is it possible to consider negative values in subscripts as positions being out of the vector like a positive subscript that exceeds the length of the vector? That way the result will be a NA.
library(Hmisc)
Lag(vec,2)[vec=="d"]
#[1] "b"
Lag(vec,2)[vec=="a"]
#[1] ""
Lag(vec,-2)[vec=="a"]
#[1] "c"
Lag(vec,-2)[vec=="c"]
#[1] ""
I'm sure I could do better, but it's late here. Why not write a small function to do what you want.
myVec <- function(input, match, change) {
temp = which(input == match)
if ((temp + change) <= 0) {
append(NA, input)
} else {
append(input, input[temp + change])
}
}
vec <- c("a", "b", "c", "d")
myVec(vec, "a", -1)
# [1] NA "a" "b" "c" "d"
myVec(vec, "c", -1)
# [1] "a" "b" "c" "d" "b"
myVec(vec, "c", -3)
# [1] NA "a" "b" "c" "d"
myVec(vec, "d", 1)
# [1] "a" "b" "c" "d" NA

Resources