Compare if the elements of two vectors are equal in Julia - r

I am trying to get the same behavior as R's == when applied to two vectors that get the comparison for each element in the vector.
a <- c(1,2 ,3 )
b <- c(1, 2 ,5 )
a==b
#[1] TRUE TRUE FALSE
I Julia, I came up with a very clumsy way of doing it, but now I wonder if there are easiest ways out there.
a = [1 2 3 ]
b = [1 2 5 ]
a == b #this does not return what I want.
#false
rows_a =size(a)[2]
equal_terms =ones(rows_a)
for i in 1:rows_a
equal_terms[i] =(a[i] == b[i])
end
equal_terms
#1.0
#1.0
#0.0
Thank you in advance.

In Julia you need to vectorize your operation:
julia> a .== b
1×3 BitMatrix:
1 1 0
Julia contrary to Python and R will require explicit vectorization each time you need it. Any operator or function call can be vectorized just by adding a dot ..
Please note that a and b are horizontal vectors and in Julia such are presented as 1×n matrices. Vectors in Julia are always vertical.

Related

Conditional statements with Dataframes [Julia v1.0]

I am porting over custom functions from R. I would like to use Julia Dataframes to store my data. I like to reference by column name instead of, say, array indices hence I am using the Dataframes package.
I simplified the follow to illustrate:
if( DataFrame(x=1).x .>1) end
The error is:
ERROR: TypeError: non-boolean (BitArray{1}) used in boolean context
Is there a simple workaround that would allow me to continue using DataFrames?
The expression:
DataFrame(x=1).x .> 1
Does the following things:
Creates a DataFrame
Extracts a column x from it
Compares all elements of this column to 1 using vectorized operation .> (broadcasting in Julia parlance)
In effect you get the following one element array:
julia> DataFrame(x=1).x .> 1
1-element BitArray{1}:
false
As opposed to R, Julia distinguishes between vectors and scalars so it is not the same as simply writing false. Moreover if statement expects a scalar not a vector, so something like this works:
if 2 > 1
println("2 is greater than 1")
end
but not something like this:
if DataFrame(x=2).x .> 1
println("success!")
end
However, for instance this would work:
if (DataFrame(x=2).x .> 1)[1]
println("success!")
end
as you extract the first (and only in this case) element from the array.
Notice that in R if you passed more than one-element vector to a conditional expression you get a warning like this:
> if (c(T,F)) {
+ print("aaa") } else {print("bbb")}
[1] "aaa"
Warning message:
In
the condition has length > 1 and only the first element will be used
Simply Julia is stricter than R in checking the types in this case. In R you do not have a distinction between scalars and vectors, but in Julia you have.
EDIT:
length(df) returns you the number of columns of a DataFrame (not number of rows). If you are coming from R it is easier to remember nrow and ncol functions.
Now regarding your question you can write either:
for i in 1:nrow(df)
if df.x[i] > 3
df.y[i] = df.x[i] + 1
end
end
or
bigx = df.x .> 3
df.y[bigx] = df.x[bigx] .+ 1
or
df.y .= ifelse.(df.x .> 3, df.x .+ 1, df.y)
or using DataFramesMeta to shorten the notation:
using DataFramesMeta
#with df begin
df.y .= ifelse.(:x .> 3, :x .+ 1, :y)
end
or
using DataFramesMeta
#byrow! df begin
if :x > 3
:y = :x + 1
end
end

return indices of duplicated elements corresponding to the unique elements in R

anyone know if there's a build in function in R that can return indices of duplicated elements corresponding to the unique elements?
For instance I have a vector
a <- ["A","B","B","C","C"]
unique(a) will give ["A","B","C"]
duplicated(a) will give [F,F,T,F,T]
is there a build-in function to get a vector of indices for the same length as original vector a, that shows the location a's elements in the unique vecor (which is [1,2,2,3,3] in this example)?
i.e., something like the output variable "ic" in the matlab function "unique". (which is, if we let c = unique(a), then a = c(ic,:)).
http://www.mathworks.com/help/matlab/ref/unique.html
Thank you!
We can use match
match(a, unique(a))
#[1] 1 2 2 3 3
Or convert to factor and coerce to integer
as.integer(factor(a, levels = unique(a)))
#[1] 1 2 2 3 3
data
a <- c("A","B","B","C","C")
This should work:
cumsum( !duplicated( sort( a)) ) # one you replace Mathlab syntax with R syntax.
Or just:
as.numeric(factor(a) )

Accessing the object referenced in a logical condition

I am writing an xor function for a class, so although any recommendations on currently existing xor functions would be nice, I have to write my own. I have searched online, but have not been able to find any solution so far. I also realize my coding style may be sub-optimal. All criticisms will be welcomed.
I writing a function that will return an element-wise TRUE iff one condition is true. Conditions are given as strings, else they will throw an error due to unexpected symbols (e.g. >). I would like to output a list of the pairwise elements of a and b in which my xor function is true.
The problem is that, while I can create a logical vector of xor T/F based on the conditions, I cannot access the objects directly to subset them. It is the conditions that are function arguments, not the objects themselves.
'%xor%' <- function(condition_a, condition_b) {
# Perform an element-wise "exclusive or" on the conditions being true.
if (length(eval(parse(text= condition_a))) != length(eval(parse(text= condition_b))))
stop("Objects are not of equal length.") # Objects must be equal length to proceed
logical_a <- eval(parse(text= condition_a)) # Evaluate and store each logical condition
logical_b <- eval(parse(text= condition_b))
xor_vector <- logical_a + logical_b == 1 # Only one condition may be true.
xor_indices <- which(xor_vector == TRUE) # Store a vector which gives the indices of the elements which satisfy the xor condition.
# Somehow access the objects in the condition strings
list(a = a[xor_indices], b = b[xor_indices]) # Desired output
}
# Example:
a <- 1:10
b <- 4:13
"a < 5" %xor% "b > 4"
Desired output:
$a
[1] 1 5 6 7 8 9 10
$b
[1] 4 8 9 10 11 12 13
I have thought about doing a combination of ls() and grep() to find existing object names in the conditions, but this would run into problems if the objects in the conditions were not initialized. For example, if someone tried to run "c(1:10) < 5" %xor% "c(4:13) > 4".

Function/instruction to count number of times a value has already been seen

I'm trying to identify if MATLAB or R has a function that resembles the following.
Say I have an input vector v.
v = [1, 3, 1, 2, 4, 2, 1, 3]
I want to generate a vector, w of equivalent length to v. Each element w[i] should tell me the following: for the corresponding value v[i], how many times has this value been encountered so far in v, i.e. in all elements of v up to, but not including, position i. In this example
w = [0, 0, 1, 0, 0, 1, 2, 1]
I'm really looking to see if any statistical or domain-specific languages have a function/instruction like this and what it might be called.
In R, you can try this:
v <- c(1,3,1,2,4,2,1,3)
ave(v, v, FUN=seq_along)-1
#[1] 0 0 1 0 0 1 2 1
Explanation
ave(seq_along(v), v, FUN=seq_along) #It may be better to use `seq_along(v)` considering different classes i.e. `factor` also.
#[1] 1 1 2 1 1 2 3 2
Here, we are grouping the sequence of elements by v. For elements that match the same group, the seq_along function will create 1,2,3 etc. In the case of v, the elements of same group 1 are in positions 1,3,7, so those corresponding positions will be 1,2,3. By subtracting with 1, we will be able to start from 0.
To understand it better,
lst1 <- split(v,v)
lst2 <- lapply(lst1, seq_along)
unsplit(lst2, v)
#[1] 1 1 2 1 1 2 3 2
Using data.table
library(data.table)
DT <- data.table(v, ind=seq_along(v))
DT[, n:=(1:.N)-1, by=v][,n[ind]]
#[1] 0 0 1 0 0 1 2 1
In Matlab there is not a function for that (as far as I know), but you can achieve it this way:
w = sum(triu(bsxfun(#eq, v, v.'), 1));
Explanation: bsxfun(...) compares each element with each other. Then triu(..., 1) keeps only matches of an element with previous elements (i.e. values above the diagonal). Finally sum(...) adds all coincidences with previous elements.
A more explicit, but slower alternative (not recommended) is:
w = arrayfun(#(n) sum(v(1:n-1)==v(n)), 1:numel(v));
Explanation: for each index n (where n varies as 1:numel(v)), compare all previous elements v(1:n-1) to the current element v(n), and get the number of matches (sum(...)).
R has a function called make.unique that can be used to obtain the required result. First use it to make all elements unique:
(v.u <- make.unique(as.character(v))) # it only works on character vectors so you must convert first
[1] "1" "3" "1.1" "2" "4" "2.1" "1.2" "3.1"
You can then take this vector, remove the original data, convert the blanks to 0, and convert back to integer to get the counts:
as.integer(sub("^$","0",sub("[0-9]+\\.?","",v.u)))
[1] 0 0 1 0 0 1 2 1
If you want to use a for-loop in matlab you can get the result with:
res=v;
res(:)=0;
for c=1:length(v)
helper=find(v==v(c));
res(c)=find(helper==c);
end
not sure about runtime compared to Luis Mendo's solution. Gonna check that now.
Edit
Running the code 10.000 times results in:
My Solution: Elapsed time is 0.303828 seconds
Luis Mendo's Solution (bsxfun): Elapsed time is 0.180215 seconds.
Luis Mendo's Solution (arrayfun): Elapsed time is 3.868467 seconds.
So the bsxfun solution is fastest, then the for-loop followed by the arrayfun solution. Gonna generate longer v-arrays now and see if sth changes.
Edit 2
Changing v to
v = ceil(rand(100,1)*8);
resulted in more obvious runtime ranking:
My Solution: Elapsed time is 4.020916 seconds.
Luis Mendo's Solution (bsxfun):Elapsed time is 0.808152 seconds.
Luis Mendo's Solution (arrayfun): Elapsed time is 22.126661 seconds.

package or function to count sequence lengths?

I was wondering if there is a package or generic function in R that counts sequence lengths.
For instance, if I input a sequence
s1<-c('a','a','b','a','a','a','b','b')
The proposed function F(s1,'a') would return a vector:
[2,3]
and F(s1,'b') would return [1,2]
Those madly typing people must have gone elsewhere:
s1<- c('a','a','b','a','a','a','b','b')
F1 <- function(s, el) {rle(s)$lengths[rle(s)$values==el] }
F1(s1, "a")
#[1] 2 3
F1(s1, "b")
#[1] 1 2

Resources