Results of #view in an array? - julia

I have a 121x2 matrix named AgeValues, which has Age in column 1 and a corresponding age value in column 2. I'm using #view to create a subset of AgeValues for each age in a 165-element data set.
AgeValues = Matrix(AgeValues)
function AgeValuesX(i)
#view AgeValues[i+1:end,2]
end
Values_ageX = AgeValuesX.(age)
This is returning the data I need, but in a 165x1 vector (with the results comma delimited in each row of the vector). I'm trying to convert this into a 165x121 array.
My problem is that each of the elements are of a different length. I've tried reduce(vcat()):
reduce(vcat, transpose(Values_ageX))
ERROR: ArgumentError: number of columns of each array must match (got (29, 42))
I've tried reshape:
reshape(Values_ageX, 165, 121)
ERROR: DimensionMismatch("new dimensions (165, 121) must be consistent with array size 165")
I've tried resizing the #view:
AgeValues = Array(AgeValues)
function AgeValuesX(i)
resize!(#view AgeValues[i+1:end,2],121)
end
ERROR: LoadError: ArgumentError: Invalid use of #view macro: argument must be a reference expression A[...]
Any suggestions? I need this to be an array, so I can use the exp.() function on the results - this doesn't seem to work on my 165x1 vector (ERROR: MethodError: no method matching exp(::SubArray{Float64, 1, Matrix{Float64}, Tuple{UnitRange{Int64}, Int64}, true})) - I need to take the exp of each value in each element. Thank you!
EDIT to add more info:
AgeValues is a DataFrame read from a CSV file before I convert it to a matrix. Even if I don't convert it to a matrix, once I use the function, it outputs a vector.
The first few lines of Values_AgeX looks like this:
165-element Vector{SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}}:
[0.228571429, 0.2, 0.171428571, 0.142857143, 0.114285714, 0.085714286, 0.057142857, 0.028571429, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[0.6, 0.571428571, 0.542857143, 0.514285714, 0.485714286, 0.457142857, 0.428571429, 0.4, 0.371428571, 0.342857143 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
The number of values in each element is 121 less age. I want to be able to resize! each row to 121 (padding with zeros, which resize! will do for me), which will give me the 165x121 matrix.
My end goal is for this vector to be output as an array so I can multiply each value by another value in another 165x121 array.

I'm still unsure exactly what you want. I assume that the variable age is a vector with 165 elements? You don't state this, forcing everyone to guess.
One should always start questions with a complete, small, example and the desired output:
agevalues = [0 0.9;
1 0.8;
2 0.7;
3 0.6]
# I can only assume this contains duplicate values since it's supposed to be longer:
ages = [0, 1, 2, 2, 1, 3]
# Desired output???
agematrix = [0.9 0.8 0.7 0.6;
0.8 0.7 0.6 0.0;
0.7 0.6 0.0 0.0;
0.7 0.6 0.0 0.0;
0.8 0.7 0.6 0.0;
0.6 0.0 0.0 0.0]
The #view macro replaces what would have been a (potentially) costly copy of the data with a indirect view onto the same data where changing the layout doesn't make any sense (I incorrectly claimed it was read only, but i was of course wrong; you could modify the contents they view if desired).
Secondly, resize! will not even initialize the data; so there won't be any zero padding, so we can't use it since we can't modify a view, but we also don't even want uninitialized data. You wanted zeros.
A good general strategy for loops and broadcasts is to only think of the single case first:
What do we want from a single age value?
Well, the sliced vector, padded with extra zeros.
With minimal modifications to your code, we pad it with zeros:
function AgeValuesX(i)
vcat((#view AgeValues[i+1:end,2]), zeros(i))
end
this creates a new array that matches a single line of our final data.
The rest of your code simply just works now.
(There are faster ways to achieve this, avoiding extra allocations, by simply assembling into a pre-allocated 165x121 matrix directly.)

Related

How to get a value from a vector including a range in Julia

Coming from R I am used to do something like this to get the first element of vector a:
a <- c(1:3, 5)
a[1]
[1] 1
H can I get the 1 in Julia? The first element of a is now a range.
a = [1:3, 5]
a[1]
1-element Array{UnitRange{Int64},1}:
1:3
The core problem here is that c(1:3, 5) in R and [1:3, 5] in Julia do not do the same thing. The R code concatenates a vector with an integer producing a vector of four integers:
> c(1:3, 5)
[1] 1 2 3 5
The Julia code constructs a two-element vector whose elements are the range 1:3 and the integer 5:
julia> [1:3, 5]
2-element Vector{Any}:
1:3
5
julia> map(typeof, ans)
2-element Vector{DataType}:
UnitRange{Int64}
Int64
This vector has element type Any because there's no smaller useful common supertype of a range and an integer. If you want to concatenate 1:3 and 5 together into a vector you can use ; inside of the brackets instead of ,:
julia> a = [1:3; 5]
4-element Vector{Int64}:
1
2
3
5
Once you've defined a correctly, you can get its first element with a[1] just like in R. In general inside of square brackets in Julia:
Comma (,) is only for constructing vectors of the exact elements given, much like in Python, Ruby, Perl or JavaScript.
If you want block concatenation like in R or Matlab, then you need to use semicolons/newlines (; or \n) for vertical concatenation and spaces for horizontal concatenation.
The given example of [1:3; 5] is a very simple instance of block concatenation, but there are significantly more complex ones possible. Here's a fancy example of constructing a block matrix:
julia> using LinearAlgebra
julia> A = rand(2, 3)
2×3 Matrix{Float64}:
0.895017 0.442896 0.0488714
0.750572 0.797464 0.765322
julia> [A' I
0I A]
5×5 Matrix{Float64}:
0.895017 0.750572 1.0 0.0 0.0
0.442896 0.797464 0.0 1.0 0.0
0.0488714 0.765322 0.0 0.0 1.0
0.0 0.0 0.895017 0.442896 0.0488714
0.0 0.0 0.750572 0.797464 0.765322
Apologies for StackOverflow's lousy syntax highlighting here: it seems to get confused by the postfix ', interpreting it as a neverending character literal. To explain this example a bit:
A is a 2×3 random matrix of Float64 elements
A' is the adjoint (conjugate transpose) of A
I is a variable size unit diagonal operator
0I is similar but the diagonal scalar is zero
These are concatenated together to form a single 5×5 matrix of Float64 elements where the upper left and lower right parts are filled from A' and A, respectively, while the lower left is filled with zeros and the upper left is filled with the 3×3 identity matrix (i.e. zeros with diagonal ones).
In this case, your a[1] is a UnitRange collection. If you want to access an individual element of it, you can use collect
For example for the first element,
collect(a[1])[1]

Julia: broadcasting `findfirst()` across rows of a matrix

I want to find the index of the first value in each row of a matrix that satisfies some condition. I want to figure out how to do this without using array comprehensions.
This is how I would do it with an array comprehension:
# let's say we want to find the first column index, per row, where a number in that row is below some threshold.
threshold = 0.5;
data = randn(50,100);
first_threshold_crossings = [findfirst(data[i,:]<threshold) for i in 1:size(data,1)];
Yielding a list of indices that tells you where (column-wise) each row has a value that first drops below the threshold, going from left to right.
Any faster way you can imagine doing this?
Here's how you can do it:
julia> using Random # For RNG reproducability
julia> A = rand(Random.MersenneTwister(0), 3, 3)
3×3 Array{Float64,2}:
0.823648 0.177329 0.0423017
0.910357 0.27888 0.0682693
0.164566 0.203477 0.361828
julia> findfirst.(x < 0.1, eachrow(A))
3-element Array{Union{Nothing, Int64},1}:
3
3
nothing
Note that findfirst returns nothing if no index satisfies the condition.

Assign values to Julia Matrix based on Condition

I have a matrix of zeros A which has dimension (m x n). I have another matrix of some integer values b. b has length n. I want to have A be set to the identity wherever b has values greater than 5. So basically, for every row of A where b has value greater than 5, set it to the identity.
I tried to do this, but it's not working. Does anyone have an idea of how to do this in Julia?
using LinearAlgebra
usable_values = filter((x) -> x > 5, b)
# A[:, usable_values] = I
A[:, b .> 5] = I
I'm not certain I understand what you mean by "set to the identity": the identity matrix must be square, and hence a row or column of a matrix can't be equal to the identity matrix. I'll operate under the assumption that you want the entries to have value 1. In that case,
A[:, findall(b .> 5)] .= 1
is a simple one-liner. Let's discuss the elements here:
As proposed above, filter will select out the elements of b bigger than 5. But you want the indices of those elements, for which findall is the appropriate function.
Note the use of broadcasted assignment, .=. This means to assign the RHS to each element of the left side. That way you don't need to create a matrix on the RHS.
The loop approach is fine too, but for reasons of performance I'd put that in a function. See the performance tips.
If what you need is for every row of A where b has value greater than 5, set it to the identity this might be helpful to you, while you wait that for some of the gurus here can write the same in one line of code :)
n = 2
m = 5
A = zeros(m, n)
b = rand(1:10, m)
println(b)
for (cnt, value) in enumerate(b)
if value > 5
A[cnt, :] = ones(1, n)
end
end
A
The result I get is:
b = [4, 2, 6, 8, 1]
5×2 Array{Float64,2}:
0.0 0.0
0.0 0.0
1.0 1.0
1.0 1.0
0.0 0.0
I am fairly new to the language, this is the best I can do to help, for now.

Assigning specific values to a boolean array

Say I am tossing a fair coin where 'tails' is assigned the value x = -1/2 and 'heads' is assigned x = 1/2.
I do this N times and I want to obtain the sum. This is what I have tried:
p = 0.5
N = 1e4
X(N,p)=(rand(N).<p)
I know this is incomplete but when I check (rand(N).<p) I see an array consisting of true, false. I interpret this as 'Tails' or 'Heads'. However, I don't know how to assign the values 1/2 and -1/2 to each of these elements in order for me to find the sum. If I simply use sum((rand(N).<p)) I do get an integer value, but I don't think this is the right way to do it because I haven't specified the values 1/2 and -1/2 anywhere.
Any help is greatly appreciated.
As indicated by the comments already, you want to do
sum(rand([-0.5, 0.5], N))
where N must be an integer (you wrote N=1e4, therefore typeof(N) == Float64 and rand won't work).
The documentation of rand (obtained by ?rand) describes what rand(S, N) does:
Pick a random element or array of random elements from the set of
values specified by S
Here, S can be an optional indexable collection, an array of values in your case (or a type like Int). So, above S = [-0.5, 0.5] and rand draws N random elements from this collection, which we can afterwards sum up.
Assigning specific values to a boolean array
Since this is the title of your question, and the answer above doesn't actually address this, let me comment on this as well.
You could do sum((rand(N).<p)-0.5), i.e. you shift all the ones to 0.5 and all the zeros to -0.5 to get the wanted result. Note that this is a general strategy: Let's say you want true to be a and false to be b, where a and b are numbers. You achieve this by (rand(N).<p)*(a-b) + b.
However, beyond being more "complicated", sum((rand(N).<p)-0.5) will allocate temporary arrays, first one of booleans, then one of numbers, the latter of which will eventually go into sum. Because of these unnecessary allocations this approach will be slower than the solution above.

How can I sort a vector based on the indices contained in a different vector?

I have a vector that I would like to sort based on the indices contained in another vector. For instance, if I have these vectors:
x <- c(0.4, 0.8, 0.1, 0.2) #<--values to be sorted
y <- c(3,1,4,2)# <--indices to base the sorting
Vector y will always have distinct values from 1 to the length of x (and therefore, both vectors will always have the same number of elements)
The expected vector would be:
0.8,0.2,0.4,0.1
Or use order
x[order(y)]
## [1] 0.8 0.2 0.4 0.1
Try rev(x[y]) to get your expected output.

Resources