I need vectors such that
[1,1,1,1,...,1]
[2,1,1,1,...,1]
[3,1,1,1,...,1]
.
.
.
[J1,1,1,1,...,1]
[1,2,1,1,...,1]
.
.
.
[1,J2,1,1,...,1]
[1,1,2,1,...,1]
.
.
.
[1,1,J3,1,...,1]
.
.
.
[1,1,1,1,1,JD]
In the case of D=5, it is easy to implement.
J = [3,4,5,3,4]
D = length(J)
H = Vector{Vector{Int8}}(undef,1+sum( J.-1 ))
cum = cumsum(J)
H[1:cum[1]] = [[i,1,1,1,1] for i=1:J[1]]
H[cum[1]+1:cum[2]-1] = [[1,j,1,1,1] for j=2:J[2]]
H[cum[2]+0:cum[3]-2] = [[1,1,k,1,1] for k=2:J[3]]
H[cum[3]-1:cum[4]-3] = [[1,1,1,l,1] for l=2:J[4]]
H[cum[4]-2:cum[5]-4] = [[1,1,1,1,m] for m=2:J[5]];
#15-element Vector{Vector{Int8}}:
# [1, 1, 1, 1, 1]
# [2, 1, 1, 1, 1]
# [3, 1, 1, 1, 1]
# [1, 2, 1, 1, 1]
# [1, 3, 1, 1, 1]
# [1, 4, 1, 1, 1]
# [1, 1, 2, 1, 1]
# [1, 1, 3, 1, 1]
# [1, 1, 4, 1, 1]
# [1, 1, 5, 1, 1]
# [1, 1, 1, 2, 1]
# [1, 1, 1, 3, 1]
# [1, 1, 1, 1, 2]
# [1, 1, 1, 1, 3]
# [1, 1, 1, 1, 4]
How can I implement it with general D?
I wrote the code as follows:
J = [3,5,4,2,5,8]
D = length(J)
H = Vector{Vector{Int8}}(undef,1+sum( J.-1 ))
cum = cumsum(J)
for d = 1:D
if d==1
for q = 1:J[1]
one_hot = ones(Int8,D)
one_hot[d] = q
H[q] = one_hot
end
else
for q = 2:J[d]
one_hot = ones(Int8,D)
one_hot[d] = q
H[cum[d-1]+(2-d)+q-1] = one_hot
end
end
end
But I think there is a better method.
Do you have any idea?
EDIT
Thank you for providing ideas.
I conducted a numerical experiment to compare your code. Apparently, AboAmmar's code is the best in terms of efficiency.
using BenchmarkTools
J = [120,120,120,120,120]
#btime get_H_August(J)
16.900 μs (600 allocations: 79.59 KiB)
#btime get_H_Stepan(J)
1.733 μs (2 allocations: 23.39 KiB)
#btime get_H_AboAmmar(J)
705.755 ns (1 allocation: 3.06 KiB)
#btime get_H_Dan(J)
72.900 μs (1795 allocations: 177.88 KiB)
function get_H_August(J)
H = typeof(J)[ones(size(J))] # first row of 1's
sizehint!(H, 1+sum(J.-1)) # we know the final size
for (idx, j) in enumerate(J)
for i = 2:j
# Place `i` at index `idx` and 1's elsewhere
row = ifelse.(1:length(J) .== idx, i, 1)
push!(H, row)
end
end
return H
end
function get_H_Stepan(J)
colsize = sum(J) - (length(J) - 1)
M = fill(1, (colsize, length(J)))
for (j, jval) in enumerate(J)
if j == 1
M[1:J[1], 1] .= 1:J[1]
continue
end
s = sum(#view J[1:j-1]) - j + 3 # start index is
# sum of previous J's - number of intersections + 1
# number of intersections = length of previous J's array - 1
# length of previous J's array is j - 1
# so, sum - (j - 1 - 1) + 1
f = s + (jval - 2) # final index
M[s:f, j] .= 2:jval # filling
end
return M
end
function get_H_AboAmmar(J)
l = 1
H = ones(Int8, sum(J)-length(J)+1,length(J))
for (i,j) in pairs(J)
for k in 2:j
H[l+=1,i] = k
end
end
return H
end
function get_H_Dan(J)
D = length(J)
H = vcat([[vcat(ones(Int8,i-1),Int8(k),ones(Int8,D-i))
for k=1+(i>1):J[i]] for i=1:D]...)
return H
end
Can be written quite easily in array comprehension:
julia> J = [3, 4, 5, 3, 4];
julia> l = length(J);
julia> H = [(v=ones(Int8,l);v[i]=k;v) for (i,j) in pairs(J) for k in 2:j];
julia> H = [[ones(Int8,l)]; H]
15-element Vector{Vector{Int8}}:
[1, 1, 1, 1, 1]
[2, 1, 1, 1, 1]
[3, 1, 1, 1, 1]
[1, 2, 1, 1, 1]
[1, 3, 1, 1, 1]
[1, 4, 1, 1, 1]
[1, 1, 2, 1, 1]
[1, 1, 3, 1, 1]
[1, 1, 4, 1, 1]
[1, 1, 5, 1, 1]
[1, 1, 1, 2, 1]
[1, 1, 1, 3, 1]
[1, 1, 1, 1, 2]
[1, 1, 1, 1, 3]
[1, 1, 1, 1, 4]
If you want something 10X faster, then build H as a matrix and use its rows like this:
l = 1
H = ones(Int8, sum(J)-length(J)+1,length(J))
for (i,j) in pairs(J)
for k in 2:j
H[l+=1,i] = k
end
end
I've come with
J = [3, 4, 5, 3, 4]
colsize = sum(J) - (length(J) - 1)
M = fill(1, (colsize, length(J)))
for (j, jval) in enumerate(J)
if j == 1
M[1:J[1], 1] .= 1:J[1]
continue
end
s = sum(#view J[1:j-1]) - j + 3 # start index is
# sum of previous J's - number of intersections + 1
# number of intersections = length of previous J's array - 1
# length of previous J's array is j - 1
# so, sum - (j - 1 - 1) + 1
f = s + (jval - 2) # final index
M[s:f, j] .= 2:jval # filling
end
# Your vectors
#show M[1, :]
#show M[2, :]
#show M[3, :]
#show M[4, :]
#show M[5, :]
# For debug
# for row in eachrow(M)
# println(row)
# end
My idea is to look at the desired vectors as rows of a matrix and to fill the matrix' columns.
There's many ways to do this of course - I think this is readable and concise enough.
J = [3,4,5,3,4]
H = typeof(J)[ones(size(J))] # first row of 1's
sizehint!(H, 1+sum(J.-1)) # we know the final size
for (idx, j) in enumerate(J)
for i = 2:j
# Place `i` at index `idx` and 1's elsewhere
row = ifelse.(1:length(J) .== idx, i, 1)
push!(H, row)
end
end
This version looks quite short and cute:
julia> J = [3,4,5,3,4];
julia> D = length(J);
julia> H = vcat([[vcat(ones(Int8,i-1),Int8(k),ones(Int8,D-i))
for k=1+(i>1):J[i]] for i=1:D]...)
15-element Vector{Vector{Int8}}:
[1, 1, 1, 1, 1]
[2, 1, 1, 1, 1]
[3, 1, 1, 1, 1]
[1, 2, 1, 1, 1]
[1, 3, 1, 1, 1]
[1, 4, 1, 1, 1]
[1, 1, 2, 1, 1]
[1, 1, 3, 1, 1]
[1, 1, 4, 1, 1]
[1, 1, 5, 1, 1]
[1, 1, 1, 2, 1]
[1, 1, 1, 3, 1]
[1, 1, 1, 1, 2]
[1, 1, 1, 1, 3]
[1, 1, 1, 1, 4]
I have a column in my dataframe containing ascending numbers which are interrupted by Zeros.
I would like to find all rows which come before a Zero and create a new datatable containing only these rows.
My Column: 1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0
What I need: 4, 6
Any help would be much appreciated! Thanks!
A dplyr solution:
library(dplyr)
df %>%
filter(lead(x) == 0, x != 0)
#> x
#> 1 4
#> 2 6
Created on 2021-07-08 by the reprex package (v2.0.0)
data
df <- data.frame(x = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0))
Welcome to SO!
You can try with base R. The idea is to fetch the rownames of the rows before the 0 and subset() the df by them:
# your data
df <- data.frame(col = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0))
# an index that get all the rownames before the 0
index <- as.numeric(rownames(df)[df$col == 0]) -1
# here you subset your original df by index: there is also a != 0 to remove the 0 before 0
df_ <- subset(df, rownames(df) %in% index & col !=0)
df_
col
4 4
12 6
Using base R:
df <- data.frame(x = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0),
y = LETTERS[1:13])
df[diff(df$x)<0,]
x y
4 4 D
12 6 L
Using Run Lengths in base R. To get the index of x, add the run lengths until 0 value occurs.
x <- c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0)
y <- rle(x)
x[cumsum(y$lengths)][which(y$values == 0) - 1]
# [1] 4 6
I have a matrix
A <- matrix(1:16, nrow = 4, ncol = 4, Byrow = FALSE)
I want a row-wise difference of matrix A. That is take element-wise difference between the first and second rows of A, element-wise difference between the second and third rows of A, etc. Since A ∈ R4×4, the resulting matrix should contain row-wise differences which has a dimension of 3 × 4.
Instead of using for-loop to iterate over the rows of A and take differences between consecutive rows, I would like to use the discrete difference operator to speed up the operation. I use sapply() to construct this matrix difference operator B. Then use B × A to compute the row-wise difference.
Let's say Matrix B ∈ R3×4
B <- matrix(c( -1, 1, 0, 0,
0, -1, 1, 0,
0, 0, -1, 1), nrow = 3, ncol = 4, byrow = TRUE)
Expected output be a matrix C ∈ R3×4 with all 1's.
Result_C <- matrix(c( 1, 1, 1, 1,
1, 1, 1, 1,
1, 1, 1, 1), nrow = 3, ncol = 4, byrow = TRUE)
How should I proceed? and what is difference operator for a Matrix in R?
We can use diff to calculate the difference between the rows
diff(A)
# [,1] [,2] [,3] [,4]
#[1,] 1 1 1 1
#[2,] 1 1 1 1
#[3,] 1 1 1 1
You can adress to full columns or rows of the matrix
A <- matrix(1:16, nrow = 4, ncol = 4)
A[2:(nrow(A)),]-A[1:(nrow(A)-1),]
and yes, diff(A) should do the same here
If I start with vector1, and test to see which items equal 1:
vector1 <- c(0, 1, 1, 1, 0, 1, 1, 1, 0, 1)
test <- which(vector1 == 1)
test now equals: 2, 3, 4, 6, 7, 8, 10
then, I want to randomly choose two of the items in test:
sample_vector <- sample(test, 2, replace = FALSE)
the above code generated a sample_vector: 6, 3
My question is how do I take sample_vector and turn it into:
vector2 <- 0, 0, 1, 0, 0, 1, 0, 0, 0, 0
I'm essentially looking to assign only the items in sample_vector to equal 1, and the remaining items from vector1 are assigned to equal 0 (i.e. so it looks like vector2). vector2 needs to have the same length at vector1 (10 items).
Thanks!
vector2 <- rep(0, length(vector1))
vector2[sample_vector] <- 1
set.seed(44)
vector1 <- c(0, 1, 1, 1, 0, 1, 1, 1, 0, 1)
test <- which(vector1 == 1)
sample_vector <- sample(test, 2, replace = FALSE)
sample_vector
#[1] 8 3
replace(tabulate(seq_along(vector1)) - 1, sample_vector, 1)
#[1] 0 0 1 0 0 0 0 1 0 0
Use this code.
vector2 <- rep(0,len(vector1))
vector2[sample_vector] = 1