Fill out multi-dimensional array using Julia - jupyter-notebook

I am trying to fill out a multi-dimensional array. For example, X[1] is a vector contains all values k*h[1] where k=0,...,floor(Int,15/h[i])+1. I didn't arrive to solve this problem.
`h=[0.01 0.02 0.04 0.08 0.1 0.2 0.5 0.8]
X=[k*h[i] for k in 0:floor(Int,15/h[i])+1 for i in 1:8]`
I got this error

You can get a Vector of Vectors instead of a 2D Array as noted by #PrzemyslawSzufel. This is because X[1] has length that's different from X[2], that's different from X[3], etc.
The real error, then, is that i is not defined. The variable i is defined in the second comprehension, for i in 1:8, but, it's not accessible in the first comprehension. To solve this, we can insert the first comprehension inside 2 brackets to make it as one element of the second comprehension. Finally, we'll get X[1] as a vector containing all values k*h[1], etc.
Note: In Julia, h = [0.01 0.02 ...] is a Matrix{Float64} and not a Vector. So, you should use h = [0.01, 0.02, ...] instead for Vectors. Also, the range 0:floor(Int,15/h[i])+1 is similar to 0:15/h[i]+1 because the default step is 1, except that the second range has Float64 values.
h = [0.01, 0.02, 0.04, 0.08, 0.1, 0.2, 0.5, 0.8];
X = [[k * h[i] for k in 1:15/h[i]+1] for i in 1:8];
X[1]
1501-element Vector{Float64}:
0.01
0.02
0.03
0.04
0.05
0.06
0.07
...

Related

Divide rows into groups given the similarity between them

Given this example data frame:
DF <- data.frame(x = c(1, 0.85, 0.9, 0, 0, 0.9, 0.95),
y = c(0, 0, 0.1, 0.9, 1, 0.9, 0.97),
z = c(0, 0, 0, 0.9, 0.9, 0.0, 0.9 ))
I am trying to assign each row to a group containing rows adjacent to one another, based on their similarity. I would like to use a cutoff of 0.35, meaning that consecutive rows of values c(1, 0.85, 0.7) can be assigned to one group, but c(0, 1, 0) cannot.
Regarding the columns, column-to-column differences are not important i.e. c(1, 1, 1) and c(0, 0, 0) could still be assigned to one group, HOWEVER, if rows in one column meet the criteria (e.g. c(1, 1, 1)) but the rows in another column(s) do not (e.g. c(1, 0, 1)) - the row is invalid.
Here is the desired output for the example I gave above:
[1] 1 1 1 2 2 NA NA
I am currently applying the abs(diff()) function to determine the difference between the values, and then for each row I take the largest value (adding 1 at the beginning to account for the first row):
diff <- apply(DF, MARGIN = 2, function (x) abs(diff(x)))
max_diff <- c(1, apply(diff, MARGIN = 1, function (x) max(x, na.rm = T)))
max_diff
[1] 1.00 0.15 0.10 0.90 0.10 0.90 0.90
I am stuck at this point, not quite sure what is the best way to proceed with the group assignment. I was initially trying to convert max_diff into a logical vector (max diff < 0.35), and then running a for loop grouping all the TRUEs together. This has a couple of problems:
My dataset has millions of rows so the forloop takes ages,
I "ignore" the first component of the group - e.g. I would not consider the first row as a member of the first group, because the max_diff value of 1 gives FALSE. I don't want to ignore anything.
I will be very grateful for any advice on how to proceed in an efficient way.
PS. The way of determining the difference between sites is not crucial - here it is just a difference of 0.35 but this is very flexible. All I am after is an adjustable method of finding similar rows.
You could do a cluster analysis and play around with different cutoffs h.
cl <- hclust(dist(DF))
DF$group <- cutree(cl, h=.5)
DF
# x y z group
# 1 1.00 0.00 0.0 1
# 2 0.85 0.00 0.0 1
# 3 0.90 0.10 0.0 1
# 4 0.00 0.90 0.9 2
# 5 0.00 1.00 0.9 2
# 6 0.90 0.90 0.0 3
# 7 0.95 0.97 0.9 4
A dendrogram helps to determine h.
plot(cl)
abline(h=.5, col=2)

need to write a new random generator in R

I need to generate 7 random numbers between -1,1 which sum of them equals to 1. I used this code to do so.
diff(c(0, sort(round(runif(7,-1,1),2)), 1))
But I have a big problem with this.
one output of this code is -0.89, 0.21, 0.00, 0.21, 0.30, 0.19, 0.61, -0.63.
The problem is it is uniform I guess so it every time generates big random numbers in the first and last number which is not I want. I need it to be spread to all numbers.
ex. 0.22 -.21 .33 -.12 0.11 0.35 -0.08 (the sum is not equal to 1 just an example)
Do you know who I can write a code to get this kind of random numbers?
Your general idea is probably inspired by the answers linked in the random description. The standard problem is how to generate 7 numbers between 0 and 1 that add to 1. The answer is:
diff(c(0, sort(runif(6, 0, 1)), 1))
#> [1] 0.27960792 0.02035231 0.02638626 0.09945877 0.25134002 0.03379598 0.28905874
The necessary modifications for getting numbers between -1 and 1 are quite simple; just leave out the sort:
diff(c(0, runif(6, 0, 1), 1))
#> [1] 0.9961661 -0.6528227 0.5298829 -0.2087127 -0.2298045 0.2017705 0.3635203
How does this work? We again partition the space between zero and one. But b leaving out the sort, we allow for the possibility of going backward, i.e. negative numbers are possible. Here is the histogram for 1000 generations:
One weakness in this approach is that the first and last numbers are necessarily positive. If this bothers you, you can add an additional sample, e.g.:
sample(diff(c(0, runif(6, 0, 1), 1)), 7)
#> [1] -0.004242793 -0.725348335 0.385971491 0.320525822 0.389915347
#> [6] 0.053195271 0.579983197
There could be 2 solutions, and both of them work with infinite loop:
Solution 1:
You can consider 6 randoms, and 1 dependant so the sum of them can be 1. But, it might happen that, an element become more than 1 or less than -1. Therefore, we cannot accept all the answers.
while(T){
res<-runif(6,-1,1)
res<-append(res,1-sum(res))
if(sum(res>1)==0)
break
}
res
Output is:
-0.34038038 0.15811401 -0.20748670 0.26443104 0.45216639 -0.09912685 0.77228248
Solution 2:
we should continuesly generate different results, and hope to get a proper answer. But, inorder to reduce the time we must round the randoms by 1 digit:
while(T){
res<-round(runif(7,-1,1), digits = 1)
print(sum(res))
if(sum(res)==1)
break
}
res
Output:
> res
[1] -0.6 0.2 0.4 0.7 -0.2 0.6 -0.1
Similar solution like Salman Lashkarara. You should round the numbers to find a solution.
library(magrittr)
set.seed(42)
x <- 1
while(sum(x) != 0){
x <- runif(7,-1,1) %>%
round(3)
}
x
#> [1] 0.155 0.559 -0.335 -0.230 -0.490 -0.557 0.898
sum(x)
#> [1] 0
Created on 2018-09-16 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).

R dataframe with nested vector

Background
In R this works:
> df <- data.frame(a=numeric(), b=numeric())
> rbind(df, list(a=1, b=2))
a b
1 1 2
But if I want the list to have a vector, rbind fails:
> df <- data.frame(a=numeric(), b=vector(mode="numeric"))
> rbind(df, list(a=1, b=c(2,3)))
Error in rbind(deparse.level, ...) :
invalid list argument: all variables should have the same length
And if I try to specify the vector length, declaring the dataframe fails:
> df <- data.frame(a=numeric(), b=vector(mode="numeric", length=2))
Error in data.frame(a = numeric(), b = vector(mode = "numeric", length = 2)) :
arguments imply differing number of rows: 0, 2
Finally, if I eschew declaring the dataframe and try rbind two lists directly, it looks like everything is working, but the datatypes are all wrong, and none of the columns appear to exist.
> l1 <- list(a=1, b=c(2,3))
> l2 <- list(a=10, b=c(20,30))
> obj <- rbind(l1, l2)
> obj
a b
l1 1 Numeric,2
l2 10 Numeric,2
> typeof(obj)
[1] "list"
> obj$a
NULL
> obj$b
NULL
> names(obj)
NULL
My setup
I have a embedded device that gathers data every 50ms and spits out a packet of data. In my script, I'm parsing a waveform that represents the states of that process (process previous frame and transmit, gather new data, dead time where nothing happens) with a state machine. For each packet I'm calculating the duration of the process period, the gathering data period which is subdivided into 8 or 16 acquisition cycles, where I calculate the time of each acquisition cycle, and the remaining dead time.
My list basically looks like `list(process=#, cycles=c(#,#,#,#), deadtime=#). Different packet types have different cycle lengths, so I pass that in as a parameter and I want the script to work on any packet time.
My question
Is there a way to declare a dataframe that does what I want, or am I using R in a fundamentally wrong way and I should break each cycle into it's own list element? I was hoping to avoid the latter as it will make treating the cycles as a group more difficult.
I will note that I've just started learning R so I'm probably doing some odd things with it.
Expected output
If I were to process 4 packets worth of signal with 3 acq. cycles each, this would be my ideal output:
df <- data.frame(processTime=numeric(), cyles=???, deadtime=numeric())
df <- rbind(df, list(processTime=0.05, cycles=c(0.08, 0.10, 0.07), deadtime=0.38)
etc...
processTime cycles deadtime
1 0.05 0.08 0.10 0.07 0.38
2 0.06 0.07 0.11 0.09 0.36
3 0.07 0.28 0.11 0.00 0.00
4 0.06 0.08 0.08 0.09 0.41
I'll take a different stab. Dealing with just your first 2 records.
processTime<-c(.05,.06)
cycles<-list(list(.08,.10,.07), list(.07,.09,.38))
deadtime<-c(.38,.36)
For cycles, we created a list element with a list that contains 3 elements in it. So cycles[[1]][1] would refer to .08, and cycles[[1]][2] would refer second element of the first list and cycles[[2]][3] would refer to the 3rd item in the second list.
If we use cbind to bind these we get the following:
test<-as.data.frame(cbind(processTime,cycles,deadtime))
test
processTime cycles deadtime
1 0.05 0.08, 0.10, 0.07 0.38
2 0.06 0.07, 0.09, 0.38 0.36
test$cycles[[1]] will return first list
test$cycles[[1]]
[[1]]
[[1]][[1]]
[1] 0.08
[[1]][[2]]
[1] 0.1
[[1]][[3]]
[1] 0.07
Whereas the 3rd element of the second list can be called with:
test$cycles[[2]][3]
[[1]]
[1] 0.38
You can also unlist later for calculations:
unlist(test$cycles[[2]])
[1] 0.07 0.09 0.38
To do this iteratively as you requested.
test<-data.frame()
processTime<-c(.05)
cycles<-list(list(.08,.10,.07))
deadtime<-c(.38)
test<-as.data.frame(cbind(processTime,cycles,deadtime))
test
processTime cycles deadtime
1 0.05 0.08, 0.10, 0.07 0.38
processTime<-c(.06)
cycles<-list(list(.07,.09,.38))
deadtime<-c(.36)
test<- rbind(test,as.data.frame(cbind(processTime,cycles,deadtime)))
test
processTime cycles deadtime
1 0.05 0.08, 0.10, 0.07 0.38
2 0.06 0.07, 0.09, 0.38 0.36

Generating random variables with specific correlation threshold value

I am generating random variables with specified range and dimension.I have made a following code for this.
generateRandom <- function(size,scale){
result<- round(runif(size,1,scale),1)
return(result)
}
flag=TRUE
x <- generateRandom(300,6)
y <- generateRandom(300,6)
while(flag){
corrXY <- cor(x,y)
if(corrXY>=0.2){
flag=FALSE
}
else{
x <- generateRandom(300,6)
y <- generateRandom(300,6)
}
}
I want following 6 variables with size 300 and scale of all is between 1 to 6 except for one variable which would have scale 1-7 with following correlation structure among them.
1 0.45 -0.35 0.46 0.25 0.3
1 0.25 0.29 0.5 -0.3
1 -0.3 0.1 0.4
1 0.4 0.6
1 -0.4
1
But when I try to increase threshold value my program gets very slow.Moreover,I want more than 7 variables of size 300 and between each pair of those variables I want some specific correlation threshold.How would I do it efficiently?
This answer is directly inspired from here and there.
We would like to generate 300 samples of a 6-variate uniform distribution with correlation structure equal to
Rhos <- matrix(0, 6, 6)
Rhos[lower.tri(Rhos)] <- c(0.450, -0.35, 0.46, 0.25, 0.3,
0.25, 0.29, 0.5, -0.3, -0.3,
0.1, 0.4, 0.4, 0.6, -0.4)
Rhos <- Rhos + t(Rhos)
diag(Rhos) <- 1
We first generate from this correlation structure the correlation structure of the Gaussian copula:
Copucov <- 2 * sin(Rhos * pi/6)
This matrix is not positive definite, we use instead the nearest positive definite matrix:
library(Matrix)
Copucov <- cov2cor(nearPD(Copucov)$mat)
This correlation structure can be used as one of the inputs of MASS::mvrnorm:
G <- mvrnorm(n=300, mu=rep(0,6), Sigma=Copucov, empirical=TRUE)
We then transform G into a multivariate uniform sample whose values range from 1 to 6, except for the last variable which ranges from 1 to 7:
U <- matrix(NA, 300, 6)
U[, 1:5] <- 5 * pnorm(G[, 1:5]) + 1
U[, 6] <- 6 * pnorm(G[, 6]) + 1
After rounding (and taking the nearest positive matrix to the copula's covariance matrix etc.), the correlation structure is not changed much:
Ur <- round(U, 1)
cor(Ur)

Easily input a correlation matrix in R

I have a R script I'm running now that is currently using 3 correlated variables. I'd like to add a 4th, and am wondering if there's a simple way to input matrix data, particularly for correlation matrices---some Matlab-like technique to enter a correlation matrix, 3x3 or 4x4, in R without the linear to matrix reshape I've been using.
In Matlab, you can use the semicolon as an end-row delimiter, so it's easy to keep track of where the cross correlations are.
In R, where I first create
corr <- c(1, 0.1, 0.5,
0.1, 1, 0.9,
0.5, 0.9, 1)
cormat <- matrix(corr, ncol=3)
Versus
cormat = [1 0.1 0.5;
0.1 1 0.9;
0.5 0.9 1]
It just feels clunkier, which makes me suspect there's a smarter way I haven't looked up yet. Thoughts?
Welcome to the site! :) you should be able to do it in one step:
MyMatrix = matrix(
c(1, 0.1, 0.5,
0.1, 1, 0.9,
0.5, 0.9, 1),
nrow=3,
ncol=3)
Here is another way:
CorrMat <- matrix(scan(),3,3,byrow=TRUE)
1 0.1 0.5
0.1 1 0.9
0.5 0.9 1
Trailing white line is important.
If you want to input a symmetric matrix, you can use the xpnd() function in the MCMCpack library.
xpnd() takes a vector which corresponds to the upper-triangle of the matrix (thus you only have to enter each value once). For instance, if you want to input:
$\left(\begin{array}{c c c}
1 & 0.1 & 0.5 \\
0.1 & 1 & 0.9 \\
0.5 & 0.9 & 1
\end{array}\right)$
You would use
library(MCMCpack)
xpnd(c(1, 0.1, 0.5, 1, 0.9, 1), 3)
where 3 refers to the number of rows in the matrix.
Help page for xpnd.
rbind(c(1, 0.1, 0.5),
c(0.1, 1, 0.9),
c(0.5, 0.9, 1))
For the existing solutions. That may only work for 3*3 matrix. I tried this one.
a<-diag(3)
m<-diag(3)
m[lower.tri(m,diag=F)]<-c(0.1, 0.5, 0.9)
m<-m+t(m)-a
As you are working with correlation matrices, you are probably not interested in entering the diagonal, and both the upper and lower parts. You can manipulate/extract those three parts separately using diag(), upper.tri() and lower.tri().
> M <- diag(3) # create 3x3 matrix, diagonal defaults to 1's
> M[lower.tri(M, diag=F)] <- c(0.1, 0.5, 0.9) # read in lower part
> M # lower matrix containing all information
[,1] [,2] [,3]
[1,] 1.0 0.0 0
[2,] 0.1 1.0 0
[3,] 0.5 0.9 1
If you want the full matrix:
> M[upper.tri(M, diag=F)] <- M[lower.tri(M)] # fill upper part
> M # full matrix
[,1] [,2] [,3]
[1,] 1.0 0.1 0.5
[2,] 0.1 1.0 0.9
[3,] 0.5 0.9 1.0

Resources