R Loop random normal distribution - r

I'm trying to achieve the following: I want to generate 7 values from a normal distribution. Then I want to take these values, and using them as a mean generate 3 more (for each initial value) values from a normal distribution to replace them. I'd like to write this in a loop.
Let's use sd = 1.5 and sd = 0.7, and start with a mean of 0.
set.seed(1234)
mu.mat<-rnorm(7,mean=0,sd=1.5)
Gives me 7 nice values.
Then I want to create a number num [1:21] that generates 3 norm. distr. values using mean = first value of the just created list with sd = 0.7, three more using the second value and so on.
Of the form:
rnorm(3,mean=mu.mat[1],sd=0.7)
Just for all entries in a loop.
What I've tried:
mu.mat2<-NULL
for(i in 1:7) {
mu.mat2[i]<-rnorm(3,mean=mu.mat[i],sd=0.7)
}
Results in error: no. of items to replace is not a multiple of replacement length.
Any help on how to put this into a loop is very appreciated. Thanks in advance!

You don't need a loop. You can do:
rnorm(21, mean = rep(mu.mat, each = 3), sd = 0.7)
#> [1] -0.4811184 -1.2327778 -1.8603816 -3.3073277 -2.5190560 -3.2298056
#> [7] -2.3695570 -2.0228732 -1.1692489 2.0342910 1.0186855 1.0838678
#> [13] 0.5486730 -0.2439510 -0.1831147 2.2026024 0.1925301 -0.2153864
#> [19] 2.8944894 1.9213206 1.3804706
But the problem with your code is that you are trying to write three values (rnorm(3,mean=mu.mat[i],sd=0.7)) into a single atomic index mu.mat2[i]. It's not clear whether you were expecting a matrix as a result, but if so your loop would be:
mu.mat2 <- matrix(ncol = 3, nrow = 7)
for(i in 1:7) {
mu.mat2[i,] <- rnorm(3, mean = mu.mat[i], sd = 0.7)
}
If you were wanting the result as a 7 x 3 matrix, you can do:
matrix(rnorm(21, mean = rep(mu.mat, each = 3), sd = 0.7), ncol = 3, byrow = TRUE)
#> [,1] [,2] [,3]
#> [1,] -0.96624036 -1.4808460 -2.6824842
#> [2,] -2.88942108 -1.7299094 -3.0446737
#> [3,] -2.82034688 -0.9570087 -2.1822797
#> [4,] 0.58997289 1.0384926 1.8111506
#> [5,] -0.07705959 -0.1024418 0.7249310
#> [6,] 0.48851487 1.4729882 0.6496858
#> [7,] 1.47961292 1.5653253 2.0629409

Try replicate like below
> replicate(3,rnorm(length(mu.mat),mu.mat,0.7))
[,1] [,2] [,3]
[1,] -2.19324092 -1.13895278 -2.1540788
[2,] 0.02102746 0.33894402 0.1077604
[3,] 1.00363528 1.26895511 1.9483744
[4,] -3.85258144 -4.15638335 -4.0041507
[5,] -0.05518348 0.05766686 -0.3700564
[6,] 0.21570611 2.45016846 1.1614128
[7,] -0.81698877 -0.76824819 -1.5786689

Related

Count all values in a correlation matrix that are above 0.8 and below -0.8

I have a matrix of 2134 by 2134 of correlation values and I would like to count the total number of values that are above 0.8 and below -0.8. I have tried
length(TFcoTF[TFcoTF>.8])
but this does not seem to be correct as I am getting about 50 percent of values above .8 which does not correspond to the histogram I have for the data. Also when I do
length(TFcoTF[TFcoTF<-.8])
I got 0 as the output. Any help is appreciated.
The data table package has a function called between. This returns TRUE/FALSE value for each value in your matrix whether the value is between two values.
In my example below, I randomly created a 10x10 matrix with random values [-1,+1]. Using the length function and subsetting where the values are in your range of [-0.8,+0.8].
library(data.table)
data <- matrix(runif(100,-1,1), nrow = 10, ncol=10)
data
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.05585901 -0.7497720 -0.8371569 -0.401079424 -0.4130752 -0.788961736 0.2909987 0.48965177 0.4076504 -0.0682856
[2,] -0.42442920 0.7476111 0.8238973 -0.912507391 -0.4450897 -0.001308901 0.5151425 -0.16838841 -0.1648151 0.8370660
[3,] -0.73295874 0.5271986 0.5822628 -0.008554908 -0.2785803 -0.499058508 -0.5661172 0.35957967 0.5807055 0.2350893
[4,] 0.18949338 0.3827603 -0.6112584 0.209209240 -0.5883962 -0.087900052 0.1272227 0.58165922 -0.9950324 -0.9118599
[5,] 0.40862973 0.9496163 0.4996253 0.079538601 0.9839763 -0.119883751 0.3667418 -0.02751815 -0.6724141 0.3217434
[6,] 0.77338548 -0.7698167 -0.5632436 0.223301216 -0.9936610 0.650110638 -0.9400395 -0.47808065 -0.1579283 -0.6896787
[7,] 0.93210326 0.5360980 0.7677325 0.815231731 -0.4320206 0.647954028 0.5180600 -0.09574138 -0.3848389 0.9726445
[8,] -0.66411834 0.1125759 -0.4021577 -0.711363103 0.7161801 -0.071971464 0.7953436 0.40326575 0.6895480 0.7496597
[9,] 0.14118154 0.4775983 0.8966069 0.852880293 0.4715885 -0.542526148 0.5200246 -0.62649677 -0.3677738 0.1961003
[10,] -0.59353193 -0.2358892 0.5769562 -0.287113142 -0.7100862 -0.107092848 -0.8101459 -0.46754146 -0.4082147 -0.4475972
length(data[between(data,-0.8,0.8)])
[1] 84
It's difficult to answer without having your dataset, please provide a minimal reproducible example later.
For the first line of code, this looks correct.
For the second, the error comes from a syntax error. In R you can assign value with = and <-. So x<-1 assign the value whereas x < -1 return a boolean.
You can then combine logical values and run the code below :
set.seed(42)
m <- matrix(runif(25, min = -1, max = 1), nrow = 5, ncol = 5)
m
length(m[ m > .8]) + length(m[ m < -.8]) # long version from what you did.
length(m[ m < -.8 | m > .8]) # | mean or. TRUE | FALSE will return TRUE.
sum(m > .8 | m < -.8)
# The sum of logical is the length, since sum(c(TRUE, FALSE)) is sum(c(0, 1))
sum(abs(m) > .8) # is the shortest version

Calculating a distance matrix by dtw

I have two matrices of normalized read counts for control and treatment in a time series day1 to day26. I want to calculate distance matrix by Dynamic Time Wrapping afterward use that for clustering but seems too complicated. I did so; who can help for more clarification please? Thanks a lot
> head(control[,1:4])
MAST2 WWC2 PHYHIPL R3HDM2
Control_D1 6.591024 5.695156 3.388652 5.756384
Control_D1 8.043454 5.365221 6.859768 6.936970
Control_D3 7.731590 4.868267 6.919972 6.931073
Control_D4 8.129948 5.105528 6.627016 7.090268
Control_D5 7.690863 4.729501 6.824746 6.904610
Control_D6 8.101723 5.334501 6.868990 7.115883
>
> head(lead[,1:4])
MAST2 WWC2 PHYHIPL R3HDM2
Lead30_D1 6.418423 5.610699 3.734425 5.778046
Lead30_D2 7.918360 4.295191 6.559294 6.780952
Lead30_D3 7.807142 4.294722 6.599187 6.716040
Lead30_D4 7.856720 4.432136 6.572337 6.848483
Lead30_D5 7.827311 4.204738 6.607107 6.784094
Lead30_D6 7.848760 4.458451 6.581216 6.943003
>
> dim(control)
[1] 26 2603
> dim(lead)
[1] 26 2603
library(dtw)
for (i in control) {
for (j in lead) {
result[i,j] <- dtw( dist(control[,,i],lead[,,j]), distance.only=T )$normalizedDistance
}
}
Says that
Error in lead[, , j] : incorrect number of dimensions
There have already been questions similar to yours,
but the answers haven't been too detailed.
Here's a breakdown of what you need to know,
in the specific case of R.
Calculating cross-distance matrices
The proxy package is made specifically for the calculation of cross-distance matrices.
You should check its vignette to know which measures are already implemented by it.
An example of its use:
set.seed(1L)
sample_data <- matrix(rnorm(50L), nrow = 5L, ncol = 10L)
suppressPackageStartupMessages(library(proxy))
distance_matrix <- proxy::dist(sample_data, method = "euclidean",
upper = TRUE, diag = TRUE)
print(distance_matrix)
#> 1 2 3 4 5
#> 1 0.000000 2.636027 3.834764 5.943374 3.704322
#> 2 2.636027 0.000000 2.587398 4.515470 2.310364
#> 3 3.834764 2.587398 0.000000 4.008678 3.899561
#> 4 5.943374 4.515470 4.008678 0.000000 5.059321
#> 5 3.704322 2.310364 3.899561 5.059321 0.000000
Note: in the context of time series,
proxy treats each row in a matrix as a series,
which can be confirmed by the fact that sample_data above is a 5x10 matrix and the resulting cross-distance matrix is 5x5.
Using the DTW distance
The dtw package implements many variations of DTW,
and it also leverages proxy.
You could calculate a DTW distance matrix with:
suppressPackageStartupMessages(library(dtw))
dtw_distmat <- proxy::dist(sample_data, method = "dtw",
upper = TRUE, diag = TRUE)
print(distance_matrix)
#> 1 2 3 4 5
#> 1 0.000000 2.636027 3.834764 5.943374 3.704322
#> 2 2.636027 0.000000 2.587398 4.515470 2.310364
#> 3 3.834764 2.587398 0.000000 4.008678 3.899561
#> 4 5.943374 4.515470 4.008678 0.000000 5.059321
#> 5 3.704322 2.310364 3.899561 5.059321 0.000000
Using custom distances
One nice thing about proxy is that it gives you the option to register custom functions.
You seem to be interested in the normalized version of DTW,
so you could do something like this:
ndtw <- function(x, y = NULL, ...) {
dtw::dtw(x, y, ..., distance.only = TRUE)$normalizedDistance
}
pr_DB$set_entry(
FUN = ndtw,
names = "ndtw",
loop = TRUE,
distance = TRUE
)
ndtw_distmat <- proxy::dist(sample_data, method = "ndtw",
upper = TRUE, diag = TRUE)
print(ndtw_distmat)
#> 1 2 3 4 5
#> 1 0.0000000 0.4046622 0.5075772 0.6789465 0.5290478
#> 2 0.4046622 0.0000000 0.3630849 0.4866252 0.3612722
#> 3 0.5075772 0.3630849 0.0000000 0.5678698 0.3303344
#> 4 0.6789465 0.4866252 0.5678698 0.0000000 0.5078112
#> 5 0.5290478 0.3612722 0.3303344 0.5078112 0.0000000
See the documentation of pr_DB for more information.
Other DTW implementations
The dtwclust package
(which I made)
implements a basic but faster version of DTW which can use multi-threading and also leverages proxy:
suppressPackageStartupMessages(library(dtwclust))
dtw_basic_distmat <- proxy::dist(sample_data, method = "dtw_basic", normalize = TRUE)
print(dtw_basic_distmat)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.0000000 0.4046622 0.5075772 0.6789465 0.5290478
#> [2,] 0.4046622 0.0000000 0.3630849 0.4866252 0.3612722
#> [3,] 0.5075772 0.3630849 0.0000000 0.5678698 0.3303344
#> [4,] 0.6789465 0.4866252 0.5678698 0.0000000 0.5078112
#> [5,] 0.5290478 0.3612722 0.3303344 0.5078112 0.0000000
The dtw_basic implementation only supports two step patterns and one window type,
but it is considerably faster:
suppressPackageStartupMessages(library(microbenchmark))
microbenchmark(
proxy::dist(sample_data, method = "dtw", window.type = "sakoechiba", window.size = 5L),
proxy::dist(sample_data, method = "dtw_basic", window.size = 5L)
)
Unit: microseconds
expr min lq mean
proxy::dist(sample_data, method = "dtw", window.type = "sakoechiba", window.size = 5L) 5279.124 5621.742 6070.069
proxy::dist(sample_data, method = "dtw_basic", window.size = 5L) 657.966 710.418 776.474
median uq max neval cld
5802.354 6348.199 10411.000 100 b
752.282 814.037 1161.626 100 a
Another multi-threaded implementation is included in the parallelDist package,
although I haven't personally tested it.
Multivariate or multi-dimensional time series
A single multivariate series is commonly a matrix where time spans the rows and the multiple variables span the columns.
DTW also works for them:
mv_series1 <- matrix(rnorm(15L), nrow = 5L, ncol = 3L)
mv_series2 <- matrix(rnorm(15L), nrow = 5L, ncol = 3L)
print(dtw_distance <- dtw_basic(mv_series1, mv_series2))
#> [1] 22.80421
The nice thing about proxy is that it can calculate distances between objects contained in lists too,
so you can put several multivariate series in lists of matrices:
mv_series <- lapply(1L:5L, function(dummy) {
matrix(rnorm(15L), nrow = 5L, ncol = 3L)
})
mv_distmat_dtwclust <- proxy::dist(mv_series, method = "dtw_basic")
print(mv_distmat_dtwclust)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.00000 27.43599 32.14207 36.42211 31.19279
#> [2,] 27.43599 0.00000 20.88470 23.88436 29.73219
#> [3,] 32.14207 20.88470 0.00000 22.14376 29.99899
#> [4,] 36.42211 23.88436 22.14376 0.00000 28.81111
#> [5,] 31.19279 29.73219 29.99899 28.81111 0.00000
Your case
Regardless of what you choose,
you can probably use proxy to get your result,
but since you haven't provided your whole data,
I can't give you a more specific example.
I presume that dtwclust::dtw_basic(control[, 1:4], lead[, 1:4], normalize = TRUE) would give you the distance between one pair of series,
assuming you're treating each one as a multivariate series with 4 variables.
If your question is "why am I getting this error?" the answer is that you're trying to subset a matrix, which is a two dimensional array, according to a 3rd dimension.
see:
dim(lead)
# [1] 26 2603
lead[,,6.418423] # yes, that's the value j has the first time through the loop
# This will reproduce your error
lead[,,1]
# This will also reproduce your error
Hopefully you can see now that you have a few problems:
You're trying to subset a matrix according to a 3rd dimension
Your i and j values are the values in control and lead respectively. You can use them as their values, or you can generate the index, e.g., for(i in seq_along(control) if you're planning to use it for something other than getting that same value out.
Taking it to the next step, it's unclear what you want to pass to the dist function. dist takes a single matrix and computes the distance between its rows. You seem to be trying to pass it two values from two different matrices, or perhaps two subsets of two different matrices. It looks like you might need to go back and look at the examples in the documentation for xtr

Iterating over a matrix and a list of times to plug into nls function in R

I have spent a fair amount of time searching for an answer to my novice question and am still confused. I am trying to plot initial magnetization of an FID versus time. My initial magnetizations are in a matrix and my time values corresponding to each column of the matrix is a list. How do I run the nls for a exponential decay over each column of data with the corresponding value in the list of times? I am trying to have the nls function input the first time value from the list and run use the initial magnetization values columnwise and return the rates in a matrix of the same dimensions as m0_matrix.
> m0_matrix
[,1] [,2] [,3] [,4]
[1,] 19439311560 15064186946 11602185622 9009147617
[2,] 9437620734 7135488585 5348160563 4156154903
[3,] 11931439242 9584153017 7765094983 6470870180
[4,] 9367920785 7612552829 5927424214 4331819248
[5,] 12077347835 8892705185 6866664357 5530601653
[6,] 20191716524 15729555553 11920147205 8964406945
[7,] 20177137879 15744074858 12364404080 9971845743
[8,] 15990100401 12464163359 9724743390 8294038306
[9,] 19409862926 16085027074 13110425604 10330007806
[10,] 15367044986 11994945813 9565243969 7535061239
r2_from_decay_matrix = matrix(data = NA, nrow = nrow(m0_matrix), ncol =
ncol(m0_matrix))
t <- c(0.1, 0.2, 0.3, 0.4)
for (i in seq(1,nrow(m0_matrix))) {
m0 <- m0_matrix[,i]
t <- t[i]
r <- 1
mCPMG_function <- function(m0, t)
results <- paste(a = m0, b = t)
mCPMG_formula <- mCPMG ~ m0*exp(-r*t)
fit_start <- c(m0= 19439311560, r=1)
fit_data <- list(m0=m0, t=t)
r2 <- nls(mCPMG_formula, fit_data, fit_start)
r2_from_decay_matrix <- r2$m$getPars()["r"][i]
}
Thank you for helping!

R: How to print names of calculated list element in a function?

I have two lists of more than 1500 elements, one vector list and one matrix list. Here some example data:
Z_matr <- list("111.2012"= matrix(c(0,0,100,200,0,0,0,0,50,350,0,50,50,200,200,0),
nrow = 4, ncol = 4, byrow = T),
"112.2012"= matrix(c(10,90,0,30,10,90,0,10,200,50,10,350,150,100,200,10),
nrow = 4, ncol = 4, byrow = T))
p <- list("111.2012"=c(200, 1000, 100, 10), "112.2012"=c(300, 900, 50, 100))
On this two list I want to perform the following function, which of course works fine on this data:
kast <- function(Z_matr, p) {
imp <- rowSums(Z_matr)
exp <- colSums(Z_matr)
x = p + imp
ac = p + imp - exp
einsdurchx = 1/as.vector(x)
einsdurchx[is.infinite(einsdurchx)] <- 0
A = Z_matr %*% diag(einsdurchx)
return(A)
}
mapply(kast, Z_matr,p, SIMPLIFY=FALSE)
However, I with my original lists I get an error. What I need is a counting of the list names that already had been calculated before it comes to the error list element (so that I know which of the list combinations creates an error). So, I tryed print(names(A)) however I only get NULL, NULL... How can I get instead this, for this example 111.2012 and 112.2012 with print?
Set it up so you pass the names and use to index the object:
kast <- function(item, p) { print(item)
imp <- rowSums(Z_matr[[item]])
exp <- colSums(Z_matr[[item]])
x = p + imp
ac = p + imp - exp
einsdurchx = 1/as.vector(x)
einsdurchx[is.infinite(einsdurchx)] <- 0
A = Z_matr[[item]] %*% diag(einsdurchx)
return(A)
}
mapply(kast, names(Z_matr),p, SIMPLIFY=FALSE)
The output... obviously you take out the print statement:
[1] "111.2012"
[1] "112.2012"
$`111.2012`
[,1] [,2] [,3] [,4]
[1,] 0.0 0.00 0.1818182 0.4347826
[2,] 0.0 0.00 0.0000000 0.0000000
[3,] 0.1 0.35 0.0000000 0.1086957
[4,] 0.1 0.20 0.3636364 0.0000000
$`112.2012`
[,1] [,2] [,3] [,4]
[1,] 0.02325581 0.08910891 0.00000000 0.05357143
[2,] 0.02325581 0.08910891 0.00000000 0.01785714
[3,] 0.46511628 0.04950495 0.01515152 0.62500000
[4,] 0.34883721 0.09900990 0.30303030 0.01785714
This is a longstanding issue with the use of both s/lapply and mapply. Only the values and not the names of list items are passed to functions. They are only added back after the processing. You can see this if you attempt to print(deparse(substitute(Z_matr))) as the first call inside your example function.

Storing multiple output vectors without list

I have a matrix of N*200 values
For each row I am calculating the 5 acf values using
for(i in 1:N){
xx[[i]] <- acf(x[i,], plot=F)$acf[1:5]
}
I was wondering is there an alternative for xx[i] other than using a list? i.e.
is it possible to have a matrix of N*5 containing each of the acf values?
I know I can get the list and then unlit this but is there a quicker way?
Use apply for cleaner code:
iN = 1000
mX = matrix(rnorm(iN*200), iN, 200)
mACF = t(apply(mX, MARGIN = 1,
FUN = function(vX) acf(vX, plot = FALSE, lag.max = 4)$acf))
Output:
> head(mACF)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 -0.01301076 -0.02077288 -0.09442797 -0.010610654
[2,] 1 -0.03060448 -0.06019641 -0.04674656 -0.086555364
[3,] 1 0.09513999 -0.05021542 -0.02757927 -0.002984605
[4,] 1 -0.08135746 0.11003419 -0.06550000 0.033755892
[5,] 1 0.09014033 0.09981602 0.11100782 0.057275603
[6,] 1 -0.08462636 -0.10192390 0.05601853 -0.019114467

Resources