I would like to generate numbers from a triangular distribution with three parameters: a, b, c where c in my case is defined as (a+b)/2.
Let's say I have a vector x:
x <- c(1,-1,2,-2,3,-3,4,-4,5,-5,11,-11,12,-12,13,-13)
And I want to generate as many new values as there are negative numbers in vector x. So further I can replace negative values with numbers generated from triangular distribution.
library(triangle)
c = abs(x[x<0])/2
sample <- rtriangle(length(a[which(a<0)]), 0, abs(x[x<0]),c)
Obviously this does not work, as I get a warning message:
Warning messages:
1: In if (a > c | b < c) return(rep(NaN, times = n)) :
the condition has length > 1 and only the first element will be used
2: In if (a != c) { :
the condition has length > 1 and only the first element will be used
3: In p[i] * (b - a) :
longer object length is not a multiple of shorter object length
4: In p[i] <- a + sqrt(p[i] * (b - a) * (c - a)) :
number of items to replace is not a multiple of replacement length
5: In (1 - p[j]) * (b - a) :
longer object length is not a multiple of shorter object length
6: In p[j] <- b - sqrt((1 - p[j]) * (b - a) * (b - c)) :
number of items to replace is not a multiple of replacement length
Since rtriangle does not take vectors as input, you could create a vector evaluating every element of a vector using sapply like this:
x <- c(1,-1,2,-2,3,-3,4,-4,5,-5,11,-11,12,-12,13,-13)
library("triangle")
sample = sapply(abs(x[x<0]), function(x){ rtriangle(1,0,x,x/2) })
> sample
[1] 0.6514940 0.6366981 1.8598445 0.9866790 1.7517438 2.9444719 4.1537113 2.2315813
You will get one random sample for 8 different triangular distributions.
Related
Problem: how can I write a function that receives a and b as inputs and returns all integers inbetween them. So, assuming we have a function called integers_inbetween that behaves like this, we should expect the following examples:
# Returns an array of integers in between a and b
integers_inbetween(1, 4)
[1] 2 3
and
# Returns an array of integers in between a and b
integers_inbetween(4, 1)
[1] 2 3
And
# Returns NULL if there are no integers inbetween a and b
integers_inbetween(3.5, 4)
[1] NULL
How can one implement that logic in R?
This solution should work. I'm assuming the function should work if a > b and also if not. The way I wrote it, if after rounded a == b, the function returns NULL.
inbetween_integers <- function(a, b) {
a <- round(a)
b <- round(b)
if (abs(a - b) <= 1)
return(NULL)
if (b < a)
return(seq.int(from = b + 1, length.out = abs(a - b) - 1))
return(seq.int(from = a + 1, length.out = abs(a - b) - 1))
}
You can try the code below
inbetween_integers <- function(a, b) {
u <- sort(c(a, b))
res <- setdiff(ceiling(u[1]):floor(u[2]), c(a, b))
if (!length(res)) {
NULL
} else {
res
}
}
and you will see
> inbetween_integers(1, 4)
[1] 2 3
> inbetween_integers(4, 1)
[1] 2 3
> inbetween_integers(3.5, 4)
NULL
This works regardless of the order of arguments.
First this function sorts the arguments, then determines the minimum and maximum values in the sequence (exclusive of integer boundaries), then returns the sequence as requested.
integers_in_between<-function(x,y){
values<-sort(c(x,y))
minimum<-ifelse(ceiling(values[1])==values[1], ceiling(values[1])+1, ceiling(values[1]))
maximum<-ifelse(floor(values[2])==values[2], floor(values[2])-1, floor(values[2]))
if(maximum-minimum<0){
NULL
}else{
minimum:maximum
}
}
I am tring to figure out how to the dot product.
b = matrix(1:70, ncol=7)
g= matrix(1:48, ncol=6)
resulta = matrix(0,6,7)
for (c in 1:ncol(b)){
for (i in 1:ncol(g)){
resulta[i,c] <- sum((g[,i]) * (b[,c]))
}
}
Warning messages:
1: In (g[, i]) * (b[, c]) :
longer object length is not a multiple of shorter object length
2: In (g[, i]) * (b[, c]) :
longer object length is not a multiple of shorter object length
...........................Total 42 alike messages
Whenever you multiply matrices, you have to make sure that dimensions are such that #columns of first matrix is same as #rows of second i.e. if first matrix is a x b, second matrix has to be b x c (c and a may or may not be equal) so that the resultant matrix is a x c.
In your case, matrix b is 70 x 7 meaning matrix g should be a 7 x something matrix. In other words, matrix g should have exactly 7 rows.
Once you have fixed the dimensions, try this for quick matrix multiplication:
resulta <- b %*% g
resulta
I am writing a function to perform bit inversion for each row of a binary matrix which depends on a predefined n value. The n value will determine the number of 1 bits for each row of the matrix.
set.seed(123)
## generate a random 5 by 10 binary matrix
init <- t(replicate(5, {i <- sample(3:6, 1); sample(c(rep(1, i), rep(0, 10 - i)))}))
n <- 3
## init_1 is a used to explain my problem (single row matrix)
init_1 <- t(replicate(1, {i <- sample(3:6, 1); sample(c(rep(1, i), rep(0, 10 - i)))}))
The bit_inversion function does this few things:
If the selected row has number of 1's lesser than n, then it randomly select a few indices (difference) and invert them. (0 to 1)
Else if the selected row has number of 1's greater than n, then it randomly select a few indices (difference) and invert them. (1 to 0)
Else do nothing (when the row has number of 1's equals to n.)
Below is the function I implemented:
bit_inversion<- function(pop){
for(i in 1:nrow(pop)){
difference <- abs(sum(pop[i,]) - n)
## checking condition where there are more bits being turned on than n
if(sum(pop[i,]) > n){
## determine position of 1's
bit_position_1 <- sample(which(pop[i,]==1), difference)
## bit inversion
for(j in 1:length(bit_position_1)){
pop[bit_position_1[j]] <- abs(pop[i,][bit_position_1[j]] - 1)
}
}
else if (sum(pop[i,]) < n){
## determine position of 0's
bit_position_0 <- sample(which(pop[i,]==0), difference)
## bit inversion
for(j in 1:length(bit_position_0)){
pop[bit_position_0[j]] <- abs(pop[bit_position_0[j]] - 1)
}
}
}
return(pop)
}
Outcome:
call <- bit_inversion(init)
> rowSums(call) ## suppose to be all 3
[1] 3 4 5 4 3
But when using init_1 (a single row matrix), the function seems to work fine.
Outcome:
call_1 <- bit_inversion(init_1)
> rowSums(call)
[1] 3
Is there a mistake in my for and if...else loop?
Change the line in 'j' for loop
pop[bit_position_1[j]] <- abs(pop[i,][bit_position_1[j]] - 1)
into
pop[i,bit_position_1[j]] <- abs(pop[i,][bit_position_1[j]] - 1)
You forgot the row index.
And, here is a more compact version of your for loop:
for(i in 1:nrow(pop)){
difference <- abs(sum(pop[i,]) - n)
logi <- sum(pop[i,]) > n
pop[i,sample(which(pop[i,]==logi), difference)] <- !logi
}
I have an integer vector vec1 and I am generating a distant matrix using dist function. I want to get the coordinates (row and column) of element of certain value in the distance matrix. Essentially I would like to get the pair of elements that are d-distant apart. For example:
vec1 <- c(2,3,6,12,17)
distMatrix <- dist(vec1)
# 1 2 3 4
#2 1
#3 4 3
#4 10 9 6
#5 15 14 11 5
Say, I am interested in pair of elements in the vector that are 5 unit apart. I wanted to get the coordinate1 which are the rows and coordinate2 which are the columns of the distance matrix. In this toy example, I would expect
coord1
# [1] 5
coord2
# [1] 4
I am wondering if there is an efficient way to get these values that doesn't involve converting the dist object to a matrix or looping through the matrix?
A distance matrix is a lower triangular matrix in packed format, where the lower triangular is stored as a 1D vector by column. You can check this via
str(distMatrix)
# Class 'dist' atomic [1:10] 1 4 10 15 3 9 14 6 11 5
# ...
Even if we call dist(vec1, diag = TRUE, upper = TRUE), the vector is still the same; only the printing styles changes. That is, no matter how you call dist, you always get a vector.
This answer focus on how to transform between 1D and 2D index, so that you can work with a "dist" object without first making it a complete matrix using as.matrix. If you do want to make it a matrix, use the dist2mat function defined in as.matrix on a distance object is extremely slow; how to make it faster?.
R functions
It is easy to write vectorized R functions for those index transforms. We only need some care dealing with "out-of-bound" index, for which NA should be returned.
## 2D index to 1D index
f <- function (i, j, dist_obj) {
if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object")
n <- attr(dist_obj, "Size")
valid <- (i >= 1) & (j >= 1) & (i > j) & (i <= n) & (j <= n)
k <- (2 * n - j) * (j - 1) / 2 + (i - j)
k[!valid] <- NA_real_
k
}
## 1D index to 2D index
finv <- function (k, dist_obj) {
if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object")
n <- attr(dist_obj, "Size")
valid <- (k >= 1) & (k <= n * (n - 1) / 2)
k_valid <- k[valid]
j <- rep.int(NA_real_, length(k))
j[valid] <- floor(((2 * n + 1) - sqrt((2 * n - 1) ^ 2 - 8 * (k_valid - 1))) / 2)
i <- j + k - (2 * n - j) * (j - 1) / 2
cbind(i, j)
}
These functions are extremely cheap in memory usage, as they work with index instead of matrices.
Applying finv to your question
You can use
vec1 <- c(2,3,6,12,17)
distMatrix <- dist(vec1)
finv(which(distMatrix == 5), distMatrix)
# i j
#[1,] 5 4
Generally speaking, a distance matrix contains floating point numbers. It is risky to use == to judge whether two floating point numbers are equal. Read Why are these numbers not equal? for more and possible strategies.
Alternative with dist2mat
Using the dist2mat function given in as.matrix on a distance object is extremely slow; how to make it faster?, we may use which(, arr.ind = TRUE).
library(Rcpp)
sourceCpp("dist2mat.cpp")
mat <- dist2mat(distMatrix, 128)
which(mat == 5, arr.ind = TRUE)
# row col
#5 5 4
#4 4 5
Appendix: Markdown (needs MathJax support) for the picture
## 2D index to 1D index
The lower triangular looks like this: $$\begin{pmatrix} 0 & 0 & \cdots & 0\\ \times & 0 & \cdots & 0\\ \times & \times & \cdots & 0\\ \vdots & \vdots & \ddots & 0\\ \times & \times & \cdots & 0\end{pmatrix}$$ If the matrix is $n \times n$, then there are $(n - 1)$ elements ("$\times$") in the 1st column, and $(n - j)$ elements in the j<sup>th</sup> column. Thus, for element $(i,\ j)$ (with $i > j$, $j < n$) in the lower triangular, there are $$(n - 1) + \cdots (n - (j - 1)) = \frac{(2n - j)(j - 1)}{2}$$ "$\times$" in the previous $(j - 1)$ columns, and it is the $(i - j)$<sup>th</sup> "$\times$" in the $j$<sup>th</sup> column. So it is the $$\left\{\frac{(2n - j)(j - 1)}{2} + (i - j)\right\}^{\textit{th}}$$ "$\times$" in the lower triangular.
----
## 1D index to 2D index
Now for the $k$<sup>th</sup> "$\times$" in the lower triangular, how can we find its matrix index $(i,\ j)$? We take two steps: 1> find $j$; 2> obtain $i$ from $k$ and $j$.
The first "$\times$" of the $j$<sup>th</sup> column, i.e., $(j + 1,\ j)$, is the $\left\{\frac{(2n - j)(j - 1)}{2} + 1\right\}^{\textit{th}}$ "$\times$" of the lower triangular, thus $j$ is the maximum value such that $\frac{(2n - j)(j - 1)}{2} + 1 \leq k$. This is equivalent to finding the max $j$ so that $$j^2 - (2n + 1)j + 2(k + n - 1) \geq 0.$$ The LHS is a quadratic polynomial, and it is easy to see that the solution is the integer no larger than its first root (i.e., the root on the left side): $$j = \left\lfloor\frac{(2n + 1) - \sqrt{(2n-1)^2 - 8(k-1)}}{2}\right\rfloor.$$ Then $i$ can be obtained from $$i = j + k - \left\{\frac{(2n - j)(j - 1)}{2}\right\}.$$
If the vector is not too large, the best way is probably to wrap the output of dist into as.matrix and to use which with the option arr.ind=TRUE. The only disadvantage of this standard method to retrieve the index numbers within a dist matrix is an increase of memory usage, which may become important in the case of very large vectors passed to dist. This is because the conversion of the lower triangular matrix returned by dist into a regular, dense matrix effectively doubles the amount of stored data.
An alternative consists in converting the dist object into a list, such that each column in the lower triangular matrix of dist represents one member of the list. The index number of the list members and the position of the elements within the list members can then be mapped to the column and row number of the dense N x N matrix, without generating the matrix.
Here is one possible implementation of this list-based approach:
distToList <- function(x) {
idx <- sum(seq(length(x) - 1)) - rev(cumsum(seq(length(x) - 1))) + 1
listDist <- unname(split(dist(x), cumsum(seq_along(dist(x)) %in% idx)))
# http://stackoverflow.com/a/16358095/4770166
}
findDistPairs <- function(vec, theDist) {
listDist <- distToList(vec)
inList <- lapply(listDist, is.element, theDist)
matchedCols <- which(sapply(inList, sum) > 0)
if (length(matchedCols) > 0) found <- TRUE else found <- FALSE
if (found) {
matchedRows <- sapply(matchedCols, function(x) which(inList[[x]]) + x )
} else {matchedRows <- integer(length = 0)}
matches <- cbind(col=rep(matchedCols, sapply(matchedRows,length)),
row=unlist(matchedRows))
return(matches)
}
vec1 <- c(2, 3, 6, 12, 17)
findDistPairs(vec1, 5)
# col row
#[1,] 4 5
The parts of the code that might be somewhat unclear concern the mapping of the position of an entry within the list to a column / row value of the N x N matrix. While not trivial, these transformations are straightforward.
In a comment within the code I have pointed out an answer on StackOverflow which has been used here to split a vector into a list. The loops (sapply, lapply) should be unproblematic in terms of performance since their range is of order O(N). The memory usage of this code is largely determined by the storage of the list. This amount of memory should be similar to that of the dist object since both objects contain the same data.
The dist object is calculated and transformed into a list in the function distToList(). Because of the dist calculation, which is required in any case, this function could be time-consuming in the case of large vectors. If the goal is to find several pairs with different distance values, then it may be better to calculate listDist only once for a given vector and to store the resulting list, e.g., in the global environment.
Long story short
The usual way to treat such problems is simple and fast:
distMatrix <- as.matrix(dist(vec1)) * lower.tri(diag(vec1))
which(distMatrix == 5, arr.ind = TRUE)
# row col
#5 5 4
I suggest using this method by default. More complicated solutions may become necessary in situations where memory limits are reached, i.e., in the case of very large vectors vec1. The list-based approach described above could then provide a remedy.
Let me describe the problem setting. The function I am depicting is a probability function and upon integration it's value would have to be equal to 1. So I will be dividing the result of the integration by 1 to give the value of C. So I can't assign value to C.
Have a look at the below code and error message -
> f <- function(x) (C*x*(exp(-x)))
> z=integrate(f, lower = 0, upper=Inf)
Error in C * x : non-numeric argument to binary operator
How am I supposed to define C here ?
Second Question- Can somebody figure what's wrong with value of z?
> f <- function(x) (x*(exp(-x)))
> z=integrate(f, lower = 0, upper=Inf)
> z
1 with absolute error < 6.4e-06
> 1/z
Error in 1/z : non-numeric argument to binary operator
Make C = 1 for when you compute the integral of the function. For that, you can make it an optional argument to your function with a default value:
f <- function(x, C = 1) C * x * exp(-x)
Then, compute:
z <- integrate(f, lower = 0, upper = Inf)
For the integral to be 1 with the real value for C, you need C * z == 1, i.e.:
C <- 1 / z$value
C
# [1] 1
As it turns out, the integral z is already equal to 1 so picking C = 1 was a lucky choice. You have nothing to do and you can just start using f as-is. Had it not been the case, I would have suggested to redefine f:
f_final <- function(x) f(x, C = 1 / z$value)
(Regarding your second question, you just had to look at the documentation for ?integrate and refer to the "Value" section.)