say I have the matrix d, which is the result of two different realizations (rows) of a sampling procedure in two dimensions (columns). I want to develop a function that creates the fully-antithetic draws from this original matrix.
c1 <- c(0.1, 0.6);c2 <- c(0.3, 0.8);d <- rbind(c1,c2)
# [,1] [,2]
# c1 0.1 0.6
# c2 0.3 0.8
That is to say, for example, for the first realization (c(0.1, 0.6)) I want to obtain the mirror images of this random draw in two dimensions, which generated 4 (2^2) possible combinations as follows:
d1_anthi = matrix(
c( d[1,1] , d[1,2],
1 - d[1,1], d[1,2],
d[1,1] , 1 - d[1,2],
1 - d[1,1], 1 - d[1,2]), nrow=2,ncol=4)
t(d1_anthi)
# [,1] [,2]
# [1,] 0.1 0.6
# [2,] 0.9 0.6
# [3,] 0.1 0.4
# [4,] 0.9 0.4
Analogously, for the second, realization the results is the following:
d2_anthi = matrix(
c( d[2,1] , d[2,2],
1 - d[2,1], d[2,2],
d[2,1] , 1 - d[2,2],
1 - d[2,1], 1 - d[2,2]), nrow=2, ncol=4)
t(d2_anthi)
# [,1] [,2]
# [1,] 0.3 0.8
# [2,] 0.7 0.8
# [3,] 0.3 0.2
# [4,] 0.7 0.2
Accordingly, my desired object will lock is like this:
anthi_draws <- rbind(t(d1_anthi),t(d2_anthi))
# [,1] [,2]
# [1,] 0.1 0.6 <- original first realization
# [2,] 0.9 0.6
# [3,] 0.1 0.4
# [4,] 0.9 0.4
# [5,] 0.3 0.8 <- original second realization
# [6,] 0.7 0.8
# [7,] 0.3 0.2
# [8,] 0.7 0.2
Finally, I would like to create a function that, given a matrix of random numbers, is able to create this expanded matrix of antithetic draws. For example, in the picture below I have a sampling in three dimensions, then the total number of draws per original draw is 2^3 = 8.
In particular, I am having problems with the creating of the full combinatory that depends on the dimensions of the original sampling (columns of the matrix). I was planning on using expand.grid() but I couldn't create the full combinations using it. Any hints or help in order to create such a function is welcome. Thank you in advance.
You can try this
do.call(
rbind,
apply(
d,
1,
function(x) {
expand.grid(data.frame(rbind(x, 1 - x)))
}
)
)
which gives
X1 X2
c1.1 0.1 0.6
c1.2 0.9 0.6
c1.3 0.1 0.4
c1.4 0.9 0.4
c2.1 0.3 0.8
c2.2 0.7 0.8
c2.3 0.3 0.2
c2.4 0.7 0.2
Related
I have matrix dsts with 3 columns; third is a factor. I want my linear plot to be colored by the factor but this command is not working:
plot(dsts[ ,'x'],dsts[,'dist'],col=dsts[,'i'],type='l')
and,
plot(dsts[ ,'x'],dsts[,'dist'],col=dsts[,'i'],type='n')
lines(dsts[ ,'x'],dsts[,'dist'],col=dsts[,'i'])
is not working either!!!
I want to avoid using matplot which accepts matrices.
The col option, though able to take vector input, only effectively controls point colour instead of line colour, so type = "p" works but not pch = "l". For pch = "b", only points will have correct colours.
If you want to have several lines with different colours, you have to plot them with separate plot or lines calls. A better way to go is to reshape your data, then use matplot. It takes a matrix, and plot its columns one by one via a for loop.
Since you've already got a function to reshape data, you have the right way to go.
The reason that plot and lines depreciate vector values in col for line display, is that they have no idea of whether this vector has a reasonable, non-random pattern. They will do something safe, by using only col[1]. I will elaborate on this by two steps.
Firstly, consider this example to see that plot will always use col[1] when type = "l":
set.seed(0); mat1 <- round(cbind(rnorm(9),rnorm(9),rep(1:3, each = 3)), 1)
# [,1] [,2] [,3]
# [1,] 1.3 2.4 1
# [2,] -0.3 0.8 1
# [3,] 1.3 -0.8 1
# [4,] 1.3 -1.1 2
# [5,] 0.4 -0.3 2
# [6,] -1.5 -0.3 2
# [7,] -0.9 -0.4 3
# [8,] -0.3 0.3 3
# [9,] 0.0 -0.9 3
Then we reorder the rows of mat1:
mat2 <- mat1[c(4:9,1:3), ]
# [,1] [,2] [,3]
# [1,] 1.3 -1.1 2
# [2,] 0.4 -0.3 2
# [3,] -1.5 -0.3 2
# [4,] -0.9 -0.4 3
# [5,] -0.3 0.3 3
# [6,] 0.0 -0.9 3
# [7,] 1.3 2.4 1
# [8,] -0.3 0.8 1
# [9,] 1.3 -0.8 1
We use the 3rd column for col, now compare:
par(mfrow = c(1,2))
plot(mat1[,1], mat1[,2], col = mat1[,3], type = "l")
plot(mat2[,1], mat2[,2], col = mat2[,3], type = "l")
mat1[, 3] starts with 1, so the line colour is black; mat2[,3] starts with 2, so the line colour is red.
Now it is time to say why plot and lines depreciate vector col when type = "l". Consider a random row shuffle of mat1:
set.seed(0); mat3 <- mat1[sample(9), ]
# [,1] [,2] [,3]
# [1,] 0.0 -0.9 3
# [2,] 1.3 -0.8 1
# [3,] -0.3 0.3 3
# [4,] 1.3 -1.1 2
# [5,] 0.4 -0.3 2
# [6,] 1.3 2.4 1
# [7,] -0.9 -0.4 3
# [8,] -0.3 0.8 1
# [9,] -1.5 -0.3 2
plot(..., type = "l") will line up points one by one. Be aware that a line of a single colour can only be drawn, if data points on this path have the same colour specification. Now, the 3rd column is completely random: there is no way to line points up with such colour specification.
The best & safest assumption plot and lines can take is that col vector is completely random. Thus, it will only retain col[1] to produce a single colour plot. The full vector will only be used, when type = "p".
Note, the same logic applies to lwd and lty, too. Any argument associated with line display will take only the first vector element. As I said earlier, if you do want to draw several different lines in different styles, do them one by one.
On top of #Zheyuan Li valuable insight on the poblem at hand I wrote a simple function to overcome the problem:
plot_line_color <- function(x,y,fact,lwd=2,...)
{
plot(x,y,type='n')
xy <- cbind(x,y)
invisible(
lapply(1:length(unique(fact)), function(j) {
xy2 <- subset(xy,fact==j)
lines(xy2[ ,1],xy2[,2],col=j,lwd=lwd,...)
})
)
}
A simple simulation:
k <- 1:5
x <- seq(0,10,length.out = 100)
dsts <- lapply(1:length(k), function(i) cbind(x=x, distri=dchisq(x,k[i]),fact=i) )
dsts <- do.call(rbind,dsts)
plot_line_color(x=dsts[,1],y=dsts[,2],fact=dsts[,3])
I have two data frames:
DF1
e l u
1 0.5 1.5
2 1 3
3 2 4
DF2
e l u
0.1 0.01 0.15
0.2 0.1 0.3
0.3 0.2 0.4
I want to combine these two data frames into single list like so:
L
[[1]]
$e: [(1 0.1);(2 0.2);(3 0.3)] #numeric
$l: [(0.5 0.01);(1 0.1);(2 0.2)] #numeric
$u: [(1.5 0.015);(3 0.3);(4 0.4)] #numeric
I have tried to rbind two data frames and then split by same column, also i was advised to use Map but it results in multiple lists not a single one or all variables become factors.
Thank you for any suggestions.
It looks like you want a list of arrays.
> mapply(cbind, DF1, DF2, SIMPLIFY=FALSE)
$e
[,1] [,2]
[1,] 1 0.1
[2,] 2 0.2
[3,] 3 0.3
$l
[,1] [,2]
[1,] 0.5 0.01
[2,] 1.0 0.10
[3,] 2.0 0.20
$u
[,1] [,2]
[1,] 1.5 0.15
[2,] 3.0 0.30
[3,] 4.0 0.40
Do you know a more efficient way to generate a matrix holding all unique combinations of "weights" (let weights be w and 0 <= w <= 1, and values of w are separated by steps of 0.1), such that the weights sum to one AND the first is the highest, the last the lowest weight.
Here is code that does the job, but it seems inefficient to delete rows:
# generate combinations of weights such that w1 >= w2 >= w3 ...
w = seq(0, 1, 0.1) #weights 0, 0.1, ..., 0.9, 1
w = expand.grid(w, w, w, KEEP.OUT.ATTRS = FALSE) #all combinations of 3 weights
w = w[rowSums(w) == 1, ] #make sure the weights sum to one
w = w[!(w[, 1] < w[, 2] | w[, 2] < w[, 3]),] #make sure w1 >= w2 >= w3 ...
w
# Var1 Var2 Var3
# 11 1.0 0.0 0.0
# 21 0.9 0.1 0.0
# 31 0.8 0.2 0.0
# 41 0.7 0.3 0.0
# 51 0.6 0.4 0.0
# 61 0.5 0.5 0.0
# 141 0.8 0.1 0.1
# 151 0.7 0.2 0.1
# 171 0.5 0.4 0.1
# 271 0.6 0.2 0.2
# 281 0.5 0.3 0.2
# 291 0.4 0.4 0.2
# 401 0.4 0.3 0.3
Let me add some more general info:
In this problem (3 weights in the above order) the upper limits for the first, second, third values are as follows:
the first number can minimally be 1 for the combination (1, 0, 0)
the second number can maximally be 1/2 for the combination (1/2, 1/2, 0)
the third number can maximally be 1/3 for the combination (1/3, 1/3, 1/3)
A non-base possibility:
library(partitions)
step <- 0.1
n_weights <- 3
t(restrictedparts(n = 1/step, m = n_weights) * step)
# [1,] 1.0 0.0 0.0
# [2,] 0.9 0.1 0.0
# [3,] 0.8 0.2 0.0
# [4,] 0.7 0.3 0.0
# [5,] 0.6 0.4 0.0
# [6,] 0.5 0.5 0.0
# [7,] 0.8 0.1 0.1
# [8,] 0.7 0.2 0.1
# [9,] 0.6 0.3 0.1
# [10,] 0.5 0.4 0.1
# [11,] 0.6 0.2 0.2
# [12,] 0.5 0.3 0.2
# [13,] 0.4 0.4 0.2
# [14,] 0.4 0.3 0.3
General purpose function with standard packages:
# Generate weights matrix with noWeights columns and noRows rows.
# Each row of this matrix contains sorted decremental weights summing up to 1.0.
generateWeights = function(noWeights,
noRows,
distribution = runif,
rounding = function(x){ round(x, 1) })
{
generator = function()
{
x = distribution (noWeights);
x = x/sum(x);
sort(rounding(x), decreasing = T)
}
t(replicate(noRows, generator()))
}
# example of use
generateWeights(3, 10)
I'd like to use quadplot to graph data points that are color coded based on a 5th variable. As an example:
a <- c(.2,.4,.6,.4,.2,.4,.2,.5)
b <- c(.2,.3,.1,.3,.3,.3,.4,.2)
c <- c(.3,.1,.2,.1,.1,.1,.1,.1)
d <- c(.2,.2,.1,.2,.4,.2,.3,.2)
e <- c(-10,20,-100,90,10,-30,-12)
f <- data.matrix(data.frame(a,b,c,d,e))
a b c d e
[1,] 0.2 0.2 0.3 0.2 -10
[2,] 0.4 0.3 0.1 0.2 20
[3,] 0.6 0.1 0.2 0.1 -100
[4,] 0.4 0.3 0.1 0.2 90
[5,] 0.2 0.3 0.1 0.4 -10
[6,] 0.4 0.3 0.1 0.2 20
[7,] 0.2 0.4 0.1 0.3 -100
[8,] 0.5 0.2 0.1 0.2 90
I want to plot a, b, c, and d, and have a color gradient for each data point, based on the value in column "e". Any ideas? If there is another package that can do what I need, then that works too. Thanks for any input, in advance.
Is this what you want ?
# Create custom color palette (blue to red gradient)
grad <- colorRampPalette(c("blue","red"))
# Get a vector of colors and deal with negative values in column "e"
colors <- grad(length(min(f[,5]):max(f[,5])))
index <- f[,5] + abs(min(f[,5]))
# Plot using quadplot
quadplot(f[,-5], labelcol=1, labelpch=1:4, col=colors[index], pch=19)
# Add color legend
library(shape)
colorlegend(col=colors, zlim=range(f[,5]), zval=sort(unique(f[,5])), posx = c(0.86, 0.89), posy=c(0.2,0.9))
I am having some problem with understanding the prob in sample. For example I want to create a sample data set of size 100 with integers 1,2,3 & 4. I am using a probability of 0.1,0.2,0.3 & 0.4 respectively.
sample1<-sample(1:4,100,replace=T,prob=seq(0.1,0.4,0.1))
So, now I am expecting a sample with integers of 1,2,3 & 4 repeating 10,20,30 & 40 times respectively. But the result is different
> table(sample1)
sample1
1 2 3 4
7 24 33 36
Can anyone explain this? And what should I do if I want to get the expected results which is
> table(sample1)
sample1
1 2 3 4
10 20 30 40
sample takes a sample with the specified probabilities. That implies randomness - you won't get the same result every time. To do what you want just use rep
rep(1:4, 100*seq(0.1,0.4,0.1))
sample(...) takes a random sample with probabilities given in prob=..., so you will not get exactly that proportion every time. On the other hand, the proportions get closer to those specified in prob as n increases:
f <- function(n)sample(1:4,n,replace=T,prob=(1:4)/10)
samples <- lapply(10^(2:6),f)
t(sapply(samples,function(x)c(n=length(x),table(x)/length(x))))
# n 1 2 3 4
# [1,] 1e+02 0.090000 0.220000 0.260000 0.430000
# [2,] 1e+03 0.076000 0.191000 0.309000 0.424000
# [3,] 1e+04 0.095300 0.200200 0.310100 0.394400
# [4,] 1e+05 0.099720 0.199800 0.302250 0.398230
# [5,] 1e+06 0.099661 0.199995 0.300223 0.400121
If you need a random sample with exactly those proportions, use rep(...) and randomize the order.
g <- function(n) rep(1:4,n*(1:4)/10)[sample(1:n,n)]
samples <- lapply(10^(2:6),g)
t(sapply(samples,function(x)c(n=length(x),table(x)/length(x))))
# n 1 2 3 4
# [1,] 1e+02 0.1 0.2 0.3 0.4
# [2,] 1e+03 0.1 0.2 0.3 0.4
# [3,] 1e+04 0.1 0.2 0.3 0.4
# [4,] 1e+05 0.1 0.2 0.3 0.4
# [5,] 1e+06 0.1 0.2 0.3 0.4