I'm using a recursive algorithm to generate samples and include them in a list. For that I was using rbind (since I dont know the final number of rows, so I cant just declare it and access trough list[i, ] to attribute the values).
The problem is I start sampling from the last value to the first, so my list is upside down.
Is there a way to use rbind to create a row upwards instead of downwards?
Example for ilustration:
Suppose you have x1 = c(1, 2) and x2 = c(3, 4)
if you do: rbind(x1, x2) you get:
1 2
3 4
But what I need is:
3 4
1 2
Remember that I cant just do rbind(x2, x1), because I'm sampling backwards, so I don't have all values before binding.
Related
I have something of a basic question as I do some preliminary coding in preparation for my dissertation. I have some experience with R, but am still somewhat new. I've looked all over the internet, and haven't found a good answer yet. Hope someone out there can help improve my code and make it more efficient.
I'm trying to create a series of 4 randomly-drawn 5x5 networks that change slightly at each time point. To do that, I create a vector of 25 randomly drawn (prob=.5) 0s and 1s, and then create a 5x5 matrix from the vector. The matrix will serve as the adjacency matrix for each network. Creating the initial matrix is pretty easy:
a <- rbinom(25, 1, .5)
matrix_a <- matrix(a, ncol = 5, nrow = 5)
This matrix will serve as my network at time point 1. For time points 2-4, I want 5 randomly-selected cells to flip, so a 0 becomes a 1, and a 1 becomes a 0. For those unfamiliar with networks, that means in five instances edges change and are added (if there wasn't one before) or are removed (if there was).
The way I've figured out how to do that is to first select 5 elements from the vector b at random:
spot <- sample(25,5)
This will give me a vector of 5 elements representing a randomly-drawn position from 1 to 25. Next, I want to change those 5 zeroes or ones to their opposite (so a zero becomes a one and vice versa), and then I can insert them back into the 25-vector element, and make matrix_b at time point 2 from that, and repeat two more times. This way, the networks stay fairly stable, but change slightly and at random at time points 2 through 4.
But here's where I'm having trouble. I'd like to create a function to automate changing the five zeroes to ones and vice versa, which seems like it should be easy to do. So far, this is the best I've been able to pull off:
x <- (a[spot])
y1 <- if (x[1]==0) {
x[1]+1
} else {
x[1]-1
}
y1
y2 <- if (x[2]==0) {
x[2]+1
} else {
x[2]-1
}
y2
I've tested this, and it does change a zero to a one and vice versa.
I repeat that three more times to create y3, y4, and y5, then create a new vector of 5 elements:
y <- c(y1,y2,y3,y4,y5)
y
Now I replace five elements from the 25-element vector a with the vector y above (which have changed from zeroes to ones and vice versa) to create the new vector b:
b <- a
b[spot] <- y
matrix_b <- matrix(b, ncol = 5, nrow = 5)
I wind up with matrix_b at time point 2, in which with 5 cells have changed from zero to one or vice versa representing edges that have been added or dropped.
This will work, but it's really inefficient. I know there's a way to automate--using functions? apply?--creating y1 through y5 above. But I've been looking for hours, and this is still the best I can do.
Any suggestions for improving the code? Thanks in advance for any help you're able to offer.
You can change all of the sampled values at once with b[spot] = 1 - b[spot]
I am trying to create a vector from another vector where I multiply the numbers in the vector one more each time.
For example if I had (1,2,3) the new vector would be (1, 1 x 2, 1 x 2 x 3)=(1,2,6)
I tried to create a loop for this as seen below. It seems to work for whole numbers but not decimals. I am not sure why.
x <- c(0.99,0.98,0.97,0.96,0.95)
for(i in 1:5){x[i]=prod(x[1:i])}
The result given is 0.9900000 0.9702000 0.9316831 0.8590845 0.7303385
which is incorrect as prod(x) = 0.8582777. Which is not the same as the last element of the vector.
Does anyone know why this is the case? Or have a suggestion for improvement in my code to get the correct answer.
test<-c(1,2,3)
cumprod(test)
[1] 1 2 6
As #akrun suggests, one can achieve the same with:
Reduce("*", test, accumulate = TRUE)
Let's say we have a hypothetical complete schedule of potential outcomes from an experiment.
Y0<-c(10,15,20,20,10,15,15)
Y1<-c(15,15,30,15,20,15,30)
budgets<-matrix(data=c(Y0,Y1),nrow=7,ncol=2)
I would like to list all of the ways to choose two elements from Y1 and the remaining 5 from Y0. Ideally, this would look like an array of 21 lists, each with two elements labeled Y0 and two elements labeled Y1.
edit: These are matched pairs, so choosing y0[1] removes y1[1] from consideration.
Thanks in advance! I think there are many ways to approach this (sapply?) but would appreciate help on the details.
Here is a longer method, there is probably a more compact solution out there:
# get within group combinations as matrix
grp0 <-t(combn(Y0, 5))
grp1 <-t(combn(Y1, 2))
# get all possible combos of these rows
grpCombos <- expand.grid(1:nrow(grp1), 1:nrow(grp2))
# get all combinations as a matrix
allGroups <- cbind(grp0[grpCombos[,1],], grp1[grpCombos[,2],])
To get all the combinations of 2 elements from Y1 and and remaining 5 elements from Y0 and only choose one element from each position, try the following code:
cb <- as.data.frame(combn(1:7, 2))
sapply(cb, FUN = function(x) c(Y1[x], Y0[-x]))
previous: If you want all the combination of choose 2 from 7 within Y1 and choose 5 from 7 within Y0, the total combination number would be 21 * 21.
I would like to perform two things to my fairly large data set about 10 K x 50 K . The following is smaller set of 200 x 10000.
First I want to generate 5% missing values, which perhaps simple and can be done with simple trick:
# dummy data
set.seed(123)
# matrix of X variable
xmat <- matrix(sample(0:4, 2000000, replace = TRUE), ncol = 10000)
colnames(xmat) <- paste ("M", 1:10000, sep ="")
rownames(xmat) <- paste("sample", 1:200, sep = "")
Generate missing values at 5% random places in the data.
N <- 2000000*0.05 # 5% random missing values
inds_miss <- round ( runif(N, 1, length(xmat)) )
xmat[inds_miss] <- NA
Now I would like to generate error (means that different value than what I have in above matrix. The above matrix have values of 0 to 4. So what I would like to do:
(1) I would like to replace x value with another value that is not x (for example 0 can be replaced by a random sample of that is not 0 (i.e. 1 or 2 or 3 or 4), similarly 1 can be replaced by that is not 1 (i.e. 0 or 2 or 3 or 4). Indicies where random value can be replaced can be simply done with:
inds_err <- round ( runif(N, 1, length(xmat)) )
If I randomly sample 0:4 values and replace with the indices, this will sometime replace same value with same value ( 0 with 0, 1 with 1 and so on) without creating error.
errorg <- sample(0:4, length(inds_err), replace = TRUE)
xmat[inds_err] <- errorg
(2) So what I would like to do is introduce error in xmat with missing values, However I do not want NA generated in above step be replaced with a value (0 to 4). So ind_err should not be member of vector inds_miss.
So summary rules :
(1) The missing values should not be replaced with error values
(2) The existing value must be replaced with different value (which is definition of error here)- in random sampling this 1/5 probability of doing this.
How can it be done ? I need faster solution that can be used in my large dataset.
You can try this:
inds_err <- setdiff(round ( runif(2*N, 1, length(xmat)) ),inds_miss)[1:N]
xmat[inds_err]<-(xmat[inds_err]+sample(4,N,replace=TRUE))%%5
With the first line you generate 2*N possible error indices, than you subtract the ones belonging to inds_miss and then take the first N. With the second line you add to the values you want to change a random number between 1 and 4 and than take the mod 5. In this way you are sure that the new value will be different from the original and stil in the 0-4 range.
Here's an if/else solution that could work for you. It is a for loop so not sure if that will be okay for you. Possibly vectorize it is some way to make it faster.
# vector of options
vec <- 0:4
# simple logic based solution if just don't want NA changed
for(i in 1:length(inds_err){
if(is.na(xmat[i])){
next
}else{
xmat[i] <- sample(vec[-xmat[i]], 1)
}
}
I want to ask your opinion since I am not so sure how to do it. This is regarding one part of my paper project and my situation is:
Stage I
I have 2 groups and for each group I need to compute the following steps:
Generate 3 random numbers from normal distribution and square them.
Repeat step 1 for 15 times and at the end I will get 15 random numbers.
I already done stage I using for loop.
n1<-3
n2<-3
miu<-0
sd1<-1
sd2<-1
asim<-15
w<-rep(NA,asim)
x<-rep(NA,asim)
for (i in 1:asim) {
print(i)
set.seed(i)
data1<-rnorm(n1,miu,sd1)
data2<-rnorm(n2,miu,sd2)
w[i]<-sum(data1^2)
x[i]<-sum(data2^2)
}
w
x
Second stage is;
Stage II
For each group, I need to:
Sort the group;
Find trimmed mean for each group.
For the whole process (stage I and stage II) I need to simulate them for 5000 times. How am I going to proceed with step 2? Do you think I need to put another loop to proceed with stage II?
Those are tasks you can do without explicit loops. Therefore, note a few things: It is the same if you generate 3 times 15 times 2000 random numbers or if you generate them all at once. They still share the same distribution.
Next: Setting the seed within each loop makes your simulation deterministic. Call set.seed once at the start of your script.
So, what we will do is to generate all random numbers at once, then compute their squared norms for groups of three, then build groups of 15.
First some variable definitions:
set.seed(20131301)
repetitions <- 2000
numperval <- 3
numpergroup <- 15
miu <- 0
sd1 <- 1
sd2 <- 1
As we need two groups, we wrap the group generation stuff into a custom function. This is not necessary, but does help a bit in keeping the code clean an readable.
generateGroup <- function(repetitions, numperval, numpergroup, m, s) {
# Generate all data
data <- rnorm(repetitions*numperval*numpergroup, m, s)
# Build groups of 3:
data <- matrix(data, ncol=numperval)
# And generate the squared norm of those
data <- rowSums(data*data)
# Finally build a matrix with 15 columns, each column one dataset of numbers, each row one repetition
matrix(data, ncol=numpergroup)
}
Great, now we can generate random numbers for our group:
group1 <- generateGroup(repetitions, numperval, numpergroup, miu, sd1)
group2 <- generateGroup(repetitions, numperval, numpergroup, miu, sd2)
To compute the trimmed mean, we again utilize apply:
trimmedmeans_group1 <- apply(group1, 1, mean, trim=0.25)
trimmedmeans_group2 <- apply(group2, 1, mean, trim=0.25)
I used mean with the trim argument instead of sorting, throwing away and computing the mean. If you need the sorted numbers explicitly, you could do it by hand (just for one group, this time):
sorted <- t(apply(group1, 1, sort))
# We have to transpose as apply by default returns a matrix with each observation in one column. I chose the other way around above, so we stick with this convention and transpose.
Now, it would be easy to throw away the first and last two columns and generate the mean, if you want to do it manually.