Replacing numbers in a vector at random in r efficiently - r

I have something of a basic question as I do some preliminary coding in preparation for my dissertation. I have some experience with R, but am still somewhat new. I've looked all over the internet, and haven't found a good answer yet. Hope someone out there can help improve my code and make it more efficient.
I'm trying to create a series of 4 randomly-drawn 5x5 networks that change slightly at each time point. To do that, I create a vector of 25 randomly drawn (prob=.5) 0s and 1s, and then create a 5x5 matrix from the vector. The matrix will serve as the adjacency matrix for each network. Creating the initial matrix is pretty easy:
a <- rbinom(25, 1, .5)
matrix_a <- matrix(a, ncol = 5, nrow = 5)
This matrix will serve as my network at time point 1. For time points 2-4, I want 5 randomly-selected cells to flip, so a 0 becomes a 1, and a 1 becomes a 0. For those unfamiliar with networks, that means in five instances edges change and are added (if there wasn't one before) or are removed (if there was).
The way I've figured out how to do that is to first select 5 elements from the vector b at random:
spot <- sample(25,5)
This will give me a vector of 5 elements representing a randomly-drawn position from 1 to 25. Next, I want to change those 5 zeroes or ones to their opposite (so a zero becomes a one and vice versa), and then I can insert them back into the 25-vector element, and make matrix_b at time point 2 from that, and repeat two more times. This way, the networks stay fairly stable, but change slightly and at random at time points 2 through 4.
But here's where I'm having trouble. I'd like to create a function to automate changing the five zeroes to ones and vice versa, which seems like it should be easy to do. So far, this is the best I've been able to pull off:
x <- (a[spot])
y1 <- if (x[1]==0) {
x[1]+1
} else {
x[1]-1
}
y1
y2 <- if (x[2]==0) {
x[2]+1
} else {
x[2]-1
}
y2
I've tested this, and it does change a zero to a one and vice versa.
I repeat that three more times to create y3, y4, and y5, then create a new vector of 5 elements:
y <- c(y1,y2,y3,y4,y5)
y
Now I replace five elements from the 25-element vector a with the vector y above (which have changed from zeroes to ones and vice versa) to create the new vector b:
b <- a
b[spot] <- y
matrix_b <- matrix(b, ncol = 5, nrow = 5)
I wind up with matrix_b at time point 2, in which with 5 cells have changed from zero to one or vice versa representing edges that have been added or dropped.
This will work, but it's really inefficient. I know there's a way to automate--using functions? apply?--creating y1 through y5 above. But I've been looking for hours, and this is still the best I can do.
Any suggestions for improving the code? Thanks in advance for any help you're able to offer.

You can change all of the sampled values at once with b[spot] = 1 - b[spot]

Related

In R: Avoid duplicates in selection from many rows

Summary: I have an array of 10 rows and 4 columns filled with numbers. I select one number from each row and want to avoid duplicates in the selection.
Elaborate:
I have a grid of 100*100 cells. In that grid are 10 cells that contain a "person". In an iterative process I want to make the persons "walk around" in the grid, but I do not want to occur that two persons are in the same cell at the same time.
I have a vector that describes the positions of the 10 persons. It contains the cell numbers with a person. These positions are counting across all rows and columns (i.e. ranges from 1:10000). For example: Position 234 would be in the 3rd row, 34th column).
Positions<-sample(1:10000,10) #Initial positions
What I did is to first make an array of the surrounding cells of each person (up, right, down, left) giving 4 positions for each person:
Surroundings<-array(c(Positions+100,Positions+1,Positions-100,Positions-1),dim=c(10,4))
I then take a random direction from each of the rows in Surroundings into vector PosNew. It is this last vector in which I want to avoid duplicates.
I could repeat the random selection process of PosNew until it has no duplicates, but this could take very long. There are probably more efficient ways to do this.
For simplicity sake, let's assume that persons do not walk off the grid and no other errors occur.
My script:
Positions<-sample(1:10000,10) #Initial positions
for(i in 1:50) {
Surroundings<-array(c(Positions+100,Positions+1,Positions-100,Positions-1),dim=c(10,4))
PosNew<-Surroundings[cbind(1:10,sample(1:4,10,replace=TRUE))]
Dups<-length(which(duplicated(PosNew)==TRUE))
Positions<-PosNew
}
I am looking for a way to check for duplicates in the selected new positions and make sure that Dups is never above zero. Any suggestions are welcome, including suggestions to make the code faster/more efficient.
Added: What could I do when at some point one or more of the persons really cannot move to an empty cell, because all 4 sides are occupied? I want that person to stay in its original cell. How to code that?
Thank you so much for your time!
As this is an iterative process, where every person's move depends on the locations of others, I don't think you can do much better then moving one person and sampling the position for the next from the difference of the sets of all directions and all occupied positions (note that this adds a bit of unfairness as the first person has the most freedom to move, so to speak).
So the code would be something like this:
Positions <- sample(1:10000, 10) #Initial positions
for (i in 1:50) {
Surroundings <-
array(c(Positions + 100, Positions + 1, Positions - 100, Positions - 1),
dim = c(10, 4))
# BEGIN NEW CODE
PosNew <- numeric(10)
for (i in 1:10) {
# PosNew[seq_len(i-1)] is the set of occupied positions
available <- setdiff(Surroundings[i, ], PosNew[seq_len(i-1)])
if (length(available) != 0)
PosNew[i] <- sample(available, 1)
else
PosNew[i] <- Positions[i] # stay where you are
}
# END NEW CODE
Dups <- sum(duplicated(PosNew)) # shorter version - sum logical values to get a count
Positions <- PosNew
}
Hope this helps!

Find n closest non-NA values to position t in vector

This is probably a simple question for those experienced in R, but it is something that I (a novice) am struggling with...
I have two examples of vectors that are common to the problem I am trying to solve, A and B:
A <- c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10)
B <- c(1,3,NA,NA,NA,NA,NA,NA,NA,NA,2,NA,9)
#and three scalars
R <- 4
t <- 5
N <- 3
There is a fourth scalar, n, where 0<=n<=N. In general, N <= R.
I want to find the n closest non-NA values to t such that they fall within a radius R centered on t. I.e., the search radius, R comprises of R+1 values. For example A, the search radius sequence is (3,NA,3,NA,4,NA,1), where t=NA, the middle value in the search radius sequence.
The expected answer can be one of two results for A:
answerA1 <- c(3,4,1)
OR
answerA2 <- c(3,4,3)
The expected answer for B:
answerB <- c(1,3)
How would I accomplish this task in the most time- and space-efficient manner? One liners, loops, etc. are welcome. If I have to choose a preference, it is for speed!
Thanks in advance!
Note:
For this case, I understand that the third closest non-NA value may involve choosing a preference for the third value to fall on either the right or left of t (as shown by the two possible answers above). I do not have a preference for whether this values falls to the left or the right of t but, if there is a way to leave it to random chance, (whether the third value falls to the right or the left) that would be ideal (but, again, it is not a requirement).
A relatively short solution is:
orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]
n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
res <- na.omit(orderedA)[seq_len(n_obj)]
res
#[1] 3 4 3
Breaking this down a little more the steps are:
Order A, by the absolute distance from the position of interest, t.
Code is: A[order(abs(seq_len(length(A)) - t))]
Subset to the first R*2 elements (so this will get the elements on either side of t within R.
Code is: [seq_len(R*2)]
Get the first min(N, # of non-NA, len of non-NA) elements
Code is: min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
Drop NA
Code is: na.omit()
Take first elements determined in step 3 (whichever is smaller)
Code is: [seq_len(n_obj)]
Something like this?
thingfinder <- function(A,R,t,n) {
left <- A[t:(t-R-1)]
right <- A[t:(t+R+1)]
leftrightmat <- cbind(left,right)
raw_ans <- as.vector(t(leftrightmat))
ans <- raw_ans[!is.na(raw_ans)]
return(ans[1:n])
}
thingfinder(A=c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10), R=3, t=5, n=3)
## [1] 3 4 3
This would give priority to the left side, of course.
In case it is helpful to others, #Mike H. also provided me with a solution to return the index positions associated with the desired vector elements res:
A <- setNames(A, seq_len(length(A)))
orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]
n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
res <- na.omit(orderedA)[seq_len(n_obj)]
positions <- as.numeric(names(res))

How to use pointDistance with a very large vector

I've got a big problem.
I've got a large raster (rows=180, columns=480, number of cells=86400)
At first I binarized it (so that there are only 1's and 0's) and then I labelled the clusters.(Cells that are 1 and connected to each other got the same label.)
Now I need to calculate all the distances between the cells, that are NOT 0.
There are quiet a lot and that's my big problem.
I did this to get the coordinates of the cells I'm interested in (get the positions (i.e. cell numbers) of the cells, that are not 0):
V=getValues(label)
Vu=c(1:max(V))
pos=which(V %in% Vu)
XY=xyFromCell(label,pos)
This works very well. So XY is a matrix, which contains all the coordinates (of cells that are not 0). But now I'm struggling. I need to calculate the distances between ALL of these coordinates. Then I have to put each one of them in one of 43 bins of distances. It's kind of like this (just an example):
0<x<0.2 bin 1
0.2<x<0.4 bin2
When I use this:
pD=pointDistance(XY,lonlat=FALSE)
R says it's not possible to allocate vector of this size. It's getting too large.
Then I thought I could do this (create an empty data frame df or something like that and let the function pointDistance run over every single value of XY):
for (i in 1:nrow(XY))
{pD=PointDistance(XY,XY[i,],lonlat=FALSE)
pDbin=as.matrix(table(cut(pD,breaks=seq(0,8.6,by=0.2),Labels=1:43)))
df=cbind(df,pDbin)
df=apply(df,1,FUN=function(x) sum(x))}
It is working when I try this with e.g. the first 50 values of XY.
But when I use that for the whole XY matrix it's taking too much time.(Sometimes this XY matrix contains 10000 xy-coordinates)
Does anyone have an idea how to do it faster?
I don't know if this will works fast or not. I recommend you try this:
Let say you have dataframe with value 0 or 1 in each cell. To find coordinates all you have to do is write the below code:
cord_matrix <- which(dataframe == 1, arr.ind = TRUE)
Now, you get the coordinate matrix with row index and column index.
To find the euclidean distance use dist() function. Go through it. It will look like this:
dist_vector <- dist(cord_matrix)
It will return lower triangular matrix. can be transformed into vector/symmetric matrix. Now all you have to do is calculating bins according to your requirement.
Let me know if this works within the specific memory space.

Counting consecutive repeats, and returning the maximum value in each in each string of repeats if over a threshold

I am working with long strings of repeating 1's and 0's representing the presence of a phenomenon as a function of depth. If this phenomenon is flagged for over 1m, it is deemed significant enough to use for further analyses, if not it could be due to experimental error.
I ultimately need to get a total thickness displaying this phenomenon at each location (if over 1m).
In a dummy data set the input and expected output would look like this:
#Depth from 0m to 10m with 0.5m readings
depth <- seq(0, 10, 0.5)
#Phenomenon found = 1, not = 0
phenomflag <- c(1,0,1,1,1,1,0,0,1,0,1,0,1,0,1,1,1,1,1,0)
What I would like as an output is a vector with: 4, 5 (which gets converted back to 2m and 2.5m)
I have attempted to solve this problem using
y <- rle(phenomflag)
z <- y$length[y$values ==1]
but once I have my count, I have no idea how to:
a) Isolate 1 maximum number from each group of consecutive repeats.
b) Restrict to consecutive strings longer than (x) - this might be easier after a.
Thanks in advance.
count posted a good solution in the comments section.
y <- y <- rle(repeating series of 1's and 0's)
x <- cbind(y$lengths,y$values) ; x[which(x[,1]>=3 & x[,2]==1)]
This results in just the values that repeat more than a threshold of 2, and just the maximum.

How should I combine two loops in r?

I want to ask your opinion since I am not so sure how to do it. This is regarding one part of my paper project and my situation is:
Stage I
I have 2 groups and for each group I need to compute the following steps:
Generate 3 random numbers from normal distribution and square them.
Repeat step 1 for 15 times and at the end I will get 15 random numbers.
I already done stage I using for loop.
n1<-3
n2<-3
miu<-0
sd1<-1
sd2<-1
asim<-15
w<-rep(NA,asim)
x<-rep(NA,asim)
for (i in 1:asim) {
print(i)
set.seed(i)
data1<-rnorm(n1,miu,sd1)
data2<-rnorm(n2,miu,sd2)
w[i]<-sum(data1^2)
x[i]<-sum(data2^2)
}
w
x
Second stage is;
Stage II
For each group, I need to:
Sort the group;
Find trimmed mean for each group.
For the whole process (stage I and stage II) I need to simulate them for 5000 times. How am I going to proceed with step 2? Do you think I need to put another loop to proceed with stage II?
Those are tasks you can do without explicit loops. Therefore, note a few things: It is the same if you generate 3 times 15 times 2000 random numbers or if you generate them all at once. They still share the same distribution.
Next: Setting the seed within each loop makes your simulation deterministic. Call set.seed once at the start of your script.
So, what we will do is to generate all random numbers at once, then compute their squared norms for groups of three, then build groups of 15.
First some variable definitions:
set.seed(20131301)
repetitions <- 2000
numperval <- 3
numpergroup <- 15
miu <- 0
sd1 <- 1
sd2 <- 1
As we need two groups, we wrap the group generation stuff into a custom function. This is not necessary, but does help a bit in keeping the code clean an readable.
generateGroup <- function(repetitions, numperval, numpergroup, m, s) {
# Generate all data
data <- rnorm(repetitions*numperval*numpergroup, m, s)
# Build groups of 3:
data <- matrix(data, ncol=numperval)
# And generate the squared norm of those
data <- rowSums(data*data)
# Finally build a matrix with 15 columns, each column one dataset of numbers, each row one repetition
matrix(data, ncol=numpergroup)
}
Great, now we can generate random numbers for our group:
group1 <- generateGroup(repetitions, numperval, numpergroup, miu, sd1)
group2 <- generateGroup(repetitions, numperval, numpergroup, miu, sd2)
To compute the trimmed mean, we again utilize apply:
trimmedmeans_group1 <- apply(group1, 1, mean, trim=0.25)
trimmedmeans_group2 <- apply(group2, 1, mean, trim=0.25)
I used mean with the trim argument instead of sorting, throwing away and computing the mean. If you need the sorted numbers explicitly, you could do it by hand (just for one group, this time):
sorted <- t(apply(group1, 1, sort))
# We have to transpose as apply by default returns a matrix with each observation in one column. I chose the other way around above, so we stick with this convention and transpose.
Now, it would be easy to throw away the first and last two columns and generate the mean, if you want to do it manually.

Resources