nested loop matrix index in R - r

I am learning matrix multiplication in R and following is what I want to achieve. I am doing this purely to upscale my skills in R.
Following is the kind of matrix I am working with:
m <- matrix(1, 100, 10)
I have matrix with only element 1 with 100 rows and 10 columns. Now I want to replace for column 1 with 0 from row1 to row10. Then for the second column, I want to replace 1 with zeros from row 11 to row 20. Similarly for for the third column, I want to replace 1 with zeros from row 21 to row 30 and similarly for the rest up too column 10. Following my my example
m <- matrix(1, 100, 10)
for(j in 1:10){
for(i in (j-1)*10+1: j*10){
m[i,j] <-0
}
}
I was quite confident that my logic was correct but every time I run my code, I get following error message Subscripts out of bounds Call. I tried couple days now and I could not resolve this problem. I would highly appreciate for any hints or direct solutions to fix this. Many thanks in advance.

You could use just one variable, which I think would be easier. For each column, j, get the lower and upper range of row indices and assign as 0.
for(j in 1:10){
row_lower <- (j-1)*10+1
row_upper <- j*10
m[row_lower: row_upper, j] <- 0
}
Returns 0's in your specified range.

Related

Checking For Largest Value in a Matrix Using "For" and "If" Loop

I'm having trouble with a for loop in a simple piece of code in R...
So I have a data frame with a bunch of precipitation data, with a bunch of "stations" (1-75) in rows and across in columns is values per month. I created a vector (which I called "Jan")of precipitation values for just the Jan column, so it looks something like this (I've put in random values for the purpose of this post) :
V1
1 10
2 5
3 15
...
75 5
I want to use a "for" loop with "if" to return the highest value of this vector. However, the code I have below seems to just go through each value in the vector and always just returns the last value of then entire thing (i.e. in row 75, would return "5"). I know it's something in my "if" loop that isn't letting me actually text the values of the vector, rather it's testing the "row number" value. Any advice??
highest_ppt<- function(v) {
i=0
output<-v[i] #c(0,length(v))
for (i in 2:length(v)){
if (v[i] > (v[i-1])){
output <- (v[i])
}
}
return(output)
}
max_ppt <- highest_ppt(Jan)
max_ppt
Thank you!
A for loop is the wrong approach here, but if you insist:
Max <- -Inf
for (i in seq_along(v)){
if (v[i] > Max)
Max <- v[i]
}

create an incidence matrix with restrictions in r (i.graph)

I would like to create a (N*M)-Incidence Matrix for a bipartite graph (N=M=200).
However, the following restrictions have to be considered:
Each column i ( 1 , ... , 200 ) has a column sum of g = 10
each row has a Row sum of h = 10
no multiedges (The values in the incidence Matrix only take on the values [0:1]
So far I have
M <- 200; # number of rows
N <- 200; # number of colums
g <- 10
I <- matrix(sample(0:1, M*N, repl=T, prob= c(1-g/N,g/N)), M, N);
Does anybody has a solution?
Here's one way to do what you want. First the algorithm idea, then its implementation in R.
Two step Algorithm Idea
You want a matrix of 0's and 1's, with each row adding up to be 10, and each column adding up to be 10.
Step 1: First,create a trivial solution as follows:
The first 10 rows have 1's for the first 10 elements, then 190 zeros.
The second set of ten rows have 1's from the 11th to the 20th element and so on.
In other words, a feasible solution is to have a 200x200 matrix of all 0's, with dense matrices of 10x10 1's embedded diagonally, 20 times.
Step 2: Shuffle entire rows and entire columns.
In this shuffle, the rowSum and columnSums are maintained.
Implementation in R
I use a smaller matrix of 16x16 to demonstrate. In this case, let's say we want each row and each column to add up to 4. (This colsum has to be integer divisible of the larger square matrix dimension.)
n <- 4 #size of the smaller square
i <- c(1,1,1,1) #dense matrix of 1's
z <- c(0,0,0,0) #dense matrix of 0's
#create a feasible solution to start with:
m <- matrix(c(rep(c(i,z,z,z),n),
rep(c(z,i,z,z),n),
rep(c(z,z,i,z),n),
rep(c(z,z,z,i),n)), 16,16)
#shuffle (Run the two lines following as many times as you like)
m <- m[sample(16), ] #shuffle rows
m <- m[ ,sample(16)] #shuffle columns
#verify that the sum conditions are not violated
colSums(m); rowSums(m)
#solution
print(m)
Hope that helps you move forward with your bipartite igraph.

Convert a one column matrix to n x c matrix

I have a (nxc+n+c) by 1 matrix. And I want to deselect the last n+c rows and convert the rest into a nxc matrix. Below is what I've tried, but it returns a matrix with every element the same in one row. I'm not sure why is this. Could someone help me out please?
tmp=x[1:n*c,]
Membership <- matrix(tmp, nrow=n, ncol=c)
You have a vector x of length n*c + n + c, when you do the extract, you put a comma in your code.
You should do tmp=x[1:(n*c)].
Notice the importance of parenthesis, since if you do tmp=x[1:n*c], it will take the range from 1 to n, multiply it by c - giving a new range and then extract based on this new range.
For example, you want to avoid:
(1:100)[1:5*5]
[1] 5 10 15 20 25
You can also do without messing up your head with indexing:
matrix(head(x, n*c), ncol=c)

gene expression datamatrix filtration

I have one matrix with 3064 rows and 27 columns which contains values between -0.5 and 2.0. I want to extract every rows which have at least once value >=0.5. As answer I would like to have whole row in it's origional matrix form.
Consider m is my matrix, I tried:
m[m[1:190,1:16]>0.5,1:16]
As this command is not accepting process on more then 190 rows, I went for 190 rows, but somehow it went wrong, because it gave me rows which also have values < 0.5.
Is it possible to write any function, that can be applied for whole matrix ?
you can also try like this if your data name is df
df2<- df[apply(df, MARGIN = 1, function(x) any(x >= 0.5)), ]
library(fBasics)
m2 <- subset(x = m, subset = rowMaxs(m)>=0.5)
What mm=m[1:190,1:16]>0.5 gives you is a matrix of boolean indicating which values of m[1:190,1:16] are greater than 0.5.
Then when you do m[mm], it considers mm as a vector and gives you corresponding values. The thing is dim(m) = 3064*27 while dim(m[1:190,1:16]) = 190*16. Which means that the first 27 values of mm will be used to get the first line of m while they correspond to part of the second line of mm.
So in order to have only the elements greater than 0.5, you need to apply matrix to m[1:190,1:16] which has the same dimension, i.e:
`m[1:190,1:16][m[1:190,1:16]>0.5, 1:16]
But what you do here is m[mm, 1:16], so you consider each individual value of mm as a row number, while it is a 190*16 matrix. It means you specify 190*16=3040 rows, it does not work with more because m only has 3064 rows.
What you want is a vector of length 190 (or even 3064 I guess) specifying which rows to take. You can get this vector with rowSums(m >=0.5)>0, which means each row with more than 0 values greater than 0.5. Then you get your output with:
m[rowSums(m >= 0.5) > 0,]
And it will work for the whole matrix. Note that some values will be smaller than 0.5 since you selected the whole line if at least one value was greater than 0.5.
Edit
For rows with values <0.5, the idea is the same:
m[rowSums(m < 0.5) > 0,]

simulate x percentage of missing and error in data in r

I would like to perform two things to my fairly large data set about 10 K x 50 K . The following is smaller set of 200 x 10000.
First I want to generate 5% missing values, which perhaps simple and can be done with simple trick:
# dummy data
set.seed(123)
# matrix of X variable
xmat <- matrix(sample(0:4, 2000000, replace = TRUE), ncol = 10000)
colnames(xmat) <- paste ("M", 1:10000, sep ="")
rownames(xmat) <- paste("sample", 1:200, sep = "")
Generate missing values at 5% random places in the data.
N <- 2000000*0.05 # 5% random missing values
inds_miss <- round ( runif(N, 1, length(xmat)) )
xmat[inds_miss] <- NA
Now I would like to generate error (means that different value than what I have in above matrix. The above matrix have values of 0 to 4. So what I would like to do:
(1) I would like to replace x value with another value that is not x (for example 0 can be replaced by a random sample of that is not 0 (i.e. 1 or 2 or 3 or 4), similarly 1 can be replaced by that is not 1 (i.e. 0 or 2 or 3 or 4). Indicies where random value can be replaced can be simply done with:
inds_err <- round ( runif(N, 1, length(xmat)) )
If I randomly sample 0:4 values and replace with the indices, this will sometime replace same value with same value ( 0 with 0, 1 with 1 and so on) without creating error.
errorg <- sample(0:4, length(inds_err), replace = TRUE)
xmat[inds_err] <- errorg
(2) So what I would like to do is introduce error in xmat with missing values, However I do not want NA generated in above step be replaced with a value (0 to 4). So ind_err should not be member of vector inds_miss.
So summary rules :
(1) The missing values should not be replaced with error values
(2) The existing value must be replaced with different value (which is definition of error here)- in random sampling this 1/5 probability of doing this.
How can it be done ? I need faster solution that can be used in my large dataset.
You can try this:
inds_err <- setdiff(round ( runif(2*N, 1, length(xmat)) ),inds_miss)[1:N]
xmat[inds_err]<-(xmat[inds_err]+sample(4,N,replace=TRUE))%%5
With the first line you generate 2*N possible error indices, than you subtract the ones belonging to inds_miss and then take the first N. With the second line you add to the values you want to change a random number between 1 and 4 and than take the mod 5. In this way you are sure that the new value will be different from the original and stil in the 0-4 range.
Here's an if/else solution that could work for you. It is a for loop so not sure if that will be okay for you. Possibly vectorize it is some way to make it faster.
# vector of options
vec <- 0:4
# simple logic based solution if just don't want NA changed
for(i in 1:length(inds_err){
if(is.na(xmat[i])){
next
}else{
xmat[i] <- sample(vec[-xmat[i]], 1)
}
}

Resources