A for loop with rowSums function - r

I am a beginner in R and I have written a double-for loop for calculating chi2 values for selecting features among 6610 terms and 10 classes.
Here is my for loops:
library(raster)
#for x^2 [n,r] = term n, class r. n starts from col #7 and r starts from col #6617
chi2vals <- matrix(0:0,6610,10)
chi2avgs <- vector("numeric",6610L)
for(r in 1:10){
for(n in 1:6610){
A = sum(data1.sub.added[,6+n]==1 & data1.sub.added[,6616+r]==1)
M = sum(data1.sub.added[,6+n]==1)
P = sum(data1.sub.added[,6616+r]==1)
N = nrow(data1.sub.added)
E = ((A*N)-(M*P))**2
F = (N-P)*(N-M)
chi2vals[n,r] = (N/(P*M))*(E/F) # for term n
}
Prcj = sum(data1.sub.added[,6616+r]==1)/sum(data1.sub.added[,6616:6626]==1) #probability of class c_r
pchi <- Prcj * chi2vals
chi2avgs[n] = rowSums(pchi)[n]
}
The code correctly calculates everything up to the line pchi <- Prcj * chi2vals. The result is a nice matrix of p*chi2 values:
> head(pchi)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 128.36551442 0.239308113 0.683517530 1.5038665 0.6145058 3.656857e-01 1.3311564 2.6977448 0.410702803
[2,] 0.06632758 0.067970859 0.019178551 0.2900692 1.5300639 4.430705e-08 0.2599859 0.6362953 0.098745147
[3,] 1.85641330 1.411925435 3.590747764 7.3018416 38.8044465 4.102248e-01 6.4118078 13.0164994 1.709506238
[4,] 0.11063892 0.005039029 0.244964758 0.1622654 0.1156411 8.274468e+00 0.2564959 0.0577651 0.242946022
[5,] 0.04788648 0.049072885 0.001420669 0.2094211 1.7200152 2.045923e-01 0.1877019 0.1468187 0.005493183
[6,] 5.39946188 6.899336618 60.735646913 7.4351538 10.7005784 9.946261e+00 35.8868899 178.7112406 11.382740754
[,10]
[1,] 0.26436516
[2,] 0.14414444
[3,] 0.90292073
[4,] 0.01168997
[5,] 0.06641298
[6,] 19.68599142
But the final chi2avgs values mostly turn out to be zeros:
> head(chi2avgs)
[1] 0.000000 0.000000 0.000000 0.000000 2.638835 0.000000
However, when aside from the loop I replace n with any number, the last line works well:
chi2avgs[1] = rowSums(pchi)[1]
chi2avgs[2] = rowSums(pchi)[2]
chi2avgs[3] = rowSums(pchi)[3]
chi2avgs[4] = rowSums(pchi)[4]
chi2avgs[5] = rowSums(pchi)[5]
> head(chi2avgs)
[1] 136.476367 3.112781 75.416334 9.481914 2.638835 0.000000
I wonder what causes this problem. Do you have an idea how I can fix it?

You can try directly rowsums without [n]
chi2avgs = rowSums(pchi)

Related

how to apply fisher exact test on matrices

I want to perform the fisher exact test between these two matrices, I want to compare the columns of one matrix with the columns of other matrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
A 0.1200480 0.07189073 0.0000000 0.1016260 0.1128205 0.10200927 0.2961319 0.3020383 0.02524866 0.0000000
C 0.0300120 0.19769950 0.2012802 0.2815041 0.2358974 0.48686244 0.4724160 0.3749228 0.65340474 0.3294118
G 0.6302521 0.52120776 0.6273115 0.4085366 0.4179487 0.35548686 0.1122384 0.1247684 0.29609793 0.6705882
T 0.2196879 0.20920201 0.1714083 0.2083333 0.2333333 0.05564142 0.1192137 0.1982705 0.02524866 0.0000000
here is the other matrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
A 0.3143147 0.79432422 0.03440895 0.07098851 0.0004179104 0.0002388202 0.9988065402 0.69191708 0.181764 0.00000000
C 0.1690764 0.02235194 0.89570290 0.92901149 0.0000000000 0.0004179354 0.0007757489 0.01218711 0.000000 0.48599247
G 0.3406620 0.14882308 0.06988816 0.00000000 0.9992835821 0.9993432444 0.0004177109 0.00000000 0.818236 0.02478944
T 0.1759469 0.03450076 0.00000000 0.00000000 0.0002985075 0.0000000000 0.0000000000 0.29589581 0.000000 0.48921809
How can I do this.
I tried this
Ref <- read.table("Ref_PPM.txt", sep=" ", header=T, stringsAsFactors=F)
Pred <- read.table("Pre_PPM.txt", sep=" ", header=T, stringsAsFactors=F)
output_df_forward <- data.frame()
for(i in 1:ncol(Ref)) {
Ref_vec <- as.numeric(unlist(as.data.frame(Ref[,1:i])))
Pred_vec <- as.numeric(unlist(as.data.frame(Pred[,(ncol(Pred)-i+1):ncol(Pred)])))
res <- Fisher.test(Pred_vec, Ref_vec)
output <- as.data.frame(cbind(as.numeric(res$p.value), as.numeric(res$estimate)))
ifelse(i == 1, output_df_forward <- output, output_df_forward <- rbind(output_df_forward, output))
}
But this loop does not work I want to apply this fisher test through above mention loop.
I also try this
FP<-table(Pre_PPM)
FR<-table(Ref_PPM)
fisher.test(FP, FR)
Error in fisher.test(FP, FR) : 'x' and 'y' must have the same length
Kindly guide how can I do this. Thanks for your time in advance.
Considering your second example, I believe your matrices do not have the same size in terms of rows and columns. Here is an example of Fisher test:
a <- data.frame(foo=c(1,2), bar=c(3, 4), spam=c(5, 6))
b <- data.frame(eggs=c(7,10), ham=c(15, 30), spar=c(35, 40))
Which works perfectly fine:
fisher.test(a, b)
The output is:
Consider checking the matrices sizes.

Nested apply function

I have two matrices and want to apply a linear regression. Briefly I want to get the p value for the regression between each colum of a with the factors of b, repsectively.
So I want to calculate the association between (the number displays the column):
a1 ~ b1
a1 ~ b2
...
a2 ~ b1
a2 ~ b2
...
a3 ~ b1
...
The both datasets:
set.seed(1232)
a <- matrix(runif(100,min=6,max=14),10)
b <- matrix(sample(c(0,1),100,replace = T),10)
I tried to use a loop, this works but it is too slow:
res <- NULL
for( i in 1:ncol(a)){
tmp <- apply(b,2,function(y,x) summary(lm(x~y))$coefficients[2,4],a[,i])
res <- cbind(res,tmp)
print(i)
}
So I tried to use a nested apply function like this but it does not work. Do you have an idea?
apply(b, 2, function(y,x) apply(x,2,function(x,y) summary(lm(x~y))$coefficients[2,4]), a)
Try this:
apply(b, 2, function(x) apply(a, 2, function(y) summary(lm(x~y))$coefficients[2,4]))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.8625602 0.2411563 0.7612476 0.509436082 0.3577235 0.45874974 0.360808572 0.05617883
[2,] 0.4136582 0.1186081 0.5161454 0.819513642 0.7813467 0.75912163 0.908950370 0.80584144
[3,] 0.5388209 0.8055687 0.7549796 0.935750996 0.0646300 0.79523596 0.973429634 0.55095667
[4,] 0.9930913 0.9622925 0.3753466 0.552830253 0.9636374 0.75312925 0.997724288 0.39246169
[5,] 0.2238149 0.4628420 0.5969530 0.007438859 0.4620672 0.42912431 0.007249279 0.50000827
[6,] 0.8022590 0.2352531 0.8990588 0.760952083 0.5794459 0.06170874 0.459247551 0.94625005
[7,] 0.4928149 0.1462937 0.5620584 0.554991195 0.6128055 0.55182670 0.874316882 0.55169689
[8,] 0.6631461 0.6260493 0.7679088 0.714076186 0.1254954 0.14316276 0.961166356 0.55342849
[9,] 0.9449110 0.2703502 0.5328246 0.533630873 0.2036671 0.87532137 0.402796595 0.24040106
[10,] 0.4151634 0.3584605 0.6923008 0.599701142 0.4649529 0.98238156 0.628130071 0.17310254
[,9] [,10]
[1,] 0.38552290 0.03078476
[2,] 0.03466566 0.64135540
[3,] 0.44603945 0.57578621
[4,] 0.47220820 0.45735156
[5,] 0.33202974 0.63330763
[6,] 0.09964719 0.19571414
[7,] 0.72649867 0.61591287
[8,] 0.22911914 0.84239810
[9,] 0.30766378 0.12782897
[10,] 0.01139275 0.46489123
You don't need regression:
res <- outer(seq_len(ncol(a)),
seq_len(ncol(b)),
FUN = Vectorize(function(k, l) cor.test(a[, k], b[, l])$p.value))
res[4, 3]
#[1] 0.3753466
summary(lm(a[,4] ~ b[, 3]))$coefficients[2,4]
#[1] 0.3753466
You can of course use lm instead of cor.test in the same way.
Alternatively, you could use package psych:
library(psych)
corr.test(as.data.frame(a), as.data.frame(b), adjust = "none")$p
Of course, it's default of adjusting for multiple testing reminds us that you really should do that.

Storing multiple output vectors without list

I have a matrix of N*200 values
For each row I am calculating the 5 acf values using
for(i in 1:N){
xx[[i]] <- acf(x[i,], plot=F)$acf[1:5]
}
I was wondering is there an alternative for xx[i] other than using a list? i.e.
is it possible to have a matrix of N*5 containing each of the acf values?
I know I can get the list and then unlit this but is there a quicker way?
Use apply for cleaner code:
iN = 1000
mX = matrix(rnorm(iN*200), iN, 200)
mACF = t(apply(mX, MARGIN = 1,
FUN = function(vX) acf(vX, plot = FALSE, lag.max = 4)$acf))
Output:
> head(mACF)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 -0.01301076 -0.02077288 -0.09442797 -0.010610654
[2,] 1 -0.03060448 -0.06019641 -0.04674656 -0.086555364
[3,] 1 0.09513999 -0.05021542 -0.02757927 -0.002984605
[4,] 1 -0.08135746 0.11003419 -0.06550000 0.033755892
[5,] 1 0.09014033 0.09981602 0.11100782 0.057275603
[6,] 1 -0.08462636 -0.10192390 0.05601853 -0.019114467

Perform pairwise comparison of matrix

I have a matrix of n variables and I want to make an new matrix that is a pairwise difference of each vector, but not of itself. Here is an example of the data.
Transportation.services Recreational.goods.and.vehicles Recreation.services Other.services
2.958003 -0.25983789 5.526694 2.8912009
2.857370 -0.03425164 5.312857 2.9698044
2.352275 0.30536569 4.596742 2.9190123
2.093233 0.65920773 4.192716 3.2567390
1.991406 0.92246531 3.963058 3.6298314
2.065791 1.06120930 3.692287 3.4422340
I tried running a for loop below, but I'm aware that R is very slow with loops.
Difference.Matrix<- function(data){
n<-2
new.cols="New Columns"
list = list()
for (i in 1:ncol(data)){
for (j in n:ncol(data)){
name <- paste("diff",i,j,data[,i],data[,j],sep=".")
new<- data[,i]-data[,j]
list[[new.cols]]<-c(name)
data<-merge(data,new)
}
n= n+1
}
results<-list(data=data)
return(results)
}
As I said before the code is running very slow and has not even finished a single run through yet. Also I apologize for the beginner level coding. Also I am aware this code leaves the original data on the matrix, but I can delete it later.
Is it possible for me to use an apply function or foreach on this data?
You can find the pairs with combn and use apply to create the result:
apply(combn(ncol(d), 2), 2, function(x) d[,x[1]] - d[,x[2]])
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 3.217841 -2.568691 0.0668021 -5.786532 -3.151039 2.6354931
## [2,] 2.891622 -2.455487 -0.1124344 -5.347109 -3.004056 2.3430526
## [3,] 2.046909 -2.244467 -0.5667373 -4.291376 -2.613647 1.6777297
## [4,] 1.434025 -2.099483 -1.1635060 -3.533508 -2.597531 0.9359770
## [5,] 1.068941 -1.971652 -1.6384254 -3.040593 -2.707366 0.3332266
## [6,] 1.004582 -1.626496 -1.3764430 -2.631078 -2.381025 0.2500530
You can add appropriate names with another apply. Here the column names are very long, which impairs the formatting, but the labels tell what differences are in each column:
x <- apply(combn(ncol(d), 2), 2, function(x) d[,x[1]] - d[,x[2]])
colnames(x) <- apply(combn(ncol(d), 2), 2, function(x) paste(names(d)[x], collapse=' - '))
> x
Transportation.services - Recreational.goods.and.vehicles Transportation.services - Recreation.services
[1,] 3.217841 -2.568691
[2,] 2.891622 -2.455487
[3,] 2.046909 -2.244467
[4,] 1.434025 -2.099483
[5,] 1.068941 -1.971652
[6,] 1.004582 -1.626496
Transportation.services - Other.services Recreational.goods.and.vehicles - Recreation.services
[1,] 0.0668021 -5.786532
[2,] -0.1124344 -5.347109
[3,] -0.5667373 -4.291376
[4,] -1.1635060 -3.533508
[5,] -1.6384254 -3.040593
[6,] -1.3764430 -2.631078
Recreational.goods.and.vehicles - Other.services Recreation.services - Other.services
[1,] -3.151039 2.6354931
[2,] -3.004056 2.3430526
[3,] -2.613647 1.6777297
[4,] -2.597531 0.9359770
[5,] -2.707366 0.3332266
[6,] -2.381025 0.2500530

Replicate each time with different standard deviation

I have a vector of standard deviations:
sd_vec<-runif(10,0,20) with 10 values between 0 and 20.
[1] 11.658106 9.693493 12.695608 4.091922 5.761061 18.410951 14.710990 12.095944 18.023123
[10] 13.294963
I would like to replicate the following process:
a<-rnorm(10,0,30)
[1] -21.265083 85.557147 23.958170 -32.843328 6.629831 -23.745339 46.094324 51.020059
[9] 1.041724 13.757235
n_columns=50
replicate(n_columns, a+rnorm(length(a), mean=0,sd=sd_vec))
The result should be 10 columns each of which are:
column 1: a + rnorm(length(a),0,11.658106)
column 2: a + rnorm(length(a),0,9.693493)
column 3: a + rnorm(length(a),0,12.695608)
.
.
.
column 10:a + rnorm(length(a),0,13.294963)
Will this use different values of sd_vec for each replication or will it use it for each random number generation?
According to your edit, then you may want to try
a+sapply(sd_vec, rnorm, n=100, mean=0)
# example
> set.seed(1)
> sd_vec <-runif(10,0,20)
> set.seed(1)
> a<-rnorm(100,0,30)
> n_columns=10
> head(a+sapply(sd_vec, rnorm, n=100, mean=0))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] -22.087869 -15.746650 -8.554735 0.7226986 -18.481801 -24.921835 -32.16206 -33.158153 -38.187974
[2,] 5.732942 18.078702 -6.489666 39.9422684 4.311839 32.504554 42.75921 -18.624133 7.954302
[3,] -29.906010 -13.260709 -2.483113 -36.0217953 -29.841630 -15.576334 -26.76925 -11.915258 -21.741820
[4,] 48.697584 45.395650 43.463125 40.7586401 47.903975 57.600406 47.59359 47.701659 33.782184
[5,] 6.409275 -7.122582 28.836887 2.3249113 13.884993 7.429514 -11.34081 1.960571 18.075706
[6,] -15.229450 -6.025260 -7.288529 -31.4375515 -18.184563 -45.038651 -50.00938 -26.965804 -37.610292
[,10]
[1,] -17.391109
[2,] 6.883342
[3,] -26.144900
[4,] 48.118830
[5,] 9.970987
[6,] -26.668629
Your current solution will replicate sd_vec for each replication, not using each sd for each replication.
If you want to have columns for each sd then you may work on matrices. Create matrix of rnorm with desire sd by:
X <- rnorm(length(a)*n_columns, mean=0, sd=sd_vec)
X <- matrix(X, nrow=length(a), ncol=n_columns, byrow=TRUE)
Then add it to a converted to matrix:
matrix(a, nrow=length(a), ncol=n_columns) + X

Resources