I have a question regarding the Reducefunction.
For example i have a list that one of the elements has an NA.
a<-list(c(1,2),c(2,2),c(1,NA))
I wish to use the Reduce function to do an average of the elements of the list.
that is (1+2+1)/3=1.33 and (2+2+NA)/3 = NA But in this last case, what i actually need is to avoid having the NA so the result should be (2+2)/2 = 2so the final outcome is a vector 1.33, 2
I am using Reduce("+", a)/length(a) but i get an NA because of the NA element.
Thanks in advance
I wouldn't use Reduce for this. It is just a hidden for loop anyway. Here is a better alternative:
rowMeans(do.call(cbind, a), na.rm = TRUE)
#[1] 1.333333 2.000000
This combines your vectors into a matrix and calculates the row means using the rowMeans function, which can remove NA values.
Related
In the following code:
x <- 1:8
x[NA]
I was expecting a TRUE or FALSE answer but I got eight NA instead. I discovered the is.na provides the TRUE/FALSE that I was looking for. However, I'm still not certain why subsetting the vector with NA results in NA. Any explanation?
From the NAs in indexing section of help("["):
When extracting, a numerical, logical or character NA index picks an unknown element and so returns NA in the corresponding element of a logical, integer, numeric, complex or character result, and ``NULL for a list. (It returns 00 for a raw result.)
So here is what's happening in your code:
In x[NA] you are basically doing the same operation as you would if you did x[] (which simply returns all elements as they are structured in x), except you are picking 8 unknown elements and returning NA for each of them. The NA is repeatedly used for each element in x. In R this is called recycling.
As #thelatemail notes in the comments below, this can be further illustrated if we look at an example.
x[c(TRUE, NA, TRUE)]
# [1] 1 NA 3 4 NA 6 7 NA
The above code is recycling c(TRUE, NA, TRUE) for all the elements in x, where TRUE means "take this value". Since we only have 8 elements in x, the third cycle cuts off at NA.
You used NA in the place of an index. In fact, the only valid indexes are 1-8 for your array. x[NA] has the same value as x[9] - it's NA.
I am trying to compute cumulative returns in R, using cumprod() for $1 invested
I seem to be getting NA values after using the cumprod() function, because the first return I'm trying to use is NA and therefore not successfully cumulating returns.
[1] NA -0.059898142 -0.267314770 -0.075349437 0.008658063 -0.008658063 0.000000000
The first row is NA and because of that, the cumprod(x+1) function turns into all NAs
How do I remove the first row/ignore the NA?
Any input would be appreciated
You can use na.omit to remove NA values in x before applying cumprod, e.g.,
cumprod(na.omit(x))
so I am working on R with a matrix as following:
diff_0
SubPop0-1, SubPop1-1, SubPop2-1, SubPop3-1, SubPop4-1,
SubPop0-1, NA NA NA NA NA
SubPop1-1, 0.003403100 NA NA NA NA
SubPop2-1, 0.005481177 -0.002070277 NA NA NA
SubPop3-1, 0.002216444 0.005946314 0.001770977 NA NA
SubPop4-1, 0.010344845 0.007151529 0.004237316 -0.0021275130 NA
... but bigger ;-).
This is a matrix of pairwise genetic differenciation between each SubPop from 0 to 4. I would like to obtain a mean differenciation value for each subPop.
For instance, for SubPop-0, the mean would just correspond to the mean of the 4 values from column 1. However for SubPop-2, this would be the mean of the 2 values in line 3 and the 2 value in column 3, since this is a demi-matrix.
I wanted to write a for loop to compute each mean value for each SubPop, taking this into account. I tried the following:
Mean <- for (r in 1:nrow(diff_0)) {
mean(apply(cbind(diff_0[r,], diff_0[,r]), 1, sum, na.rm=T))
}
First this isolates each line and column of index [r], whose values refer to the same SubPop r. 'sum' enable to gather these values and eliminate 'NA's. Finally I get the mean value for my SubPop r. I was hoping my for loop would give me with value for each index r, which would be a SubPop.
However, eventhough my mean(apply(cbind(diff_0[r,], diff_0[,r]), 1, sum, na.rm=T)), if run alone with a fixed r value between 1 and 5, does give me what I want; well the 'for loop' itself only returns an empty vector.
Something like for (r in 1:nrow(diff_0)) { print(diff_0[r,1]) } also works, so I do not understand what is going on.
This is a trivial question but I could not find an answer on the internet! Although I know I am probably missing the obvious :-)...
Thank you very much,
Cheers!
Okay, based on what you want to do (and if I understand everything correctly) there are several ways of doing this.
The one that comes to my mind now is just making your lower triangular matrix to an "entire matrix" (i.e. fill the upper triangle with the transpose of the lower triangle) and then do row- or column-wise means
My R is running right now on something else, so I can't check my code but this should work
diff = diff_0
diff[upper.tri(diff)] = t(diff_0[lower.tri(diff)]) #This step might need some work
As I said, my R is running right now so I can't check the correctness of the last line - I might be confused with some transposes there, so I'd appreciate any feedback on whether it actually worked or not.
You can then either set the diagonal values to 0 or alternatively add na.rm = TRUE to the mean statement
mean_diffs = apply(diff,2,FUN = function(x)mean(x, na.rm = TRUE))
that should work
Also: Yes, your code does not work, because the assignment is not in the for loop. This should work:
means = rep(NA, nrow(diff_0)
for (r in 1:length(means)){
means[r] = mean(apply(cbind(diff_0[r,], diff_0[,r]), 1, sum, na.rm=T))
But in general for loops are not what you want to do in R
This may be a solution...
for(i in 1:nrow(diff_0)) {
k<-mean(cbind(as.numeric(diff_0[,i]),as.numeric(diff_0[i,])),na.rm=T)
if(i==1) {
data_mean <- k
}else{
data_mean <- rbind(data_mean,k)
}
}
colnames(data_mean) <- "mean"
rownames(data_mean) <- c("SubPop0","SubPop1","SubPop2","SubPop3","SubPop4")
data_mean
mean
SubPop0 0.005361391
SubPop1 0.003607666
SubPop2 0.002354798
SubPop3 0.001951555
SubPop4 0.004901544
I have two data.frames A and B.
A contains negative, absolute and NA values.
B contains only positive and NA values.
The dimensions of the data frames are the same.
data.frame A looks like this:
ENSMUSG00000000001.4/Gnai3 0.1943315 0.3021675 NA NA
ENSMUSG00000000003.9/Pbsn -1.4843914 -1.2608270 -0.2587953 -0.46167430
ENSMUSG00000000028.8/Cdc45 -0.2388901 -0.1106236 0.9046436 0.08968331
ENSMUSG00000000037.9/Scml 0.3242902 0.5385371 0.2311202 0.51110287
ENSMUSG00000000049.5/Apoh -1.7606033 -1.8159545 -0.2087083 -1.09614630
ENSMUSG00000000056.7/Narf NA NA -0.3747798 -0.55547798
I need to check if a value is NA or negative in this table then I need to update data.frame B on the same indices to the value 0.999.
For example:
The first record of A has two NA values, indexes are [1,4] and [1,5] meaning, I will update B[1,4]=0.999 and B[1,5]=0.999.
I could do this in the nested loops for columns and rows but it would take too much time. Is there a faster way?
You can pass a Boolean mask as an index if it's the same size:
b[is.na(a) | a < 0] <- 0.999
I would use ifelse to do this, since the dataframes have the same dimensions.
A<-matrix(data=1:15,nrow=5) # create matrices (works with dataframe as well)
B<-matrix(data=16:30,nrow=5)
B[1,2]<-NA # introduce some NA and negative values
B[5,3]<-(-1)
ifelse(is.na(B) | B<=0,A,B) # new matrix with "updated" values
I have a data set of 144 scenarios and would like to calculate the percent change of all possible combinations using the comb n function. I have tried to use a percent difference function within the combn but it keeps giving me a large amount of NA's. Is there a way that I can accomplish this?
create percent change function:
pcchange=function(x,lag=1)
c(diff(x,lag),rep(NA,lag))/x*100
use withing combn:
Catch_comp<-combn(catch_table$av_muC, 2, pcchange)
convert results into a matrix
inputs <- headers
out2 <- Catch_comp
class(out2) <- "dist"
attr(out2, "Labels") <- as.character(inputs)
attr(out2, "Size") <- length(inputs)
out2 <- as.matrix(out2)
out2
This is what my table is coming out looking like:
> out2
F_R1S2_11 F_R1S2_12 F_R1S2_13 F_R1S2_21 F_R1S2_22 F_R1S2_23 F_R1S2_31 0.00000000 -0.8328001 NA -2.1972852 NA -0.11300746 NA -1.15112915 NA -2.7011787 NA -0.5359923 NA
F_R1S2_12 -0.83280008 0.0000000 NA -1.4558031 NA
As an example:
I have the average of 1000 simulations of the actual catch for two scenarios-
F_R1S1_11=155420.36
and
F_R1S1_12= 154126.0215.
Using the pcchange function I would like to calculate:
((F_R1S1_11-F_R1S1_12)/F_R1S1_11)*100 or
((155420.36-154126.02)/155420.36)*100=0.83%
change in the values.
I would like to do this for all possible combinations in a 144x144 matrix form. I hope that helps.
Thanks!