I have a vector with either a negative value or NA and a threshold:
threshold <- -1
example <- c(NA, NA, -0.108, NA, NA, NA, NA, NA -0.601, -0.889, -1.178, -1.089, -1.401, -1.178, -0.959, -1.085, -1.483, -0.891, -0.817, -0.095, -1.305, NA, NA, NA, NA, -0.981, -0.457, -0.003, -0.358, NA, NA)
I want to identify all the data blocks with at least one value lower than the threshold and to replace by NA all the other blocks. With my example vector, I want this result:
result <- c(NA, NA, NA, NA, NA, NA, NA, NA -0.601, -0.889, -1.178, -1.089, -1.401, -1.178, -0.959, -1.085, -1.483, -0.891, -0.817, -0.095, -1.305, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
So the first available value is the first block but -0.108 is higher than -1 so it turns into NA. The second block is kept the same because there is at least ine value lower than -1. The third block is now NA values because between the 4 available values, no one was lower than the threshold.
My first idea was to identify where were the values lower than the threshold:
val <- which(example < threshold)
But then I don't know how to say "keep all the values around this position which are not NA" because it is always a different number of values...
Try
library(data.table)#v >= 1.9.5 (devel version - install from GitHub).
#library(devtools)
#install_github("Rdatatable/data.table", build_vignettes = FALSE)
as.data.table(example)[, res:=(NA | (min(example)< -1))*example, by=rleid(is.na(example))][, res]
Another way, with the suggestion of OlliJ :
example <- c(NA, NA, -0.108, NA, NA, NA, NA, NA -0.601, -0.889, -1.178, -1.089, -1.401, -1.178, -0.959, -1.085, -1.483, -0.891, -0.817, -0.095, NA, NA, NA, NA, -0.981, -0.457, -0.003, -0.358, NA, NA)
test <- !(is.na(example))
len <- rle(test)$lengths
val <- rle(test)$values
##Matrix with the beginning and the end of each group
ind <- matrix(,nrow=length(which(val)),ncol=2)
ind[,1] <- (cumsum(len)[which(val==T)-1])+1
ind[,2] <- (cumsum(len))[val==T]
result <- rep(NA, length=length(example))
apply(ind, 1, function(x)
{
if(any(example[x[1]:x[2]] < -1))
{
result[x[1]:x[2]] <- example[x[1]:x[2]]
}
})
Related
I am trying to convert a correlation matrix to a covariance matrix using cor2cov in R.
library(MBESS)
eff_1971 <- c(NA, .56, .25, .25, .22, -.47, -.01, -.06)
eff_1972 <- c(NA, NA, .23, .23, .25, .47, -.01, .03)
annual_earnings_1970 <- c(NA, NA, NA, .88, .83, -.02, -.28, -.14)
annual_earnings_1971 <- c(NA, NA, NA, NA, .88, -.02, .21, -.29)
annual_earnings_1972 <- c(NA, NA, NA, NA, NA, .03, .06, .21)
change_eff_1971_1972 <- c(NA, NA, NA, NA, NA, NA, 0.0, .1)
change_ann_earn_1970_1971 <- c(NA, NA, NA, NA, NA, NA, NA, -.29)
change_ann_earn_1971_1972 <- c(NA, NA, NA, NA, NA, NA, NA, NA)
df <- data.frame(eff_1971,
eff_1972,
annual_earnings_1970,
annual_earnings_1971,
annual_earnings_1972,
change_eff_1971_1972,
change_ann_earn_1970_1971,
change_ann_earn_1971_1972)
df <- as.matrix(df)
sd <- c(.82, .82, .52, .51, .50, .77, .25, .25)
cor2cov(df, sd)
However, I get this error message:
Error in cor2cov(df, sd) :
The object 'cor.mat' should be either a symmetric or a triangular matrix
Does anyone know how I can fix this error?
Thank you!
You can make df triangular by setting the diag to 1 and the upper values to 0
diag(df) <- 1
df[is.na(df)] <- 0
I want to run Pearson correlations of each row of a matrix (dat) vs a vector (v1), as part of a loop, and output the correlation coefficients and associated p-values in a table. Here is an example for random data (data pasted at the end):
result_table <- data.frame(matrix(ncol = 2, nrow = nrow(dat)))
colnames(result_table) <- c("correlation_coefficient", "pvalue")
for(i in 1:nrow(dat)){
print(i)
corr <- cor.test(as.numeric(dat[i,]), v1, na.action = "na.omit")
result_table[i,1] <- corr$estimate
result_table[i,2] <- corr$p.value
}
When cor.test() removes missing data, sometimes there are not enough observations remaining and the loop stops with an error (for example at row 11). I would like the loop to continue running, just leaving the values in the result table as NAs. I think the result table should then look like this:
> result_table
correlation_coefficient pvalue
1 0.68422642 0.04206591
2 -0.15895586 0.70694013
3 -0.37005028 0.53982309
4 0.08448970 0.89255250
5 0.86860091 0.05603661
6 0.19544883 0.75274040
7 -0.94695380 0.01454887
8 -0.03817885 0.94275955
9 -0.15214122 0.77354897
10 -0.22997890 0.70978386
11 NA NA
12 NA NA
13 -0.27769887 0.59415930
14 -0.09768153 0.81800885
15 -0.20986632 0.61790214
16 -0.40474976 0.31990456
17 -0.00605937 0.98863896
18 0.02176976 0.95919460
19 -0.14755097 0.72733118
20 -0.25830856 0.50216600
I would also like the errors to keep being printed
Here is the data:
> dput(v1)
c(-0.840396, 0.4746047, -1.101857, 0.5164767, 1.2203134, -0.9758888,
-0.3657913, -0.6272523, -0.5853803, 1.7367901)
> dput(dat)
structure(list(s1 = c(-0.52411895, 0.14709633, 0.05433954, 0.7504406,
-0.59971988, -0.59679685, -0.12571854, 0.73289705, -0.71668771,
-0.04813957, -0.67849896, -0.11947141, -0.26371884, -1.34137162,
2.60928064, -1.23397547, 0.51811222, -4.10759883, -0.70127093,
7.51914575), s2 = c(0.21446623, -0.27281487, NA, NA, NA, NA,
NA, NA, -0.62468391, NA, NA, NA, -3.84387999, 0.64010069, NA,
NA, NA, NA, NA, NA), s3 = c(0.3461212, 0.279062, NA, NA, NA,
-0.4737744, 0.6313365, -2.8472641, 1.2647846, 2.2524449, -0.7913039,
-0.752590307, -3.535815266, 1.692385187, 3.55789764, -1.694910854,
-3.624517121, -4.963855198, 2.395998161, 5.35680032), s4 = c(0.3579742,
0.3522745, -1.1720907, 0.4223402, 0.146605, -0.3175295, -1.383926807,
-0.688551166, NA, NA, NA, NA, NA, 0.703612974, 1.79890268, -2.625404608,
-3.235884921, -2.845474098, 0.058650461, 1.83900702), s5 = c(1.698104376,
NA, NA, NA, NA, NA, -1.488000007, -0.739488766, 0.276012387,
0.49344994, NA, NA, -1.417434166, -0.644962513, 0.04010434, -3.388182254,
2.900252493, -1.493417096, -2.852256003, -0.98871696), s6 = c(0.3419271,
0.2482013, -1.2230283, 0.270752, -0.6653978, -1.1357202, NA,
NA, NA, NA, NA, NA, NA, NA, -1.0288213, -1.17817328, 6.1682455,
1.02759131, -3.80372867, -2.6249692), s7 = c(0.3957243, 0.8758406,
NA, NA, NA, NA, NA, 0.60196247, -1.28631859, -0.5754757, NA,
NA, NA, NA, NA, NA, NA, NA, NA, -2.6303001), s8 = c(-0.26409595,
1.2643281, 0.05687957, -0.09459169, -0.7875279, NA, NA, NA, NA,
NA, NA, NA, 2.42442997, -0.00445559, -1.0341522, 2.47315322,
0.1190265, 5.82533417, 0.82239131, -0.8279679), s9 = c(0.237123,
-0.5004619, 0.4447322, -0.2155249, -0.2331443, 1.3438071, -0.3817672,
1.9228182, 0.305661, -0.01348, NA, NA, 3.4009042, 0.8268469,
0.2061843, -1.1228663, -0.1443778, 4.8789902, 1.3480328, 0.4258486
), s10 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
0.5211859, 0.2196643, -1.2333367, 0.1186947, 1.478086, 0.5211859,
0.2196643)), .Names = c("s1", "s2", "s3", "s4", "s5", "s6", "s7",
"s8", "s9", "s10"), class = "data.frame", row.names = c(NA, -20L
))
A solution with tryCatch could be
for(i in 1:nrow(dat)){
print(i)
corr <- tryCatch(cor.test(as.numeric(dat[i,]), v1, na.action = "na.omit"), error = function(e) return(NA))
if(length(corr) == 1){
result_table[i,1] <- NA
result_table[i,2] <- NA
}else{
result_table[i,1] <- corr$estimate
result_table[i,2] <- corr$p.value
}
}
Here is a solution with tryCatch():
Replacing the for loop with:
for(i in 1:nrow(dat)){
tryCatch({
print(i)
corr <- cor.test(as.numeric(dat[i,]), v1, na.action = "na.omit") # Correlation miRNA activity vs CNVs for that gene
result_table[i,1] <- corr$estimate
result_table[i,2] <- corr$p.value
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
I would like to paste values of a certain data.frame row to other rows which have a certain attribute of a certain feature, however not a whole row just a couple of values of it. Exactly it looks like:
z <- c(NA, NA, 3,4,2,3,5)
x <- c(NA, NA, 2,5,5,3,3)
a <- c("Hank", NA, NA, NA, NA, NA, NA)
b <- c("Hank", NA, NA, NA, NA, NA, NA)
c <- c(NA, NA, NA, NA, NA, NA, NA)
d <- c("Bobby", NA, NA, NA, NA, NA, NA)
df <- as.data.frame(rbind( a, b, c, d, z, x))
Now, I would like to pass df["z",3:7] to the rows[3:7] which have V1 == "Hank", and pass df["x", 3:7] when V1== "Bobby".
Do anybody has a hint for me? I guess it should be a function with sapply or something like that. Maybe a dplyr could give a solution? Thanks for any advice!
I have a for-loop that initializes 3 vectors (launch_2012, amount, and one_week_bf) and creates a data frame. Then, it predicts a single week's of data and inserts it into vectors (amount and one_week_bf), and recreates the data.frame again; this process is looped 8 times. However, I can't seem to get the data.frame to update the new amounts. Would anyone be able to assist please?
for (i in 1:8) {
launch_2012 <- c(rep('bf', 5), 'launch', rep('af', 7))
amount <- c(7946, 6641, 5975, 5378, 5217, NA, NA, NA, NA, NA, NA, NA, NA)
one_week_bf <- c(NA, 7946, 6641, 5975, 5378, 5217, NA, NA, NA, NA, NA, NA, NA)
newdata <- data.frame(amount = amount, one_week_bf = one_week_bf, launch = launch_2012, week = week)
predicted <- predict(model0a, newdata)
amount[i+5] <- predicted[i+5]
one_week_bf[i+6] <- predicted[i+5]
View(newdata)
}
It's difficult to be sure since your example is not reproducible, but note that predict.lm(...) by default has na.action=na.pass, which means that any rows in newdata that have any NA values by default generate NA for the prediction. Since your first pass of newdata has NA in rows 6-13, predicted will have NA in those same elements. This means that amounts and one_week_bf will have NA in those elements, which in turn will generate the same newdata each time.
None of this should be in a for loop.
x <- data.frame("launch_2012" = c(rep('bf', 5), 'launch', rep('af', 7)),
"amount"=c(7946, 6641, 5975, 5378, 5217, NA, NA, NA, NA, NA, NA, NA, NA),
"one_week_bf"=c(NA, 7946, 6641, 5975, 5378, 5217, NA, NA, NA, NA, NA, NA, NA))
x$new_amount <- #the replacement from your predict vector
x$new_one_week_bf <- #the replacement from your predict vector
Note I have no idea what model0a does, so just gave what the new columns should be as whatever the resulting vector is from your predict function. This will add the new data as new columns
I have a 10x10 matrix in R, called run_off. I would like to convert this matrix to a data frame that contains the entries of the matrix (the order doesn't really matter, although I'd prefer it to be filled by row) as well as the row and columns numbers of the entries as separate columns in the data frame, so that for instance element run_off[2,3] has a row in the data frame with 3 columns, the first containing the element itself, the second containing 2 and the third containing 3.
This is what I have so far:
run_off <- matrix(data = c(45630, 23350, 2924, 1798, 2007, 1204, 1298, 563, 777, 621,
53025, 26466, 2829, 1748, 732, 1424, 399, 537, 340, NA,
67318, 42333, -1854, 3178, 3045, 3281, 2909, 2613, NA, NA,
93489, 37473, 7431, 6648, 4207, 5762, 1890, NA, NA, NA,
80517, 33061, 6863, 4328, 4003, 2350, NA, NA, NA, NA,
68690, 33931, 5645, 6178, 3479, NA, NA, NA, NA, NA,
63091, 32198, 8938, 6879, NA, NA, NA, NA, NA, NA,
64430, 32491, 8414, NA, NA, NA, NA, NA, NA, NA,
68548, 35366, NA, NA, NA, NA, NA, NA, NA, NA,
76013, NA, NA, NA, NA, NA, NA, NA, NA, NA)
, nrow = 10, ncol = 10, byrow = TRUE)
df <- data.frame()
for (i in 1:nrow(run_off)) {
for (k in 1:ncol(run_off)) {
claim <- run_off[i,k]
acc_year <- i
dev_year <- k
df[???, "claims"] <- claim # Problem here
df[???, "acc_year"] <- acc_year # and here
df[???, "dev_year"] <- dev_year # and here
}
}
dev_year refers to the column number of the matrix entry and acc_yearto the row number. My problem is that I don't know the proper index to use for the data frame.
I am assuming you are not interested in the NA elements? You can use which and the arr.ind = TRUE argument to return a two column matrix of array indices for each value and cbind this to the values, excluding the NA values:
# Get array indices
ind <- which( ! is.na(run_off) , arr.ind = TRUE )
# cbind indices to values
out <- cbind( run_off[ ! is.na( run_off ) ] , ind )
head( as.data.frame( out ) )
# V1 row col
#1 45630 1 1
#2 53025 2 1
#3 67318 3 1
#4 93489 4 1
#5 80517 5 1
#6 68690 6 1
Use t() on the matrix first if you want to fill by row, e.g. which( ! is.na( t( run_off ) ) , arr.ind = TRUE ) (and when you cbind it).