Function ignoring my if condition statement - r

I have a tree outlined in a data frame as:
number conc knot neg pick
1 1 0 0 1
2 1 0 0 1
3 1 0 0 1
4 3 164 0 1
5 1 0 0 1
6 1 0 0 1
7 3 159 1 1
8 0 0 0 0
9 0 0 0 0
10 3 208 1 1
11 3 181 1 1
12 3 1 1 1
13 3 95 0 1
14 0 0 0 0
15 0 0 0 0
I'm traversing the tree with a recursive function:
printtree <- function(number,tree) {
if (!is.na(tree[number,5] != 0)) {
letssee<-c(tree[number,1],tree[number,2],tree[number,3],tree[number,4],tree[number,5])
print(letssee)
}
left <- tree[number,1]
if (!is.na(left)) printtree(tree[left,1]*2,tree)
right <- tree[number,1]
if (!is.na(right)) printtree(tree[right,1]*2+1,tree)
}
My if condition should be omitting lines when the pick column = 0 but it is still printing and I can't figure out why.
Here's the output:
[1] 1 1 0 0 1
[1] 2 1 0 0 1
[1] 4 3 164 0 1
[1] 8 0 0 0 0
[1] 9 0 0 0 0
[1] 5 1 0 0 1
[1] 10 3 208 1 1
[1] 11 3 181 1 1
[1] 3 1 0 0 1
[1] 6 1 0 0 1
[1] 12 3 1 1 1
[1] 13 3 95 0 1
[1] 7 3 159 1 1
[1] 14 0 0 0 0
[1] 15 0 0 0 0
Is it ignoring my if statement because of is.na()? If I don't have the is.na check I get an error for "missing value where TRUE/FALSE needed" so it has to be there.

If tree[number, 5] happens to really equal zero, then the internal test
tree[number, 5] !=0
Will return FALSE. FALSE is not an NA value, so !is.na(FALSE) will always return TRUE. So the if statement as you've written it will always return TRUE if tree[number, 5] is zero.
Maybe try:
if (!is.na(X) & X !=0) {...}

Related

Recoding range of numerics into single numeric in R

I am trying to recode a data frame with four columns. Across all of the columns, I want to recode all the numeric values into these ordinal numeric values:
0 stays as is
1:3 <- 1
4:10 <- 2
11:22 <- 3
22:max <-4
This is the data frame:
> df
T4.1 T4.2 T4.3 T4.4
1 0 54 0 5
2 0 5 0 0
3 0 3 0 0
4 0 2 0 0
5 0 3 0 0
6 0 2 0 0
7 0 4 0 0
8 1 20 0 0
9 1 7 0 2
10 0 14 0 0
11 0 3 0 0
12 0 202 0 41
13 2 12 0 0
14 3 6 0 0
15 3 21 0 3
16 0 143 0 0
17 0 0 0 0
18 4 9 0 0
19 3 15 0 0
20 0 58 0 6
21 2 0 0 0
22 0 52 0 0
23 0 3 0 0
24 0 1 0 0
25 4 6 0 1
26 1 4 0 0
27 0 38 0 1
28 0 6 0 0
29 0 8 0 0
30 0 29 0 4
31 1 14 0 0
32 0 12 0 10
33 4 1 0 3
I'm trying to use the recode function, but I can't seem to figure out how to input a range of numeric values into it. I get the following errors with my attempts:
> recode(df, 11:22=3)
Error: unexpected '=' in "recode(df, 11:22="
> recode(df, c(11:22)=3)
Error: unexpected '=' in "recode(df, c(11:22)="
I would greatly appreciate any advice. Thanks for your time!
Edit: Thanks all for the help!!
You can use cut with range of values as:
df_res <- as.data.frame(sapply(df, function(x)cut(x,
breaks = c(-0.5, 0.5, 3.5, 10.5, 22.5, Inf),
labels = c(0, 1, 2, 3, 4)))
)
str(df_res)
#'data.frame': 33 obs. of 4 variables:
# $ T4.1: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 2 2 1 ...
# $ T4.2: Factor w/ 5 levels "0","1","2","3",..: 5 3 2 2 2 2 3 4 3 4 ...
# $ T4.3: Factor w/ 1 level "0": 1 1 1 1 1 1 1 1 1 1 ...
# $ T4.4: Factor w/ 4 levels "0","1","2","4": 3 1 1 1 1 1 1 1 2 1 ...
df_res
# T4.1 T4.2 T4.3 T4.4
# 1 0 4 0 2
# 2 0 2 0 0
# 3 0 1 0 0
# 4 0 1 0 0
# 5 0 1 0 0
# 6 0 1 0 0
# 7 0 2 0 0
# 8 1 3 0 0
# 9 1 2 0 1
# 10 0 3 0 0
# 11 0 1 0 0
# 12 0 4 0 4
# 13 1 3 0 0
# 14 1 2 0 0
# 15 1 3 0 1
# 16 0 4 0 0
# 17 0 0 0 0
# 18 2 2 0 0
# 19 1 3 0 0
# 20 0 4 0 2
# 21 1 0 0 0
# 22 0 4 0 0
# 23 0 1 0 0
# 24 0 1 0 0
# 25 2 2 0 1
# 26 1 2 0 0
# 27 0 4 0 1
# 28 0 2 0 0
# 29 0 2 0 0
# 30 0 4 0 2
# 31 1 3 0 0
# 32 0 3 0 2
# 33 2 1 0 1
I find named vectors are a nice pattern for re-coding variables, especially for irregular patterns. You could use one like this here:
decoder <- c(0, rep(1,3), rep(2,7), rep(3, 12))
names(decoder) <- 0:22
sapply(df, function(x) ifelse(x <= 22, decoder[as.character(x)], 4))
If the re-coding was more of a pattern, cut is a useful function.

correlation with multiple variables and its mutiple combination

Here is the example of the data set to be calculated the correlation between O_data and possible multiple combinations of M_data.
O_data=runif(10)
M_a=runif(10)
M_b=runif(10)
M_c=runif(10)
M_d=runif(10)
M_e=runif(10)
M_data=data.frame(M_a,M_b,M_c,M_d,M_e)
I can calculate the correlation between O_data and individual M_data data.
correlation= matrix(NA,ncol = length(M_data[1,]))
for (i in 1:length(correlation))
{
correlation[,i]=cor(O_data,M_data[,i])
}
In addition to this, how can I get the correlation between O_data and possible multiple combinations of M_data set?
let's clarify the combination.
cor_M_ab=cor((M_a+M_b),O_data)
cor_M_abc=cor((M_a+M_b+M_c),O_data)
cor_M_abcd=...
cor_M_abcde=...
...
....
cor_M_bcd=..
..
cor_M_eab=...
....
...
I don't want combinations of M_a and M_c, I want the combination on a continuous basis, like, M_ab, or bc,bcd,abcde,ea,eab........
Generate the data using set.seed so you can reproduce:
set.seed(42)
O_data=runif(10)
M_a=runif(10)
M_b=runif(10)
M_c=runif(10)
M_d=runif(10)
M_e=runif(10)
M_data=data.frame(M_a,M_b,M_c,M_d,M_e)
The tricky part is just keeping things organized. Since you didn't specify, I made a matrix with 5 rows and 31 columns. The rows get the names of the variables in your M_data. Here's the matrix (motivated by: All N Combinations of All Subsets)
M_grid <- t(do.call(expand.grid, replicate(5, 0:1, simplify = FALSE))[-1,])
rownames(M_grid) <- names(M_data)
M_grid
#> 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
#> M_a 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
#> M_b 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1
#> M_c 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0
#> M_d 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1
#> M_e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
#> 28 29 30 31 32
#> M_a 1 0 1 0 1
#> M_b 1 0 0 1 1
#> M_c 0 1 1 1 1
#> M_d 1 1 1 1 1
#> M_e 1 1 1 1 1
Now when I do a matrix multiplication of M_data and any column of my M_grid I get a sum of the columns in M_data corresponding to which rows of M_grid have 1's. For example:
as.matrix(M_data) %*% M_grid[,4]
gives me the sum of M_a and M_b. I can calculate the correlation between O_data and any of these sums. Putting it all together in one line:
(final <- cbind(t(M_grid), apply(as.matrix(M_data) %*% M_grid, 2, function(x) cor(O_data, x))))
#> M_a M_b M_c M_d M_e
#> 2 1 0 0 0 0 0.066499681
#> 3 0 1 0 0 0 -0.343839423
#> 4 1 1 0 0 0 -0.255957896
#> 5 0 0 1 0 0 0.381614222
#> 6 1 0 1 0 0 0.334916617
#> 7 0 1 1 0 0 0.024198743
#> 8 1 1 1 0 0 0.059297654
#> 9 0 0 0 1 0 0.180676146
#> 10 1 0 0 1 0 0.190656099
#> 11 0 1 0 1 0 -0.140666930
#> 12 1 1 0 1 0 -0.094245439
#> 13 0 0 1 1 0 0.363591787
#> 14 1 0 1 1 0 0.363546012
#> 15 0 1 1 1 0 0.111435827
#> 16 1 1 1 1 0 0.142772457
#> 17 0 0 0 0 1 0.248640472
#> 18 1 0 0 0 1 0.178471959
#> 19 0 1 0 0 1 -0.117930168
#> 20 1 1 0 0 1 -0.064838097
#> 21 0 0 1 0 1 0.404258155
#> 22 1 0 1 0 1 0.348609692
#> 23 0 1 1 0 1 0.114267433
#> 24 1 1 1 0 1 0.131731971
#> 25 0 0 0 1 1 0.241561478
#> 26 1 0 0 1 1 0.229693510
#> 27 0 1 0 1 1 0.001390233
#> 28 1 1 0 1 1 0.030884234
#> 29 0 0 1 1 1 0.369212761
#> 30 1 0 1 1 1 0.354971839
#> 31 0 1 1 1 1 0.166132390
#> 32 1 1 1 1 1 0.182368955
The final column is the correlation of O_data with all 31 possible sums of columns in M_data. You can tell which column is included by seeing which has a 1 under it for that row.
I try not to resort to matrices too much but this was the first thing I thought of.

How to find three consecutive rows with the same value

I have a dataframe as follows:
chr leftPos Sample1 X.DD 3_samples MyStuff
1 324 -1 1 1 1
1 4565 -1 0 0 0
1 6887 -1 1 0 0
1 12098 1 -1 1 1
2 12 -1 1 0 1
2 43 -1 1 1 1
5 1 -1 1 1 0
5 43 0 1 -1 0
5 6554 1 1 1 1
5 7654 -1 0 0 0
5 8765 1 1 1 0
5 9833 1 1 1 -1
6 12 1 1 0 0
6 43 0 0 0 0
6 56 1 0 0 0
6 79 1 0 -1 0
6 767 1 0 -1 0
6 3233 1 0 -1 0
I would like to convert it according to the following rules
For each chromosome:
a. If there are three or more 1's or -1's consecutively in a column then the value stays as it is.
b. If there are less than three 1's or -1s consecutively in a column then the value of the 1 or -1 changes to 0
The rows in a column have to have the same sign (+ or -ve) to be called consecutive.
The result of the dataframe above should be:
chr leftPos Sample1 X.DD 3_samples MyStuff
1 324 -1 0 0 0
1 4565 -1 0 0 0
1 6887 -1 0 0 0
1 12098 0 0 0 0
2 12 0 0 0 0
2 43 0 0 0 0
5 1 0 1 0 0
5 43 0 1 0 0
5 6554 0 1 0 0
5 7654 0 0 0 0
5 8765 0 0 0 0
5 9833 0 0 0 0
6 12 0 0 0 0
6 43 0 0 0 0
6 56 1 0 0 0
6 79 1 0 -1 0
6 767 1 0 -1 0
6 3233 1 0 -1 0
I have managed to do this for two consecutive rows but I'm not sure how to change this for three or more rows.
DAT_list2res <-cbind(DAT_list2[1:2],DAT_list2res)
colnames(DAT_list2res)[1:2]<-c("chr","leftPos")
DAT_list2res$chr<-as.numeric(gsub("chr","",DAT_list2res$chr))
DAT_list2res<-as.data.frame(DAT_list2res)
dx<-DAT_list2res
f0 <- function( colNr, dx)
{
col <- dx[,colNr]
n1 <- which(col == 1| col == -1) # The `1`-rows.
d0 <- which( diff(col) == 0) # Consecutive rows in a column are equal.
dc0 <- which( diff(dx[,1]) == 0) # Same chromosome.
m <- intersect( n1-1, intersect( d0, dc0 ) )
return ( setdiff( 1:nrow(dx), union(m,m+1) ) )
}
g <- function( dx )
{
for ( i in 3:ncol(dx) ) { dx[f0(i,dx),i] <- 0 }
return ( dx )
}
dx<-g(dx)
Here is one solution only using base R.
First define a function that will replace any repetitions which are less than 3 for zeros:
replace_f <- function(x){
subs <- rle(x)
subs$values[subs$lengths < 3] <- 0
inverse.rle(subs)
}
Then split your data.frame by chr and then apply the function to all columns that you want to change (in this case columns 3 to 6):
df[,3:6] <- do.call("rbind", lapply(split(df[,3:6], df$chr), function(x) apply(x, 2, replace_f)))
Notice that we combine the results together with rbind before replacing the original data. This will give you the desired result:
chr leftPos Sample1 X.DD X3_samples MyStuff
1 1 324 -1 0 0 0
2 1 4565 -1 0 0 0
3 1 6887 -1 0 0 0
4 1 12098 0 0 0 0
5 2 12 0 0 0 0
6 2 43 0 0 0 0
7 5 1 0 1 0 0
8 5 43 0 1 0 0
9 5 6554 0 1 0 0
10 5 7654 0 0 0 0
11 5 8765 0 0 0 0
12 5 9833 0 0 0 0
13 6 12 0 0 0 0
14 6 43 0 0 0 0
15 6 56 1 0 0 0
16 6 79 1 0 -1 0
17 6 767 1 0 -1 0
18 6 3233 1 0 -1 0
A data.table solution using rleid would be
require(data.table)
setDT(dat)
dat[,Sample1 := Sample1 * as.integer(.N>=3), by=.(chr, rleid(Sample1))]
This used the grouping by rleid(Sample1) and data.table's helpful .N-variable.
Doing it for all columns you could use the eval(parse(text=...)) syntax as follows:
for(i in names(dat)[3:6]){
by_string = paste0("list(chr, rleid(", i, "))")
def_string = paste0(i, "* as.integer(.N>=3)")
dat[,(i) := eval(parse(text=def_string)), by=eval(parse(text=by_string))]
}
So it results in:
> dat[]
chr leftPos Sample1 X.DD X3_samples MyStuff
1: 1 324 -1 0 0 0
2: 1 4565 -1 0 0 0
3: 1 6887 -1 0 0 0
4: 1 12098 0 0 0 0
5: 2 12 0 0 0 0
6: 2 43 0 0 0 0
7: 5 1 0 1 0 0
8: 5 43 0 1 0 0
9: 5 6554 0 1 0 0
10: 5 7654 0 0 0 0
11: 5 8765 0 0 0 0
12: 5 9833 0 0 0 0
13: 6 12 0 0 0 0
14: 6 43 0 0 0 0
15: 6 56 1 0 0 0
16: 6 79 1 0 -1 0
17: 6 767 1 0 -1 0
18: 6 3233 1 0 -1 0

Convert binary string to decimal

I have a question on data conversion from binary to decimal. Suppose I have a binary pattern like this:
pattern<-do.call(expand.grid, replicate(5, 0:1, simplify=FALSE))
pattern
Var1 Var2 Var3 Var4 Var5
1 0 0 0 0 0
2 1 0 0 0 0
3 0 1 0 0 0
4 1 1 0 0 0
5 0 0 1 0 0
6 1 0 1 0 0
7 0 1 1 0 0
8 1 1 1 0 0
9 0 0 0 1 0
10 1 0 0 1 0
11 0 1 0 1 0
12 1 1 0 1 0
13 0 0 1 1 0
14 1 0 1 1 0
15 0 1 1 1 0
16 1 1 1 1 0
17 0 0 0 0 1
18 1 0 0 0 1
19 0 1 0 0 1
20 1 1 0 0 1
21 0 0 1 0 1
22 1 0 1 0 1
23 0 1 1 0 1
24 1 1 1 0 1
25 0 0 0 1 1
26 1 0 0 1 1
27 0 1 0 1 1
28 1 1 0 1 1
29 0 0 1 1 1
30 1 0 1 1 1
31 0 1 1 1 1
32 1 1 1 1 1
I'm wondering in R what is the easiest way to convert each row to a decimal value? and versus. such as:
00000->0
10000->16
...
01111->15
Try:
res <- strtoi(apply(pattern,1, paste, collapse=""), base=2)
res
#[1] 0 16 8 24 4 20 12 28 2 18 10 26 6 22 14 30 1 17 9 25 5 21 13 29 3
#[26] 19 11 27 7 23 15 31
You could try intToBits to convert back to the binary:
pat2 <- t(sapply(res, function(x) as.integer(rev(intToBits(x)))))[,28:32]
pat1 <- as.matrix(pattern)
dimnames(pat1) <- NULL
identical(pat1, pat2)
#[1] TRUE
You can try:
as.matrix(pattern) %*% 2^((ncol(pattern)-1):0)

How to get a square table?

I've got the following code to create a classification table in R:
> table(class = class1, truth = valid[,1])
1 2 3 4 5 6 7 8 9 10 11 12
1 357 73 0 0 47 0 5 32 20 0 4 7
2 25 71 0 0 23 4 1 0 2 1 8 3
3 1 2 120 1 5 0 1 0 0 0 0 0
4 0 0 0 77 0 0 0 0 1 0 0 0
5 15 27 0 0 67 6 7 0 4 1 5 7
6 1 2 0 0 2 44 0 0 0 7 7 0
7 1 1 0 0 10 0 66 0 1 0 1 7
9 1 0 0 0 3 0 0 2 8 0 0 2
10 1 1 0 0 1 6 0 0 0 17 0 0
11 0 7 0 0 3 1 0 0 0 4 10 2
12 0 1 0 0 1 0 0 0 0 0 0 1
However, I need this table to be a square (line 8 is missing in this example), i.e. the number of rows should equal the number of columns, and I need the rownames and colnames to be preserved. The missing line should be filled with zeros. Any way of doing this?
The problem most probably comes from a difference in levels.
Try copying the levels from valid to class1:
class1 <- factor(class1, levels=levels(valid[,1])
table(class = class1, truth = valid[,1])

Resources