I have this data frame in R and I need to select only rows that match at least two of the following conditions :
A >= 5
B >= 5
C >= 5
D >= 5
A B C D
1 0.000000 48.936170 0.000000 29.787234
2 0.000000 72.340426 0.000000 6.382979
3 0.000000 78.723404 0.000000 2.127660
4 2.127660 78.723404 0.000000 0.000000
5 0.000000 43.617021 0.000000 35.106383
6 0.000000 79.787234 0.000000 1.063830
7 3.191489 0.000000 77.659574 0.000000
8 77.659574 0.000000 2.127660 0.000000
9 46.808511 0.000000 0.000000 31.914894
10 35.106383 0.000000 27.659574 0.000000
The only solution I found is to use "if"...
if ( ((data$A >=5) + (data$B >=5) + (data$C >=5) + (data$D >=5)) >=2 ) {
#result }
...but I cannot find how to combine the if selection with my data frame.
I tried like this but I doesn't seem to be the solution for this problem :
Selection = data[if ( ((data$A >=5) + (data$B >=5) + (data$C >=5) + (data$D >=5)) >=2 ),]
Thanking you in advance for your help,
You could also do
df <- read.table(header=T, text=" A B C D
1 0.000000 48.936170 0.000000 29.787234
2 0.000000 72.340426 0.000000 6.382979
3 0.000000 78.723404 0.000000 2.127660
4 2.127660 78.723404 0.000000 0.000000
5 0.000000 43.617021 0.000000 35.106383
6 0.000000 79.787234 0.000000 1.063830
7 3.191489 0.000000 77.659574 0.000000
8 77.659574 0.000000 2.127660 0.000000
9 46.808511 0.000000 0.000000 31.914894
10 35.106383 0.000000 27.659574 0.000000")
df[rowSums(df >= 5) >= 2, ]
# A B C D
# 1 0.00000 48.93617 0.00000 29.787234
# 2 0.00000 72.34043 0.00000 6.382979
# 5 0.00000 43.61702 0.00000 35.106383
# 9 46.80851 0.00000 0.00000 31.914894
# 10 35.10638 0.00000 27.65957 0.000000
Related
I have a dataframe with 152 rows and 300 columns.
Showing you first 3 rows and all 300 columns
genea 2500 2691 genea 191.0 + 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.380752 -0.380752 -0.531231 -0.681710 -0.681710 -0.681710 -0.681710 -0.340855 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.190376 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.451626 0.903252 0.903252 0.654369 0.654369 0.778811 0.903252 0.903252 -0.681710 -0.681710 -0.340855 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.809625 1.619250 1.619250 1.257220 1.257220 1.057214 0.857208 0.857208 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.190376 -0.380752 -0.380752 0.672255 0.672255 0.672255 0.672255 0.672255 0.903252 0.903252 1.422216 1.941180 1.941180 1.508340 1.508340 1.912020 2.315700 2.315700 3.317330 3.317330 3.840005 4.362680 4.362680 3.508340 3.508340 3.128800 2.749260 2.749260 3.531090 3.531090 2.982865 2.434640 2.434640 1.975690 1.975690 2.516920 3.058150 3.058150 5.556610 5.556610 5.922590 6.288570 6.288570 2.056200 2.056200 2.563420 3.070640 3.070640 3.577700 3.577700 4.076065 4.574430 4.574430 4.008980 4.008980 4.648165 5.287350 5.287350 5.550990 5.550990 3.810200 2.069410 2.069410 0.000000 0.000000 1.584965 3.169930 3.169930 3.169930 3.169930 3.243285 3.316640 3.316640 4.766030 4.766030 4.925570 5.085110 5.085110 6.746300 6.746300 6.693390 6.640480 6.640480 5.850710 5.850710 5.628100 5.405490 5.405490 4.830740 4.830740 5.017090 5.203440 5.203440 6.095880 6.095880 6.392065 6.688250 6.688250 6.337030 6.337030 5.835895 5.334760 5.334760 4.836420 4.836420 4.736225 4.636030 4.636030 3.659990 3.659990 4.325255 4.990520 4.990520 4.756270 4.756270 2.378135 0.000000 0.000000 3.700440 3.700440 3.921625 4.142810 4.142810 4.318290 4.318290 4.490965 4.663640 4.663640 4.643860 4.643860 3.706855 2.769850 2.769850 2.878250 2.878250 3.156445 3.434640 3.434640 3.676790 3.676790 3.867180 4.057570 4.057570 4.192870 4.192870 4.521820 4.850770 4.850770 4.602990 4.602990 4.119790 3.636590 3.636590 3.899620 3.899620 4.155710 4.411800 4.411800
chr11 62841618 62841809 geneb 191.0 - -0.613539 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.228451 -0.380752 -0.380752 -0.380752 -0.380752 -0.152301 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.038075 -0.380752 -0.380752 -0.380752 -0.380752 -0.342677 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.228451 -0.380752 -0.380752 -0.380752 -0.380752 -0.152301 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.228451 -0.380752 -0.380752 -0.380752 -0.380752 -0.152301 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.068171 -0.681710 -0.681710 0.269267 0.903252 0.857144 0.442170 0.442170 0.718819 0.903252 0.812927 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.068171 -0.681710 -0.681710 -0.681710 -0.681710 -0.651614 -0.380752 -0.380752 0.447699 1.000000 1.000000 1.000000 1.000000 1.350976 1.584960 1.626464 2.000000 2.000000 1.400000 1.000000 0.900000 0.000000 0.000000 1.550976 2.584960 2.584960 2.584960 2.584960 0.805533 -0.380752 -0.180752 1.619250 1.619250 2.198676 2.584960 2.743457 4.169930 4.169930 4.612106 4.906890 4.874697 4.584960 4.584960 3.633984 3.000000 2.858496 1.584960 1.584960 3.348120 4.523560 4.523560 4.523560 4.523560 1.809424 0.000000 0.370044 3.700440 3.700440 3.824310 3.906890 3.862144 3.459430 3.459430 3.427222 3.405750 3.561390 4.962150 4.962150 5.458362 5.789170 5.720218 5.099650 5.099650 4.866226 4.710610 4.661025 4.214760 4.214760 3.676302 3.317330 3.349619 3.640220 3.640220 4.456088 5.000000 5.032193 5.321930 5.321930 5.101292 4.954200 4.852898 3.941180 3.941180 4.168286 4.319690 4.374439 4.867180 4.867180 4.999348 5.087460 4.978714 4.000000 4.000000 2.550976 1.584960 1.688389 2.619250 2.619250 2.619250 2.619250 2.357325 0.000000 0.000000 0.000000 0.000000 0.332193 3.321930 3.321930 3.321930 3.321930 3.335680 3.459430 3.459430 3.068182 2.807350 2.807350 2.807350 2.807350 2.922940 3.000000 2.800000 1.000000 1.000000 0.400000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.038075 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 0.447699 1.000000
chr17 43367899 43368087 genec 188.0 - 0.000000 1.600000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 1.400000 0.000000 0.000000 0.000000 0.000000 -0.204513 -0.681710 -0.681710 -0.681710 -0.681710 -0.681710 -0.681710 -0.681710 -0.136342 0.000000 -0.114226 -0.380752 -0.380752 -0.380752 -0.380752 -0.266526 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.204513 -0.681710 -0.681710 -0.681710 -0.681710 -0.477197 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.114226 -0.380752 -0.380752 0.419248 0.619248 0.619248 0.619248 0.619248 0.923850 1.000000 1.000000 1.000000 1.000000 0.200000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.267968 1.584960 1.584960 1.584960 1.584960 0.316992 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.800000 1.000000 1.000000 1.000000 1.000000 0.200000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.300000 1.000000 1.000000 1.000000 1.000000 0.700000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.800000 1.000000 1.000000 1.000000 1.000000 0.200000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.267968 1.584960 1.584960 1.584960 1.584960 0.316992 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.300000 1.000000 1.000000 1.000000 1.000000 0.700000 0.000000 0.000000 1.600000 2.000000 2.974379 5.247930 5.247930 5.674674 5.781360 5.694507 5.491850 5.491850 3.623866 3.156870 3.505008 4.317330 4.317330 5.117330 5.317330 5.650009 6.426260 6.426260 6.155220 6.087460 6.087460 6.087460 6.087460 5.180852 4.954200 4.464519 3.321930 3.321930 3.934354 4.087460 3.936710 3.584960 3.584960 3.252936 3.169930 2.994439 2.584960 2.584960 4.030848 4.392320 4.371203 4.321930 4.321930 4.532354 4.584960 4.722789 5.044390 5.044390 5.044390 5.044390 4.413427 2.941180 2.941180 2.141180 1.941180 2.320089 3.204210 3.204210 3.040842 3.000000 3.000000 3.000000 3.000000 0.600000 0.000000 0.996579 3.321930 3.321930 3.431930 3.459430 3.459430 3.459430 3.459430 0.387284 -0.380752 -0.380752 -0.380752 -0.380752 -0.076150 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.204513 -0.681710 -0.681710 -0.681710 -0.681710 -0.477197 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
I am plotting the values in this dataframe using pheatmap function as follows:
pheatmap(dmat,
scale="none",
cluster_rows = FALSE,
cluster_cols = FALSE,
annotation_names_col = FALSE,
show_colnames= FALSE,
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(500),
main = "figure1",
border_color = NA
)
I want to add a label at 200th column "TSS" on this pheatmap generated above. What code should I use for that? "TSS" should appear in the last row and 200th column
Thanks in advance
Here is a solution based on grobs.
library(pheatmap)
library(RColorBrewer)
library(grid)
# Generate data
nc <- 300
nr <- 152
ref <- 200 # The column where you need to add a label
dmat <- matrix(runif(nr*nc), ncol=nc)
dmat[,ref] <- 0
q <- pheatmap(dmat,
scale="none",
cluster_rows = FALSE,
cluster_cols = FALSE,
annotation_names_col = FALSE,
show_colnames= FALSE,
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(500),
main = "Figure1",
border_color = NA
)
downViewport("matrix.4-3-4-3")
grid.text("TSS", x=ref/nc, y=1, vjust=-0.5, gp=gpar(col="red", fontface=2, fontsize=12))
popViewport()
The list of available viewports generated by pheatmap can be retrieved using
grid.draw(q)
current.vpTree()
# viewport[ROOT]->(viewport[layout]->(viewport[layout]->
# (viewport[main.1-3-1-3], viewport[legend.4-5-5-5], viewport[matrix.4-3-4-3])))
I have this kind of data:
set.seed(12345)
df <- data.frame(group=rep(c("A"),26), size=c(rep(1000,5),rep(0,3),rep(1000,7),rep(0,3),rep(1000,5),rep(0,3)),
int=c(rnorm(3,5,1),rep(0,5),rnorm(3,5,1),rep(0,7),rnorm(3,5,1),rep(0,5)),
out=c(rep(0,5),rnorm(3,5,1),rep(0,7),rnorm(3,5,1),rep(0,5),rnorm(3,5,1)))
Here is desired output:
group size int out id id2
1 A 1000 5.585529 0.000000 1 1
2 A 1000 5.709466 0.000000 1 1
3 A 1000 4.890697 0.000000 1 1
4 A 1000 0.000000 0.000000 1 1
5 A 1000 0.000000 0.000000 1 1
6 A 0 0.000000 4.080678 1 1
7 A 0 0.000000 4.883752 NA 1
8 A 0 0.000000 6.817312 NA 1
9 A 1000 4.546503 0.000000 2 2
10 A 1000 5.605887 0.000000 2 2
11 A 1000 3.182044 0.000000 2 2
12 A 1000 0.000000 0.000000 2 2
13 A 1000 0.000000 0.000000 2 2
14 A 1000 0.000000 0.000000 2 2
15 A 1000 0.000000 0.000000 2 2
16 A 0 0.000000 5.370628 2 2
17 A 0 0.000000 5.520216 NA 2
18 A 0 0.000000 4.249468 NA 2
19 A 1000 5.630099 0.000000 3 3
20 A 1000 4.723816 0.000000 3 3
21 A 1000 4.715840 0.000000 3 3
22 A 1000 0.000000 0.000000 3 3
23 A 1000 0.000000 0.000000 3 3
24 A 0 0.000000 5.816900 3 3
25 A 0 0.000000 4.113642 NA 3
26 A 0 0.000000 4.668422 NA 3
The new group id is created based on the data above. I believe rle function is the way to go, but I cannot figure it out to the end.
A variation on #ycw's answer:
library(data.table)
setDT(df)
df[, g := rleid( z <- out==0 | shift(out==0) )*NA^(!z) ]
group size int out g
1: A 1000 5.585529 0.000000 1
2: A 1000 5.709466 0.000000 1
3: A 1000 4.890697 0.000000 1
4: A 1000 0.000000 0.000000 1
5: A 1000 0.000000 0.000000 1
6: A 0 0.000000 4.080678 1
7: A 0 0.000000 4.883752 NA
8: A 0 0.000000 6.817312 NA
9: A 2000 4.546503 0.000000 3
10: A 2000 5.605887 0.000000 3
11: A 2000 3.182044 0.000000 3
12: A 2000 0.000000 0.000000 3
13: A 2000 0.000000 0.000000 3
14: A 2000 0.000000 0.000000 3
15: A 2000 0.000000 0.000000 3
16: A 0 0.000000 5.370628 3
17: A 0 0.000000 5.520216 NA
18: A 0 0.000000 4.249468 NA
19: A 5000 5.630099 0.000000 5
20: A 5000 4.723816 0.000000 5
21: A 5000 4.715840 0.000000 5
22: A 5000 0.000000 0.000000 5
23: A 5000 0.000000 0.000000 5
24: A 0 0.000000 5.816900 5
25: A 0 0.000000 4.113642 NA
26: A 0 0.000000 4.668422 NA
group size int out g
(#ycw suggested I make it a separate answer. Also, the NA^x trick is borrowed from #akrun.)
For the OP's group numbers, this extra step works:
df[, g := match(g, unique(na.omit(g)))]
For the extension the OP added ("id2"):
w = df[.(unique(na.omit(g))), on=.(g), which=TRUE, mult="first"]
df[, g2 := cumsum(.I %in% w)]
So in the end we have...
group size int out g g2
1: A 1000 5.585529 0.000000 1 1
2: A 1000 5.709466 0.000000 1 1
3: A 1000 4.890697 0.000000 1 1
4: A 1000 0.000000 0.000000 1 1
5: A 1000 0.000000 0.000000 1 1
6: A 0 0.000000 4.080678 1 1
7: A 0 0.000000 4.883752 NA 1
8: A 0 0.000000 6.817312 NA 1
9: A 2000 4.546503 0.000000 2 2
10: A 2000 5.605887 0.000000 2 2
11: A 2000 3.182044 0.000000 2 2
12: A 2000 0.000000 0.000000 2 2
13: A 2000 0.000000 0.000000 2 2
14: A 2000 0.000000 0.000000 2 2
15: A 2000 0.000000 0.000000 2 2
16: A 0 0.000000 5.370628 2 2
17: A 0 0.000000 5.520216 NA 2
18: A 0 0.000000 4.249468 NA 2
19: A 5000 5.630099 0.000000 3 3
20: A 5000 4.723816 0.000000 3 3
21: A 5000 4.715840 0.000000 3 3
22: A 5000 0.000000 0.000000 3 3
23: A 5000 0.000000 0.000000 3 3
24: A 0 0.000000 5.816900 3 3
25: A 0 0.000000 4.113642 NA 3
26: A 0 0.000000 4.668422 NA 3
group size int out g g2
For base R analogues, there is an SO Q&A on how to make rleid without data.table; shift can be constructed manually (it's just a lag operator); and there are other ways to find w (maybe tapply?).
Here is an option using dplyr and the rleid function from the data.table package. dt2 is the final output.
library(dplyr)
library(data.table)
df2 <- df %>%
mutate(non_zero = ifelse(size != 0, 1, 0)) %>%
mutate(runID = rleid(non_zero)) %>%
mutate(runID = ifelse(runID %% 2 != 0, (runID + 1)/2, runID/2)) %>%
group_by(runID) %>%
mutate(id = ifelse(row_number() %in% n():(n() - 1), NA, runID)) %>%
ungroup() %>%
select(group, size, int, out, id, id2 = runID)
I have the following Matrix in R:
> head(k)
row.names SRX003922 SRX001291 SRX001364 SRX001365
ENSG00000000003 12.1999909 1.982836 0.000000 1.335383
ENSG00000000005 47.8615027 0.000000 0.000000 0.000000
ENSG00000000419 0.9384608 11.897018 4.735838 3.338457
ENSG00000000457 13.1384517 23.794037 16.575434 16.024595
ENSG00000000460 0.0000000 0.000000 0.000000 0.000000
ENSG00000000938 10.3230692 0.000000 0.000000 0.000000
SRX001366 SRX001367 SRX001368 SRX003931
ENSG00000000003 0.0000000 1.220217 0.000000 5.641656
ENSG00000000005 0.0000000 0.000000 0.000000 1.880552
ENSG00000000419 10.5627363 6.711194 7.510932 5.014805
ENSG00000000457 21.1254727 15.862822 18.151419 7.522207
ENSG00000000460 0.0000000 3.050543 0.625911 0.000000
ENSG00000000938 0.7041824 0.000000 0.000000 1.253701
I would like to know how to recover an entire row from this matrix by giving a row.names value?
I'm trying to calculate the cross-correlation scores between a series of vectors.
My code for doing so is:
smusF<-array(ccf(arrMat[[r,1]],arrMat[[h,1]],lag.max=(length(arrMat[[r,1]])+length(arrMat[[h,1]])))[[1]])
Where smusF is the vector of all ccf scores, for each overlap position, and the r and h variables assign differing vectors for comparison.
What I'm finding is sometimes this code has no errors and works correctly, but sometimes it produces the error message:
Error in plot.window(...) : need finite 'ylim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
I'm really stuck as to why this happens in some instances and not others, and what I can do to fix it. Any help much appreciated.
An example of two vectors that produce this error message when compared are..
> arrMat[[r,1]]
[1] 0.011688 0.014871 0.015314 0.013446 0.008538 0.006948 0.006514 0.004343
[9] 0.002171 0.000000 0.002196 0.006899 0.012790 0.014289 0.015993 0.015321
[17] 0.016845 0.010438 0.005219 0.003040 0.007235 0.011430 0.009546 0.005351
[25] 0.004531 0.006752 0.011283 0.009062 0.006842 0.002311 0.002311 0.003614
[33] 0.006072 0.008825 0.009742 0.012397 0.013938 0.019788 0.022163 0.025293
[41] 0.022794 0.024204 0.020493 0.017092 0.010652 0.009013 0.007760 0.008768
[49] 0.008235 0.008858 0.005392 0.002696 0.000000 0.000000 0.000869 0.001737
[57] 0.003474 0.003474 0.003474 0.002212 0.009292 0.016371 0.027220 0.023992
[65] 0.023172 0.015995 0.015421 0.010799 0.011676 0.012986 0.017361 0.018033
[73] 0.018508 0.022458 0.027989 0.034674 0.030668 0.024449 0.013905 0.009333
[81] 0.005809 0.005219 0.003441 0.002433 0.001425 0.000950 0.002138 0.003326
[89] 0.004989 0.003326 0.001663 0.000000 0.000000 0.000000 0.000000 0.000000
[97] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[105] 0.000000 0.000000 0.000000 0.000770 0.001540 0.002311 0.001540 0.000770
[113] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[121] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[129] 0.000000 0.000000 0.000000 0.001188 0.002376 0.004719 0.004687 0.004654
[137] 0.002311 0.001155 0.000000 0.000000 0.000713 0.001425 0.002138 0.001425
[145] 0.000713 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[153] 0.000000 0.000000
And..
> arrMat[[h,1]]
[1] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[9] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[17] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[25] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[33] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[41] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[49] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[57] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[65] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[73] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[81] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[89] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[97] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[105] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[113] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[121] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[129] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[137] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[145] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[153] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[161] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[169] 0.000000 0.000000 0.012729 0.025457 0.045459 0.043641 0.063643 0.089100
[177] 0.125907 0.119074 0.079510 0.040635 0.023581 0.031983 0.034051 0.036118
[185] 0.030912 0.023639 0.016365 0.007273 0.003637 0.007273 0.018184 0.029094
[193] 0.030912 0.025457 0.020002 0.010910 0.005455 0.000000 0.000000 0.000000
[201] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[209] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[217] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[225] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[233] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[241] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[249] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[257] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[265] 0.000000 0.000000 0.000000 0.000000
For future reference it seems ccf is not happy comparing un-fluctuating vectors, such as a vector of only '1's against a vector of only '2's, or in the above example, overlaps to '0's against '0's
I know there's another post similar to this one but it has not helped my situation. I am trying to draw a dendrogram from a distance matrix I've calculated not using euclidean distance (using an earth-mover's distance from the emdist package). I am now trying to draw a dendrogram from this matrix:
dim(x)
[1] 8800 8800
x <- x[1:10,1:10]
x
1 2 3 4 5 6 7
1 0.00000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
2 0.67400563 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
3 0.02577228 0.6526842 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
4 0.37994900 0.7268372 0.1240314 0.0000000 0.0000000 0.0000000 0.0000000
5 0.85156584 1.0248822 0.6165767 0.9077611 0.0000000 0.0000000 0.0000000
6 0.51784015 0.5286874 0.5115762 0.6601093 1.1639417 0.0000000 0.0000000
7 0.19290720 0.5906327 0.6576926 0.4350795 0.2986499 0.4130357 0.0000000
8 1.57669127 1.3727582 1.4215065 1.9522834 1.0919793 0.9681544 1.0372481
9 3.01650143 3.3004177 3.0651622 3.2502077 4.1505108 2.9940774 3.6078234
10 0.48684093 0.6997258 0.3959822 0.3515030 0.8611233 0.5505790 0.3047047
8 9 10
1 0.000000 0.000000 0
2 0.000000 0.000000 0
3 0.000000 0.000000 0
4 0.000000 0.000000 0
5 0.000000 0.000000 0
6 0.000000 0.000000 0
7 0.000000 0.000000 0
8 0.000000 0.000000 0
9 3.753577 0.000000 0
10 1.500342 3.309016 0
the problem is when I run
plot(hclust(x))
I get this error:
Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor
exceed 65536") : missing value where TRUE/FALSE needed
whereas if I run the dist function to calculate euclidean distances from the distance matrix that I've already calculated using a different approach, it draws the plot.
plot(hclust(dist(x)))
However, this is not realistic. I need hclust to work from the distance matrix I've already calculated using a different approach. Any ideas?
hclust needs an object of class dist. as.dist, rather than dist, should give you want you are looking for.
plot(hclust(as.dist(x)))
You can try a neighbor joining method for creating trees based on distance metrics. The nj function in the Ape package can help.
http://r-eco-evo.blogspot.com/2007/09/neighbor-joining-tree-with-ape.html