labelling in pheatmap at a particular column - r

I have a dataframe with 152 rows and 300 columns.
Showing you first 3 rows and all 300 columns
genea 2500 2691 genea 191.0 + 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.380752 -0.380752 -0.531231 -0.681710 -0.681710 -0.681710 -0.681710 -0.340855 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.190376 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.451626 0.903252 0.903252 0.654369 0.654369 0.778811 0.903252 0.903252 -0.681710 -0.681710 -0.340855 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.809625 1.619250 1.619250 1.257220 1.257220 1.057214 0.857208 0.857208 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.190376 -0.380752 -0.380752 0.672255 0.672255 0.672255 0.672255 0.672255 0.903252 0.903252 1.422216 1.941180 1.941180 1.508340 1.508340 1.912020 2.315700 2.315700 3.317330 3.317330 3.840005 4.362680 4.362680 3.508340 3.508340 3.128800 2.749260 2.749260 3.531090 3.531090 2.982865 2.434640 2.434640 1.975690 1.975690 2.516920 3.058150 3.058150 5.556610 5.556610 5.922590 6.288570 6.288570 2.056200 2.056200 2.563420 3.070640 3.070640 3.577700 3.577700 4.076065 4.574430 4.574430 4.008980 4.008980 4.648165 5.287350 5.287350 5.550990 5.550990 3.810200 2.069410 2.069410 0.000000 0.000000 1.584965 3.169930 3.169930 3.169930 3.169930 3.243285 3.316640 3.316640 4.766030 4.766030 4.925570 5.085110 5.085110 6.746300 6.746300 6.693390 6.640480 6.640480 5.850710 5.850710 5.628100 5.405490 5.405490 4.830740 4.830740 5.017090 5.203440 5.203440 6.095880 6.095880 6.392065 6.688250 6.688250 6.337030 6.337030 5.835895 5.334760 5.334760 4.836420 4.836420 4.736225 4.636030 4.636030 3.659990 3.659990 4.325255 4.990520 4.990520 4.756270 4.756270 2.378135 0.000000 0.000000 3.700440 3.700440 3.921625 4.142810 4.142810 4.318290 4.318290 4.490965 4.663640 4.663640 4.643860 4.643860 3.706855 2.769850 2.769850 2.878250 2.878250 3.156445 3.434640 3.434640 3.676790 3.676790 3.867180 4.057570 4.057570 4.192870 4.192870 4.521820 4.850770 4.850770 4.602990 4.602990 4.119790 3.636590 3.636590 3.899620 3.899620 4.155710 4.411800 4.411800
chr11 62841618 62841809 geneb 191.0 - -0.613539 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.228451 -0.380752 -0.380752 -0.380752 -0.380752 -0.152301 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.038075 -0.380752 -0.380752 -0.380752 -0.380752 -0.342677 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.228451 -0.380752 -0.380752 -0.380752 -0.380752 -0.152301 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.228451 -0.380752 -0.380752 -0.380752 -0.380752 -0.152301 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.068171 -0.681710 -0.681710 0.269267 0.903252 0.857144 0.442170 0.442170 0.718819 0.903252 0.812927 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.068171 -0.681710 -0.681710 -0.681710 -0.681710 -0.651614 -0.380752 -0.380752 0.447699 1.000000 1.000000 1.000000 1.000000 1.350976 1.584960 1.626464 2.000000 2.000000 1.400000 1.000000 0.900000 0.000000 0.000000 1.550976 2.584960 2.584960 2.584960 2.584960 0.805533 -0.380752 -0.180752 1.619250 1.619250 2.198676 2.584960 2.743457 4.169930 4.169930 4.612106 4.906890 4.874697 4.584960 4.584960 3.633984 3.000000 2.858496 1.584960 1.584960 3.348120 4.523560 4.523560 4.523560 4.523560 1.809424 0.000000 0.370044 3.700440 3.700440 3.824310 3.906890 3.862144 3.459430 3.459430 3.427222 3.405750 3.561390 4.962150 4.962150 5.458362 5.789170 5.720218 5.099650 5.099650 4.866226 4.710610 4.661025 4.214760 4.214760 3.676302 3.317330 3.349619 3.640220 3.640220 4.456088 5.000000 5.032193 5.321930 5.321930 5.101292 4.954200 4.852898 3.941180 3.941180 4.168286 4.319690 4.374439 4.867180 4.867180 4.999348 5.087460 4.978714 4.000000 4.000000 2.550976 1.584960 1.688389 2.619250 2.619250 2.619250 2.619250 2.357325 0.000000 0.000000 0.000000 0.000000 0.332193 3.321930 3.321930 3.321930 3.321930 3.335680 3.459430 3.459430 3.068182 2.807350 2.807350 2.807350 2.807350 2.922940 3.000000 2.800000 1.000000 1.000000 0.400000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.038075 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 -0.380752 0.447699 1.000000
chr17 43367899 43368087 genec 188.0 - 0.000000 1.600000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 1.400000 0.000000 0.000000 0.000000 0.000000 -0.204513 -0.681710 -0.681710 -0.681710 -0.681710 -0.681710 -0.681710 -0.681710 -0.136342 0.000000 -0.114226 -0.380752 -0.380752 -0.380752 -0.380752 -0.266526 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.204513 -0.681710 -0.681710 -0.681710 -0.681710 -0.477197 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.114226 -0.380752 -0.380752 0.419248 0.619248 0.619248 0.619248 0.619248 0.923850 1.000000 1.000000 1.000000 1.000000 0.200000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.267968 1.584960 1.584960 1.584960 1.584960 0.316992 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.800000 1.000000 1.000000 1.000000 1.000000 0.200000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.300000 1.000000 1.000000 1.000000 1.000000 0.700000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.800000 1.000000 1.000000 1.000000 1.000000 0.200000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.267968 1.584960 1.584960 1.584960 1.584960 0.316992 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.300000 1.000000 1.000000 1.000000 1.000000 0.700000 0.000000 0.000000 1.600000 2.000000 2.974379 5.247930 5.247930 5.674674 5.781360 5.694507 5.491850 5.491850 3.623866 3.156870 3.505008 4.317330 4.317330 5.117330 5.317330 5.650009 6.426260 6.426260 6.155220 6.087460 6.087460 6.087460 6.087460 5.180852 4.954200 4.464519 3.321930 3.321930 3.934354 4.087460 3.936710 3.584960 3.584960 3.252936 3.169930 2.994439 2.584960 2.584960 4.030848 4.392320 4.371203 4.321930 4.321930 4.532354 4.584960 4.722789 5.044390 5.044390 5.044390 5.044390 4.413427 2.941180 2.941180 2.141180 1.941180 2.320089 3.204210 3.204210 3.040842 3.000000 3.000000 3.000000 3.000000 0.600000 0.000000 0.996579 3.321930 3.321930 3.431930 3.459430 3.459430 3.459430 3.459430 0.387284 -0.380752 -0.380752 -0.380752 -0.380752 -0.076150 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.204513 -0.681710 -0.681710 -0.681710 -0.681710 -0.477197 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
I am plotting the values in this dataframe using pheatmap function as follows:
pheatmap(dmat,
scale="none",
cluster_rows = FALSE,
cluster_cols = FALSE,
annotation_names_col = FALSE,
show_colnames= FALSE,
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(500),
main = "figure1",
border_color = NA
)
I want to add a label at 200th column "TSS" on this pheatmap generated above. What code should I use for that? "TSS" should appear in the last row and 200th column
Thanks in advance

Here is a solution based on grobs.
library(pheatmap)
library(RColorBrewer)
library(grid)
# Generate data
nc <- 300
nr <- 152
ref <- 200 # The column where you need to add a label
dmat <- matrix(runif(nr*nc), ncol=nc)
dmat[,ref] <- 0
q <- pheatmap(dmat,
scale="none",
cluster_rows = FALSE,
cluster_cols = FALSE,
annotation_names_col = FALSE,
show_colnames= FALSE,
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(500),
main = "Figure1",
border_color = NA
)
downViewport("matrix.4-3-4-3")
grid.text("TSS", x=ref/nc, y=1, vjust=-0.5, gp=gpar(col="red", fontface=2, fontsize=12))
popViewport()
The list of available viewports generated by pheatmap can be retrieved using
grid.draw(q)
current.vpTree()
# viewport[ROOT]->(viewport[layout]->(viewport[layout]->
# (viewport[main.1-3-1-3], viewport[legend.4-5-5-5], viewport[matrix.4-3-4-3])))

Related

Scattering blocks of a contiguous 2D Array using MPI

I am currently learning to use MPI and trying to scatter blocks of my array to the rest of my processors.
My root processor is the last one (nproc-1) and I am generating the array in that processor. In my next iteration of my code it will be a random array.
For all my processors I am allocating contiguous memory using calloc both for 'array' and 'grain'.
Grain stores the data to process and since I need the above and below rows from the original array, I made it of size grain_length+2.
My issue is that I get the correct data from the original array except for the last two values (see output example below).
int main(int argc, char** argv)
{
int i, j, m;
int array_size, grain_length;
int rc, rank, nproc;
MPI_Status status;
rc = MPI_Init(&argc, &argv);
if (rc != MPI_SUCCESS)
{
printf("Error starting MPI Program.\n");
MPI_Abort(MPI_COMM_WORLD, rc);
}
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
array_size = 8;
grain_length = array_size / nproc;
double **array= (double **) calloc(array_size, sizeof (double *));
for (i = 0; i < array_size; i++)
array[i] = (double *) calloc(array_size, sizeof (double));
double **grain = (double **) calloc(grain_length+2, sizeof (double *));
for (i = 0; i < grain_length + 2; i++)
grain[i] = (double *) calloc(array_size, sizeof (double));
if (array == NULL || grain == NULL)
{
printf("Memory could not be allocated for the arrays.");
exit(EXIT_FAILURE);
}
if (rank == nproc-1)
{
for (i = 0; i < array_size; i++)
{
for (j = 0; j < array_size; j++)
{
//array[i][j] = rand() % 10;
array[i][j] = i+j;
}
}
}
MPI_Scatter(
&array[0][0], grain_length*array_size, MPI_DOUBLE,
&grain[1][0], grain_length*array_size, MPI_DOUBLE,
nproc-1, MPI_COMM_WORLD);
for (m = 0; m < nproc; m++)
{
if (rank == m)
{
printf("Grain from processor %d:\n", rank);
for (i = 0; i < grain_length+2; i++)
{
for (j = 0; j < array_size; j++)
{
printf("%f\t", grain[i][j]);
}
printf("\n");
}
printf("\n");
}
MPI_Barrier(MPI_COMM_WORLD);
}
if (rank == nproc-1)
{
printf("Array from processor %d:\n", rank);
for (i = 0; i < array_size; i++)
{
for (j = 0; j < array_size; j++)
{
printf("%f\t", array[i][j]);
}
printf("\n");
}
printf("\n");
}
MPI_Finalize();
return 0;
}
Here is the output. In Grain 0, the first and last row are 0s as expected since the above and below rows will be sent and placed there. Then the second row is correct but the third row is missing the 7 and 8 values which are the first values in Grain 1.
Are the two 0s in Grain 0 the array's two pointers addresses? I don't understand why I am getting incomplete data when the array in memory is stored contiguously.
I tried to use scatterv with the displacement but I am not sure I understand how it works.
I also tried to create an MPI Type but didn't get far away with that either.
What I managed to do is to broadcast the each row of the array to all the others processors. But it is quite inefficient I think. This is how I did it.
for (i=0; i < array_size; i++)
MPI_Bcast(&array[i][0], array_size, MPI_DOUBLE, nproc-1, MPI_COMM_WORLD);
Many thanks in advance for your help!!
Grain from processor 0:
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000000 1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000
1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Grain from processor 1:
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
7.000000 8.000000 0.000000 0.000000 2.000000 3.000000 4.000000 5.000000
8.000000 9.000000 0.000000 0.000000 3.000000 4.000000 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Grain from processor 2:
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 0.000000 0.000000
6.000000 7.000000 8.000000 9.000000 10.000000 11.000000 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Grain from processor 3:
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000000 0.000000 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000
0.000000 0.000000 6.000000 7.000000 8.000000 9.000000 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Array from processor 3:
0.000000 1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000
1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000
2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000 9.000000
3.000000 4.000000 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000
4.000000 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 11.000000
5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 11.000000 12.000000
6.000000 7.000000 8.000000 9.000000 10.000000 11.000000 12.000000 13.000000
7.000000 8.000000 9.000000 10.000000 11.000000 12.000000 13.000000 14.000000
I was able to achieve the result you want by changing the length of sent information to each individual process from:
MPI_Scatter(
&array[0][0], grain_length*array_size, MPI_DOUBLE,
&grain[1][0], grain_length*array_size, MPI_DOUBLE,
nproc-1, MPI_COMM_WORLD);
To:
MPI_Scatter(
&array[0][0], 4+grain_length*array_size, MPI_DOUBLE,
&grain[1][0], 4+grain_length*array_size, MPI_DOUBLE,
nproc-1, MPI_COMM_WORLD);
The result:
Grain from processor 0:
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000000 1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000
1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Grain from processor 1:
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000 9.000000
3.000000 4.000000 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Grain from processor 2:
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
4.000000 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 11.000000
5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 11.000000 12.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Grain from processor 3:
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
6.000000 7.000000 8.000000 9.000000 10.000000 11.000000 12.000000 13.000000
7.000000 8.000000 9.000000 10.000000 11.000000 12.000000 13.000000 14.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Array from processor 3:
0.000000 1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000
1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000
2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000 9.000000
3.000000 4.000000 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000
4.000000 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 11.000000
5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 11.000000 12.000000
6.000000 7.000000 8.000000 9.000000 10.000000 11.000000 12.000000 13.000000
7.000000 8.000000 9.000000 10.000000 11.000000 12.000000 13.000000 14.000000
I hope it will help you.

Select rows with at least two conditions from all conditions

I have this data frame in R and I need to select only rows that match at least two of the following conditions :
A >= 5
B >= 5
C >= 5
D >= 5
A B C D
1 0.000000 48.936170 0.000000 29.787234
2 0.000000 72.340426 0.000000 6.382979
3 0.000000 78.723404 0.000000 2.127660
4 2.127660 78.723404 0.000000 0.000000
5 0.000000 43.617021 0.000000 35.106383
6 0.000000 79.787234 0.000000 1.063830
7 3.191489 0.000000 77.659574 0.000000
8 77.659574 0.000000 2.127660 0.000000
9 46.808511 0.000000 0.000000 31.914894
10 35.106383 0.000000 27.659574 0.000000
The only solution I found is to use "if"...
if ( ((data$A >=5) + (data$B >=5) + (data$C >=5) + (data$D >=5)) >=2 ) {
#result }
...but I cannot find how to combine the if selection with my data frame.
I tried like this but I doesn't seem to be the solution for this problem :
Selection = data[if ( ((data$A >=5) + (data$B >=5) + (data$C >=5) + (data$D >=5)) >=2 ),]
Thanking you in advance for your help,
You could also do
df <- read.table(header=T, text=" A B C D
1 0.000000 48.936170 0.000000 29.787234
2 0.000000 72.340426 0.000000 6.382979
3 0.000000 78.723404 0.000000 2.127660
4 2.127660 78.723404 0.000000 0.000000
5 0.000000 43.617021 0.000000 35.106383
6 0.000000 79.787234 0.000000 1.063830
7 3.191489 0.000000 77.659574 0.000000
8 77.659574 0.000000 2.127660 0.000000
9 46.808511 0.000000 0.000000 31.914894
10 35.106383 0.000000 27.659574 0.000000")
df[rowSums(df >= 5) >= 2, ]
# A B C D
# 1 0.00000 48.93617 0.00000 29.787234
# 2 0.00000 72.34043 0.00000 6.382979
# 5 0.00000 43.61702 0.00000 35.106383
# 9 46.80851 0.00000 0.00000 31.914894
# 10 35.10638 0.00000 27.65957 0.000000

Sum multiple variables by group [duplicate]

This question already has answers here:
Group by multiple columns and sum other multiple columns
(7 answers)
Closed 7 years ago.
I have a R data frame like this with 45389 rows
gene_id KOIN1 KOIN2 KOIN3 KOIP1 KOIP2 KOIP3
1 ENSMUSG00000000001 6.0056300 4.677550 6.3490400 9.9992300 9.931780 12.56900000
2 ENSMUSG00000000003 0.0000000 0.000000 0.0000000 0.0000000 0.000000 0.00000000
3 ENSMUSG00000000028 0.9988830 0.407537 1.5629300 0.1845460 1.899790 0.85186600
4 ENSMUSG00000000031 0.0000000 0.818696 0.3708190 0.0419544 0.000000 0.02832700
5 ENSMUSG00000000037 0.0160579 0.172857 0.0988266 0.0000000 1.174690 0.00726742
6 ENSMUSG00000000049 0.3923090 0.000000 0.0000000 0.0000000 0.124112 0.01811530
and so on...
There are some duplicates in the gene_id column. For example,
5090 ENSMUSG00000025515 0.00000000 0.00000000 0.1572500 0.000000000 0.000000 0.0000000
5091 ENSMUSG00000025515 0.00000000 0.00000000 0.1572500 0.000000000 0.000000 0.0000000
5095 ENSMUSG00000025515 0.00000000 0.00000000 0.0386388 0.000000000 0.000000 0.0000000
5096 ENSMUSG00000025515 0.00000000 0.00000000 0.0386388 0.000000000 0.000000 0.0000000
5100 ENSMUSG00000025515 0.00000000 0.00000000 0.0000000 0.000000000 0.000000 0.0000000
5101 ENSMUSG00000025515 0.00000000 0.00000000 0.0000000 0.000000000 0.000000 0.0000000
5105 ENSMUSG00000025515 0.33817000 0.06733700 0.4894620 0.000000000 0.000000 0.0000000
5106 ENSMUSG00000025515 0.33817000 0.06733700 0.4894620 0.000000000 0.000000 0.0000000
5110 ENSMUSG00000025515 0.00863568 0.00000000 0.0337577 0.000000000 0.000000 0.0000000
5111 ENSMUSG00000025515 0.00863568 0.00000000 0.0337577 0.000000000 0.000000 0.0000000
What I basically want to do is collapse all the duplicates down into one row with the values for each column being a sum of all the values.
I thought ddply from plyr package would work but it still gives me all the duplicates
newdataframe <- ddply(dataframe,"gene_id",numcolwise(sum))
This is what I ran.
Any suggestions?
Another option:
library(dplyr)
df %>%
group_by(gene_id) %>%
summarise_each(funs(sum))
Which gives:
#Source: local data frame [7 x 7]
#
# gene_id KOIN1 KOIN2 KOIN3 KOIP1 KOIP2 KOIP3
# (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
#1 ENSMUSG00000000001 6.0056300 4.677550 6.3490400 9.9992300 9.931780 12.56900000
#2 ENSMUSG00000000003 0.0000000 0.000000 0.0000000 0.0000000 0.000000 0.00000000
#3 ENSMUSG00000000028 0.9988830 0.407537 1.5629300 0.1845460 1.899790 0.85186600
#4 ENSMUSG00000000031 0.0000000 0.818696 0.3708190 0.0419544 0.000000 0.02832700
#5 ENSMUSG00000000037 0.0160579 0.172857 0.0988266 0.0000000 1.174690 0.00726742
#6 ENSMUSG00000000049 0.3923090 0.000000 0.0000000 0.0000000 0.124112 0.01811530
#7 ENSMUSG00000025515 0.6936114 0.134674 1.4382170 0.0000000 0.000000 0.00000000
Plain old aggregate would do:
newdataframe <- aggregate(. ~ gene_id, dataframe, sum)
The formula reads everything else aggregated by gene_id, and sum to compute the sum of all values. You could also use mean for instance.
If you just want some of the other columns, you can cbind them:
newdataframe <- aggregate(cbind(col1, col2) ~ gene_id, dataframe, sum)

R recovering a Row from a Matrix

I have the following Matrix in R:
> head(k)
row.names SRX003922 SRX001291 SRX001364 SRX001365
ENSG00000000003 12.1999909 1.982836 0.000000 1.335383
ENSG00000000005 47.8615027 0.000000 0.000000 0.000000
ENSG00000000419 0.9384608 11.897018 4.735838 3.338457
ENSG00000000457 13.1384517 23.794037 16.575434 16.024595
ENSG00000000460 0.0000000 0.000000 0.000000 0.000000
ENSG00000000938 10.3230692 0.000000 0.000000 0.000000
SRX001366 SRX001367 SRX001368 SRX003931
ENSG00000000003 0.0000000 1.220217 0.000000 5.641656
ENSG00000000005 0.0000000 0.000000 0.000000 1.880552
ENSG00000000419 10.5627363 6.711194 7.510932 5.014805
ENSG00000000457 21.1254727 15.862822 18.151419 7.522207
ENSG00000000460 0.0000000 3.050543 0.625911 0.000000
ENSG00000000938 0.7041824 0.000000 0.000000 1.253701
I would like to know how to recover an entire row from this matrix by giving a row.names value?

R ccf function producing "need finite 'ylim' values" / "no non-missing arguments" error messages

I'm trying to calculate the cross-correlation scores between a series of vectors.
My code for doing so is:
smusF<-array(ccf(arrMat[[r,1]],arrMat[[h,1]],lag.max=(length(arrMat[[r,1]])+length(arrMat[[h,1]])))[[1]])
Where smusF is the vector of all ccf scores, for each overlap position, and the r and h variables assign differing vectors for comparison.
What I'm finding is sometimes this code has no errors and works correctly, but sometimes it produces the error message:
Error in plot.window(...) : need finite 'ylim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
I'm really stuck as to why this happens in some instances and not others, and what I can do to fix it. Any help much appreciated.
An example of two vectors that produce this error message when compared are..
> arrMat[[r,1]]
[1] 0.011688 0.014871 0.015314 0.013446 0.008538 0.006948 0.006514 0.004343
[9] 0.002171 0.000000 0.002196 0.006899 0.012790 0.014289 0.015993 0.015321
[17] 0.016845 0.010438 0.005219 0.003040 0.007235 0.011430 0.009546 0.005351
[25] 0.004531 0.006752 0.011283 0.009062 0.006842 0.002311 0.002311 0.003614
[33] 0.006072 0.008825 0.009742 0.012397 0.013938 0.019788 0.022163 0.025293
[41] 0.022794 0.024204 0.020493 0.017092 0.010652 0.009013 0.007760 0.008768
[49] 0.008235 0.008858 0.005392 0.002696 0.000000 0.000000 0.000869 0.001737
[57] 0.003474 0.003474 0.003474 0.002212 0.009292 0.016371 0.027220 0.023992
[65] 0.023172 0.015995 0.015421 0.010799 0.011676 0.012986 0.017361 0.018033
[73] 0.018508 0.022458 0.027989 0.034674 0.030668 0.024449 0.013905 0.009333
[81] 0.005809 0.005219 0.003441 0.002433 0.001425 0.000950 0.002138 0.003326
[89] 0.004989 0.003326 0.001663 0.000000 0.000000 0.000000 0.000000 0.000000
[97] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[105] 0.000000 0.000000 0.000000 0.000770 0.001540 0.002311 0.001540 0.000770
[113] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[121] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[129] 0.000000 0.000000 0.000000 0.001188 0.002376 0.004719 0.004687 0.004654
[137] 0.002311 0.001155 0.000000 0.000000 0.000713 0.001425 0.002138 0.001425
[145] 0.000713 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[153] 0.000000 0.000000
And..
> arrMat[[h,1]]
[1] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[9] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[17] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[25] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[33] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[41] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[49] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[57] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[65] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[73] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[81] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[89] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[97] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[105] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[113] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[121] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[129] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[137] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[145] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[153] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[161] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[169] 0.000000 0.000000 0.012729 0.025457 0.045459 0.043641 0.063643 0.089100
[177] 0.125907 0.119074 0.079510 0.040635 0.023581 0.031983 0.034051 0.036118
[185] 0.030912 0.023639 0.016365 0.007273 0.003637 0.007273 0.018184 0.029094
[193] 0.030912 0.025457 0.020002 0.010910 0.005455 0.000000 0.000000 0.000000
[201] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[209] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[217] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[225] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[233] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[241] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[249] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[257] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[265] 0.000000 0.000000 0.000000 0.000000
For future reference it seems ccf is not happy comparing un-fluctuating vectors, such as a vector of only '1's against a vector of only '2's, or in the above example, overlaps to '0's against '0's

Resources