I have extracted tables from pdf file with tabulizer package. After extracting tables I want to rbind different tables extracted as list with different length.
table1 <- extract_tables("\\AC002_2017.pdf")
final <- do.call(rbind, table1)
But it gives me following error
Error in (function (..., deparse.level = 1) :
number of columns of matrices must match (see arg 2)
How can I rbind it?
Format of data is as follows
[[1]] [,1] [,2] [,3] [,4]
[1,] 20 45 34 34
[2,] 23 34 67 43
[3,] 22 23 42 34
[4,] 45 44 56 54
[5,] 12 11 12 14
[6,] 34 33 45 32
Related
I have a numeric vector with integers which:
I want to transform into "bins".
I want these bins to be used as sample frames from which I can then sample again, uniformly.
So far I can do both using findInterval but I am looking for a way to do it with cut.
Let's consider a random vector with integers which will be split in equally sized intervals of length 2:
df = sample(1:100,10)
df
[1] 81 11 38 95 45 14 10 61 96 88
Using findInterval I get the bins and a approximate way for sampling:
breaks = seq(1,max(df+1),by=10)
b <- findInterval(df, breaks)
b
[1] 9 2 4 10 5 2 1 7 10 9
# If b is equal to 1 or 100, then use ifelse() to prevent leaking outside [1,100]
sam <- round(runif(10,ifelse(b==1,10*b-9,10*b-10),ifelse(b==10,10*b,10*b+10)))
sam
[1] 85 14 39 94 50 16 7 63 93 85
Using cut I get the intervals:
breaks = seq(1,max(df+1),by=10)
cut(df,breaks,right=TRUE)
[1] (71,81] (1,11] (31,41] <NA> (41,51] (11,21] (1,11] (51,61] <NA> (81,91] Levels: (1,11] (11,21] (21,31] (31,41] (41,51] (51,61] (61,71] (71,81] (81,91]
But I don't know how to use those values as intervals from which to sample.
If there is another approach, I would be interested to know!
Good Question! I will give you a completely different approach.
So basically you want to perform Latin Hypercube sampling, i.e. stratified uniform sampling in the interval [0,100] with each bin of 10.
For this, it would be easier to download lhs package and use randomLHS function to perform stratified sampling.
First step: Generate uniform draws from every 10 quartiles (strata) as many times as you want. In this example, let's do 5 times:
library(lhs)
randomLHS(10, 5)
> X
[,1] [,2] [,3] [,4] [,5]
[1,] 0.92154144 0.22185959 0.49953326 0.66248165 0.79035832
[2,] 0.47571700 0.05894016 0.55883326 0.34875162 0.98831829
[3,] 0.57738486 0.64525528 0.04955733 0.50939147 0.46297294
[4,] 0.17578838 0.83843074 0.27138703 0.87421301 0.16401042
[5,] 0.03850768 0.40746004 0.69518073 0.23487653 0.55537945
[6,] 0.83942905 0.52957416 0.84952231 0.14031915 0.84956654
[7,] 0.22802502 0.79911728 0.76789194 0.09788194 0.08667802
[8,] 0.61821268 0.93088726 0.30789950 0.95831993 0.36903120
[9,] 0.70391230 0.11445154 0.97976851 0.42027836 0.61097786
[10,] 0.31385709 0.33557430 0.18389684 0.70124986 0.27601550
Second step: Although the output of X is stratified, the columns are still unsorted. Therefore, when we show the final stratified draws, we sort them.
Y <- apply(X,2, function(x) sort(round(x*100)))
> Y
[,1] [,2] [,3] [,4] [,5]
[1,] 4 6 5 10 9
[2,] 18 11 18 14 16
[3,] 23 22 27 23 28
[4,] 31 34 31 35 37
[5,] 48 41 50 42 46
[6,] 58 53 56 51 56
[7,] 62 65 70 66 61
[8,] 70 80 77 70 79
[9,] 84 84 85 87 85
[10,] 92 93 98 96 99
NB: I have done rounding only for convenience to make it obvious but no need to call round function if you are happy to have non-integer draws as output).
I'm trying to add a new column to existing matrix, but getting warning everytime.
I'm trying this code:
normDisMatrix$newColumn <- labels
Getting this message:
Warning message: In normDisMatrix$newColumn <- labels : Coercing LHS
to a list
After it, when I check the matrix, it seems null:
dim(normDisMatrix)
NULL
Note: labels are just vectors which have numbers between 1 and 4.
What can be the problem?
As #thelatemail pointed out, the $ operator cannot be used to subset a matrix. This is because a matrix is just a single vector with a dimension attribute. When you used $ to try to add a new column, R converted your matrix to the lowest structure where $ can be used on the vector, which is a list.
The function you want is cbind() (column bind). Suppose I have the matrix m
(m <- matrix(51:70, 4))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 51 55 59 63 67
# [2,] 52 56 60 64 68
# [3,] 53 57 61 65 69
# [4,] 54 58 62 66 70
To add the a new column from a vector called labels, we can do
labels <- 1:4
cbind(m, newColumn = labels)
# newColumn
# [1,] 51 55 59 63 67 1
# [2,] 52 56 60 64 68 2
# [3,] 53 57 61 65 69 3
# [4,] 54 58 62 66 70 4
I'm trying to add a new column to existing matrix, but getting warning everytime.
I'm trying this code:
normDisMatrix$newColumn <- labels
Getting this message:
Warning message: In normDisMatrix$newColumn <- labels : Coercing LHS
to a list
After it, when I check the matrix, it seems null:
dim(normDisMatrix)
NULL
Note: labels are just vectors which have numbers between 1 and 4.
What can be the problem?
As #thelatemail pointed out, the $ operator cannot be used to subset a matrix. This is because a matrix is just a single vector with a dimension attribute. When you used $ to try to add a new column, R converted your matrix to the lowest structure where $ can be used on the vector, which is a list.
The function you want is cbind() (column bind). Suppose I have the matrix m
(m <- matrix(51:70, 4))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 51 55 59 63 67
# [2,] 52 56 60 64 68
# [3,] 53 57 61 65 69
# [4,] 54 58 62 66 70
To add the a new column from a vector called labels, we can do
labels <- 1:4
cbind(m, newColumn = labels)
# newColumn
# [1,] 51 55 59 63 67 1
# [2,] 52 56 60 64 68 2
# [3,] 53 57 61 65 69 3
# [4,] 54 58 62 66 70 4
I have a vector with 49 numeric values. I want to have a 7x7 numeric matrix instead.
Is there some sort of convenient automatic conversion statement I can use, or do I have to do 7 separate column assignments of the correct vector subsets to a new matrix? I hope that there is something like the oposite of c(myMatrix), with the option of giving the number of rows and/or columns I want to have, of course.
Just use matrix:
matrix(vec,nrow = 7,ncol = 7)
One advantage of using matrix rather than simply altering the dimension attribute as Gavin points out, is that you can specify whether the matrix is filled by row or column using the byrow argument in matrix.
A matrix is really just a vector with a dim attribute (for the dimensions). So you can add dimensions to vec using the dim() function and vec will then be a matrix:
vec <- 1:49
dim(vec) <- c(7, 7) ## (rows, cols)
vec
> vec <- 1:49
> dim(vec) <- c(7, 7) ## (rows, cols)
> vec
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 8 15 22 29 36 43
[2,] 2 9 16 23 30 37 44
[3,] 3 10 17 24 31 38 45
[4,] 4 11 18 25 32 39 46
[5,] 5 12 19 26 33 40 47
[6,] 6 13 20 27 34 41 48
[7,] 7 14 21 28 35 42 49
I have a headered matrix and I would like to add the word "genes" to the first position in the column header for the matrix, basically attaching that word to the beginning of the column header.
Here is what I have so far:
I input a matrix into R,
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
and generate a heatmap from that matrix, using heatmap.2. I then extract the data for the corresponding heatmap using the carpet variable.
Here is the code used to generate the heatmap:
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks = my.breaks)
Here I am extracting the values for the clustered matrix, after passing the original matrix through heatmap.2:
new_matrix <- result$carpet
old_name <- colnames(new_matrix)
Here I am trying to attach the name "genes" to the column name
old_name <- cat("genes",old_name)
colnames(new_matrix) <- old_name;
write.table(new_matrix, file="data_result3.txt",sep = " \t",col.names = T, row.names = T);
When I try to attach "genes" to the header using:
old_name <- cat("genes",old_name)
The headers are printed out to the screen properly,
but when I examine the result file the vector number is printed:
"V1" "V2" "V3" "V4" "V5" "V6"
Instead I would like the result to look like:
genes Pacs-11 Pacs-2 PC06E7.3 PC49C3.3 Pceh-60 PF52C6.12
In this way genes comes before the rest of the matrix header.
Here is a link to my dataset:
Full Dataset
Here is the dataSet after running dput(head(new_matrix))
output of dput
# to have a space between gene and column_name
old_name <- paste("genes", old_name, sep=" ")
Edit (based on your new comment), perhaps you need:
old_name <- c("genes", old_name)
Here is a trivial example
> test <- matrix(1:50, ncol=5)
> test
[,1] [,2] [,3] [,4] [,5]
[1,] 1 11 21 31 41
[2,] 2 12 22 32 42
[3,] 3 13 23 33 43
[4,] 4 14 24 34 44
[5,] 5 15 25 35 45
[6,] 6 16 26 36 46
[7,] 7 17 27 37 47
[8,] 8 18 28 38 48
[9,] 9 19 29 39 49
[10,] 10 20 30 40 50
> colnames(test) <- c("genes", paste("V", 1:4))
> test
genes V 1 V 2 V 3 V 4
[1,] 1 11 21 31 41
[2,] 2 12 22 32 42
[3,] 3 13 23 33 43
[4,] 4 14 24 34 44
[5,] 5 15 25 35 45
[6,] 6 16 26 36 46
[7,] 7 17 27 37 47
[8,] 8 18 28 38 48
[9,] 9 19 29 39 49
[10,] 10 20 30 40 50
# to only add "genes" as the first column's name
colnames(test) <- c("genes", colnames(test)[-1])
I was able to get it to work by printing out the first row with genes and then the rest:
new_matrix <- result$carpet
old_name <- colnames(new_matrix)
sink("data_result3.txt")
cat(c("genes",old_name), "\n")
for (i in 1:nrow(new_matrix))
{
cat (old_name[i], new_matrix[i,], "\n")
}
sink()