Combining multiple tables/files by mapping to a common column - awk,sed - unix

This is an edited post (awk,sed method to combine multiple files to one by mapping to a common file/column)
Due to my naivety the last post was not in the correct format and was closed down before getting a correct answer. For some unknown reason, I could not edit/delete the said post.
Sorry for the trouble. Hope this is the correct format.
I have 11 (tab-separated) files each with two columns as shown below. the row number varies with some files having 1000 rows while some above 2500.
File-0
This is the mapping file
K00001 0
K00002 0
K00003 0
K00004 0
K00005 0
This file goes up to K30000 0 covering all Knumbers in the rest of the file
File-1
K00002 0.60
K00003 31
K00006 0.21
K00007 0.06
K00012 0.01
File-2
K00003 21
K00004 0.54
K00005 0.4
K00006 0.01
K00009 0.39
K00010 0.01
File-3
K00002 09
K00003 0.11
K00004 0.87
K00006 0.54
K00007 0.11
K00008 0.02
I want to combine all these 10 files (file-1....file-10) into one by mapping to the first column of file-0. The output I would like will be like this:
K00001
K00002 0.60 9
K00003 31 21 0.11
K00004 0.54 0.87
K00005 0.4
K00006 0.21 0.01 0.54
K00007 0.06 0.11
K00008 0.02
K00009 0.39
K000010 0.01
K000011
K00012 0.01
Can anyone help me with this?
Thanks.

Looking at your last post I believe #EdMorten's AWK answer is the solution you are looking for - you just need to change the field separator from "\t" to " ":
# To get the first column of the mapping file (i.e. lose the column of zeros):
cut -d" " -f1 file0 > test1.txt
cat test1.txt
K00001
K00002
K00003
K00004
K00005
K00006
K00007
K00008
K00009
K00010
K00011
K00012
K00013
cat test2.txt
K00002 0.60
K00003 31
K00006 0.21
K00007 0.06
K00012 0.01
cat test3.txt
K00003 21
K00004 0.54
K00005 0.4
K00006 0.01
K00009 0.39
K00010 0.01
cat test4.txt
K00002 09
K00003 0.11
K00004 0.87
K00006 0.54
K00007 0.11
K00008 0.02
awk '
BEGIN { FS=OFS=" " }
{ map[$1][ARGIND] = $2 }
END {
PROCINFO["sorted_in"] = "#ind_str_asc"
for (key in map) {
printf "%s", key
for (fileNr=1; fileNr<=ARGIND; fileNr++) {
printf "%s%s", OFS, map[key][fileNr]
}
print ""
}
}
' test*.txt
K00001
K00002 0.60 09
K00003 31 21 0.11
K00004 0.54 0.87
K00005 0.4
K00006 0.21 0.01 0.54
K00007 0.06 0.11
K00008 0.02
K00009 0.39
K00010 0.01
K00011
K00012 0.01
K00013
Does this work on your "real" data?

Related

Creating an igraph from weighted correlation matrix csv

First of all, I'd like to say that I'm completely new to R, and I'm just trying to accomplish this one task.
So, what I'm trying to do is that I'd like to create an network diagram from a weighted matrix. I made an example:
The CSV is a simple correlation matrix that looks like this:
,A,B,C,D,E,F,G
A,1,0.9,0.64,0.43,0.38,0.33,0.33
B,0.9,1,0.64,0.33,0.43,0.38,0.38
C,0.64,0.64,1,0.59,0.69,0.64,0.64
D,0.43,0.33,0.59,1,0.28,0.23,0.28
E,0.38,0.43,0.69,0.28,1,0.95,0.9
F,0.33,0.38,0.64,0.23,0.95,1,0.9
G,0.33,0.38,0.64,0.28,0.9,0.9,1
I tried to draw the wanted result by myself and came up with this:
To be more precise, I draw the diagram first, then, using a ruler, I took note of the distances, calculated an equation to get the weights and made the CSV table.
The higher the value is, the closer the two points are to each other.
However, whatever I do, the best result I get is this:
And this is how I'm trying to accomplish it, using this tutorial:
First of all, I import my matrix:
> matrix <- read.csv(file = 'test_dataset.csv')
But after printing the matrix out with head(), this already somehow cuts the last line of the matrix:
> head(matrix)
ï.. A B C D E F G
1 A 1.00 0.90 0.64 0.43 0.38 0.33 0.33
2 B 0.90 1.00 0.64 0.33 0.43 0.38 0.38
3 C 0.64 0.64 1.00 0.59 0.69 0.64 0.64
4 D 0.43 0.33 0.59 1.00 0.28 0.23 0.28
5 E 0.38 0.43 0.69 0.28 1.00 0.95 0.90
6 F 0.33 0.38 0.64 0.23 0.95 1.00 0.90
> dim(matrix)
[1] 7 8
I then proceed with removing the first column so the matrix is square again...
> matrix <- data.matrix(matrix)[,-1]
> head(matrix)
A B C D E F G
[1,] 1.00 0.90 0.64 0.43 0.38 0.33 0.33
[2,] 0.90 1.00 0.64 0.33 0.43 0.38 0.38
[3,] 0.64 0.64 1.00 0.59 0.69 0.64 0.64
[4,] 0.43 0.33 0.59 1.00 0.28 0.23 0.28
[5,] 0.38 0.43 0.69 0.28 1.00 0.95 0.90
[6,] 0.33 0.38 0.64 0.23 0.95 1.00 0.90
> dim(matrix)
[1] 7 7
Then I create the graph and try to plot it:
> network <- graph_from_adjacency_matrix(matrix, weighted=T, mode="undirected", diag=F)
> plot(network)
And the result above appears...
So, after spending the last few hours googling and trying way, way more things, this is the closest I've been able to get to.
So I'm asking for your help, thank you very much!
This is all fine.
head() just prints out the first 6 rows of a matrix or dataframe, if you want to see all of it use print() or just the name of the matrix variable.
graph_from_adjacency_matrix produces a link between two nodes if the value is non-zero. That's why you are getting every node linked to every other node.
To get what that tutorial is doing you need to add a line like
matrix[matrix<0.5] <- 0
to remove the edges for correlations below a cut off before you create the graph.
It's still not going to produce a chart like your hand drawn one (where closeness is roughly the correlation), just clump them together if they are above 0.5 correlation.

How can find difference between value with out missing first sample?

I like to find difference between my samples but when I use diff() my first sample miss.
input:
data
XX.3.22 XX.1.2 XX.5.19 XX.2.21 XX.2.16 XX.5.27 XX.3.5 XX.2.12 XX.4.15
0.00 0.12 0.17 0.20 0.21 0.26 0.27 0.27 0.32
diff(data)
output:
XX.1.2 XX.5.19 XX.2.21 XX.2.16 XX.5.27 XX.3.5 XX.2.12 XX.4.15
0.05 0.05 0.03 0.01 0.05 0.01 0.00 0.05
I do not want miss first (XX.3.22) sample.
I expect:
XX.3.22 = 0.12

R rownames(foo[bar]) prints as null but can be successfully changed - why?

I've written a script that works on a set gene-expression data.
I'll try to separate my post in the short question and the rather lengthy explanation (sorry about that long text block). I hope the short question makes sense in itself. The long explanation is simply to clarify if I don't get the point along in the short question.
I tried to aquire basic R skills and something that puzzles me occurred, and I didn't find any enlightment via google. I really don't understand this. I hope that by clarifying what is happening here I can better understand R.
That said I'm not a programmer so please bear with my bad code.
SHORT QUESTION:
When I have rownames(foo) e.g.
> print(rownames(foo))
"a" "b" "c" "d"
and I try to access it via print(rownames(foo[bar]) it prints it as null.
E.g
> print(rownames(foo[2]))
NULL
Here in the second answer Richie Cotton explains this as "[...] that where there aren't any names, [...]"
This would indicate to me, that either rownames(foo) is empty - which is clearly not the case as I can print it with "print(rownames(foo))" - or that this method of access fails.
However when I try to change the value at position bar, i get a warning message, that the replacement length wouldn't match. However the operation nevertheless succeeds - which pretty much proves, that this method of access is indeed successful. E.g.
> bar = 2
> rownames(foo[bar]) = some.vector(rab)
> print(rownames(foo[bar])
NULL
> print(rownames(foo))
"a" "something else" "c" "d"
Why is this working? Obviously the function can't properly access the position of bar in foo, as it prints it as empty.
Why the heck does it still replace the value successfully and not fail in a horrific way?
Or asked the other way around: When it successfully replaces the value at this position why is the print function not returning the value properly?
LONG BACKGROUND EXPLANATION:
The data source contains the number in the list, the entrez-id of the gene, the official gene symbol, the affimetrix probe id and then the increase or decrease values. It looks something like this:
No Entrez Symbol Probe_id Sample1_FoldChange Sample2_FoldChange
1 690244 Sumo2 1367452_at 1.02 0.19
Later when displaying the data I want it to print out only the gene symbol and the increases.
Now if there is no gene-symbol in the data set it is printed as "n/a", this is obviously of no value for me, as I can't determine which one of many genes it is.
So I made a first processing step, that only for this cases exchanges the "n/a" result with "n/a(12345) where 12345 is the entrez-id.
I've written the following script to do this. (Note as I'm not a programmer and I am new with R I doubt that it is pretty code. But that's not the point I want to discuss.)
no.symbol.idx <-which(rownames(expr.table) == "n/a")
c1 <- character (length(rownames(expr.table)))
c2 <- c1
for (x in 1:length(c1))
{
c1[x] <- "n/a ("
}
for (x in 1:length(c2))
{
c2[x] <- ")"
}
rownames(expr.table)[no.symbol.idx] <- paste(c1, (expr.table[no.symbol.idx , "Entrez"]),c2, sep="")
The script works and it does what it should do. However I get the following error message.
Warning message:
In rownames(expr.table)[no.symbol.idx] <- paste(c1, (expr.table[no.symbol.idx, :
number of items to replace is not a multiple of replacement length
To find out what happened here is i put some text output into the script.
no.symbol.idx <-which(rownames(expr.table) == "n/a")
c1 <- character (length(rownames(expr.table)))
c2 <- c1
for (x in 1:length(c1))
{
c1[x] <- "n/a ("
}
for (x in 1:length(c2))
{
c2[x] <- ")"
}
print("print(rownames(expr.table)):")
print(rownames(expr.table))
print("print(no.symbol.idx):")
print(no.symbol.idx)
print("print(rownames(expr.table[no.symbol.idx])):")
print(rownames(expr.table[no.symbol.idx]))
print("print(rownames(expr.table[14])):")
print(rownames(expr.table[14]))
print("print(rownames(expr.table[15])):")
print(rownames(expr.table[15]))
cat("print(expr.table[no.symbol.idx,\"Entrez\"]):\n")
print(expr.table[no.symbol.idx,"Entrez"])
rownames(expr.table)[no.symbol.idx] <- paste(c1, (expr.table[no.symbol.idx , "Entrez"]),c2, sep="")
print("print(rownames(expr.table)):")
print(rownames(expr.table))
print("print(rownames(expr.table[no.symbol.idx])):")
print(rownames(expr.table[no.symbol.idx]))
And I get the following output in the console.
[1] "print(rownames(expr.table)):"
[1] "Sumo2" "Cdc37" "Copb2" "Vcp" "Ube2d3" "Becn1" "Lypla2" "Arf1" "Gdi2" "Copb1" "Capns1" "Phb2" "Puf60" "Dad1" "n/a"
[1] "print(no.symbol.idx):"
[1] 15
[1] "print(rownames(expr.table[no.symbol.idx])):"
NULL
[1] "print(rownames(expr.table[14])):"
NULL
[1] "print(rownames(expr.table[15])):"
NULL
... (to be continued)
so obviously no.symbol.idx gets the right position for the n/a value.
When I try to print it however it claims that rownames for this position was empty and returns NULL.
When I try to access this position "by hand" and use expr.table[15] it also returns NULL.
This however has nothing to do with the n/a value as the same holds true for the value stored at position 14.
... (the continuation)
print(expr.table[no.symbol.idx,"Entrez"]):
[1] "116727"
[1] "print(rownames(expr.table)):"
[1] "Sumo2" "Cdc37" "Copb2" "Vcp" "Ube2d3" "Becn1" "Lypla2" "Arf1" "Gdi2"
[10] "Copb1" "Capns1" "Phb2" "Puf60" "Dad1" "n/a (116727)"
[1] "print(rownames(expr.table[no.symbol.idx])):"
NULL
and this is the result that surprises me. Despite this it is working. It claims everything would be NULL but the operation is successful.
I don't understand this.
EDIT:
Here are the results of the functions you wanted me tu run.
str(expr.table)
chr [1:15, 1:17] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "401" "690244" "114562" "60384" ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:15] "Sumo2" "Cdc37" "Copb2" "Vcp" ...
..$ : chr [1:17] "No" "Entrez" "Symbol" "Probe_id" ...
head(expr.table)
dput(head(expr.table,10))
structure(c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"690244", "114562", "60384", "116643", "81920", "114558", "83510",
"64310", "29662", "114023", "Sumo2", "Cdc37", "Copb2", "Vcp",
"Ube2d3", "Becn1", "Lypla2", "Arf1", "Gdi2", "Copb1", "1367452_at",
"1367453_at", "1367454_at", "1367455_at", "1367456_at", "1367457_at",
"1367458_at", "1367459_at", "1367460_at", "1367461_at", "1.02000",
"-1.04000", "1.03000", "-0.12000", "-0.02000", "-0.03000", "0.09000",
"0.05000", "-0.09000", "0.16000", "0.19000", "0.11000", "-0.00425",
"0.52000", "0.46000", "0.42000", "0.20000", "0.05000", "0.21000",
"0.37000", "0.26000", "0.19000", "-0.03000", "0.35000", "0.34000",
"0.07000", "0.00156", "0.12000", "0.08000", "0.16000", "0.59000",
"0.20000", "-0.16000", "0.28000", "0.46000", "-0.15000", "0.00168",
"0.23000", "-0.01000", "0.10000", "0.05000", "0.12000", "-0.00522",
"0.58000", "0.23000", "0.06000", "0.01000", "0.07000", "-0.11000",
"0.23000", "-0.03", "0.08", "0.09", "0.08", "0.11", "0.03", "-0.08",
"0.02", "-0.05", "0.06", "0.03000", "-0.06000", "0.09000", "0.00940",
"0.11000", "-0.09000", "0.04000", "-0.04000", "-0.09000", "0.01000",
"0.04000", "-0.02000", "0.21000", "0.27000", "0.08000", "0.12000",
"0.06000", "0.26000", "0.04000", "0.40000", "0.05000", "0.05000",
"0.00897", "0.09000", "0.20000", "0.09000", "0.13000", "-0.03000",
"-0.08000", "-0.01000", "0.050000", "0.020000", "0.050000", "-0.005390",
"0.020000", "0.008080", "0.060000", "-0.030000", "-0.020000",
"-0.000406", "0.50", "0.11", "0.06", "0.19", "0.21", "0.32",
"0.15", "0.17", "0.14", "0.03", "-0.08000", "-0.11000", "-0.07000",
"0.03000", "-0.04000", "0.02000", "-0.00444", "-0.07000", "-0.13000",
"-0.11000", "0.25000", "0.15000", "0.22000", "0.74000", "0.39000",
"0.36000", "-0.08000", "0.18000", "0.00865", "0.43000"), .Dim = c(10L,
17L), .Dimnames = list(c("Sumo2", "Cdc37", "Copb2", "Vcp", "Ube2d3",
"Becn1", "Lypla2", "Arf1", "Gdi2", "Copb1"), c("No", "Entrez",
"Symbol", "Probe_id", "AA_HD_24h_FoldChange", "AAF_HD_24h_FoldChange",
"APAP_HD_24h_FoldChange", "BBZ_HD_24h_FoldChange", "BCT_HD_24h_FoldChange",
"BEA_HD_24h_FoldChange", "CBP_HD_24h_FoldChange", "CCL4_HD_24h_FoldChange",
"CPA_HD_24h_FoldChange", "CSP_HD_24h_FoldChange", "DEN_HD_24h_FoldChange",
"LS_HD_24h_FoldChange", "PCT_HD_24h_FoldChange")))
And here I added the file I use for debugging. This is the data it reads into expr.table.
No Entrez Symbol Probe_id AA_HD_24h_FoldChange AAF_HD_24h_FoldChange APAP_HD_24h_FoldChange BBZ_HD_24h_FoldChange BCT_HD_24h_FoldChange BEA_HD_24h_FoldChange CBP_HD_24h_FoldChange CCL4_HD_24h_FoldChange CPA_HD_24h_FoldChange CSP_HD_24h_FoldChange DEN_HD_24h_FoldChange LS_HD_24h_FoldChange PCT_HD_24h_FoldChange
1 690244 Sumo2 1367452_at 1.02 0.19 0.26 0.59 0.05 -0.03 0.03 0.04 0.05 0.05 0.5 -0.08 0.25
2 114562 Cdc37 1367453_at -1.04 0.11 0.19 0.2 0.12 0.08 -0.06 -0.02 0.05 0.02 0.11 -0.11 0.15
3 60384 Copb2 1367454_at 1.03 -4.25E-003 -0.03 -0.16 -5.22E-003 0.09 0.09 0.21 8.97E-003 0.05 0.06 -0.07 0.22
4 116643 Vcp 1367455_at -0.12 0.52 0.35 0.28 0.58 0.08 9.40E-003 0.27 0.09 -5.39E-003 0.19 0.03 0.74
5 81920 Ube2d3 1367456_at -0.02 0.46 0.34 0.46 0.23 0.11 0.11 0.08 0.2 0.02 0.21 -0.04 0.39
6 114558 Becn1 1367457_at -0.03 0.42 0.07 -0.15 0.06 0.03 -0.09 0.12 0.09 8.08E-003 0.32 0.02 0.36
7 83510 Lypla2 1367458_at 0.09 0.2 1.56E-003 1.68E-003 0.01 -0.08 0.04 0.06 0.13 0.06 0.15 -4.44E-003 -0.08
8 64310 Arf1 1367459_at 0.05 0.05 0.12 0.23 0.07 0.02 -0.04 0.26 -0.03 -0.03 0.17 -0.07 0.18
9 29662 Gdi2 1367460_at -0.09 0.21 0.08 -0.01 -0.11 -0.05 -0.09 0.04 -0.08 -0.02 0.14 -0.13 8.65E-003
10 114023 Copb1 1367461_at 0.16 0.37 0.16 0.1 0.23 0.06 0.01 0.4 -0.01 -4.06E-004 0.03 -0.11 0.43
11 29156 Capns1 1367462_at -0.23 0.32 0.11 0.13 -0.38 -0.15 -0.08 0.15 -0.18 0.2 0.13 -0.18 0.09
12 114766 Phb2 1367463_at 1.01E-003 0.29 0.41 0.59 0.05 -0.07 -0.13 -0.18 -0.28 -0.21 -0.22 -0.2 0.39
13 84401 Puf60 1367464_at -0.05 0.33 0.14 0.3 0.03 0.02 8.96E-003 2.96E-003 -8.63E-003 -0.13 0.07 -0.15 0.44
14 192275 Dad1 1367465_at 0.22 -0.21 -0.19 -0.24 -0.47 -0.01 -0.09 0.68 -0.06 -0.08 0.02 -0.29 -0.25
401 116727 n/a 1367852_s_at -0.34 -0.12 -0.06 -0.11 0.13 0.03 0.07 -0.18 0.08 -0.2 0.04 -0.04 0.06
Rownames is filled with the Gene symbols e.g Sumo2 for No 1.
What the script should do (and does) is for Entry No 401 it should change the name from n/a to n/a(116727). However the afforementioned warning occurs and I want to understand what's going on here.
I assume you are using a data.frame called foo. Underneath the hood, a data.frame is a list of vectors each of which is of the same length.
So foo[2] refers to the second column of foo as a dataframe, foo[,2] refers to the second column of foo as a vector. rownames(foo) is a vector and its second term is rownames(foo)[2]
If you want the second column of foo as a dataframe then you can use foo[2] or foo[,2,drop=FALSE] and print(rownames(foo[2])) will give you the same result as print(rownames(foo))
If you want the second row of foo as a dataframe then you need a comma as in foo[2,] and print(rownames(foo[2,])) will give you the same result as print(rownames(foo)[2])
If you want to change the name of the second row of foo in the original foo dataframe then try something like:
rownames(foo)[2] = "example of new name for row 2"

Extracting rows based on the value

I have tab delim text file which contains the following columns:
Probe A_sig A_Pval
ILMN_122 12.31 0.04
ILMN_456 56.12 0
ILMN_198 981.2 0.06
ILMN_980 876.0 0.001
ILMN_542 123.9 0.16
ILMN_567 134.1 0
ILMN_452 213.4 0.98
ILMN_142 543.8 0.04
ILMN_765 187.4 0.05
Now I want to take out those rows which has the Pval <.05. The output should look like
Probe A_sig A_Pval
ILMN_122 12.31 0.04
ILMN_980 876.0 0.001
ILMN_142 543.8 0.04
Can anyone please help me?
I'll answer this but it's a basic question that is probably repeated elsewhere on this list.
Load data.
DAT <- read.table(text="Probe A_sig A_Pval
ILMN_122 12.31 0.04
ILMN_456 56.12 0
ILMN_198 981.2 0.06
ILMN_980 876.0 0.001
ILMN_542 123.9 0.16
ILMN_567 134.1 0
ILMN_452 213.4 0.98
ILMN_142 543.8 0.04
ILMN_765 187.4 0.05", h=T)
You can use indexing as in:
DAT[DAT$A_Pval <.05, ]
However this returns the zero vales as well. That isn't what you're output looks like. If you don't want the zeros use logical operator & as well as in:
DAT[DAT$A_Pval <.05 & DAT$A_Pval!=0, ]
I suggest you take a look at some manuals and this (LINK) reference card to help get you started.
my_dataframe[my_dataframe$A_Pval < 0.05,]
The trailing comma is important.

Colorize/highlight values of R ftable() output in knitr/Sweave rapports

I am generating a lot of ftable() crosstabulations for a descriptive report. Example:
AUS BEL BUL EST FRA GEO GER HUN ITA NET NOR ROM RUS
30- primary 0.06 0.03 0.07 0.03 0.02 0.03 0.03 0.02 0.05 0.03 0.05 0.04 0.02
secondary 0.30 0.09 0.16 0.10 0.10 0.14 0.10 0.16 0.11 0.08 0.08 0.09 0.11
tertiary 0.05 0.07 0.04 0.05 0.07 0.06 0.02 0.04 0.02 0.05 0.06 0.02 0.09
30+ primary 0.07 0.16 0.12 0.07 0.16 0.03 0.05 0.11 0.35 0.21 0.09 0.17 0.03
secondary 0.40 0.20 0.30 0.29 0.25 0.35 0.35 0.34 0.27 0.20 0.27 0.34 0.26
tertiary 0.13 0.23 0.13 0.18 0.17 0.17 0.18 0.09 0.09 0.23 0.23 0.06 0.24
60+ primary 0.00 0.12 0.10 0.13 0.14 0.07 0.05 0.12 0.09 0.11 0.06 0.19 0.12
secondary 0.00 0.05 0.05 0.08 0.06 0.10 0.14 0.09 0.02 0.04 0.11 0.07 0.06
tertiary 0.00 0.05 0.03 0.06 0.03 0.04 0.07 0.03 0.01 0.05 0.06 0.02 0.07
I am looking for a function that could take the ftable() or table() output, and highligh values that deviate from the row-mean, or assign an overall gradient to the text of the values, e.g. from 0-100% the values are coloured from red to green.
The output is now processed through knitr, but I'm not sure at which point in the toolchain I could intervene and add colour based on the relative size of the values.
You can use the latex function, in the Hmisc package.
# Example shamelessly copied from http://www.karlin.mff.cuni.cz/~kulich/vyuka/Rdoc/harrell-R-latex.pdf
cat('
\\documentclass{article}
\\usepackage[table]{xcolor}
\\begin{document}
<<results=tex>>=
library(Hmisc)
d <- head(iris)
cellTex <- matrix(rep("", nrow(d) * ncol(d)), nrow=nrow(d))
cellTex[2,2] <- "cellcolor{red}"
cellTex[2,3] <- "color{red}"
cellTex[5,1] <- "rowcolor{yellow}"
latex(d, file = "", cellTexCmds = cellTex, rowname=NULL)
#
\\end{document}',
file="tmp.Rnw" )
Sweave("tmp.Rnw")
library(utils)
texi2pdf("tmp.tex")
To generate latex tables from R objects, you can use the xtable package. It is available on CRAN, take a look at the documentation. To get the color in the table, use the color latex package. Some example code:
library(xtable)
n = 100
cat_country = c("NL","BE","HU")
cat_prim = c("primary","secondary","tertiary")
dat = data.frame(country = sample(cat_country, n, replace = TRUE),
prim = sample(cat_prim, n, replace = TRUE))
ftable_dat = ftable(dat)
## Make latex table:
latex_table = xtable(as.table(ftable_dat))
To get what you want I made the following hack (ugly one). The trick is to print the xtable object and than edit that:
latex_table = within(latex_table, {
# browser()
primary = ifelse(primary > 12, sprintf("\\textbf{%s}", primary), primary)
#primary = sub("\\{", "{", primary)
})
printed_table = print(latex_table)
printed_table = sub("backslash", "\\", printed_table)
printed_table = sub("\\\\}", "}", printed_table)
printed_table = sub("\\\\\\{", "{", printed_table)
printed_table = sub("\\$", "\\", printed_table)
printed_table = sub("\\$", "\\", printed_table)
cat(printed_table)
Which leads to:
% latex table generated in R 2.14.1 by xtable 1.6-0 package
% Thu Feb 16 13:10:55 2012
\begin{table}[ht]
\begin{center}
\begin{tabular}{rrrr}
\hline
& primary & secondary & tertiary \\
\hline
BE & 10 & 5 & 11 \\
HU & \textbf{13} & 13 & 8 \\
NL & 11 & 17 & 12 \\
\hline
\end{tabular}
    \end{center}
    \end{table}
This example makes a number in the primary category bold, but it can work for colorization just as easily. Maybe someone else has a more elegant solution?

Resources