Specify monospace font in `menu` - r

Language: R. Question: Can I specify fixed width font for the menu(..,graphics=T) function?
Explanation:
I recently asked this question on how to have a user select a row of a data frame interactively:
df <- data.frame(a=c(9,10),b=c('hello','bananas'))
df.text <- apply( df, 1, paste, collapse=" | " )
menu(df.text,graphics=T)
I'd like the | to line up. They don't at the moment; fair enough, I haven't padded out the columns to the same width. So I use format to get every column to the same width (later I'll write code to automagically determine the width per column, but let's ignore that for now):
df.padded <- apply(df,2,format,width=8)
df.padded.text <- apply( df.padded, 1, paste, collapse=" | ")
menu( df.padded.text,graphics=T )
See how it's still wonky? Yet, if I look at df.padded, I get:
> df.padded
a b
[1,] " 9 " "hello "
[2,] "10 " "bananas "
So each cell is definitely padded out to the same length.
The reason for this is probably because the default font for this (on my system anyway, Linux) is not fixed width.
So my question is:
Can I specify fixed width font for the menu(..,graphics=T) function?
Update
#RichieCotton noticed that if you look at menu with graphics=T it calls select.list, which in turn calls tcltk::tk_select.list.
So it looks like I'll have to modify tcltk options for this. From #jverzani:
library(tcltk)
tcl("option", "add", "*Listbox.font", "courier 10")
menu(df.padded.text,graphics=T)
Given that menu(...,graphics=T) calls tcltk::tk_select.list when graphics is TRUE, my guess is that this is a viable option, as any distro that would be capable of displaying the graphical menu in the first place would also have tcltk on it, since it needs to call tk_select.list.
(As an aside, I can't find anything in the documentation that would give me the hint to try tcl('option','add',...), let alone that the option was called *Listbox.font!)
Another update -- had a closer look at the select.list and menu code, and it turns out on Windows (or if .Platform$GUI=='AQUA' -- is that Mac?), the tcltk::tk_select.list isn't called at all, and it's just some internal code instead. So modifying '*Listbox.font' won't affect this.
I guess I'll just:
if tcltk is there, load it, set the *Listbox.font to courier, and use tcltk::tk_select.list explicitly
if it isn't there, try menu(...,graphics=T) to at least get a graphical interface (which won't be monospace, but is better than nothing)
if that fails too, then just fallback to menu(...,graphics=F), which will definitely work.
Thanks all.

Another approach to padding:
na.pad <- function(x,len){
x[1:len]
}
makePaddedDataFrame <- function(l,...){
maxlen <- max(sapply(l,length))
data.frame(lapply(l,na.pad,len=maxlen),...)
}
x = c(rep("one",2))
y = c(rep("two",10))
z = c(rep("three",5))
makePaddedDataFrame(list(x=x,y=y,z=z))
The na.pad() function exploits the fact that R will automatically pad a vector with NAs if you try to index non-existent elements.
makePaddedDataFrame() just finds the longest one and pads the rest up to a matching length.

I don't understand why you don't want to use View(df) (get the rowid, put the contents into temp. data frame and display it with the View command)
Edit: well, just use sprintf command
Create a function f to extract the strings from the data frame object
f <- function(x,sep1) {
sep1=format(sep1,width=8)
xa<-gsub(" ","",as.character(x[1]))
a1 <- nchar(xa)
xa=format(xa,width=8)
xb=gsub(" ","",as.character(x[2]))
b1 <- nchar(xb)
xb=format(xb,width=8)
format1=paste("%-",10-a1,"s%s%-",20-b1,"s",sep="")
concat=sprintf(format1,xa,sep1,xb)
concat
}
df <- data.frame(a=c(9,10),b=c('hello','bananas'))
df.text <- apply( df, 1, f,sep1="|")
menu(df.text,graphics=T)
Of course the limits used in sprintf 10, 20 are maximum length for the number of characters in the data-frame column (a,b). You can change it to reflect it according to your data.

Related

How to access and modify a sibling of a Tcl/Tk object in R

In short:
I use tcltk package in R. But non-R users may suggest ideas too and provide examples in other language than R.
I have a list of Tcl/Tk objects in R <Tcl> .1.1.1.1 .1.1.1.2 .1.1.1.3 and want to access and modify each object separately. How can I do it?
Or If I have button A object, how can I access and modify button B object?
Details:
I have the following widget:
The code to create it:
library(tcltk)
top <- tktoplevel()
frame_1 <- tkframe(top)
tkgrid(frame_1)
frame_n <- tkframe(frame_1)
tkgrid(frame_n)
b1 <- ttkbutton(frame_n, text = "button A")
b2 <- ttkbutton(frame_n, text = "button B")
b3 <- ttkbutton(frame_n, text = "button c")
tkgrid(b1, b2, b3)
Let's say I can access only b1:
class(b1)
# [1] "tkwin"
I want to access and modify the siblings of b1 as if I had objects b2, etc. (for example):
tkcget(b2, "-text") # Get text
tkconfigure(b2, text = "New B") # Change text
By using tkwinfo, I managed to access the parent of b1 and get a list of siblings (I'm not sure if technically it is a "list"), but I don't know, how to access/modify each of them one by one:
(parent_of_b1 <- tkwinfo("parent", b1))
# <Tcl> .1.1.1
(siblings_of_b1 <- tkwinfo("children", parent_of_b1))
# <Tcl> .1.1.1.1 .1.1.1.2 .1.1.1.3
class(siblings_of_b1)
# "tclObj"
My attempt results in error:
tkcget(siblings_of_b1, "-text")
# Error in structure(.External(.C_dotTclObjv, objv), class = "tclObj") :
# [tcl] invalid command name ".1.1.1.1 .1.1.1.2 .1.1.1.3".
Most probably I don't know the way to subset the object. How can I do it?
UPDATE: based on the comments of #Donal Fellows, I found the solution.
Function as.character() does the job.
(my_tcl_object <- tkwinfo("children", parent_of_b1))
# <Tcl> .1.1.1.1 .1.1.1.2 .1.1.1.3
as.character(my_tcl_object)
## [1] ".1.1.1.1" ".1.1.1.2" ".1.1.1.3"
In this situation, tclvalue() + strsplit() works as well:
strsplit(tclvalue(my_tcl_object), " ", fixed = TRUE)[[1]]
## [1] ".1.1.1.1" ".1.1.1.2" ".1.1.1.3"
But, in general (for other problems), as.character() vs. tclvalue() + strsplit() may give different results.
The issue is that the winfo children subcommand (using the underlying Tcl name) returns a Tcl list of widget identifiers. In general, this is a bit messy to deal with from other languages (because of potential issues with handling quoting rules) but because the generated widget identifiers just contain ASCII digits and . characters and the separators are just single spaces, simply splitting by space will give you the right thing.
(siblings_of_b1 <- strsplit(tkwinfo("children", parent_of_b1), " ", fixed = TRUE))
You'll need to iterate over the resulting list, of course. Multiple siblings are multiple siblings. (Also, don't forget that this includes b1 itself; you've not asked for the actual siblings, but rather the children of the parent.)

Swap name and content of a (lookup) vector in an one liner / library function

In my code I use lookup tables quite often, for example to have more verbose versions of column names in a data frame. For instance:
lkp <- c(speed = "Speed in mph", dist = "Stopping Distance in ft")
makePlot <- function(x = names(cars)) {
x <- match.arg(x)
hist(cars[[x]], xlab = lkp[[x]])
}
Now it happens that I want to reverse the lookup vector [*], which is easily done by
setNames(names(lkp), lkp)
If lkp is a bit more complicated, this becomes quite a lot of typing:
setNames(names(c(firstLkp, secondLkp, thirdLkp, youGotTheIdea)),
c(firstLkp, secondLkp, thirdLkp, youGotTheIdea))
with a lot of redundant code. Of course I could create a temporary variable
fullLkp <- c(firstLkp, secondLkp, thirdLkp, youGotTheIdea)
setNames(names(fullLkp), fullLkp)
Or even write a simple function doing it for me
swap_names_content <- function(x) setNames(names(x), x)
However, since this seems to me to be such a common task, I was wondering whether there is already a function in one of the popular packages doing the same?
[*] A common use case for me is the use of shiny's selectInput for instance:
List of values to select from. If elements of the list are named, then that name rather than the value is displayed to the user.
That is, it is exactly the reverse of my typical lookup table.

R Sweave: digits number in xtable of prop.table

I'm making an xtableFtable on R Sweave and can't find a way to suppress the digits with this code. What I am doing false? I've read that it can happen if your values aren't numeric but factor or character, but is prop.table making them non-numeric? I'm lost...
library(xtable)
a <- ftable(prop.table(table(mtcars$mpg, mtcars$hp), margin=2)*100)
b <- xtableFtable(a, method = "compact", digits = 0)
print.xtableFtable(b, rotate.colnames = TRUE)
I've already tried with digits=c(0,0,0,0...) too.
You could use options(digits) to control how many digits will print. Try something like options(digits = 4) as the first line of your code (change 4 to whatever value you want between 1 and 22). See ?options for more information.
Or round the values before printing
a = round(ftable(prop.table(table(mtcars$mpg, mtcars$hp), margin=2)*100), 2)
b = xtableFtable(a, method = "compact")
print.xtableFtable(b, rotate.colnames = TRUE)
The "digits" argument to xtableFtable seems to be unimplemented (as of my version, which is 1.8.3), since after playing around with it for half an hour nothing seems to make any difference.
There's a hint to this effect in the function documentation:
It is not recommended that users change the values of align, digits or align. First of all, alternative values have not been tested. Secondly, it is most likely that to determine appropriate values for these arguments, users will have to investigate the code for xtableFtable and/or print.xtableFtable.
It's probably just carried over from the xtable function (on which xtableFtable is surely based) as a TODO which the maintainer hasn't gotten around to yet.

Dynamically call dataframe column & conditional replacement in R

First question post. Please excuse any formatting issues that may be present.
What I'm trying to do is conditionally replace a factor level in a dataframe column. Reason being due to unicode differences between a right single quotation mark (U+2019) and an apostrophe (U+0027).
All of the columns that need this replacement begin with with "INN8", so I'm using
grep("INN8", colnames(demoDf)) -> apostropheFixIndices
for(i in apostropheFixIndices) {
levels(demoDfFinal[i]) <- c(levels(demoDf[i]), "I definitely wouldn't")
(insert code here)
}
to get the indices in order to perform the conditional replacement.
I've taken a look at a myriad of questions that involve naming variables on the fly: naming variables on the fly
as well as how to assign values to dynamic variables
and have explored the R-FAQ on turning a string into a variable and looked into Ari Friedman's suggestion that named elements in a list are preferred. However I'm unsure as to the execution as well as the significance of the best practice suggestion.
I know I need to do something along the lines of
demoDf$INN8xx[demoDf$INN8xx=="I definitely wouldn’t"] <- "I definitely wouldn't"]
but the iterations I've tried so far haven't worked.
Thank you for your time!
If I understand you correctly, then you don't want to rename the columns. Then this might work:
demoDf <- data.frame(A=rep("I definitely wouldn’t",10) , B=rep("I definitely wouldn’t",10))
newDf <- apply(demoDf, 2, function(col) {
gsub(pattern="’", replacement = "'", x = col)
})
It just checks all columns for the wrong symbol.
Or if you have a vector containing the column indices you want to check then you could go with
# Let's say you identified columns 2, 5 and 8
cols <- c(2,5,8)
sapply(cols, function(col) {
demoDf[,col] <<- gsub(pattern="’", replacement = "'", x = demoDf[,col])
})

How to perform basic Multiple Sequence Alignments in R?

(I've tried asking this on BioStars, but for the slight chance that someone from text mining would think there is a better solution, I am also reposting this here)
The task I'm trying to achieve is to align several sequences.
I don't have a basic pattern to match to. All that I know is that the "True" pattern should be of length "30" and that the sequences I have had missing values introduced to them at random points.
Here is an example of such sequences, were on the left we see what is the real location of the missing values, and on the right we see the sequence that we will be able to observe.
My goal is to reconstruct the left column using only the sequences I've got on the right column (based on the fact that many of the letters in each position are the same)
Real_sequence The_sequence_we_see
1 CGCAATACTAAC-AGCTGACTTACGCACCG CGCAATACTAACAGCTGACTTACGCACCG
2 CGCAATACTAGC-AGGTGACTTCC-CT-CG CGCAATACTAGCAGGTGACTTCCCTCG
3 CGCAATGATCAC--GGTGGCTCCCGGTGCG CGCAATGATCACGGTGGCTCCCGGTGCG
4 CGCAATACTAACCA-CTAACT--CGCTGCG CGCAATACTAACCACTAACTCGCTGCG
5 CGCACGGGTAAGAACGTGA-TTACGCTCAG CGCACGGGTAAGAACGTGATTACGCTCAG
6 CGCTATACTAACAA-GTG-CTTAGGC-CTG CGCTATACTAACAAGTGCTTAGGCCTG
7 CCCA-C-CTAA-ACGGTGACTTACGCTCCG CCCACCTAAACGGTGACTTACGCTCCG
Here is an example code to reproduce the above example:
ATCG <- c("A","T","C","G")
set.seed(40)
original.seq <- sample(ATCG, 30, T)
seqS <- matrix(original.seq,200,30, T)
change.letters <- function(x, number.of.changes = 15, letters.to.change.with = ATCG)
{
number.of.changes <- sample(seq_len(number.of.changes), 1)
new.letters <- sample(letters.to.change.with , number.of.changes, T)
where.to.change.the.letters <- sample(seq_along(x) , number.of.changes, F)
x[where.to.change.the.letters] <- new.letters
return(x)
}
change.letters(original.seq)
insert.missing.values <- function(x) change.letters(x, 3, "-")
insert.missing.values(original.seq)
seqS2 <- t(apply(seqS, 1, change.letters))
seqS3 <- t(apply(seqS2, 1, insert.missing.values))
seqS4 <- apply(seqS3,1, function(x) {paste(x, collapse = "")})
require(stringr)
# library(help=stringr)
all.seqS <- str_replace(seqS4,"-" , "")
# how do we allign this?
data.frame(Real_sequence = seqS4, The_sequence_we_see = all.seqS)
I understand that if all I had was a string and a pattern I would be able to use
library(Biostrings)
pairwiseAlignment(...)
But in the case I present we are dealing with many sequences to align to one another (instead of aligning them to one pattern).
Is there a known method for doing this in R?
Writing an alignment algorithm in R looks like a bad idea to me, but there is an R interface to the MUSCLE algorithm in the bio3d package (function seqaln()). Be aware of the fact that you have to install this algorithm first.
Alternatively, you can use any of the available algorithms (eg ClustalW, MAFFT, T-COFFEE) and import the multiple sequence alignemts in R using bioconductor functionality. See eg here..
Though this is quite an old thread, I do not want to miss the opportunity to mention that, since Bioconductor 3.1, there is a package 'msa' that implements interfaces to three different multiple sequence alignment algorithms: ClustalW, ClustalOmega, and MUSCLE. The package runs on all major platforms (Linux/Unix, Mac OS, and Windows) and is self-contained in the sense that you need not install any external software. More information can be found on http://www.bioinf.jku.at/software/msa/ and http://www.bioconductor.org/packages/release/bioc/html/msa.html.
You can perform multiple alignment in R with the DECIPHER package.
Following your example, it would look something like:
library(DECIPHER)
dna <- DNAStringSet(all.seqS)
aligned_DNA <- AlignSeqs(dna)
It is fast and at least as accurate as the other methods listed here (see the paper). I hope that helps!
You are looking for a global alignment algorithm on multiple sequences.
Did you look at Wikipedia before asking ?
First learn what global alignment is, then look for multiple sequence alignment.
Wikipedia doesn't give a lot of details about algorithms, but this paper is better.

Resources