Using expression in variables - r

I'm trying to label a plot in R with superscript. For example, I have a variable called label:
>label <- colnames(tmp.df);
>label
[1] "ColumnA" "Volume 100mm3", "ColumnC", etc.
I would like to have "3" in the "Volume 100mm3" as superscript in my plot label. I cannot use something like:
label <- c("ColumnA", expression(paste('Volume 100mm'^'3')), "ColumnC");
since the ordering of the column names in tmp.df may change from run to run. So how can I get around this problem?

You could find the one with the mm by
ind <- grep("mm",label)
splt <- strsplit(label[ind], "mm")[[1]]
and then inject the expression via
label[ind] <- parse(text=sprintf("paste('%smm'^'%s')",splt[1],splt[2]))
If there are multiple strings that indicate the need for expressions, then it should be straightforward to adapt this.

You can use bquote for this. The * connects "Volume 100" to "mm^3" without a space. If you want a space there, you can use ~ instead of *.
plot(1:10, main = bquote(.("Volume 100") * mm^3))

how about just using the Unicode character for cubed 'SUPERSCRIPT THREE' (U+00B3)? in R, this would be escaped as '100mm\u00B3', or if this is coming from a data file, just use unicode characters directly in the file.

Related

How to match any character existing between a pattern and a semicolon

I am trying to get anything existing between sample_id= and ; in a vector like this:
sample_id=10221108;gender=male
tissue_id=23;sample_id=321108;gender=male
treatment=no;tissue_id=98;sample_id=22
My desired output would be:
10221108
321108
22
How can I get this?
I've been trying several things like this, but I don't find the way to do it correctly:
clinical_data$sample_id<-c(sapply(myvector, function(x) sub("subject_id=.;", "\\1", x)))
You could use sub with a capture group to isolate that which you are trying to match:
out <- sub("^.*\\bsample_id=(\\d+).*$", "\\1", x)
out
[1] "10221108" "321108" "22"
Data:
x <- c("sample_id=10221108;gender=male",
"tissue_id=23;sample_id=321108;gender=male",
"treatment=no;tissue_id=98;sample_id=22")
Note that the actual output above is character, not numeric. But, you may easily convert using as.numeric if you need to do that.
Edit:
If you are unsure that the sample IDs would always be just digits, here is another version you may use to capture any content following sample_id:
out <- sub("^.*\\bsample_id=([^;]+).*$", "\\1", x)
out
You could try the str_extract method which utilizes the Stringr package.
If your data is separated by line, you can do:
str_extract("(?<=\\bsample_id=)([:digit:]+)") #this tells the extraction to target anything that is proceeded by a sample_id= and is a series of digits, the + captures all of the digits
This would extract just the numbers per line, if your data is all collected like that, it becomes a tad more difficult because you will have to tell the extraction to continue even if it has extracted something. The code would look something like this:
str_extract_all("((?<=sample_id=)\\d+)")
This code will extract all of the numbers you're looking for and the output will be a list. From there you can manipulate the list as you see fit.

Skip % in R when using sprintf

I am using "sprintf" function in R to generate some numbers as for a ggplot lables. The problem is that I want those numbers in percentage like the following:
sprintf("paste(round(%s*100, 2), '\\%', sep='')", data_plot[1])
As you can see I am using "\" so the sprintf function does not deal with it as a special character but I still receive the following error:
Error in sprintf("paste(round(%s*100, 2), '%', sep='')", names(data_plot [1]) : too few arguments
When I replace the "%" with for example "+" everything works fine. I found some posts regarding this and how I can write a separate function to take care of this, but I am wondering if there is an easier way of doing it.
Note: This line of code is a part of ggplot code so it has to be written like this.
Thanks
You need to use "%%" in sprintf to print a single "%".
You can use either paste or sprintf, you don't need both. So, something like
dat <- data.frame(x=seq(0.01, 1, len=10), y=runif(10))
ggplot(dat, aes(x, y)) +
geom_point() +
scale_x_continuous(breaks=dat$x, labels=sprintf("%.2f%%", 100*dat$x))

R - Plot: How to format in 10-base scientific notation and put it text, mtex, title etc functions?

I have numeric variable, say K=3.5e-5 (its values is calculated throughout my script). I want to write this value somewhere (title, as text in the plot, etc) in my plot as:
K_{root} = 3.5 10^{-5} cm /d
I have tried the functions bquote, substitute and no one worked.
Let's put the question in examples. I have tried the following:
1)
png("exp_1.png")
kroot = 3.5e-5
plot(1:10,1:10,
text(4,9,bquote(italic(K[root])~"="~.(kroot)~"cm/d")))
dev.off()
Try my favorite function, paste().
plot(1:10,1:10,
text(4,9,gsub("e",paste("K[root]=",format(k,scientific=TRUE),"cm/d",sep=" "),replacement=" 10^")))
You can replace the "e" here using the function gsub. I've edited my answer to include this.
The output:
> k=.0000035
> k
[1] 3.5e-06
> gsub("e",paste("K[root]=",format(k,scientific=TRUE),"} cm/d",sep=" "),replacement=" 10^{ ")
[1] "K[root]= 3.5 10^{ -06 } cm/d"
You can remove the extra spaces around { -06 } by using the function substr, if it's important, or simply leave out the curly brackets in the gsub statement.
I try to avoid using paste inside expressions. There is generally a cleaner way to approach this:
expon <- floor(log10(kroot)) # returns -5
mantis <- kroot*10^(-1*expon ) # returns 3.5
plot(1:10,1:10,
text(4,9,substitute( italic(K[root]) == mantis %.% pten* expon ~cm/d,
list(expon=expon, mantis=mantis, pten=" 10^")))

Using column names with signs of a data frame in a qplot

I have a dataset and unfortunately some of the column labels in my dataframe contain signs (- or +). This doesn't seem to bother the dataframe, but when I try to plot this with qplot it throws me an error:
x <- 1:5
y <- x
names <- c("1+", "2-")
mydf <- data.frame(x, y)
colnames(mydf) <- names
mydf
qplot(1+, 2-, data = mydf)
and if I enclose the column names in quotes it will just give me a category (or something to that effect, it'll give me a plot of "1+" vs. "2-" with one point in the middle).
Is it possible to do this easily? I looked at aes_string but didn't quite understand it (at least not enough to get it to work).
Thanks in advance.
P.S. I have searched for a solution online but can't quite find anything that helps me with this (it could be due to some aspect I don't understand), so I reason it might be because this is a completely retarded naming scheme I have :p.
Since you have non-standard column names, you need to to use backticks (`)in your column references.
For example:
mydf$`1+`
[1] 1 2 3 4 5
So, your qplot() call should look like this:
qplot(`1+`, `2-`, data = mydf)
You can find more information in ?Quotes and ?names
As said in the other answer you have a problem because you you don't have standard names. When solution is to avoid backticks notation is to convert colnames to a standard form. Another motivation to convert names to regular ones is , you can't use backticks in a lattice plot for example. Using gsub you can do this:
gsub('(^[0-9]+)[+|-]+|[+|-]+','a\\1',c("1+", "2-","a--"))
[1] "a1" "a2" "aa"
Hence, applying this to your example :
colnames(mydf) <- gsub('(^[0-9]+)[+|-]+|[+|-]+','a\\1',colnames(mydf))
qplot(a1,a2,data = mydf)
EIDT
you can use make.names with option unique =T
make.names(c("10+", "20-", "10-", "a30++"),unique=T)
[1] "X10." "X20." "X10..1" "a30.."
If you don't like R naming rules, here a custom version with using gsubfn
library(gsubfn)
gsubfn("[+|-]|^[0-9]+",
function(x) switch(x,'+'= 'a','-' ='b',paste('x',x,sep='')),
c("10+", "20-", "10-", "a30++"))
"x10a" "x20b" "x10b" "a30aa" ## note x10b looks better than X10..1

How to refer to a variable name with spaces?

In ggplot2, how do I refer to a variable name with spaces?
Why do qplot() and ggplot() break when used on variable names with quotes?
For example, this works:
qplot(x,y,data=a)
But this does not:
qplot("x","y",data=a)
I ask because I often have data matrices with spaces in the name. Eg, "State Income". ggplot2 needs data frames; ok, I can convert. So I'd want to try something like:
qplot("State Income","State Ideology",data=as.data.frame(a.matrix))
That fails.
Whereas in base R graphics, I'd do:
plot(a.matrix[,"State Income"],a.matrix[,"State Ideology"])
Which would work.
Any ideas?
Answer: because 'x' and 'y' are considered a length-one character vector, not a variable name. Here you discover why it is not smart to use variable names with spaces in R. Or any other programming language for that matter.
To refer to variable names with spaces, you can use either hadleys solution
a.matrix <- matrix(rep(1:10,3),ncol=3)
colnames(a.matrix) <- c("a name","another name","a third name")
qplot(`a name`, `another name`,data=as.data.frame(a.matrix)) # backticks!
or the more formal
qplot(get('a name'), get('another name'),data=as.data.frame(a.matrix))
The latter can be used in constructs where you pass the name of a variable as a string in eg a loop construct :
for (i in c("another name","a third name")){
print(qplot(get(i),get("a name"),
data=as.data.frame(a.matrix),xlab=i,ylab="a name"))
Sys.sleep(5)
}
Still, the best solution is not to use variable names with spaces.
Using get is not more "formal", actually I would argue the opposite. As the R help says (help("`")), you can almost always use a variable name that contains spaces, provided it's quoted. (Normally, with a backtick, as already suggested.)
Something similar was asked on ggplot2 mailing list and Mehmet Gültaş linked to this post. Another way of using strings to construct your ggplot call is through the aes_strings function. Note that you still have to put backticks inside the quotes for the thing to work for variables with spaces.
library(ggplot2)
names(mtcars)[1] <- "em pi dzi"
ggplot(mtcars, aes_string(x = "cyl", y = "`em pi dzi`")) +
theme_bw() +
geom_jitter()

Resources