How to refer to a variable name with spaces? - r

In ggplot2, how do I refer to a variable name with spaces?
Why do qplot() and ggplot() break when used on variable names with quotes?
For example, this works:
qplot(x,y,data=a)
But this does not:
qplot("x","y",data=a)
I ask because I often have data matrices with spaces in the name. Eg, "State Income". ggplot2 needs data frames; ok, I can convert. So I'd want to try something like:
qplot("State Income","State Ideology",data=as.data.frame(a.matrix))
That fails.
Whereas in base R graphics, I'd do:
plot(a.matrix[,"State Income"],a.matrix[,"State Ideology"])
Which would work.
Any ideas?

Answer: because 'x' and 'y' are considered a length-one character vector, not a variable name. Here you discover why it is not smart to use variable names with spaces in R. Or any other programming language for that matter.
To refer to variable names with spaces, you can use either hadleys solution
a.matrix <- matrix(rep(1:10,3),ncol=3)
colnames(a.matrix) <- c("a name","another name","a third name")
qplot(`a name`, `another name`,data=as.data.frame(a.matrix)) # backticks!
or the more formal
qplot(get('a name'), get('another name'),data=as.data.frame(a.matrix))
The latter can be used in constructs where you pass the name of a variable as a string in eg a loop construct :
for (i in c("another name","a third name")){
print(qplot(get(i),get("a name"),
data=as.data.frame(a.matrix),xlab=i,ylab="a name"))
Sys.sleep(5)
}
Still, the best solution is not to use variable names with spaces.

Using get is not more "formal", actually I would argue the opposite. As the R help says (help("`")), you can almost always use a variable name that contains spaces, provided it's quoted. (Normally, with a backtick, as already suggested.)

Something similar was asked on ggplot2 mailing list and Mehmet Gültaş linked to this post. Another way of using strings to construct your ggplot call is through the aes_strings function. Note that you still have to put backticks inside the quotes for the thing to work for variables with spaces.
library(ggplot2)
names(mtcars)[1] <- "em pi dzi"
ggplot(mtcars, aes_string(x = "cyl", y = "`em pi dzi`")) +
theme_bw() +
geom_jitter()

Related

expression() is not changing in the plot [duplicate]

I'm trying to work out how to have subscript letters in an axis label.
dat <- data.frame(x = rnorm(100), y = rnorm(100))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1]))
dat <- data.frame(x = rnorm(100), y = rnorm(100))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1d]))
The first example works as it's just a number, as soon as you have a character in the square brackets, it fails. Blah[subscript(1d)] is essentially what I need, but I can't work out how to get it to let me have letters in subscript. I have tried variations, including paste().
The following examples provide strange behavior:
labs(y=expression(Blah[12])) # this works
labs(y=expression(Blah[d])) # this works
labs(y=expression(Blah[d1])) # this works
labs(y=expression(Blah[1d])) # this fails
Thoughts?
The reason the last one fails is that the arguments to expression get run through the R parser and an error is returned when they fail the test of whether they could possibly be correct R syntax. The string or token 1d is not a valid R token (or symbol). It would be possible to either break it into valid R tokens and "connect" with non-space operators, backtick it , or use ordinary quotes. I think either is a better way than using paste:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1*d]))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah["1d"]))
Tokens (or "names" or "symbols") in R are not supposed to start with digits. So you get around that limitation by either quoting or by separating 1 and d by a non-space separator, the * operator. That "joins" or "ligates" a pure numeric literal with a legal R symbol or token.
To get a percent sign unsubscripted just:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1*d]*"%"))
To put parens around the pct-sign:
expression(Blah[1*d]*"(%)")
The % character has special meaning in R parsing, since it signifies the beginning of a user defined infix operator. So using it as a literal requires that it be quoted. The same reasoning requires that "for" and "in" be quoted, because they are in the "reserved words" group for R. There are other reserved words, (but for and in are the ones that trip me up most often.) Type:
?Reserved
And another "trick" is to use quotation marks around digits within italic()if you need them italicized. Unquoted digits do not get italicized inside that function.
Caveats: paste is a plotmath function except it has different semantics than the base::paste function. In particular, it has no 'sep' argument. So you can never get a space between the printed arguments and if you try to put in a non-space item, a single instance will appear after all the other arguments labeled as sep=" ".
paste0 is not a plotmath function and so will not get interpreted but rather will appear "unprocessed" with its unprocessed arguments inside parentheses.
Okay. I swear I didn't post this just to answer it myself, despite how quickly I got it (always the way when you ask a question!)
Here it is:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1][d]))
Thought it best to post the answer rather than remove the question as it may help someone else one day.
'Blahs' aside, what I actually wanted was expression(paste("Hb", A[1][c]," (%)",sep=""))
Why paste0() doesn't work here is beyond me.

Using expression in variables

I'm trying to label a plot in R with superscript. For example, I have a variable called label:
>label <- colnames(tmp.df);
>label
[1] "ColumnA" "Volume 100mm3", "ColumnC", etc.
I would like to have "3" in the "Volume 100mm3" as superscript in my plot label. I cannot use something like:
label <- c("ColumnA", expression(paste('Volume 100mm'^'3')), "ColumnC");
since the ordering of the column names in tmp.df may change from run to run. So how can I get around this problem?
You could find the one with the mm by
ind <- grep("mm",label)
splt <- strsplit(label[ind], "mm")[[1]]
and then inject the expression via
label[ind] <- parse(text=sprintf("paste('%smm'^'%s')",splt[1],splt[2]))
If there are multiple strings that indicate the need for expressions, then it should be straightforward to adapt this.
You can use bquote for this. The * connects "Volume 100" to "mm^3" without a space. If you want a space there, you can use ~ instead of *.
plot(1:10, main = bquote(.("Volume 100") * mm^3))
how about just using the Unicode character for cubed 'SUPERSCRIPT THREE' (U+00B3)? in R, this would be escaped as '100mm\u00B3', or if this is coming from a data file, just use unicode characters directly in the file.

Subscript letters in ggplot axis label

I'm trying to work out how to have subscript letters in an axis label.
dat <- data.frame(x = rnorm(100), y = rnorm(100))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1]))
dat <- data.frame(x = rnorm(100), y = rnorm(100))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1d]))
The first example works as it's just a number, as soon as you have a character in the square brackets, it fails. Blah[subscript(1d)] is essentially what I need, but I can't work out how to get it to let me have letters in subscript. I have tried variations, including paste().
The following examples provide strange behavior:
labs(y=expression(Blah[12])) # this works
labs(y=expression(Blah[d])) # this works
labs(y=expression(Blah[d1])) # this works
labs(y=expression(Blah[1d])) # this fails
Thoughts?
The reason the last one fails is that the arguments to expression get run through the R parser and an error is returned when they fail the test of whether they could possibly be correct R syntax. The string or token 1d is not a valid R token (or symbol). It would be possible to either break it into valid R tokens and "connect" with non-space operators, backtick it , or use ordinary quotes. I think either is a better way than using paste:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1*d]))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah["1d"]))
Tokens (or "names" or "symbols") in R are not supposed to start with digits. So you get around that limitation by either quoting or by separating 1 and d by a non-space separator, the * operator. That "joins" or "ligates" a pure numeric literal with a legal R symbol or token.
To get a percent sign unsubscripted just:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1*d]*"%"))
To put parens around the pct-sign:
expression(Blah[1*d]*"(%)")
The % character has special meaning in R parsing, since it signifies the beginning of a user defined infix operator. So using it as a literal requires that it be quoted. The same reasoning requires that "for" and "in" be quoted, because they are in the "reserved words" group for R. There are other reserved words, (but for and in are the ones that trip me up most often.) Type:
?Reserved
And another "trick" is to use quotation marks around digits within italic()if you need them italicized. Unquoted digits do not get italicized inside that function.
Caveats: paste is a plotmath function except it has different semantics than the base::paste function. In particular, it has no 'sep' argument. So you can never get a space between the printed arguments and if you try to put in a non-space item, a single instance will appear after all the other arguments labeled as sep=" ".
paste0 is not a plotmath function and so will not get interpreted but rather will appear "unprocessed" with its unprocessed arguments inside parentheses.
Okay. I swear I didn't post this just to answer it myself, despite how quickly I got it (always the way when you ask a question!)
Here it is:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1][d]))
Thought it best to post the answer rather than remove the question as it may help someone else one day.
'Blahs' aside, what I actually wanted was expression(paste("Hb", A[1][c]," (%)",sep=""))
Why paste0() doesn't work here is beyond me.

Using column names with signs of a data frame in a qplot

I have a dataset and unfortunately some of the column labels in my dataframe contain signs (- or +). This doesn't seem to bother the dataframe, but when I try to plot this with qplot it throws me an error:
x <- 1:5
y <- x
names <- c("1+", "2-")
mydf <- data.frame(x, y)
colnames(mydf) <- names
mydf
qplot(1+, 2-, data = mydf)
and if I enclose the column names in quotes it will just give me a category (or something to that effect, it'll give me a plot of "1+" vs. "2-" with one point in the middle).
Is it possible to do this easily? I looked at aes_string but didn't quite understand it (at least not enough to get it to work).
Thanks in advance.
P.S. I have searched for a solution online but can't quite find anything that helps me with this (it could be due to some aspect I don't understand), so I reason it might be because this is a completely retarded naming scheme I have :p.
Since you have non-standard column names, you need to to use backticks (`)in your column references.
For example:
mydf$`1+`
[1] 1 2 3 4 5
So, your qplot() call should look like this:
qplot(`1+`, `2-`, data = mydf)
You can find more information in ?Quotes and ?names
As said in the other answer you have a problem because you you don't have standard names. When solution is to avoid backticks notation is to convert colnames to a standard form. Another motivation to convert names to regular ones is , you can't use backticks in a lattice plot for example. Using gsub you can do this:
gsub('(^[0-9]+)[+|-]+|[+|-]+','a\\1',c("1+", "2-","a--"))
[1] "a1" "a2" "aa"
Hence, applying this to your example :
colnames(mydf) <- gsub('(^[0-9]+)[+|-]+|[+|-]+','a\\1',colnames(mydf))
qplot(a1,a2,data = mydf)
EIDT
you can use make.names with option unique =T
make.names(c("10+", "20-", "10-", "a30++"),unique=T)
[1] "X10." "X20." "X10..1" "a30.."
If you don't like R naming rules, here a custom version with using gsubfn
library(gsubfn)
gsubfn("[+|-]|^[0-9]+",
function(x) switch(x,'+'= 'a','-' ='b',paste('x',x,sep='')),
c("10+", "20-", "10-", "a30++"))
"x10a" "x20b" "x10b" "a30aa" ## note x10b looks better than X10..1

text in plot from column, first argument of a ";" divided string

one quick question
picture a data frame like
data=data.frame(x=c(1,2,3), y=c(4,5,6), Genes=c("AHS;AKS;AHS","AHS;IO","HU"))
so i want to plot x and y
plot(x,y)
and do the label for the dots like this
text(data$x+0.2,data$y+0.2,labels=data$Genes)
BUT i dont want to use all arguments from the genes column ONLY the first one (e.g. before the ";")
Can u please help me with that?
This is only an example, i have already read my data in with read.delim, so i cannot do a specific "read in" with string separation.
As per my comment, you can use gsub to do this:
gsub('^([A-Z]+);.*$', '\\1', data$Genes)
You could also use strsplit:
unlist(lapply(strsplit(data$Genes, ';'), '[', 1))
But that's yucky...
Its also probably worth mentioning the stringr package which collects a lot of these string munging functions in to a single place with predictable syntax and names.

Resources