I'm trying to work out how to have subscript letters in an axis label.
dat <- data.frame(x = rnorm(100), y = rnorm(100))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1]))
dat <- data.frame(x = rnorm(100), y = rnorm(100))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1d]))
The first example works as it's just a number, as soon as you have a character in the square brackets, it fails. Blah[subscript(1d)] is essentially what I need, but I can't work out how to get it to let me have letters in subscript. I have tried variations, including paste().
The following examples provide strange behavior:
labs(y=expression(Blah[12])) # this works
labs(y=expression(Blah[d])) # this works
labs(y=expression(Blah[d1])) # this works
labs(y=expression(Blah[1d])) # this fails
Thoughts?
The reason the last one fails is that the arguments to expression get run through the R parser and an error is returned when they fail the test of whether they could possibly be correct R syntax. The string or token 1d is not a valid R token (or symbol). It would be possible to either break it into valid R tokens and "connect" with non-space operators, backtick it , or use ordinary quotes. I think either is a better way than using paste:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1*d]))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah["1d"]))
Tokens (or "names" or "symbols") in R are not supposed to start with digits. So you get around that limitation by either quoting or by separating 1 and d by a non-space separator, the * operator. That "joins" or "ligates" a pure numeric literal with a legal R symbol or token.
To get a percent sign unsubscripted just:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1*d]*"%"))
To put parens around the pct-sign:
expression(Blah[1*d]*"(%)")
The % character has special meaning in R parsing, since it signifies the beginning of a user defined infix operator. So using it as a literal requires that it be quoted. The same reasoning requires that "for" and "in" be quoted, because they are in the "reserved words" group for R. There are other reserved words, (but for and in are the ones that trip me up most often.) Type:
?Reserved
And another "trick" is to use quotation marks around digits within italic()if you need them italicized. Unquoted digits do not get italicized inside that function.
Caveats: paste is a plotmath function except it has different semantics than the base::paste function. In particular, it has no 'sep' argument. So you can never get a space between the printed arguments and if you try to put in a non-space item, a single instance will appear after all the other arguments labeled as sep=" ".
paste0 is not a plotmath function and so will not get interpreted but rather will appear "unprocessed" with its unprocessed arguments inside parentheses.
Okay. I swear I didn't post this just to answer it myself, despite how quickly I got it (always the way when you ask a question!)
Here it is:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1][d]))
Thought it best to post the answer rather than remove the question as it may help someone else one day.
'Blahs' aside, what I actually wanted was expression(paste("Hb", A[1][c]," (%)",sep=""))
Why paste0() doesn't work here is beyond me.
Related
Maybe i am not correct, but it appears to me that there is an undesired space in expressions beginning with a superscript:
df <- data.frame(treatment=as.factor(c("A", "B")), value=c(1,2))
labels <- c(expression(""^14~CH[4]),
expression(""^14~CH[4]~"+"~"SO"[4]^{2-''}))
library(ggplot2)
ggplot(df, aes(treatment, value)) +
geom_bar(stat="identity") +
scale_x_discrete(labels=labels)
I could go to Photoshop to reduce the space between the superscripted 14 and the "C", but maybe there is a way in plotmath? Notice, that this is not happening in the second expression with a superscript in the end.
In expressions, ~ gives you a space between terms. If you don't want a space between terms, you can use *. The end superscript is not preceded by a ~, so no space.
You can also remove most of the quote marks - these are unnecessary except when there are special characters or spaces.
So your expression can become
expression(''^14*CH[4]~+~SO[4]^'2-')
I am using "sprintf" function in R to generate some numbers as for a ggplot lables. The problem is that I want those numbers in percentage like the following:
sprintf("paste(round(%s*100, 2), '\\%', sep='')", data_plot[1])
As you can see I am using "\" so the sprintf function does not deal with it as a special character but I still receive the following error:
Error in sprintf("paste(round(%s*100, 2), '%', sep='')", names(data_plot [1]) : too few arguments
When I replace the "%" with for example "+" everything works fine. I found some posts regarding this and how I can write a separate function to take care of this, but I am wondering if there is an easier way of doing it.
Note: This line of code is a part of ggplot code so it has to be written like this.
Thanks
You need to use "%%" in sprintf to print a single "%".
You can use either paste or sprintf, you don't need both. So, something like
dat <- data.frame(x=seq(0.01, 1, len=10), y=runif(10))
ggplot(dat, aes(x, y)) +
geom_point() +
scale_x_continuous(breaks=dat$x, labels=sprintf("%.2f%%", 100*dat$x))
I've been using as.formula for setting up a glm, and I can't figure out where the unexpected symbol is. Part of the problem is that the character vector I'm converting is so long. It's about 700 words with + inserted in between in order to turn it into a formula. The error presents as follows:
Error in parse(text = x, keep.source = FALSE) :
<text>:2:10080: unexpected symbol
with the following snippet of the text:
2: c_1_E + Campaign_Search_Payroll_Generic_1_P + Campaign_Search_Performing_Core_Keywords + Campaign_Self_Employment_E + Campaign_Self_Employment_P + Campaign_Withholding + Campaign_Youtube + Sou
Things I know for sure:
No item is repeated.
No symbols other than alphanumerics and underscore (_).
No item starts with a number.
I'm not well versed enough in R to understand reading the documentation for as.formula or the function call itself.
Any ideas?
The <text>:2:10080 is giving you the location of the error. 2nd line, 10080th character. Consider:
parse(text="1 + 1 + 2\n a - 3 b")
# Error in parse(text = "1 + 1 + 2\n a - 3 b") :
# <text>:2:8: unexpected symbol
Here, the error is with b, which is an illegal use of a symbol, and you'll note it is the 8th character of the second line.
Most likely you're missing a +, though no way of knowing without the data behind your error. Also, not to judge or anything, but that's a helluva lot variables to be sticking into a model. I hope you have lots of data points.
Here is what worked for me as a work around for this problem.
features = make.names(features)
right_side = paste0(features, collapse=" + ")
fml = as.formula(sprintf(" ~ %s", right_side))
I'm trying to work out how to have subscript letters in an axis label.
dat <- data.frame(x = rnorm(100), y = rnorm(100))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1]))
dat <- data.frame(x = rnorm(100), y = rnorm(100))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1d]))
The first example works as it's just a number, as soon as you have a character in the square brackets, it fails. Blah[subscript(1d)] is essentially what I need, but I can't work out how to get it to let me have letters in subscript. I have tried variations, including paste().
The following examples provide strange behavior:
labs(y=expression(Blah[12])) # this works
labs(y=expression(Blah[d])) # this works
labs(y=expression(Blah[d1])) # this works
labs(y=expression(Blah[1d])) # this fails
Thoughts?
The reason the last one fails is that the arguments to expression get run through the R parser and an error is returned when they fail the test of whether they could possibly be correct R syntax. The string or token 1d is not a valid R token (or symbol). It would be possible to either break it into valid R tokens and "connect" with non-space operators, backtick it , or use ordinary quotes. I think either is a better way than using paste:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1*d]))
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah["1d"]))
Tokens (or "names" or "symbols") in R are not supposed to start with digits. So you get around that limitation by either quoting or by separating 1 and d by a non-space separator, the * operator. That "joins" or "ligates" a pure numeric literal with a legal R symbol or token.
To get a percent sign unsubscripted just:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1*d]*"%"))
To put parens around the pct-sign:
expression(Blah[1*d]*"(%)")
The % character has special meaning in R parsing, since it signifies the beginning of a user defined infix operator. So using it as a literal requires that it be quoted. The same reasoning requires that "for" and "in" be quoted, because they are in the "reserved words" group for R. There are other reserved words, (but for and in are the ones that trip me up most often.) Type:
?Reserved
And another "trick" is to use quotation marks around digits within italic()if you need them italicized. Unquoted digits do not get italicized inside that function.
Caveats: paste is a plotmath function except it has different semantics than the base::paste function. In particular, it has no 'sep' argument. So you can never get a space between the printed arguments and if you try to put in a non-space item, a single instance will appear after all the other arguments labeled as sep=" ".
paste0 is not a plotmath function and so will not get interpreted but rather will appear "unprocessed" with its unprocessed arguments inside parentheses.
Okay. I swear I didn't post this just to answer it myself, despite how quickly I got it (always the way when you ask a question!)
Here it is:
ggplot(dat, aes(x=x,y=y)) +
geom_point() +
labs(y=expression(Blah[1][d]))
Thought it best to post the answer rather than remove the question as it may help someone else one day.
'Blahs' aside, what I actually wanted was expression(paste("Hb", A[1][c]," (%)",sep=""))
Why paste0() doesn't work here is beyond me.
In ggplot2, how do I refer to a variable name with spaces?
Why do qplot() and ggplot() break when used on variable names with quotes?
For example, this works:
qplot(x,y,data=a)
But this does not:
qplot("x","y",data=a)
I ask because I often have data matrices with spaces in the name. Eg, "State Income". ggplot2 needs data frames; ok, I can convert. So I'd want to try something like:
qplot("State Income","State Ideology",data=as.data.frame(a.matrix))
That fails.
Whereas in base R graphics, I'd do:
plot(a.matrix[,"State Income"],a.matrix[,"State Ideology"])
Which would work.
Any ideas?
Answer: because 'x' and 'y' are considered a length-one character vector, not a variable name. Here you discover why it is not smart to use variable names with spaces in R. Or any other programming language for that matter.
To refer to variable names with spaces, you can use either hadleys solution
a.matrix <- matrix(rep(1:10,3),ncol=3)
colnames(a.matrix) <- c("a name","another name","a third name")
qplot(`a name`, `another name`,data=as.data.frame(a.matrix)) # backticks!
or the more formal
qplot(get('a name'), get('another name'),data=as.data.frame(a.matrix))
The latter can be used in constructs where you pass the name of a variable as a string in eg a loop construct :
for (i in c("another name","a third name")){
print(qplot(get(i),get("a name"),
data=as.data.frame(a.matrix),xlab=i,ylab="a name"))
Sys.sleep(5)
}
Still, the best solution is not to use variable names with spaces.
Using get is not more "formal", actually I would argue the opposite. As the R help says (help("`")), you can almost always use a variable name that contains spaces, provided it's quoted. (Normally, with a backtick, as already suggested.)
Something similar was asked on ggplot2 mailing list and Mehmet Gültaş linked to this post. Another way of using strings to construct your ggplot call is through the aes_strings function. Note that you still have to put backticks inside the quotes for the thing to work for variables with spaces.
library(ggplot2)
names(mtcars)[1] <- "em pi dzi"
ggplot(mtcars, aes_string(x = "cyl", y = "`em pi dzi`")) +
theme_bw() +
geom_jitter()