Formatting numbers on a continuous axis in ggplot - r

I would like to format the numbers on a continuous axis in a ggplot graph. I would like to have a French formatting for large numbers with a space every three digit (ie "20 000" instead of "20000"). I know it is possible to do it using the format() function (for instance format(20000, scientific=FALSE, big.mark = " ")) but I don't know how to combine this function with ggplot. I can imagine that there is an option in the scale_y_continuous() but I was unable to find the solution by myself.
Here is my gist file.

french = function(x) format(x, big.mark = " ")
p + scale_y_continuous(labels=french)

Related

Plotting a sequence of strings using sprintf in R

I want to start using sprintf to plot a series of two strings in R for a title of a figure. Can anyone show me how to do it correctly? The values from HS and score should be plotted as characters behind the terms in quotes.
title = sprintf ("HS %s", as.character(HS), "Score %s", as.character(score))
With sprintf, we can multiple arguments as the usage is
sprintf(fmt, ...)
That implies, there would be a single fmt and any number of inputs
sprintf("%HS %s Score %s", as.character(HS), as.character(score))

Add thousand separator to levels in cut function

My x axis labels look like [100000,250000] which makes it hard to understand the numer at first sight, I want it to look like [100.000,250.000], I know that the cut2 function has a formatfun parameter but I think I don´t know how to use it properly.
Try using the "formatC" function on your cut data. e.g.
formatC(my_cuts, big.mark = ".", decimal.mark = ",")
Let's create an example to work on:
x <- cut(seq(0,1,length.out=8) + 1e6, 3)
This is a factor. Although at bottom it's a numeric array, you don't want to format its values; you want to format its levels, which are the strings associated with its values. This is what the levels look like in the example (calling head to prevent lots of printing in case x has many distinct levels):
(head(levels(x)))
[1] "(1000000,1000000.3]" "(1000000.3,1000000.7]" "(1000000.7,1000001]"
To format the levels, we need to pick them apart into their numeric components (which are separated by a comma ","), format each component, and reassemble the results.
Here's the picking-apart-and-formatting step in one go, using only base R functionality. It calls gsub and strsplit on the first line (for cleaning out the "(" and "]" characters and splitting each pair of numeric strings into two strings) and employs prettyNum on the second line (for the formatting), which conveniently will format any character string that looks like a number:
s <- lapply(strsplit(gsub("]|[(]", "", levels(x)), ","),
prettyNum, big.mark=".", decimal.mark=",", input.d.mark=".", preserve.width="individual")
(You might not need the input.d.mark argument, but I did because my locale uses "." for a decimal point, as you could see above. The docs say "individual" is the default for setting the output width, but that just isn't the case on my system: I had to specify it explicitly.)
The paste* functions will perform the reassembly, whose results we simply re-assign to the levels of x:
levels(x) <- paste0("(", sapply(s, function(a) paste0(a, collapse="; ")), "]")
(Since each number potentially already includes "," and "." delimiters, I have specified a third punctuation mark, ";", to separate the numbers themselves -- but you may use what you wish, of course.)
Let's display the new levels to verify the results:
(head(levels(x)))
[1] "(1.000.000; 1.000.000,3]" "(1.000.000,3; 1.000.000,7]" "(1.000.000,7; 1.000.001]"

bquote, parsing, expression to get multiple lines labels in ggplot with greek letters and variables as subscripts

Let's say I have
paste0("Year = ",index,"\nN = ",length((dfGBD %>% filter(year==index))[[vbl]]),
" Bandwidth = ",round(stats::bw.nrd(log((dfGBD %>% filter(year == index))[[vbl]])),2),
"\nSkewness:", round(e1071::skewness(log((dfGBD %>% filter(year==index))[[vbl]])), 2),
" Kurtosis:",round(e1071::kurtosis(log((dfGBD %>% filter(year==index))[[vbl]])),2),
"\nmu[",vbl,"] = ", round(mean((dfGBD %>% filter(year==index))[[vbl]]),2),
" sigma[",vbl,"] = ",round(sd((dfGBD %>% filter(year==index))[[vbl]]),2)
)
inside a sapply through index years. Further, vbl is a string with the name of a variable. The sapply produces a vector of labels for a factor variable.
Applying ggplot I obtain labels similar to the next:
Year = 2000
N = 195 Bandwidth = 0.09
Skewness: 0 Kurtosis: -0.56
mu[Mortality] = 7750.85 sigma[Mortality] = 1803.28
Till here, all ok. I have already written mu[vbl], sigma[vbl] thinking in parsing and subscript notation to get the greek letters with the name of the variable saved in vbl as subscript.
First I tried facet_wrap with option labeller = "label_parsed". I obtained an error which I only solved writting the string between backticks ``, but then \n has no effect. I tried many options using bquote and/or parse and/or expression and/or atop etc. in order to get this multiple lines result with the desired output I described above. But only get or one line or very ugly outputs or, mostly, errors, and I couldn't see yet the greek letters.
So what/how should I do?
PS: as answered in other stackoverflow's, \n does not work in this context, so a list with bquote's for each line is suggested. I tried it, but then I got an error that I think is due to incompatibility of number of elements of all the lists and number of labels of a factor (a label may not be a list?).
Thank you!

Word wrap in ggmap labels?

I have some very long labels for points on a map (Avg 35 chars per label).
Is there a "word wrap" of sorts in ggmap annotate (or other labeling function) so the lines are not stretched across the map? I'd like to either limit by number of chars or force a linebreak at a space.
Thanks!
Was able to find a solution here by replacing spaces with the linebreak character "\n"
StackedList <- gsub(" ", "\n", UnStackedList)

R: How do I write "≥2: n=nrow(x)" in plot legend?

I am doing boxplots and have problems with the legend. Specifically, I want to write "≥2: n=formatC(nrow(x))" but can not combine the commands for the ≥ symbol, the function that calculates nrow(x) and formatC(nrow(x), bigmark=",") that should give the nrow number with a thousand separator.
What I tried so far:
smoke <- matrix(c(1:1200),ncol=1,byrow=TRUE)
colnames(smoke) <- c("High")
smoke <- as.table(smoke)
pdf('test.pdf')
plot(NA,xlim=c(0,100),ylim=c(0,100))
legend(10,70,bquote(paste(NA>=2, ": n=", .(formatC(nrow(smoke)), big.mark=","))))
dev.off()
which gives: ≥ 2: n=1200
I would like to have: ≥2: n=1,200
It seems that formatC does not work under bquote and I would also like to remove the space after the ≥ symbol.
I also tried:
legend(x,y, legend=c(expression(NA>=2), paste(": n=", formatC(nrow(smoke)), sep="")))
which gives the legend in two lines:
≥ 2
: n=1200
Putting paste before expression gives one line but does not convert the >= to ≥.
I am exporting the graph as pdf, which currently works for the ≥ symbol. I would prefer to keep that. Unicode does not work with pdf in my hands.
Thanks in advance,
Philipp
You have a ) in the wrong place right after smoke, so it takes the big.mark argument as part of paste and not formatC. Try this:
legend(10,70,bquote(paste(NA>=2, ": n=", .(formatC(nrow(smoke), big.mark=",")))))

Resources