Pyramind Plot using ggplot2 in R - r

I need help in removing/converting the negative values in the x-axis of the pyramid plot.
I was able to build the pyramid through https://walker-data.com/census-r/exploring-us-census-data-with-visualization.html subheading 4.5.2. I keep getting an error saying function not found: number_format.
I think the scale_x_continuous is where I'd be able to change the - sign but returns an error everytime

So from the context of the plot, I am assuming that you don't want to convert the values themselves, but rather convert the x-axis so that the male and female sides of the pyramid are mirrored?
In which case, you can use abs. The function abs(x) computes the absolute value of x, which would remove the negative sign from your data's values.
Without seeing a reproducible example of your code (which should be included when you ask questions like this on Stack Overflow, see the package {reprex} for help with this), it's a little difficult to be sure exactly what you need to change to make the code work, but I think you should be on the right track with scale_x_continuous.
With regard to the error that you're receiving, it suggests that you haven't imported the library for that function, {scales}, as suggested by stefan in the comments (and as suggested, scales:::label_number has superseded scales:::number_format, so you should use the former).
If you're using the scale_x_continuous code from the second plot in Section 4.5.2 of the link you have shared:
utah_pyramid +
scale_x_continuous(
labels = ~ number_format(scale = .001, suffix = "k")(abs(.x)),
limits = 140000 * c(-1, 1)
)
The number_format function isn't the part of the code that is producing the absolute values, it is converting the scale of the values to thousands. It is the abs(.x) part that is removing the negative sign.

Related

plot function type=ā€œnā€ is ignored for plot(y~x)?

I am trying to plot a graph of certain values against time using the plot function.
I am simply trying to change the representation of the dots, using the pch= function. However R is simply ignoring me! I have also tried removing the dots so that I can place labels instead, but when I type in type="n" it ignores that too!
I am using the exact same format of code that I have used for other plots but this time it just isn't cooperating.
If I specify other features such as the title or the x/y axis labels, it will add those in but it simply ignores the pch or type commands.
This is my basic code:
plot(Differences ~ Time, data=subsetH)
But if I run
plot(Differences ~ Time, type="n", data=subsetH)
or
plot(Differences ~ Time, pch=2, data=subsetH)
it keeps plotting the same thing.
Is there something obvious I have missed?
I just came across your question because I encountered the same thing - creating an empty plot did not work, as type='n' was always ignored (as well as other type specifications).
With the help of this entry: Plotting time-series with Date labels on x-axis
I realized that my date format needed to be assigned as "date" class (as.Date()).
I know your entry dates back a little bit already, but maybe it's still useful.

R vegan envfit labels do not move with arrow.mul?

I am making an NMDS ordination plot in R with environmental vectors using envfit, and I have missing values in some of my environmental variables. In order not to exclude all rows from all variables containing a missing value in one variable (as with na.rm=TRUE), I have run envfit on each variable separately, but now would like to plot the results on the same plot. Thus I need to use the arrow.mul argument to make the scale of the arrows comparable to one another. And here I run into my problem: the location of the labels for the vectors do not seem to listen to arrow.mul, and labels stay in their original position, floating in space.
If someone could tell me how to get the labels to move to the new arrow point location, or suggest a workaround for the missing values, that would be fantastic. I hope I am not missing something obvious!
Here is a reproducible example of stubborn labels using the dune data from vegan:
data(dune)
data(dune.env)
de2<-dune.env[,c(1,2)]
ord<-metaMDS(dune,k=2)
plot(ord)
de2$Moisture<-as.numeric(de2$Moisture) #pretend this is numeric
ev<-envfit(ord,de2,choices=c(1,2))
ordiplot(ord,type="n")
points(ord,pch=16,col=c(1:4)[dune.env$Management])
plot(ev,arrow.mul=1,labels=list(vectors=c("A1","Moisture")),
col="black",cex=0.9,bg="white")#floating labels
plot(ev,labels=list(vectors=c("A1","Moisture")),
col="black",cex=0.9,bg="white")#labels are fine with no arrow.mul
plot(ev,arrow.mul=1) #default labels behave the same and float with arrow.mul
(yes I know I should learn to use ggplot, but would prefer not to just now, just for this ...)
This indeed is a bug in vegan. I'll fix it in github. Meanwhile, you can use plot(ev, arrow.mul = 1, rescale = FALSE) to have floating text. This will give you a warning, but it is harmless, and will work. Meanwhile I'll fix this in vegan.

Automated labeling of logarithmic plots

I would like to automate the graph axis values for a series of plots in Stata 13.
In particular, I would like to show axis labels like 10^-1, 0, 10^1, 10^2 etc. by feeding the axis options macros.
The solution in the following blog post gives a good starting point:
Labeling logarithmic axes in Stata graphs
I could also also produce nicer labels by using "10{sup:`x'}".
However, I cannot find the solution to the following additional elements:
the range of the axis labels would run from 10^-10 up to 10^10. Moreover, my baseline is ln, so log values are 2.3, 4.6, etc. In particular, the line below which only takes integers as input:
label define expon `vallabel', replace
I would like to force the range of the axis values by graph (e.g. a particular axis runs from 10^-2 to 10^5). I understand that range() only extends axes, but does not allow to trim them.
Any ideas on either or both of the above?
This is a very straightforward output in R or Python, even standard without many additional arguments, but unfortunately not so in Stata.
Q1. You should check out niceloglabels from SSC as announced in this thread. niceloglabels together with other tricks and devices in this territory will be discussed in a column expected to appear in Stata Journal 18(1) in the first quarter of 2018.
Value labels are limited to association with integers but that does not bite here. All you need focus on is the text which is to be displayed as axis labels at designated points on either axis; such points can be specified using any numeric value within the axis range.
Your specific problem appears to be that one of your variables is a natural logarithm but you wish to label axes in terms of powers of 10. Conversion to a variable containing logarithms to base 10 is surely easy, but another program mylabels (SSC) can help here. This is a self-contained example.
* ssc inst mylabels
sysuse auto, clear
set scheme s1color
gen lnprice = ln(price)
mylabels 4000 8000 16000, myscale(ln(#)) local(yla)
gen lnweight = ln(weight)
mylabels 2 3 4, myscale(ln(1000*#)) suffix(" x 10{sup:3}") local(xla)
scatter lnprice lnweight, yla(`yla') xla(`xla') ms(Oh) ytitle(Price (USD)) xtitle(Weight (lb))
I have used different styles for the two axes just to show what is possible. On other grounds it is usually preferable to be consistent about style.
Broadly speaking, the use of niceloglabels is often simpler as it boils down to specifying xscale(log) or yscale(log) with the labels you want to see. niceloglabels also looks at a variable range or a specified minimum and maximum to suggest what labels might be used.
Q2. range() is an option with twoway function that allows extension of the x axis range. For most graph commands, the pertinent options are xscale() or yscale(), which again extend axis ranges if suitably specified. None of these options will omit data as a side-effect or reduce axis ranges as compared with what axis label options imply. If you want to omit data you need to use if or in or clone your variables and replace values you don't want to show with missing values in the clone.
FWIW, superscripts look much better to me than ^ for powers.
I have finally found a clunky but working solution.
The trick is to first generate 2 locals: one to evaluate the axis value, another to denote the axis label. Then combine both into a new local.
Somehow I need to do this separately for positive and negative values.
I'm sure this can be improved...
// define macros
forvalues i = 0(1)10 {
local a`i' = `i'*2.3
local b`i' `" "10{sup:`i'}" "'
local l`i' `a`i'' `"`b`i''"'
}
forvalues i = 1(1)10 {
local am`i' = `i'*-2.3
local bm`i' `" "10{sup:-`i'}" "'
local lm`i' `am`i'' `"`bm`i''"'
}
// graph
hist lnx, ///
xl(`lm4' `lm3' `lm2' `lm1' `l0' `l1' `l2' `l3' `l4')

ggplot2 geom_violin with 0 variance

I started to really like violin plots, since they give me a much better feel that box plots when you have funny distributions. I like to automatize a lot of stuff, and thus ran into a problem:
When one variable has 0 variance, the boxplot just gives you a line at that point. Geom_violin however, terminates with an error. What behavior would I like? Well, either put in a line or nothing, but please give me the distributions for the other variables.
Ok, quick example:
dff=data.frame(x=factor(rep(1:2,each=100)),y=c(rnorm(100),rep(0,100)))
ggplot(dff,aes(x=x,y=y)) + geom_violin()
yields
Error in `$<-.data.frame`(`*tmp*`, "n", value = 100L) :
replacement has 1 row, data has 0
However, what works is:
ggplot(dff,aes(x=x,y=y)) + geom_boxplot()
Update:
The issue is resolved as of yesterday: https://github.com/hadley/ggplot2/issues/972
Update 2:
(from question author)
Wow, Hadley himself responded! geom_violin now behaves consistently with geom_density and base R density.
However, I don't think the behavior is optimal yet.
(1) The 'zero' problem
Just run it with my original example:
dff=data.frame(x=factor(rep(1:2, each=100)), y=c(rnorm(100), rep(0,100)))
ggplot(dff,aes(x=x,y=y)) + geom_violin(trim=FALSE)
Yielding this:
Is the plot on the right an appropriate representation of 'all zeroes'? I don't think so. It is better to have trimming that produces a single line to show that there is no variation in the data.
Workaround solution: Add a + geom_boxplot()
(2) I may actually want TRIM=TRUE.
Example:
dff=data.frame(x=factor(rep(1:2, each=100)), y=c(rgamma(100,1,1), rep(0,100) ))
ggplot(dff,aes(x=x,y=y)) + geom_violin(trim=FALSE)
Now I have non-zero data, and standard kernel density estimates don't handle this correctly. With trim=T I can quickly see that the data is strictly positive.
I am not arguing that the current behavior is 'wrong', since it's in line with other functions. However, geom_violin may be used in different contexts, for exploring different data.frames with heterogeneous data types (positive+skewed or not, for instance).
Three options for dealing with this until the ggplot2 issue is resolved:
As a quick hack, you can set one of the y-values to 0.0001 (instead of zero) and geom_violin will work.
Check out the vioplot package if you're not set on using ggplot2. vioplot doesn't throw an error when you feed it a bunch of identical values.
The Hmisc package includes a panel.bpplot (box-percentile plot) function that can create violin plots with the bwplot function from the lattice package. See the Examples section of ?panel.bpplot. It produces a single line when you feed it a vector of identical values.

equivalent to MatLab "bar" function in R?

Is there an function in R that does the same job as Matlab's "bar" function?
R does have a "barplot" function in the library graphics, however, it is not the same.
The Matlab bar(X,Y) (verbatim excerpt from MATLAB documentation) "draws a bar for each element in Y at locations specified in X, where X is a vector defining the x-axis intervals for the vertical bars." (emphasis mine)
However, the R barplot function does not allow one to specify locations.
Perhaps there is a method in ggplot2 that supports this? I am only able to find standard bar charts in ggplot2.
No, barplot is not the same as bar, but you should read the whole help. You can do many things to position the bars. The first is simply their order in Y. You could insert spaces if you wish (additional 0s). If you have X and Y then sort Y on X (Y[order(X)]) and plot it. If you need to change positions use the "space" and "width" arguments. It's not as straightforward as specifying X values I suppose but it's definitely more useful in most situations. Generally what you want to adjust is widths of bars and spaces between bars. Their position on the X-axis should be arbitrary. If the position on the X-axis is really meaningful then you should be using line plots, not bar graphs.
In R:
barplot(rbind(1:10, 2:11), beside=T, names.arg=1:10)
In MATLAB:
>> bar(1:10, [(1:10)' (2:11)'])
Read up on par . Then observe, for example:
x<-c(1,2,4,5,6)
y<-c(3,4,3,4,2)
plot(x,y,type='h',lwd=6)
Edit: yes, I know this doesn't (yet) plot multiple data sets, but I would hope you can see simple ways to make that happen, with spacings, colors, etc. specified to your exact liking :-)
Sounds vaguely like the R stepfun. On the other hand one would need to know what "draws a bar" means before saying it is not the same as barplot(..., horiz=TRUE) One would, of course, need to examine some more detailed evidence such as data and plots before arriving at a conclusion, however. #John Colby should be congratulated for adding some specificity to the discussion. The axis function is probably what Quant Guy needs education regarding.

Resources