I am plotting means of grouped data and I'm having trouble getting the legends to be right. The text is so large that one can only see the names of two groups, not all four. I have spent a long time trying to use cex-like commands to change the size, but it doesn't work. I have tried rotating them with las=3, but it doesn't work.
I cannot share the data, but the code is here:
plot.question = function(number){
#which question to plot? get ID
question = names(sorted.by.n)[number]
#the formula
form = paste0("DF.scored.g.scale ~ ",question)
#fit it to data
fit = lm(form, DF.merged.g)
#get ANOVA results
fit.anova = anova(fit)
#get ANOVA p value
p.value = round(fit.anova[[5]][2],4) #p value
#plot it
plotmeans(as.formula(form), DF.merged.g,
ylab = "4 g-items sumscore",
xlab = "Answer",
main = paste0(questions.unique[question,"text"],"\nANOVA p=",p.value),
cex.main = .8,
cex.axis = .8,
cex.lab = .8,
cex.sub = .8,
las=3,) #size of main title
}
Preferably, I'd like to simply make the text smaller, so it can fit. Alternatively, I'd like to rotate it so it can fit (perhaps along with a margin change). If not what else?
One can suppress the legends with xaxt="n", but then one has to add them some other way. Can it really not be done within the plotmeans() function?
Well I tried many things and this was the only thing that worked. Apparently plotmeans() creates a plot that you cannot modify in any way. The only thing I was able to do is to overlay text as a new only-text-plot on top of the plotmeans plot.
myfactor <- factor(rep(c('cat1','cat2','cat3'),20)) #make a factor
mynum <- runif(60) #make a numeric field
plotmeans(mynum ~ myfactor,xaxt='n') #plot them
labs <- paste(names(table(myfactor)), "") #make the names
par(new=T) #create new plot
a<-rev(as.numeric(unique(myfactor))) #count the unique factors to make a vector of their numbers to serve as the positions on the x axis
text(cex=1, x=a, y=0.2, labs, xpd=TRUE, srt=35) #insert the text on the graph.
#here you need to modify y according to your data to find the best place to plot them.
#In my case x=c(1,2,3) because I have 3 categories and y=0.2
#because this is the lowest value of the y axis. The srt argument rotates the text.
You should probably be able to either fix the y axis to have standard values and then use the minimum of that number in the y argument of the text function to make a generic function, or calculate the min value of the y axis each time.
Hope that helps!
Related
This question already has an answer here:
Show element values in barplot
(1 answer)
Closed 4 years ago.
I have a barplot with grouped bars. Is it possible to include a label for each bar ? Example of plot without bar labels:
test <- structure(c(0.431031856834624, 0.54498742364355, 0.495317895592119,0.341002949852507, 0.40229990800368, 0.328769657724329,0.258600583090379,0.343181818181818, 0.260619469026549), .Dim = c(3L, 3L), .Dimnames = list(
c("2015", "2016", "2017"), c("a", "b", "c")))
barplot(test,ylim=c(0,1),beside=T)
p <- barplot(test, ylim=c(0, 1), beside=T)
text(p, test + .05*sign(test), labels=format(round(test, digits=2), nsmall=2))
The last line adds the labeling over the bar plots.
p takes the return values of the barplot() which are the x-axis bar positions.
In this example this is of the format 3x3 matrix.
text() needs then p for his x= argument. And for his y= argument it needs a slightly offsetted value than its bar plot heights (test). sign() determines the direction (above or below, +1 or -1) of the bar and .05 I determined empirically by trying, it is dependent on your values of the table.
So, x= and y= are the x and y coordinates for the labeling.
And finally, labels= determines which text should be printed.
The combination of format() and round() gives you full control over how many digits you want to display and that the display is absolutely regular in turns of number of digits displayed, which is not, if you use only round().
With xpd=T you could determine, whether labeling is allowed to go outside of region or not.
cex= could determine the fontsize of the label,
col= the colouring and font= the font.
alternatively, you can give just test for y= and determine via pos=3 that it should be above and offset=1 how many characterwidths the offset of the text shoul be.
p <- barplot(test, ylim=c(0, 1), beside=T)
text(x=p, y=test, pos=3, offset=1, labels=format(round(test, digits=2), nsmall=2))
You can find plenty of more instructions by looking into the documentation by
?text
# and
?barplot
in the R console
You can add a label using text function by extending your barplot. You can play with the parameters as you wish. Here is the sample code and its output.
x= barplot(test,ylim=c(0,1),beside=T)
text(x, test, labels=test, pos=1, offset=.5, col="red", srt = 90) #srt is used for vertical labels
If you really want to make a better plot, I would recommend ggplot as it has several other features like adding a theme to your plots and it is more easy for customizations.
If you're looking to label ever bar's category not it's value you can do something like this
allPermutations <- unlist(lapply(colnames(test), function(x) paste(x, rownames(test)) ))
barplot(test,ylim=c(0,1),beside=T, names.arg = allPermutations, las=2)
the file line gets all the combinations of categories. The plot call allows you to specify individual values with "names.arg" while las=2 rotates the names so it shows a bit nicer
I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:
How to change the axis length? for ex:
s <- data.table(school=rep(1:3,5), wave=c(rep(1,7), rep(2,8)), v1=rpois(15,10))
plot(s$wave,s$v2)
I get a scatter plot where the data is at the edges of the plot (a lot of white space in the graph). changing the xaxp values doesn't help (tried xaxp=c(-1, +2,4)) but nothing happened) and when I try to define it a factor I get a box plot. I know I can "squeeze" it when i save to .png but is there any other way?
I tried to upload pictures to convey the problem but I don't have enough reputation.
edit-thanks for whoever uploaded it (although the axis are reversed - wave is the x and V2 is the y). the thing is that there is a lot of "free space" between the 1st and the 2nd wave. the position is perfect when i define the wave a factor (it's centered and each factor is half the axis length) but it keeps giving me a box plot!
You can add a lot of values to your plot function, like colour, title, and also the limits of the axsis
Your code:
s <- data.frame(school=rep(1:3,5), wave=c(rep(1,7), rep(2,8)), v1=rpois(15,10))
plot(s$wave,s$v2)
And now just add some more:
plot(
x = s$wave,
y = s$v2,
col = "red",
main = "This is my title",
xlab = "the label of the x-axis",
ylab = "the label of the y-axis",
xlim = c(-5, 5), # the limits of the x-axis,
ylim = c(-4, 10) # the limits of the y-axis
)
You can add much more like size and type of the points ...
just as jlhoward mentioned
i found a function in the "lattice" package that does exactly what i want - a boxplot without the box.
the function is called stripplot.
http://www.math.ucla.edu/~anderson/rw1001/library/base/html/stripplot.html
thank you all for the help
I'm constructing a plot using bargraph.CI from sciplot. The x-axis represents a categorical variable, so the values of this variable are the names for the different positions on the x-axis. Unfortunately these names are long, so at default settings, some of them just disappear. I solved this problem by splitting them into multiple lines by injecting "\n" where needed. This basically worked, but because the names are now multi-line, they look too close to the x-axis. I need to move them farther away. How?
I know I can do this with mgp, but that affects the y-axis too.
I know I can set axisnames=FALSE in my call to barplot.CI, then use axis to create a separate x-axis. (In fact, I'm already doing that, but only to make the x-axis extend farther than it would by default- see my code below.) Then I could give the x-axis its own mgp parameter that would not affect the y-axis. But as far as I can tell, axis() is well set up for ordinal or continuous variables and doesn't seem to work great for categorical variables. After some fiddling, I couldn't get it to put the names in the right locations (i.e. right under their correspondence bars)
Finally, I tried using mgp.axis.labels from Hmisc to set ONLY the x-axis mgp, which is precisely what I want, but as far as I could tell it had no effect on anything.
Ideas? Here's my code.
ylim = c(0.5,0.8)
yticks = seq(ylim[1],ylim[2],0.1)
ylab = paste(100*yticks,"%",sep="")
bargraph.CI(
response = D$accuracy,
ylab = "% Accuracy on Test",
ylim = ylim,
x.factor = D$training,
xlab = "Training Condition",
axes = FALSE
)
axis(
side = 1,
pos = ylim[1],
at = c(0,7),
tick = TRUE,
labels = FALSE
)
axis(
side = 2,
tick = TRUE,
at = yticks,
labels = ylab,
las = 1
)
axis works fine with cateory but you should set the right ticks values and play with pos parameter for offset translation. Here I use xvals the return value of bargraph.CI to set àxis tick marks.
Here a reproducible example:
library(sciplot)
# I am using some sciplot data
dat <- ToothGrowth
### I create along labels
labels <- c('aaaaaaaaaa\naaaaaaaaaaa\nhhhhhhhhhhhhhhh',
'bbbbbbbbbb\nbbbbbbbbbbb\nhhhhhhhhhhhhhh',
'cccccccccc\nccccccccccc\ngdgdgdgdgd')
## I change factor labels
dat$dose <- factor(dat$dose,labels=labels)
ll <- bargraph.CI(x.factor = dose, response = len, data = dat,axisnames=FALSE)
## set at to xvals
axis(side=1,at=ll$xvals,labels=labels,pos=-2,tick=FALSE)
I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this: