ggplot: plot title and plot overlap each other - r

I am a newbie to R and hence having some problems in plotting using ggplot and hence need help.
In the above diagram, if any of my bars have high values (in this case, a green one with value of 447), the plot and the plot title gets overlapped. The values here are normalised / scaled such that the y-axis values are always between 0-100, though the label might indicate a different number (this is the actual count of occurrences, where as the scaling is done based on percentages).
I would like to know how to avoid the overlap of the plot with the plot title, in all cases, where the bar heights are very close to 100.
The ggplot function I am using is as below.
my_plot<-ggplot(data_frame,
aes(x=as.factor(X_VAR),y=GROUP_VALUE,fill=GROUP_VAR)) +
geom_bar(stat="identity",position="dodge") +
geom_text(aes(label = BAR_COUNT, y=GROUP_VALUE, ymax=GROUP_VALUE, vjust = -1), position=position_dodge(width=1), size = 4) +
theme(axis.text.y=element_blank(),axis.text.x=element_text(size=12),legend.position = "right",legend.title=element_blank()) + ylab("Y-axis label") +
scale_fill_discrete(breaks=c("GRP_PERCENTAGE", "NORMALIZED_COUNT"),
labels=c("Percentage", "Count of Jobs")) +
ggtitle("Distribution based on Text Analysis 2nd Level Sub-Category") +
theme(plot.title = element_text(lineheight=1, face="bold"))
Here is the ggsave command, in case if that is creating the problem, with dpi, height and width values.
ggsave(my_plot,file=paste(paste(variable_name,"my_plot",sep="_"),".png",sep = ""),dpi=72, height=6.75,width=9)
Can anyone please suggest what need to be done to get this right?
Many Thanks

As Axeman suggests ylim is useful Have a look at the documentation here:
http://docs.ggplot2.org/0.9.3/xylim.html
In your code:
my_plot + ylim(0,110)
Also, I find this intro to axis quite useful:
http://www.cookbook-r.com/Graphs/Axes_(ggplot2)/
Good luck!

Related

Is there a R function that makes y-axis percentage variable dependent instead of total observation dependent

First post here, so please tell me if im doing something wrong!
I'm just starting out learning "r" and am currently partaking in a "titanic kaggle assignment" for one of the correlations im using barplots to show ( this person survived due to... )
library(ggplot2)
ggplot(data=train, aes(x=Pclass, fill=Survived)) +
ggtitle("Class distribution of passengers")+
scale_y_continuous(labels = scales::percent)+
theme(plot.title = element_text(hjust = 0.5))+ #center title
labs(y= "Count", x = "Class")+ #naming X and Y axis
geom_bar(position = "stack") # bars in plot arent stacked but side by side
^ is the code i am using where i make the Y-axis percentage based instead of a total numeric value.
I noted that the % value is based out of total data observations, and wonder if i can make the Y axis in essence 100% for each bar, and their % distribution only dependent on the class ( 1 , 2 , 3 ).
so in essense it would become something like this ( pardon my artistic skills ):
Thanks in advance for the help! if you got any forum posting tips please do tell me aswell to make it a bit more readable in the future.
The help page of geom_bar says:
By default, multiple bars occupying the same x position will be stacked atop one another by position_stack(). If you want them to be dodged side-to-side, use position_dodge() or position_dodge2(). Finally, position_fill() shows relative proportions at each x by stacking the bars and then standardising each bar to have the same height.
In other words, you need to use:
geom_bar(position = position_fill()) or equivalently geom_bar(position = "fill")
The advantage of using position_fill() rather than "fill" is that you can pass arguments if you need to tweak the position (which is quite uncommon). See ?position_fill.

Vertical dodge in geom_jitter

I have an ethogram-like ggplot where I plot the value of a factor quadrant (1 to 4), which is plotted for each frame of a movie (frameID). The color is given by 3 animals that are being tracked.
I am fairly satisfied with the graph but the amount of points makes it difficult to see, even with alpha. I was wondering how to add position_dodge in a way that doesn't destroy the plot.
ggplot(dataframe) ,
aes(frameID, quadrant, color=animal)) +
geom_jitter(alpha=0.5) +
scale_color_manual(values = c("#1334C1","#84F619", "#F43900")) +
theme_classic()+
theme(legend.position = 'none')
This link has useful info about dodging using geom_point.
R: How to spread (jitter) points with respect to the x axis?
I can change to geom_point with height, which works but it produces something awful.
+ geom_point(position = position_jitter(w = 0, h = 2))
Update
Data lives in GitHub
Lowering the alpha or changing size helps, adds trouble when rescaling the image.
https://github.com/matiasandina/MLA2_Tracking/blob/master/demo_data/sample_data.csv
Update 2022
It's been a while since I posted this initially, my original thoughts changed and are better reflected here, but I am still looking for a ggplot2 version of doing this!

Extend X axis interval ggplot2

I am trying to plot data with lot's of X axis values. I am trying to not overlap my point with geom_point. I found lot's of discussions about "scale_x_continuous", "position = jitter or dodge" etc... and every time my problem is remaining because I need to keep my point aligned. Moreover, "scale_size_area" does not make it good.
EDIT: Generated data already melted at the end of the post.
I can not post image (Link to image), but to give the idea: I have 6 levels in my Y axis, and 400 levels in X axis. My points (shape = 1 = circle) are Y-levels aligned, and have different diameters depending on the value.
This is ok, but circles are overlapping.
plot <- ggplot(data, aes(x_variable_400_levels, y_variable_6_levels)) +
# value*100 because values are between 0 and 1 to have bigger circles
geom_point(shape = 1, size = data$value*100) +
# theme description
theme(
plot.title = element_text(lineheight=.8, face="bold", vjust=1),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=0.3)
)
So, my question is: Can I modify the interval between two values of the X axis in order to avoid the overlapping between circles? Jitter is not interesting here because the noise does not allow a good visualisation of data, including that when I tried to had only HORIZONTAL noise.
Any kind of solution, links or other tutorial to solve it will be appreciated.
EDIT : Generated data. Import with read.table, sep = "," and header = T. The point is that, I have very little circles and they are important too.
data <- read.table(text='"trf","sample","value"
36,"S1",0.143882104
38,"S1",0.025971979
47,"S1",0.016711593
56,"S1",0.027896069
67,"S1",0.025870577
93,"S1",0.07638307
100,"S1",0.022905895
102,"S1",0.019192547
104,"S1",0.018258923
107,"S1",0.005032219
114,"S1",0.028297368
123,"S1",0.007874848
131,"S1",0.024184004
36,"S2",0.115123666
38,"S2",0
47,"S2",0.00479275
56,"S2",0.029523128
67,"S2",0.030133055
93,"S2",0.044749246
100,"S2",0.032865979
102,"S2",0
104,"S2",0
107,"S2",0.013160255
114,"S2",0.052047248
123,"S2",0.007632445
131,"S2",0
36,"S3",0.179332128
38,"S3",0.046215267
47,"S3",0
56,"S3",0.070791832
67,"S3",0.050214857
93,"S3",0.074108014
100,"S3",0
102,"S3",0
104,"S3",0
107,"S3",0
114,"S3",0.081441849
123,"S3",0
131,"S3",0.100090456', header=T,sep=",")
I don't think changing the interval is the solution, as your x-axis is numeric. It would be more difficult to interpret if the space between for instance 1 and 2 is larger that the space between 9 and 10. And if you would change all intervals to the largest circle, the plot would be too wide. I also imagine it would be very cluttered if you have more data, which makes it harder to see patterns. Maybe a (faceted) barplot is the solution? Allows for horizontal and vertical comparison, small values are visible and values are easily extracted and compared. Here's a start:
p2 <- ggplot(data, aes(x=trf, y=value))+
geom_bar(stat="identity") +
facet_grid(sample~.) +
xlim(c(0,150)) + theme_bw()

Obtaining Percent Scales Reflective of Individual Facets with ggplot2

So I managed to get this far...
ggplot(init, aes(x=factor(ANGLE), fill=NETWORK)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
facet_wrap(~SHOW) + opts(legend.position = "top") +
scale_y_continuous(labels = percent_format())
My problem is that the colored bars below represent the percent of ALL the camera ANGLE measurements for all the television programs in my study. For instance, the OREILLY graph has a bar that approaches 15% for ANGLE 2, which is %15 for all the ANGLE measurements in the chart, not solely those in the OREILLY facet. What I want each graph to show is the percentage of counts relative to just ONE television show (just that one facet), rather than all of them.
The idea is to compare the proportional use of camera angles among different shows, but with the way the graph is now, it is skewed to make the shows with more camera angle changes look as though they spend way more time at camera angle 2 than they actually do relative to the others.
The frustrating part of it all is that I spent an hour getting this to look the way I wanted, then I made the mistake of updating R. The packages updated along with it, and this happened.
A reduced size data table is available here.
EDIT: This doesn't work either. I tried putting "group=NETWORK" in either (and both) of the aes(..., ) terms, but nothing changed. I also tried the same thing with "group=SHOW", which I thought might have more of a chance since I wanted to get just one percentages for one SHOW in each facet (hence, the scales for each facet should go up to about 80% since so many of the shows are predominantly camera angle 2). Am I missing something?
ggplot(init, aes(x=factor(ANGLE), fill=NETWORK), group=SHOW)
+ geom_bar(aes(y = (..count..)/sum(..count..), group=NETWORK)) +
+ facet_wrap(~SHOW) + opts(legend.position = "top") +
+ scale_y_continuous(labels = percent_format())
Using the ..density.. stat rather than ..count.. seems to work for me:
ggplot(dat, aes(x=factor(ANGLE))) +
geom_bar(aes(y = ..density..,group = SHOW,fill = NETWORK)) +
facet_wrap(~SHOW) +
opts(legend.position = "top") +
scale_y_continuous(labels = percent_format())
At least, this produces a different result, I can't say for sure it reflects what you want. Additionally, I'm not sure why the ..count.. stat was behaving that way.
this is no longer working in newer versions of ggplot. The way to do it is now + stat_count(aes(y=..prop..))

Fix for overflowing x-axis text in ggplot2

I've created custom, two level x-axis entries that tend to work pretty well. The only problem is that when my y-axis, proportion, is close to one, these axis entries spill onto the chart area. When I use vjust to manually alter their vertical position, part of each entry is hidden by the chart boundary.
Any suggestions for how to make chart boundaries that dynamically adjust to accommodate large y-axis values and the full text of each entry (without running on to the chart).
Have a look at the following example:
library(ggplot2)
GroupType <- rep(c("American","European"),2)
Treatment <- c(rep("Smurf",2),rep("OompaLoompa",2))
Proportion <- rep(1,length(GroupType))
PopulationTotal <- rep(2,length(GroupType))
sampleData <- as.data.frame(cbind(GroupType,Treatment,Proportion,PopulationTotal))
hist_cut <- ggplot(sampleData, aes(x=GroupType, y=Proportion, fill=Treatment, stat="identity"))
chartCall<-expression(print(hist_cut + geom_bar(position="dodge") + scale_x_discrete(breaks = NA) +
geom_text(aes(label = paste(as.character(GroupType),"\n[N=",PopulationTotal,"]",sep=""),y=-0.02),size=4) + labs(x="",y="",fill="")
))
dev.new(width = 860, height = 450)
eval(chartCall)
Any thoughts about how I can fix the sloppy x-axis text?
Many thanks in advance,
Aaron
Unfortunately you have to manage the y axis yourself - there's currently no way for ggplot2 to figure out how much extra space you need because the physical space required depends on the size of the plot. Use, e.g., expand_limits(y = -0.1) to budget a little extra space for the text.

Resources