Vertical dodge in geom_jitter - r

I have an ethogram-like ggplot where I plot the value of a factor quadrant (1 to 4), which is plotted for each frame of a movie (frameID). The color is given by 3 animals that are being tracked.
I am fairly satisfied with the graph but the amount of points makes it difficult to see, even with alpha. I was wondering how to add position_dodge in a way that doesn't destroy the plot.
ggplot(dataframe) ,
aes(frameID, quadrant, color=animal)) +
geom_jitter(alpha=0.5) +
scale_color_manual(values = c("#1334C1","#84F619", "#F43900")) +
theme_classic()+
theme(legend.position = 'none')
This link has useful info about dodging using geom_point.
R: How to spread (jitter) points with respect to the x axis?
I can change to geom_point with height, which works but it produces something awful.
+ geom_point(position = position_jitter(w = 0, h = 2))
Update
Data lives in GitHub
Lowering the alpha or changing size helps, adds trouble when rescaling the image.
https://github.com/matiasandina/MLA2_Tracking/blob/master/demo_data/sample_data.csv
Update 2022
It's been a while since I posted this initially, my original thoughts changed and are better reflected here, but I am still looking for a ggplot2 version of doing this!

Related

r ggplot fill the background of selected facets [duplicate]

I'm using the "tips" data set in ggplot2. If I do
sp = ggplot(tips,aes(x=total_bill, y = tip/total_bill)) +
geom_point(shape=1) +
facet_grid(sex ~ day)
The plot comes out fine. But I now want to change the panel background for just the plots under "Fri". Is there a way to do this?
Even better, can I conditionally change colors by passing parameters? For example if more than 3 points are below 0.1, then change panel background (for just that panel) to a certain color while all others remain the default light grey?
The general rule for doing anything in ggplot2 is to,
Create a data frame that encodes the information you want to plot
Pass that data frame to a geom
This is made a bit more complicated in this case because of the particular aspect of the plot you want to alter. The Powers That Be designed ggplot2 in a way that separates data elements of the plot (i.e. geom's) from non-data elements (i.e. theme's), and it so happens that the plot background falls under the "non-data" category.
There is always the option of modifying the underlying grid object manually but this is tedious and the details may change with different versions of ggplot2. Instead, we'll employ the "hack" that Hadley refers to in this question.
#Create a data frame with the faceting variables
# and some dummy data (that will be overwritten)
tp <- unique(tips[,c('sex','day')])
tp$total_bill <- tp$tip <- 1
#Just Fri
ggplot(tips,aes(x=total_bill, y = tip/total_bill)) +
geom_rect(data = subset(tp,day == 'Fri'),aes(fill = day),xmin = -Inf,xmax = Inf,
ymin = -Inf,ymax = Inf,alpha = 0.3) +
geom_point(shape=1) +
facet_grid(sex ~ day)
#Each panel
ggplot(tips,aes(x=total_bill, y = tip/total_bill)) +
geom_rect(data = tp,aes(fill = day),xmin = -Inf,xmax = Inf,
ymin = -Inf,ymax = Inf,alpha = 0.3) +
geom_point(shape=1) +
facet_grid(sex ~ day)
I cannot comment yet.. so here is an additional answer to joran his answer.
If you are having trouble with the transparency setting, like setting alpha = 0.2 but not noticing any difference, it might be because of the data that you give to ggplot.
"Thanks for clarifying your question. This was puzzling to me, so I went to google, and ended up learning something new (after working around some vagaries in their examples). Apparently what you are doing is drawing many rectangles on top of each other, effectively nullifying the semi-transparency you want. So, the only ways to overcome this are to hard-code the rectangle coordinates in a separate df"
This answer comes from
geom_rect and alpha - does this work with hard coded values?

color cells of facet_grid according to cluster [duplicate]

I'm using the "tips" data set in ggplot2. If I do
sp = ggplot(tips,aes(x=total_bill, y = tip/total_bill)) +
geom_point(shape=1) +
facet_grid(sex ~ day)
The plot comes out fine. But I now want to change the panel background for just the plots under "Fri". Is there a way to do this?
Even better, can I conditionally change colors by passing parameters? For example if more than 3 points are below 0.1, then change panel background (for just that panel) to a certain color while all others remain the default light grey?
The general rule for doing anything in ggplot2 is to,
Create a data frame that encodes the information you want to plot
Pass that data frame to a geom
This is made a bit more complicated in this case because of the particular aspect of the plot you want to alter. The Powers That Be designed ggplot2 in a way that separates data elements of the plot (i.e. geom's) from non-data elements (i.e. theme's), and it so happens that the plot background falls under the "non-data" category.
There is always the option of modifying the underlying grid object manually but this is tedious and the details may change with different versions of ggplot2. Instead, we'll employ the "hack" that Hadley refers to in this question.
#Create a data frame with the faceting variables
# and some dummy data (that will be overwritten)
tp <- unique(tips[,c('sex','day')])
tp$total_bill <- tp$tip <- 1
#Just Fri
ggplot(tips,aes(x=total_bill, y = tip/total_bill)) +
geom_rect(data = subset(tp,day == 'Fri'),aes(fill = day),xmin = -Inf,xmax = Inf,
ymin = -Inf,ymax = Inf,alpha = 0.3) +
geom_point(shape=1) +
facet_grid(sex ~ day)
#Each panel
ggplot(tips,aes(x=total_bill, y = tip/total_bill)) +
geom_rect(data = tp,aes(fill = day),xmin = -Inf,xmax = Inf,
ymin = -Inf,ymax = Inf,alpha = 0.3) +
geom_point(shape=1) +
facet_grid(sex ~ day)
I cannot comment yet.. so here is an additional answer to joran his answer.
If you are having trouble with the transparency setting, like setting alpha = 0.2 but not noticing any difference, it might be because of the data that you give to ggplot.
"Thanks for clarifying your question. This was puzzling to me, so I went to google, and ended up learning something new (after working around some vagaries in their examples). Apparently what you are doing is drawing many rectangles on top of each other, effectively nullifying the semi-transparency you want. So, the only ways to overcome this are to hard-code the rectangle coordinates in a separate df"
This answer comes from
geom_rect and alpha - does this work with hard coded values?

ggplot: plot title and plot overlap each other

I am a newbie to R and hence having some problems in plotting using ggplot and hence need help.
In the above diagram, if any of my bars have high values (in this case, a green one with value of 447), the plot and the plot title gets overlapped. The values here are normalised / scaled such that the y-axis values are always between 0-100, though the label might indicate a different number (this is the actual count of occurrences, where as the scaling is done based on percentages).
I would like to know how to avoid the overlap of the plot with the plot title, in all cases, where the bar heights are very close to 100.
The ggplot function I am using is as below.
my_plot<-ggplot(data_frame,
aes(x=as.factor(X_VAR),y=GROUP_VALUE,fill=GROUP_VAR)) +
geom_bar(stat="identity",position="dodge") +
geom_text(aes(label = BAR_COUNT, y=GROUP_VALUE, ymax=GROUP_VALUE, vjust = -1), position=position_dodge(width=1), size = 4) +
theme(axis.text.y=element_blank(),axis.text.x=element_text(size=12),legend.position = "right",legend.title=element_blank()) + ylab("Y-axis label") +
scale_fill_discrete(breaks=c("GRP_PERCENTAGE", "NORMALIZED_COUNT"),
labels=c("Percentage", "Count of Jobs")) +
ggtitle("Distribution based on Text Analysis 2nd Level Sub-Category") +
theme(plot.title = element_text(lineheight=1, face="bold"))
Here is the ggsave command, in case if that is creating the problem, with dpi, height and width values.
ggsave(my_plot,file=paste(paste(variable_name,"my_plot",sep="_"),".png",sep = ""),dpi=72, height=6.75,width=9)
Can anyone please suggest what need to be done to get this right?
Many Thanks
As Axeman suggests ylim is useful Have a look at the documentation here:
http://docs.ggplot2.org/0.9.3/xylim.html
In your code:
my_plot + ylim(0,110)
Also, I find this intro to axis quite useful:
http://www.cookbook-r.com/Graphs/Axes_(ggplot2)/
Good luck!

re-sizing ggplot geom_dotplot

I'm having trouble creating a figure with ggplot2. I am using geom_dotplot with center stacking to display my data which are discrete values for 4 categories.
For aesthetic reasons I want to customize the positions of the dots so that
reduce the empty space between dots along the y axis, (ie the dots are 1 value large)
The distributions fit and don't overlap
I've adjusted the bin and dotsize to achieve aesthetic goal 1, but that requires me to fiddle with the ylim() parameter to make sure that the groups fit in the plot. This results in a plot with more whitw space and few numbers on the y axis.
Question: Can anyone explain a way to resize the empty space on this plot?
My code is below:.
plot <- ggplot(figdata, aes(y=Counts, x=category, col=strain)) +
geom_dotplot(aes(fill=strain), dotsize=1, binwidth=.7,
binaxis= "y",stackdir ="centerwhole", stackratio=.7) +
ylim(18,59)
plot + scale_color_manual(values=c("#E69F00", "#56B4E9")) +
geom_errorbar(stat="hline", yintercept="mean",
aes( ymax=..y..,ymin=..y.., group = category, width = 0.5),
color="black")
Which produces:
EDIT: Incorporating jitter will allow the all the data to fit, but I don't want to add noise to this data and would prefer to show it as discreet data.
adjusting the binwidth and dotsize to 0.3 as suggested below also fits all the data, however it leaves too much white space.
I think that I might have to transform my data so that the values are steps smaller than 1, in order to get everything to fit horizontally and dot sizes to big large enough to reduce white space.
I think the easiest way is using coord_cartesian:
plot + scale_color_manual(values=c("#E69F00", "#56B4E9")) +
geom_errorbar(stat="hline", yintercept="mean",
aes( ymax=..y..,ymin=..y.., group = category, width = 0.5),
color="black") +
coord_cartesian(ylim=c(17,40))
Which gives me this plot (with fake data that are not as neatly distributed as yours):

Extend X axis interval ggplot2

I am trying to plot data with lot's of X axis values. I am trying to not overlap my point with geom_point. I found lot's of discussions about "scale_x_continuous", "position = jitter or dodge" etc... and every time my problem is remaining because I need to keep my point aligned. Moreover, "scale_size_area" does not make it good.
EDIT: Generated data already melted at the end of the post.
I can not post image (Link to image), but to give the idea: I have 6 levels in my Y axis, and 400 levels in X axis. My points (shape = 1 = circle) are Y-levels aligned, and have different diameters depending on the value.
This is ok, but circles are overlapping.
plot <- ggplot(data, aes(x_variable_400_levels, y_variable_6_levels)) +
# value*100 because values are between 0 and 1 to have bigger circles
geom_point(shape = 1, size = data$value*100) +
# theme description
theme(
plot.title = element_text(lineheight=.8, face="bold", vjust=1),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=0.3)
)
So, my question is: Can I modify the interval between two values of the X axis in order to avoid the overlapping between circles? Jitter is not interesting here because the noise does not allow a good visualisation of data, including that when I tried to had only HORIZONTAL noise.
Any kind of solution, links or other tutorial to solve it will be appreciated.
EDIT : Generated data. Import with read.table, sep = "," and header = T. The point is that, I have very little circles and they are important too.
data <- read.table(text='"trf","sample","value"
36,"S1",0.143882104
38,"S1",0.025971979
47,"S1",0.016711593
56,"S1",0.027896069
67,"S1",0.025870577
93,"S1",0.07638307
100,"S1",0.022905895
102,"S1",0.019192547
104,"S1",0.018258923
107,"S1",0.005032219
114,"S1",0.028297368
123,"S1",0.007874848
131,"S1",0.024184004
36,"S2",0.115123666
38,"S2",0
47,"S2",0.00479275
56,"S2",0.029523128
67,"S2",0.030133055
93,"S2",0.044749246
100,"S2",0.032865979
102,"S2",0
104,"S2",0
107,"S2",0.013160255
114,"S2",0.052047248
123,"S2",0.007632445
131,"S2",0
36,"S3",0.179332128
38,"S3",0.046215267
47,"S3",0
56,"S3",0.070791832
67,"S3",0.050214857
93,"S3",0.074108014
100,"S3",0
102,"S3",0
104,"S3",0
107,"S3",0
114,"S3",0.081441849
123,"S3",0
131,"S3",0.100090456', header=T,sep=",")
I don't think changing the interval is the solution, as your x-axis is numeric. It would be more difficult to interpret if the space between for instance 1 and 2 is larger that the space between 9 and 10. And if you would change all intervals to the largest circle, the plot would be too wide. I also imagine it would be very cluttered if you have more data, which makes it harder to see patterns. Maybe a (faceted) barplot is the solution? Allows for horizontal and vertical comparison, small values are visible and values are easily extracted and compared. Here's a start:
p2 <- ggplot(data, aes(x=trf, y=value))+
geom_bar(stat="identity") +
facet_grid(sample~.) +
xlim(c(0,150)) + theme_bw()

Resources