I am trying to plot data with lot's of X axis values. I am trying to not overlap my point with geom_point. I found lot's of discussions about "scale_x_continuous", "position = jitter or dodge" etc... and every time my problem is remaining because I need to keep my point aligned. Moreover, "scale_size_area" does not make it good.
EDIT: Generated data already melted at the end of the post.
I can not post image (Link to image), but to give the idea: I have 6 levels in my Y axis, and 400 levels in X axis. My points (shape = 1 = circle) are Y-levels aligned, and have different diameters depending on the value.
This is ok, but circles are overlapping.
plot <- ggplot(data, aes(x_variable_400_levels, y_variable_6_levels)) +
# value*100 because values are between 0 and 1 to have bigger circles
geom_point(shape = 1, size = data$value*100) +
# theme description
theme(
plot.title = element_text(lineheight=.8, face="bold", vjust=1),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=0.3)
)
So, my question is: Can I modify the interval between two values of the X axis in order to avoid the overlapping between circles? Jitter is not interesting here because the noise does not allow a good visualisation of data, including that when I tried to had only HORIZONTAL noise.
Any kind of solution, links or other tutorial to solve it will be appreciated.
EDIT : Generated data. Import with read.table, sep = "," and header = T. The point is that, I have very little circles and they are important too.
data <- read.table(text='"trf","sample","value"
36,"S1",0.143882104
38,"S1",0.025971979
47,"S1",0.016711593
56,"S1",0.027896069
67,"S1",0.025870577
93,"S1",0.07638307
100,"S1",0.022905895
102,"S1",0.019192547
104,"S1",0.018258923
107,"S1",0.005032219
114,"S1",0.028297368
123,"S1",0.007874848
131,"S1",0.024184004
36,"S2",0.115123666
38,"S2",0
47,"S2",0.00479275
56,"S2",0.029523128
67,"S2",0.030133055
93,"S2",0.044749246
100,"S2",0.032865979
102,"S2",0
104,"S2",0
107,"S2",0.013160255
114,"S2",0.052047248
123,"S2",0.007632445
131,"S2",0
36,"S3",0.179332128
38,"S3",0.046215267
47,"S3",0
56,"S3",0.070791832
67,"S3",0.050214857
93,"S3",0.074108014
100,"S3",0
102,"S3",0
104,"S3",0
107,"S3",0
114,"S3",0.081441849
123,"S3",0
131,"S3",0.100090456', header=T,sep=",")
I don't think changing the interval is the solution, as your x-axis is numeric. It would be more difficult to interpret if the space between for instance 1 and 2 is larger that the space between 9 and 10. And if you would change all intervals to the largest circle, the plot would be too wide. I also imagine it would be very cluttered if you have more data, which makes it harder to see patterns. Maybe a (faceted) barplot is the solution? Allows for horizontal and vertical comparison, small values are visible and values are easily extracted and compared. Here's a start:
p2 <- ggplot(data, aes(x=trf, y=value))+
geom_bar(stat="identity") +
facet_grid(sample~.) +
xlim(c(0,150)) + theme_bw()
Related
I have a dataset of a type as shown
Seasons A B C A1 B1 C1
Winter 97 94 87 0.2 0.4 0.3
Summer 92 94 101 1 0.7 0.3
There are values for each season (Summer, Winter, autumn, spring) and with variables from (A to E) and (A1 to E1). While drawing a barplot using ggplot2, the bar height of A1 to E1 is very less due to their low values and I wish to move them to the secondary axis but I don't know how to do that. Please suggest the code. I am sharing my code until now.
library(readxl)
library(ggplot2)
library(readxl)
cell_viability_data <- read_excel("C:/Users/CEZ178522/Downloads/ananya/Cell_viability.xlsx")
cell_viability_data
plot1 <- ggplot(data=cell_viability_data, aes(x=Seasons, y= CellViability, fill= Types)) +
geom_bar(stat="identity", position=position_dodge()) +
labs(title = "Seasonal Cell Viability") +
theme(axis.text.x = element_text(colour = "grey1", size = 10),
axis.text.y = element_text(colour = "grey1", size = 10),
plot.title = element_text(hjust = 0.5))
plot1
I need the small bars to move to secondary axis
Secondary y-axes were for a long time banned in ggplot because they usually do more damage than good. The only option for now is to display an auxiliary, secondary y-axis which has a direct, proportional transformation from the primary y-axis. In other words, the secondary y-axis is a supplemental axis which displays the same information, but on a different scale (thing Celcius and Fahrenheit).
What you are asking is to have a subset of data points inflated by some arbitrary value, so they are "on par" with the remaining. Consider this: Can you, by choice of scaling constant, make values A1-E1 appear much higher than values A-E? Can you, by choice of scaling constant, make values A1-E1 appear much, much lower than values A-E? Can you, by choice of scaling constant, make values A1-E1 be "on par" with A-E, but always slighter lower? If so, to any question, your data visualisation cannot be trusted.
Consider instead: What is the important comparison you are trying to make? Season-to-season for each type? A vs. A1? Take out a pen and paper, and try to sketch what you want to compare, and what issues you are encountering when making a comparison. Then you are ready to make the visualisation in R/ggplot.
I am trying to create a scatterplot based on four values. My data is just lists of prices (BASIC,VALUE,DELUXE,ULTIMATE). I want VALUE and DELUXE to be the two axis (x,y) and then have the size and color of the points represent the data for the other two columns.
It is hard to set up a reproducible example, because it is only an issue when I get a lot of values listed. i have about 300 points, with about 30 different color/value labels(For ULTIMATE, and 20 size/value labels(For BASIC)
> gg <- ggplot(d, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1)
> plot(gg)
My code does this well, and lists the colors/size with the corresponding value on the side. This is great, but I would like to alter how that is displayed, so that it is not cut off. I would like to be able to "wrap" the values into more columns, or shrink the display size of those so that they fit.
Currently, this lists ULTIMATE in three columns, to the right of the plot area, but cuts off the top of the labels (it extends well above the plot area)
This lists BASIC size/value labels to the right of the plot area, below ULTIMATE labels, in one column, so about half are cut off at the bottom.
I can increase the margins with:
> gg <- ggplot(d, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1) +theme(plot.margin = unit(c(4,2,4,2), "cm"))
> plot(gg)
This gets more of it in, but creates lots of white area and a smaller view of the plot. I would like to be able to just increase the right margin if necessary, and "wrap" the labels in more columns extending to the right. (i.e. put ULTIMATE into 4 columns instead of 3, and put BASIC into 3-4 columns instead of 1 - So that they are shorter and don't run out the plot area.
There is some built in functionality I found to do the required operation. It lies in adding a guides() argument to the plot, specifying whether I am dealing with the color or size legend, and specifying the number of columns with "ncol = " (You can also specify rows). Giving it an order ranking allows you to rank these as well, so my resulting code was:
> gg <- ggplot(Table, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1) + guides(color = guide_legend(order = 0,ncol = 4),size = guide_legend(order = 1,ncol = 4))
I have an ethogram-like ggplot where I plot the value of a factor quadrant (1 to 4), which is plotted for each frame of a movie (frameID). The color is given by 3 animals that are being tracked.
I am fairly satisfied with the graph but the amount of points makes it difficult to see, even with alpha. I was wondering how to add position_dodge in a way that doesn't destroy the plot.
ggplot(dataframe) ,
aes(frameID, quadrant, color=animal)) +
geom_jitter(alpha=0.5) +
scale_color_manual(values = c("#1334C1","#84F619", "#F43900")) +
theme_classic()+
theme(legend.position = 'none')
This link has useful info about dodging using geom_point.
R: How to spread (jitter) points with respect to the x axis?
I can change to geom_point with height, which works but it produces something awful.
+ geom_point(position = position_jitter(w = 0, h = 2))
Update
Data lives in GitHub
Lowering the alpha or changing size helps, adds trouble when rescaling the image.
https://github.com/matiasandina/MLA2_Tracking/blob/master/demo_data/sample_data.csv
Update 2022
It's been a while since I posted this initially, my original thoughts changed and are better reflected here, but I am still looking for a ggplot2 version of doing this!
I am a newbie to R and hence having some problems in plotting using ggplot and hence need help.
In the above diagram, if any of my bars have high values (in this case, a green one with value of 447), the plot and the plot title gets overlapped. The values here are normalised / scaled such that the y-axis values are always between 0-100, though the label might indicate a different number (this is the actual count of occurrences, where as the scaling is done based on percentages).
I would like to know how to avoid the overlap of the plot with the plot title, in all cases, where the bar heights are very close to 100.
The ggplot function I am using is as below.
my_plot<-ggplot(data_frame,
aes(x=as.factor(X_VAR),y=GROUP_VALUE,fill=GROUP_VAR)) +
geom_bar(stat="identity",position="dodge") +
geom_text(aes(label = BAR_COUNT, y=GROUP_VALUE, ymax=GROUP_VALUE, vjust = -1), position=position_dodge(width=1), size = 4) +
theme(axis.text.y=element_blank(),axis.text.x=element_text(size=12),legend.position = "right",legend.title=element_blank()) + ylab("Y-axis label") +
scale_fill_discrete(breaks=c("GRP_PERCENTAGE", "NORMALIZED_COUNT"),
labels=c("Percentage", "Count of Jobs")) +
ggtitle("Distribution based on Text Analysis 2nd Level Sub-Category") +
theme(plot.title = element_text(lineheight=1, face="bold"))
Here is the ggsave command, in case if that is creating the problem, with dpi, height and width values.
ggsave(my_plot,file=paste(paste(variable_name,"my_plot",sep="_"),".png",sep = ""),dpi=72, height=6.75,width=9)
Can anyone please suggest what need to be done to get this right?
Many Thanks
As Axeman suggests ylim is useful Have a look at the documentation here:
http://docs.ggplot2.org/0.9.3/xylim.html
In your code:
my_plot + ylim(0,110)
Also, I find this intro to axis quite useful:
http://www.cookbook-r.com/Graphs/Axes_(ggplot2)/
Good luck!
I'm having trouble creating a figure with ggplot2. I am using geom_dotplot with center stacking to display my data which are discrete values for 4 categories.
For aesthetic reasons I want to customize the positions of the dots so that
reduce the empty space between dots along the y axis, (ie the dots are 1 value large)
The distributions fit and don't overlap
I've adjusted the bin and dotsize to achieve aesthetic goal 1, but that requires me to fiddle with the ylim() parameter to make sure that the groups fit in the plot. This results in a plot with more whitw space and few numbers on the y axis.
Question: Can anyone explain a way to resize the empty space on this plot?
My code is below:.
plot <- ggplot(figdata, aes(y=Counts, x=category, col=strain)) +
geom_dotplot(aes(fill=strain), dotsize=1, binwidth=.7,
binaxis= "y",stackdir ="centerwhole", stackratio=.7) +
ylim(18,59)
plot + scale_color_manual(values=c("#E69F00", "#56B4E9")) +
geom_errorbar(stat="hline", yintercept="mean",
aes( ymax=..y..,ymin=..y.., group = category, width = 0.5),
color="black")
Which produces:
EDIT: Incorporating jitter will allow the all the data to fit, but I don't want to add noise to this data and would prefer to show it as discreet data.
adjusting the binwidth and dotsize to 0.3 as suggested below also fits all the data, however it leaves too much white space.
I think that I might have to transform my data so that the values are steps smaller than 1, in order to get everything to fit horizontally and dot sizes to big large enough to reduce white space.
I think the easiest way is using coord_cartesian:
plot + scale_color_manual(values=c("#E69F00", "#56B4E9")) +
geom_errorbar(stat="hline", yintercept="mean",
aes( ymax=..y..,ymin=..y.., group = category, width = 0.5),
color="black") +
coord_cartesian(ylim=c(17,40))
Which gives me this plot (with fake data that are not as neatly distributed as yours):