I have a dataset of a type as shown
Seasons A B C A1 B1 C1
Winter 97 94 87 0.2 0.4 0.3
Summer 92 94 101 1 0.7 0.3
There are values for each season (Summer, Winter, autumn, spring) and with variables from (A to E) and (A1 to E1). While drawing a barplot using ggplot2, the bar height of A1 to E1 is very less due to their low values and I wish to move them to the secondary axis but I don't know how to do that. Please suggest the code. I am sharing my code until now.
library(readxl)
library(ggplot2)
library(readxl)
cell_viability_data <- read_excel("C:/Users/CEZ178522/Downloads/ananya/Cell_viability.xlsx")
cell_viability_data
plot1 <- ggplot(data=cell_viability_data, aes(x=Seasons, y= CellViability, fill= Types)) +
geom_bar(stat="identity", position=position_dodge()) +
labs(title = "Seasonal Cell Viability") +
theme(axis.text.x = element_text(colour = "grey1", size = 10),
axis.text.y = element_text(colour = "grey1", size = 10),
plot.title = element_text(hjust = 0.5))
plot1
I need the small bars to move to secondary axis
Secondary y-axes were for a long time banned in ggplot because they usually do more damage than good. The only option for now is to display an auxiliary, secondary y-axis which has a direct, proportional transformation from the primary y-axis. In other words, the secondary y-axis is a supplemental axis which displays the same information, but on a different scale (thing Celcius and Fahrenheit).
What you are asking is to have a subset of data points inflated by some arbitrary value, so they are "on par" with the remaining. Consider this: Can you, by choice of scaling constant, make values A1-E1 appear much higher than values A-E? Can you, by choice of scaling constant, make values A1-E1 appear much, much lower than values A-E? Can you, by choice of scaling constant, make values A1-E1 be "on par" with A-E, but always slighter lower? If so, to any question, your data visualisation cannot be trusted.
Consider instead: What is the important comparison you are trying to make? Season-to-season for each type? A vs. A1? Take out a pen and paper, and try to sketch what you want to compare, and what issues you are encountering when making a comparison. Then you are ready to make the visualisation in R/ggplot.
Related
I've been trying to standardise multiple bar plots so that the bars are all identical in width regardless of the number of bars. Note that this is over multiple distinct plots - faceting is not an option. It's easy enough to scale the plot area so that, for instance, a plot with 6 bars is 1.5* the width of a plot with 4 bars. This would work perfectly, except that each plot has an expanded x axis by default, which I would like to keep.
"The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables."
https://ggplot2.tidyverse.org/reference/scale_discrete.html
My problem is that I can't for the life of me work out what '0.6 units' actually means. I've manually measured the distance between the bars and the y axis in various design tools and gotten inconsistent answers, so I can't factor '0.6 units' into my calculations when working out what size the panel windows should be. Additionally I can't find any answers on how many 'units' long a discrete x axis is - I assumed at first it would be 1 unit per category but that doesn't fit with the visuals at all. I've included an image that hopefully shows what I mean - the two graphs
In this image, the top graph has a plot area exactly 1.5* that of the bottom graph. Seeing as it has 6 bars compared with 4, that would mean each bar is the same width, except that that extra space between the axis and the first bar messes this up. Setting expand = expansion(add = c(0, 0)) clears this up but results in not-so-pretty graphs. What I'd like is for the bars to be identical in width between the two plots, accounting for this extra space. I'm specifically looking for a general solution that I can use for future plots, not for the individual solution for this sample. As such, what I'd really like to know is how many 'units' long are these two x axes? Many thanks for any and all help!
Instead of using expansion for the axis, I would probably use the fact that categorical variables are actually plotted on the positive integers on Cartesian co-ordinates. This means that, provided you know the maximum number of columns you are going to use in your plots, you can set this as the range in coord_cartesian. There is a little arithmetic involved to keep the bars centred, but it should give consistent results.
We start with some reproducible data:
library(ggplot2)
set.seed(1)
df <- data.frame(group = letters[1:6], value = 100 * runif(6))
Now we set the value for the maximum number of bars we will need:
MAX_BARS <- 6
And the only thing "funny" about the plot code is the calculation of the x axis limits in coord_cartesian:
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
Now let us remove one factor level and run the exact same plot code:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
And again:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
And again:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
You will see the bars remain constant width and centralized, yet the panel size remains fixed.
Created on 2021-11-06 by the reprex package (v2.0.0)
I have an ethogram-like ggplot where I plot the value of a factor quadrant (1 to 4), which is plotted for each frame of a movie (frameID). The color is given by 3 animals that are being tracked.
I am fairly satisfied with the graph but the amount of points makes it difficult to see, even with alpha. I was wondering how to add position_dodge in a way that doesn't destroy the plot.
ggplot(dataframe) ,
aes(frameID, quadrant, color=animal)) +
geom_jitter(alpha=0.5) +
scale_color_manual(values = c("#1334C1","#84F619", "#F43900")) +
theme_classic()+
theme(legend.position = 'none')
This link has useful info about dodging using geom_point.
R: How to spread (jitter) points with respect to the x axis?
I can change to geom_point with height, which works but it produces something awful.
+ geom_point(position = position_jitter(w = 0, h = 2))
Update
Data lives in GitHub
Lowering the alpha or changing size helps, adds trouble when rescaling the image.
https://github.com/matiasandina/MLA2_Tracking/blob/master/demo_data/sample_data.csv
Update 2022
It's been a while since I posted this initially, my original thoughts changed and are better reflected here, but I am still looking for a ggplot2 version of doing this!
This is a relatively straightforward question, however, I was unable to find an answer. Likewise, I am not used to posting at Stackoverflow so I apologise for any kind of errors.
I currently have a Multiplot Facet that displays the variation in animal activity (Y) and day length on a seasonal level (in this case, two seasons).
As you see on the X axis, there are numbers such as 0.45, 0.47, etc. These represent time in numeric form. The issue is, is that I would like to convert 0.45 and etc to hours (it should be noted that they are not represented as dates). That is, 0.45 should represent 10, 0.47 should represent 10.2 etc. While I attempted to manually do this in excel...the scatter plots are well..not very scattered when plotting them. That is, I simply converted 10:02:00 to 10.2 (therefore, they do not represent actual dates in R)
Is there a way to either
1. manually change the numeric daylength (i.e. 0.45) to the hours that they represent?
2. Shorten the tick marks for the actual hours for both facets so that they do not seemed as scattered?
Likewise, is all of this possible while keeping both facets in place?
Here is the script that I use for the plot:
ii$season = factor(ii$season, levels = c("Winter","Summer"))
ii <- ii%>%
mutate(season = ifelse(month2 %in% c( 6,7, 8), "Winter",
ifelse(month2 %in% c(12,1,2), "Summer","Error")))
Plot <- ggplot(ii, aes(daylength, newavx2)) +
geom_point() + geom_smooth()+
ggtitle("Activity and Daylength by Season") +
xlab("Daylength") + ylab("Activity") +
theme(plot.title = element_text(lineheight=.8, face="bold",size = 20)) +
theme(text = element_text(size=18))
Plot + facet_grid(.~season, scales = "free_x") +
theme(strip.background = element_blank())
It should be noted that for the second plot, the variable daylength is simply replaced by 'hours' Thank you so much for your help
I am trying to plot data with lot's of X axis values. I am trying to not overlap my point with geom_point. I found lot's of discussions about "scale_x_continuous", "position = jitter or dodge" etc... and every time my problem is remaining because I need to keep my point aligned. Moreover, "scale_size_area" does not make it good.
EDIT: Generated data already melted at the end of the post.
I can not post image (Link to image), but to give the idea: I have 6 levels in my Y axis, and 400 levels in X axis. My points (shape = 1 = circle) are Y-levels aligned, and have different diameters depending on the value.
This is ok, but circles are overlapping.
plot <- ggplot(data, aes(x_variable_400_levels, y_variable_6_levels)) +
# value*100 because values are between 0 and 1 to have bigger circles
geom_point(shape = 1, size = data$value*100) +
# theme description
theme(
plot.title = element_text(lineheight=.8, face="bold", vjust=1),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=0.3)
)
So, my question is: Can I modify the interval between two values of the X axis in order to avoid the overlapping between circles? Jitter is not interesting here because the noise does not allow a good visualisation of data, including that when I tried to had only HORIZONTAL noise.
Any kind of solution, links or other tutorial to solve it will be appreciated.
EDIT : Generated data. Import with read.table, sep = "," and header = T. The point is that, I have very little circles and they are important too.
data <- read.table(text='"trf","sample","value"
36,"S1",0.143882104
38,"S1",0.025971979
47,"S1",0.016711593
56,"S1",0.027896069
67,"S1",0.025870577
93,"S1",0.07638307
100,"S1",0.022905895
102,"S1",0.019192547
104,"S1",0.018258923
107,"S1",0.005032219
114,"S1",0.028297368
123,"S1",0.007874848
131,"S1",0.024184004
36,"S2",0.115123666
38,"S2",0
47,"S2",0.00479275
56,"S2",0.029523128
67,"S2",0.030133055
93,"S2",0.044749246
100,"S2",0.032865979
102,"S2",0
104,"S2",0
107,"S2",0.013160255
114,"S2",0.052047248
123,"S2",0.007632445
131,"S2",0
36,"S3",0.179332128
38,"S3",0.046215267
47,"S3",0
56,"S3",0.070791832
67,"S3",0.050214857
93,"S3",0.074108014
100,"S3",0
102,"S3",0
104,"S3",0
107,"S3",0
114,"S3",0.081441849
123,"S3",0
131,"S3",0.100090456', header=T,sep=",")
I don't think changing the interval is the solution, as your x-axis is numeric. It would be more difficult to interpret if the space between for instance 1 and 2 is larger that the space between 9 and 10. And if you would change all intervals to the largest circle, the plot would be too wide. I also imagine it would be very cluttered if you have more data, which makes it harder to see patterns. Maybe a (faceted) barplot is the solution? Allows for horizontal and vertical comparison, small values are visible and values are easily extracted and compared. Here's a start:
p2 <- ggplot(data, aes(x=trf, y=value))+
geom_bar(stat="identity") +
facet_grid(sample~.) +
xlim(c(0,150)) + theme_bw()
I have data in percentages. I would like to use ggplot to create a graph, but I cannot get it to work like I would like. Since the data is very skewed a simple stacked column doesn't work well because the really small values don't show up. Here is a sample set:
Actual Predicted
a 0.5 5
b 9.5 5
c 90 90
On the left is an excel plot and on the right is R-ggplot
The problem is that in R the columns do not stack up to be even.
Here is my R code:
a = c("a","b","c","a","b","c")
b = c("Actual","Actual","Actual","Predicted","Predicted","Predicted")
c = c(0.5,2.5,97,0.2,2.2,97.6)
c = c+1
dat = data.frame(Type=a, Case=b, Percentage=c)
ggplot(dat, aes(x=Case, y=Percentage, fill=Type)) + geom_bar(stat="identity") + scale_y_log10()
*In both Excel and R I do a +1 to deal with numbers 0-1, so the y-axis is off slightly
If I use:
ggplot(dat, aes(x=Case, y=Percentage, fill=Type)) + geom_bar(stat="identity",position = "fill") + scale_y_log10()
The total heights match, however the two blue portions do not match in size (they are both 90%)
Just because two sets of numbers add up to the same value (103 in this case) doesn't mean the sum of the logs will add up to the same value! When you stack the bars without "fill" you get them different heights because the sums of the logs of the values are different. When you then scale it all to the same height you have to squash the blue boxes down by different rates and so they look different.
The Excel bar chart is deliberately misleading. The left red bar is the same size as the blue bar above it but represents a value of about a tenth of the blue bar. You can't make a barchart on a log scale of proportions - its just wrong.
There is a brilliant way to show small numbers without losing them or misrepresenting them. Its an amazing visualisation technique called 'writing the numbers in a table'.
I managed to get it to work like excel. Like Spacedman said, the plot is visually misleading, but numerically correct. The reason is that we want to compare bar segment actual height, when numerically you need to look at the y-axis start and end values. Its similar to bar charts that don't have a y-axis minimum of zero. Here is an example.
I am not sure if I will use the method for visualizing my data, but I had to figure it out.
Here is the result:
Here is the code (I might clean it up as a function that can be called when you assign the y values in ggplot).
a = c("a","b","c","a","b","c")
b = c("Actual","Actual","Actual","Predicted","Predicted","Predicted")
c = c(0.5,9.5,90,5,5,90)
c = c+1
dat = data.frame(Type=a, Case=b, Percentage=c, Cumsum_L=c, Cumsum=c, Norm=c)
for(i in 1:length(dat$Percentage)){
cumsum=0
for(j in 1:i){
if(dat$Case[j]==dat$Case[i]){
cumsum=cumsum+(dat$Percentage[j])
}
}
dat$Cumsum_L[i]=cumsum-dat$Percentage[i]
dat$Cumsum[i]=cumsum
if(dat$Cumsum_L[i]==0){
dat$Cumsum_L[i]=1
}
dat$Norm[i] = log(dat$Cumsum[i])-log(dat$Cumsum_L[i])
}
intervals = seq(from = 0, to = 100, by = 10)
intervals_log = log(intervals)
intervals_log[1]=0
ggplot(dat, aes(x=Case, y=Norm, fill=Type)) + geom_bar(stat="identity") +
scale_y_continuous(name="Percent",breaks = intervals_log, labels=intervals )
*I also need to fix the end points +1 kinda thing.
**I also might be butchering maths.