logarithmic y-axis issue in R/ ggplot2 - r

I plotted a histogram from a frequency distribution table using ggplot2. Here is some sample data
dput(test_data)
structure(list(inst = c(5, 5, 5, 10, 10, 10, 15, 15, 15), equip = c("a",
"b", "c", "a", "b", "c", "a", "b", "c"), value = c(0.520670542493463,
0.7556017707102, 0.931902746669948, 0.206132101127878, 0.0114199279341847,
0.603053622646257, 0.315444506937638, 0.375196750741452, 0.983124621212482
)), class = "data.frame", row.names = c(NA, -9L))
When I use ggplot2 to plot the data, I get the following output:
test_hist1 <- ggplot(test_data,aes(x = inst, y =value, fill = equip)) + geom_bar(width=3,alpha=1,stat = "dodge", position ="stack")+theme_bw()+xlab(expression(Value))+ylab("value") + ggtitle(expression(test~data))+theme(plot.title = element_text(hjust = 0.5))+scale_fill_manual(values=c("#00FF00", "#FFD700","#DC143C"))
But when I transform the y_axis to be a log_axis, the plot direction changes and so does the intensity of the bars.
test_hist2 <- ggplot(test_data,aes(x = inst, y =value, fill = equip)) + geom_bar(width=3,alpha=1,stat = "dodge", position ="stack")+theme_bw()+xlab(expression(Value))+ylab("log_yaxis") + ggtitle(expression(test~data))+theme(plot.title = element_text(hjust = 0.5))+scale_fill_manual(values=c("#00FF00", "#FFD700","#DC143C"))+scale_y_log10()
My second plot is wrong, because the code for second plot is just converting my y-axis number to log10(y_axis_value) instead of a log_axis that is given in the following answer (the plot in the answer is the axis I am looking for). Can someone direct me in the right direction. Thanks for the help.
R: Difference between log axis scale vs. manual log transformation?

Related

two different legends from one dataset

I am trying to have two legends: one based on variable c and the other on variable d, defined by their own shape and size. I do know if this is possible in ggplot2? Maybe it is not fitting to the philosophy behind the use of ggplot2. If I transform the data to long format, I can deal with the different shapes, but the sizes are confounded. The same is happening if I use a facet_wrap option.
structure(list(a = c(5, 6, 7), b = c(5, 6, 7), c = c(0.1, 0.5,
1), d = c(10, 5, 1)), .Names = c("a", "b", "c", "d"), row.names = c(NA,
-3L), class = "data.frame")
library(ggplot2)
plot <- ggplot() + geom_point(data=e,aes(x=a,y=b,size=c), shape=1,
color="black")
plot <- plot + geom_point(data=e,aes(x=a,y=b,size=d), shape=3, color="red")
plot
Any advice is more than welcome.
you can write shape and size in aes() like geom_point(aes(x=a,y=b,shape=factor(c))) +geom_point(aes(x=a,y=b,size=d), shape=3). For example,
library(ggplot2)
ggplot(mpg) + geom_point(aes(x=hwy,y=cty,shape=class)) +
geom_point(aes(x=hwy,y=cty,size=cyl), shape=3)

Values in gganimate col chart differs from original data values

I'm starting with animated charts and using gganimate package. I've found that when generating a col chart animation over time, values of variables change from original. Let me show you an example:
Data <- as.data.frame(cbind(c(1,1,1,2,2,2,3,3,3),
c("A","B","C","A","B","C","A","B","C"),
c(20,10,15,20,20,20,30,25,35)))
colnames(Data) <- c("Time","Object","Value")
Data$Time <- as.integer(Data$Time)
Data$Value <- as.numeric(Data$Value)
Data$Object <- as.character(Data$Object)
p <- ggplot(Data,aes(Object,Value)) +
stat_identity() +
geom_col() +
coord_cartesian(ylim = c(0,40)) +
transition_time(Time)
p
The chart obtained loks like this:
Values obtained in the Y-axis are between 1 and 6. It seems that the original value of 10 corresponds to a value of 1 in the Y-axis. 15 is 2, 20 is 3 and so on...
Is there a way for keeping the original values in the chart?
Thanks in advance
Your data changed when you coerced a factor variable into numeric. (see data section how to efficiently define a data.frame)
You were missing a position = "identity" for your bar charts to stay at the same place. I added a fill = Time for illustration.
Code
p <- ggplot(Data, aes(Object, Value, fill = Time)) +
geom_col(position = "identity") +
coord_cartesian(ylim = c(0, 40)) +
transition_time(Time)
p
Data
Data <- data.frame(Time = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
Object = c("A", "B", "C", "A", "B", "C", "A", "B", "C"),
Value = c(20, 10, 15, 20, 20, 20, 30, 25, 35))

How to point each plot to correct y axis (many plots, two y axes, in R with ggplot2)

So I have compared two groups with a third using a range of inputs. For each of the three groups I have a value and a confidence interval for a range of inputs. For the two comparisons I also have a p-value for that range of inputs. Now I would like to plot all five data series, but use a second axis for the p values.
I am able to do that except for one thing: how do I make sure that R knows which of the plots to assign to the second axis?
This is what it looks like now. The bottom two data series should be scaled up to the Y axis to the right.
ggplot(df) +
geom_pointrange(aes(x=x, ymin=minc, ymax=maxc, y=meanc, color="c")) +
geom_pointrange(aes(x=x, ymin=minb, ymax=maxb, y=meanb, color="b")) +
geom_pointrange(aes(x=x, ymin=mina, ymax=maxa, y=meana, color="a")) +
geom_point(aes(x=x, y=c, color="c")) +
geom_point(aes(x=x, y=b, color="b")) +
scale_y_continuous(sec.axis = sec_axis(~.*0.2))
df is a dataframe whose column names are all the variables you see listed above, all row values are the corresponding datapoints.
You can get what you want, staying true to Hadley's cannon and Grammar of Graphics gospel, if you transform your DF from wide to long, and employ a different aes (i.e. shape, color, fill) between means and CI.
You did not provide a reproducible example, so I employ my own. (Dput at the end of the post)
df2 <- df %>%
mutate(CatCI = if_else(is.na(CI), "", Cat)) # Create a categorical name to map the CI to the legend.
ggplot(df2, aes(x = x)) +
geom_pointrange(aes(ymin = min, ymax = max, y = mean, color = Cat), shape = 16) +
geom_point(data = dplyr::filter(df2,!is.na(CI)), ## Filter the NA within the CI
aes(y = (CI/0.2), ## Transform the CI's y position to fit the right axis.
fill = CatCI), ## Call a second aes the aes
shape = 25, size = 5, alpha = 0.25 ) + ## I changed shape, size, and fillto help with visualization
scale_y_continuous(sec.axis = sec_axis(~.*0.2, name = "P Value")) +
labs(color = "Linerange\nSinister Axis", fill = "P value\nDexter Axis", y = "Mean")
Result:
Dataframe:
df <- structure(list(Cat = c("a", "b", "c", "a", "b", "c", "a", "b",
"c", "a", "b", "c", "a", "b", "c"), x = c(2, 2, 2, 2.20689655172414,
2.20689655172414, 2.20689655172414, 2.41379310344828, 2.41379310344828,
2.41379310344828, 2.62068965517241, 2.62068965517241, 2.62068965517241,
2.82758620689655, 2.82758620689655, 2.82758620689655), mean = c(0.753611797661977,
0.772340941644911, 0.793970086962944, 0.822424652072316, 0.837015408776649,
0.861417383841253, 0.87023105762465, 0.892894201949377, 0.930096326498796,
0.960862178366363, 0.966600321596147, 0.991206984637544, 1.00714201832596,
1.02025006679944, 1.03650896186786), max = c(0.869753641121797,
0.928067675294351, 0.802815304215019, 0.884750162053761, 1.03609814491961,
0.955909854315582, 1.07113399603486, 1.02170928767791, 1.05504846273091,
1.09491706586801, 1.20235615364205, 1.12035782960649, 1.17387406039167,
1.13909154635088, 1.0581878034897), min = c(0.632638511783381,
0.713943701135991, 0.745868763626567, 0.797491261486603, 0.743382797144923,
0.827693203320894, 0.793417962991821, 0.796917421637021, 0.92942504556723,
0.89124101157585, 0.813058838839382, 0.91701749675892, 0.943744642652422,
0.912869230576973, 0.951734254896252), CI = c(NA, 0.164201137643034,
0.154868406784159, NA, 0.177948094206453, 0.178360305763648,
NA, 0.181862670931493, 0.198447350829814, NA, 0.201541499248143,
0.203737532636542, NA, 0.205196077692786, 0.200992205838595),
CatCI = c("", "b", "c", "", "b", "c", "", "b", "c", "", "b",
"c", "", "b", "c")), .Names = c("Cat", "x", "mean", "max",
"min", "CI", "CatCI"), row.names = c(NA, 15L), class = "data.frame")

Stacked bar plot in violin plot shape

Maybe this is a stupid idea, or maybe it's a brain wave. I have a dataset of lipid classes in 4 different species. The data is proportional, and the sums are 1000. I want to visualise the differences in proportions for each class in each species. Generally a stacked bar would be the way to go here, but there are several classes, and it becomes uninterpretable since only the bottom class shares a baseline (see below).
And this appears to be the best option of a bad bunch, with pie and donut charts being nothing short of sneered at.
I was then inspired by this creation Symmetrical, violin plot-like histogram?, which creates a sort of stacked distribution violin plot (see below).
I am wondering if this could somehow be converted into a stacked violin, such that each segment represents a whole variable. In the case of my data, species' A and D would be 'fat' around the TAG segment, and 'skinnier' at the STEROL segment. This way the proportions are depicted horizontally, and always have a common baseline. Thoughts?
Data:
structure(list(Sample = c("A", "A", "A", "B", "B", "B", "C",
"C", "C", "D", "D"), WAX = c(83.7179798600773, 317.364310355766,
20.0147496567679, 93.0194886619568, 78.7886829173726, 79.3445694220837,
91.0020522660375, 88.1542855137005, 78.3313314713951, 78.4449591023115,
236.150030864875), TAG = c(67.4640254081232, 313.243238213156,
451.287867136276, 76.308508343969, 40.127554151831, 91.1910102221636,
61.658394708941, 104.617259648364, 60.7502685224869, 80.8373642262043,
485.88633863193), FFA = c(41.0963382465756, 149.264019576272,
129.672579626868, 51.049208042632, 13.7282635713804, 30.0088572108344,
47.8878116348504, 47.9564218319094, 30.3836532949481, 34.8474205480686,
10.9218910757234), `DAG1,2` = c(140.35876401479, 42.4556176551009,
0, 0, 144.993393432366, 136.722412691012, 0, 140.027443968931,
137.579074961889, 129.935353616471, 46.6128854387559), STEROL = c(73.0144390122309,
24.1680929257195, 41.8258704279641, 78.906816661241, 67.5678558060943,
66.7150537517493, 82.4794113296791, 76.7443442992891, 68.9357008866253,
64.5444668132533, 29.8342694785768), AMPL = c(251.446564854412,
57.8713327050339, 306.155806819949, 238.853696442419, 201.783872969561,
175.935515655693, 234.169038776536, 211.986239116884, 196.931330316831,
222.658181144794, 73.8944654414811), PE = c(167.99718650752,
43.3839497916674, 22.1937177530762, 150.315149187176, 153.632530721031,
141.580725482114, 164.215442147509, 155.113323256627, 143.349000132624,
128.504657216928, 50.6281347160092), PC = c(174.904702096271,
52.2494387772846, 28.8494085790995, 191.038328534942, 190.183655117756,
175.33290326259, 199.2632149392, 175.400682364295, 176.64926273487,
163.075864395099, 66.071984352649), LPC = c(0, 0, 0, 120.508804125665,
109.194191312608, 103.16895230176, 119.324634197247, 0, 107.09037767833,
97.151732936871, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -11L), .Names = c("Sample", "WAX", "TAG",
"FFA", "DAG1,2", "STEROL", "AMPL", "PE", "PC", "LPC"))
This is essentially a horizontal bar plot:
library(reshape2)
DFm <- melt(DF, id.vars = "Sample")
DFm1 <- DFm
DFm1$value <- -DFm1$value
DFm <- rbind(DFm, DFm1)
ggplot(DFm, aes(x = "A", y = value / 10, fill = variable, color = variable)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
theme_minimal() +
facet_wrap(~ Sample, nrow = 1, switch = "x") +
theme(axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank())

ggplot scatter plot of two groups with superimposed means with X and Y error bars

How can I generate a ggplot2 scatterplot of two groups with the means indicated together with X and Y error bars, like this?
Here is a reduced example (using dput to recreate the data.frame df) with two groups of cells and three measures, and I'd like to say plot Peak against Rise, or Peak against Decay. That much is straightforward, but I would like to add points indicating the group means with X and Y error bars (+/- sem).
Is there a way to do this within ggplot2, or do I need to generate means and sem values first? This post draw my attention to geom_errorbarh but I'm still uncertain as to the best way to proceed.
library(ggplot2)
df<-structure(list(Group = c("A", "A", "A", "A", "A", "A", "A",
"A", "B", "B", "B", "B", "B", "B", "B", "B"), Peak = c(102.975,
37.805, 64.996, 66.36, 199.354, 7.425, 34.137, 366.59, 10.165,
14.833, 702.525, 39.086, 8.286, 122.783, 105.762, 37.018), Rise = c(0.346855,
0.24165, 0.24028, 0.461548, 0.194016, 0.164047, 0.484375, 0.307861,
0.438538, 0.488083, 0.549423, 0.365448, 0.511551, 0.33596, 0.331467,
0.270096), Decay = c(1.3874, 1.07407, 1.88787, 2.64408, 1.1462,
0.615963, 4.04641, 1.48701, 3.61397, 4.1838, 1.92746, 3.64329,
4.21354, 0.812695, 1.14611, 1.28279)), .Names = c("Group",
"Peak", "Rise", "Decay"), class = "data.frame", row.names = c(NA,
-16L))
ggplot(df, aes(Peak, Rise)) +
geom_point(aes(colour=Group)) +
theme_bw(14)
I have tried something like:
library(doBy)
sem <- function(x) sqrt(var(x)/length(x))
z<-summaryBy(Peak+Rise+Decay~Group, data=df, FUN=c(mean,sem))
z
to get the values, but easily (and flexibly) incorporating them into the ggplot code is defeating me.
I tend to use plyr for these kinds of summaries:
z <- ddply(df,.(Group),summarise,
Peak = mean(Peak),
Rise = mean(Rise),
PeakSE = sqrt(var(Peak))/length(Peak),
RiseSE = sqrt(var(Rise))/length(Rise))
ggplot(df,aes(x = Peak,y = Rise)) +
geom_point(aes(colour = Group)) +
geom_point(data = z,aes(colour = Group)) +
geom_errorbarh(data = z,aes(xmin = Peak - PeakSE,xmax = Peak + PeakSE,y = Rise,colour = Group,height = 0.01)) +
geom_errorbar(data = z,aes(ymin = Rise - RiseSE,ymax = Rise + RiseSE,x = Peak,colour = Group))
I confess I was a little disappointed that I had to manually tweak the crossbar height. But thinking about it, I guess that could be fairly challenging to implement.

Resources