Geom_qq() Logarithmic Transformation - r

I am trying to compare some data on a Q-Q plot with the regular distribution of the data and then a distribution with a log transformation of the same variable.
However, I am getting the same plot (though the y-axis has a different range, of course). The x-axis is the same in both plots.
Here is the code for the regular plot:
sbpqq <- wcgs %>%
ggplot() + geom_qq(mapping = aes(sample = sbp))
sbpqq
And here is the code for the log plot:
sbpqq <- wcgs %>%
ggplot() + geom_qq(mapping = aes(sample = log(sbp)))
sbpqq
The two plots, like I said, look the same, although I imagine they should look different (the log transformation on a histogram made the data follow a more normal distribution). Should the plots look the same and I'm just misinterpreting this?
Any help is appreciated.
Thank you!
Here are the plots:
Normal Plot:
Log Plot:

After viewing the two plots side-by-side in RStudio, I realized that they are not identical and that the log plot is more normal than the regular plot.

Related

How to create histogram plot in ggplot2 without data frame?

I am plotting two histograms in R by using the following code.
x1<-rnorm(100)
x2<-rnorm(50)
h1<-hist(x1)
h2<-hist(x2)
plot(h1, col=rgb(0,0,1,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE)
plot(h2, col=rgb(1,0,0,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE,add=TRUE)
legend("topright", c("H1", "H2"), fill=c(rgb(0,0,1,.25),rgb(1,0,0,.25)))
The code produces the following output.
I need a visually good looking (or stylistic) version of the above plot. I want to use ggplot2. I am looking for something like this (see Change fill colors section). However, I think, ggplot2 only works with data frames. I do not have data frames in this case. Hence, how can I create good looking histogram plot in ggplot2? Please let me know. Thanks in advance.
You can (and should) put your data into a data.frame if you want to use ggplot. Ideally for ggplot, the data.frame should be in long format. Here's a simple example:
df1 = rbind(data.frame(grp='x1', x=x1), data.frame(grp='x2', x=x2))
ggplot(df1, aes(x, fill=grp)) +
geom_histogram(color='black', alpha=0.5)
There are lots of options to change the appearnce how you like. If you want to have the histograms stacked or grouped, or shown as percent versus count, or as densities etc., you will find many resources in previous questions showing how to implement each of those options.

How to plot 3 different time-series with "actual" values rather than a density plot as ridgeline plots (formerly known as Joyplots) in R?

I would like to plot ridgeline plots of 3 different timeseries with same axes with actual values, but NOT a density plot as ridgeplots generally show.
Tried using Henrik Lindberg's code here : https://github.com/halhen/viz-pub/tree/master/sports-time-of-day
It does what it is supposed to do, but can not produce smoothing.
Also tried the ggridges manual codes (below)
ggplot(df,aes(x = time, y = activity, height = p)) + geom_density_ridges()
ggridges produces density plots, not as a timeseries as I want it to be. Henrik's code produces desired timeseries, but without the smoothing as I wanted from a ridgeplot.

Why do some of my violin plots look "wavy" for discrete scales?

I have overlayed violin plots comparing group A and group B scores for a particular section of a survey, facet wrapped by section. The scores are discrete 1-7 values. In some of these violin plots, the smoothing works as expected. In others, one group or the other looks very "wavy" between discrete scores (shown below).
I thought the problem may be a difference in the group sizes, but then surely the "waviness" would appear in all the section plots.
Also, this doesn't explain to me why the plots "dip in" despite being discrete 1-7 values.
When I add the adjust parameter it over-smooths the already smooth sections, so it's not quite ideal.
I use this code to create the plots
create_violin_across_groups_by_section <- function(data, test_group="first") {
g <- ggplot(data) +
aes(x=factor(nrow(data)),y=score,fill=group) +
geom_violin(alpha=0.5,position="identity") +
facet_wrap("section") +
labs(
title = paste("Comparison across groups for ", test_group)
)
return(g)
}
which results in something like this
in this case, "openness," is oddly wavy while the others all appear to be smoothed as normal.
I've thought perhaps it has something to do with the x=factor(nrow(data)) but again, surely the waviness would appear in all the section plots.
I would expect either all of the plots to be wavy (though I still wouldn't understand why) or all of them to have the same smoothness.
How can I make all of the facet-wrapped plots have the same smoothness, and why are they different in the first place?
Thanks all
The shape of the violin plot is calculated with a kernel density estimation. Kernel density estimations are designed for continuous data and not for discrete data, like your scores. While you can feed discrete data to the kernel estimator, the result may not always be beautiful or even meaningful. You can try to use different kernel and bw argument values in the geom_violin or you might consider something designed for discrete data, such as geom_dotplot.
+ geom_dotplot(binaxis = "y", stackdir = "center", position = "dodge")
Check out the corresponding example of geom_dotplot https://ggplot2.tidyverse.org/reference/geom_dotplot.html for a preview of how it can look like.
Check out the kernel and bw description of the violin plot https://ggplot2.tidyverse.org/reference/geom_violin.html that points to the density function https://www.rdocumentation.org/packages/stats/versions/3.6.1/topics/density for further information on how kernel density estimations are calculated.

How to plot a graph of Probability density function using ggplot

Here is my data set
link to dataset
I want to plot a graph showing the Probability density function for the variable quality of the division on the type of wine.
I try this:
library(ggplot2)
db <- dbeta(wines$quality, 1, 1)
qplot(wines$quality, db, geom="line")
but it plot flat line.
ok, i think my code don't have any sens. I want to do somethink lie that:
Example
x-quality of wines(dry,semi-dry....)
What can I do?
Is this what you want?
ggplot(wines) + geom_density(aes(quality))
EDIT:
I see your point, but probably you just need to rescale the y values (am I correct?)
so is not this what you're after? changed the image
ggplot(wines[-4381,]) + geom_density(aes(x=quality)) +
facet_wrap(~sweetnes)
or all in one with different fill
ggplot(wines) + geom_density(aes(x=quality, fill=sweetnes))

ggplot2 2d Density Weights

I'm trying to plot some data with 2d density contours using ggplot2 in R.
I'm getting one slightly odd result.
First I set up my ggplot object:
p <- ggplot(data, aes(x=Distance,y=Rate, colour = Company))
I then plot this with geom_points and geom_density2d. I want geom_density2d to be weighted based on the organisation's size (OrgSize variable). However when I add OrgSize as a weighting variable nothing changes in the plot:
This:
p+geom_point()+geom_density2d()
Gives an identical plot to this:
p+geom_point()+geom_density2d(aes(weight = OrgSize))
However, if I do the same with a loess line using geom_smooth, the weighting does make a clear difference.
This:
p+geom_point()+geom_smooth()
Gives a different plot to this:
p+geom_point()+geom_smooth(aes(weight=OrgSize))
I was wondering if I'm using density2d inappropriately, should I instead be using contour and supplying OrgSize as the 'height'? If so then why does geom_density2d accept a weighting factor?
Code below:
require(ggplot2)
Company <- c("One","One","One","One","One","Two","Two","Two","Two","Two")
Store <- c(1,2,3,4,5,6,7,8,9,10)
Distance <- c(1.5,1.6,1.8,5.8,4.2,4.3,6.5,4.9,7.4,7.2)
Rate <- c(0.1,0.3,0.2,0.4,0.4,0.5,0.6,0.7,0.8,0.9)
OrgSize <- c(500,1000,200,300,1500,800,50,1000,75,800)
data <- data.frame(Company,Store,Distance,Rate,OrgSize)
p <- ggplot(data, aes(x=Distance,y=Rate))
# Difference is apparent between these two
p+geom_point()+geom_smooth()
p+geom_point()+geom_smooth(aes(weight = OrgSize))
# Difference is not apparent between these two
p+geom_point()+geom_density2d()
p+geom_point()+geom_density2d(aes(weight = OrgSize))
geom_density2d is "accepting" the weight parameter, but then not passing to MASS::kde2d, since that function has no weights. As a consequence, you will need to use a different 2d-density method.
(I realize my answer is not addressing why the help page says that geom_density2d "understands" the weight argument, but when I have tried to calculate weighted 2D-KDEs, I have needed to use other packages besides MASS. Maybe this is a TODO that #hadley put in the help page that then got overlooked?)

Resources