How to plot a graph of Probability density function using ggplot

How to plot a graph of Probability density function using ggplot - r

Here is my data set
link to dataset
I want to plot a graph showing the Probability density function for the variable quality of the division on the type of wine.
I try this:
library(ggplot2)
db <- dbeta(wines$quality, 1, 1)
qplot(wines$quality, db, geom="line")
but it plot flat line.
ok, i think my code don't have any sens. I want to do somethink lie that:
Example
x-quality of wines(dry,semi-dry....)
What can I do?

Is this what you want?
ggplot(wines) + geom_density(aes(quality))
EDIT:
I see your point, but probably you just need to rescale the y values (am I correct?)
so is not this what you're after? changed the image
ggplot(wines[-4381,]) + geom_density(aes(x=quality)) +
facet_wrap(~sweetnes)
or all in one with different fill
ggplot(wines) + geom_density(aes(x=quality, fill=sweetnes))

Related

Geom_qq() Logarithmic Transformation

I am trying to compare some data on a Q-Q plot with the regular distribution of the data and then a distribution with a log transformation of the same variable.
However, I am getting the same plot (though the y-axis has a different range, of course). The x-axis is the same in both plots.
Here is the code for the regular plot:
sbpqq <- wcgs %>%
ggplot() + geom_qq(mapping = aes(sample = sbp))
sbpqq
And here is the code for the log plot:
sbpqq <- wcgs %>%
ggplot() + geom_qq(mapping = aes(sample = log(sbp)))
sbpqq
The two plots, like I said, look the same, although I imagine they should look different (the log transformation on a histogram made the data follow a more normal distribution). Should the plots look the same and I'm just misinterpreting this?
Any help is appreciated.
Thank you!
Here are the plots:
Normal Plot:
Log Plot:

After viewing the two plots side-by-side in RStudio, I realized that they are not identical and that the log plot is more normal than the regular plot.

How to set heigth of rows grid in graph lines on ggplots (R)?

I'm trying plots a graph lines using ggplot library in R, but I get a good plots but I need reduce the gradual space or height between rows grid lines because I get big separation between lines.
This is my R script:
library(ggplot2)
library(reshape2)
data <- read.csv('/Users/keepo/Desktop/G.Con/Int18/input-int18.csv')
chart_data <- melt(data, id='NRO')
names(chart_data) <- c('NRO', 'leyenda', 'DTF')
ggplot() +
geom_line(data = chart_data, aes(x = NRO, y = DTF, color = leyenda), size = 1)+
xlab("iteraciones") +
ylab("valores")
and this is my actual graphs:
..the first line is very distant from the second. How I can reduce heigth?
regards.

The lines are far apart because the values of the variable plotted on the y-axis are far apart. If you need them closer together, you fundamentally have 3 options:
change the scale (e.g. convert the plot to a log scale), although this can make it harder for people to interpret the numbers. This can also change the behavior of each line, not just change the space between the lines. I'm guessing this isn't what you will want, ultimately.
normalize the data. If the actual value of the variable on the y-axis isn't important, just standardize the data (separately for each value of leyenda).
As stated above, you can graph each line separately. The main drawback here is that you need 3 graphs where 1 might do.
Not recommended:
I know that some graphs will have the a "squiggle" to change scales or skip space. Generally, this is considered poor practice (and I doubt it's an option in ggplot2 because it masks the true separation between the data points. If you really do want a gap, I would look at this post: axis.break and ggplot2 or gap.plot? plot may be too complexe
In a nutshell, the answer here depends on what your numbers mean. What is the story you are trying to tell? Is the important feature of your plots the change between them (in which case, normalizing might be your best option), or the actual numbers themselves (in which case, the space is relevant).

you could use an axis transformation that maps your data to the screen in a non-linear fashion,
fun_trans <- function(x){
d <- data.frame(x=c(800, 2500, 3100), y=c(800,1950, 3100))
model1 <- lm(y~poly(x,2), data=d)
model2 <- lm(x~poly(y,2), data=d)
scales::trans_new("fun",
function(x) as.vector(predict(model1,data.frame(x=x))),
function(x) as.vector(predict(model2,data.frame(y=x))))
}
last_plot() + scale_y_continuous(trans = "fun")
enter image description here

geom_density doesn't fill correctly with scale_y_log10

Code:
require(ggplot2)
set.seed(0)
xvar <- rnorm(100)
ggplot(data.frame(xvar), aes(xvar)) + geom_density(fill="lightblue") + scale_y_log10()
The graph is something like this:
How can I make the graph shade on the right side of (viz. below) the density estimate?

The problem is that stat_density by default fills between the density and the y=0 line of the transformed data. So transformations that alter the y=0 line will fall victim to problems of this sort. I personally think this is a bug in ggplot2, although since graphical grammar experts probably argue that y-transformed densities are meaningless, the bug may not get a lot of attention.
A very kludgy workaround is to manually add an offset to ..density.., which you will have to explicitly invoke, and then change the breaks to make it look like you didn't do anything weird.
require(ggplot2)
require(scales)
set.seed(0)
xvar <- rnorm(100000)
quartz(height=4,width=6)
ggplot(data.frame(xvar), aes(x=xvar, y=log10(..density..)+4)) +
geom_density(fill='lightblue') +
scale_y_continuous(breaks=c(0,1,2,3,4),
labels=c('0.0001', '0.001', '0.01', '0.1','1'), limits=c(0,4),
name='density')
quartz.save('![StackOverflow_29111741_v2][1].png')
That code produces this graph:

This isn't a ggplot2 or even an R issue but is simply an issue with the tails of a probability distribution being undersampled for your sample sizes. The log axis can go down forever, taking infinitely long to "reach" zero, but no finite sample size can ever hope to cover the increasingly improbable regions of the distribution.
So, to make the plot pretty, you need to both (a) increase the number of points from 100 to 10,000 or higher, while (b) keeping the plot ylims the same. (Otherwise the extra data you draw in your rnorm call will sparsely populate the tails of the gaussian even farther away from the mean, convincing ggplot2 to make automatic y axis limits even lower, in the range of the poorly-sampled tails, and the noisiness that you don't like will return.)
require(ggplot2)
require(scales)
set.seed(0)
xvar <- rnorm(100000)
ggplot(data.frame(xvar), aes(xvar)) +
geom_density(fill="lightblue") +
scale_y_continuous(trans=log10_trans(), limits = c(0.01, 1))
This generates this plot, which I think is what you want.

ggplot2 2d Density Weights

I'm trying to plot some data with 2d density contours using ggplot2 in R.
I'm getting one slightly odd result.
First I set up my ggplot object:
p <- ggplot(data, aes(x=Distance,y=Rate, colour = Company))
I then plot this with geom_points and geom_density2d. I want geom_density2d to be weighted based on the organisation's size (OrgSize variable). However when I add OrgSize as a weighting variable nothing changes in the plot:
This:
p+geom_point()+geom_density2d()
Gives an identical plot to this:
p+geom_point()+geom_density2d(aes(weight = OrgSize))
However, if I do the same with a loess line using geom_smooth, the weighting does make a clear difference.
This:
p+geom_point()+geom_smooth()
Gives a different plot to this:
p+geom_point()+geom_smooth(aes(weight=OrgSize))
I was wondering if I'm using density2d inappropriately, should I instead be using contour and supplying OrgSize as the 'height'? If so then why does geom_density2d accept a weighting factor?
Code below:
require(ggplot2)
Company <- c("One","One","One","One","One","Two","Two","Two","Two","Two")
Store <- c(1,2,3,4,5,6,7,8,9,10)
Distance <- c(1.5,1.6,1.8,5.8,4.2,4.3,6.5,4.9,7.4,7.2)
Rate <- c(0.1,0.3,0.2,0.4,0.4,0.5,0.6,0.7,0.8,0.9)
OrgSize <- c(500,1000,200,300,1500,800,50,1000,75,800)
data <- data.frame(Company,Store,Distance,Rate,OrgSize)
p <- ggplot(data, aes(x=Distance,y=Rate))
# Difference is apparent between these two
p+geom_point()+geom_smooth()
p+geom_point()+geom_smooth(aes(weight = OrgSize))
# Difference is not apparent between these two
p+geom_point()+geom_density2d()
p+geom_point()+geom_density2d(aes(weight = OrgSize))

geom_density2d is "accepting" the weight parameter, but then not passing to MASS::kde2d, since that function has no weights. As a consequence, you will need to use a different 2d-density method.
(I realize my answer is not addressing why the help page says that geom_density2d "understands" the weight argument, but when I have tried to calculate weighted 2D-KDEs, I have needed to use other packages besides MASS. Maybe this is a TODO that #hadley put in the help page that then got overlooked?)

Setting breakpoints for data with scale_fill_brewer() function in ggplot2

I am creating a map (choropleth) as described on the ggplot2 wiki. Everything works like a charm, except that I am running into an issue mapping a continuous value to the polygon fill color via the scale_fill_brewer() function.
This question describes the problem I'm having. As in the answer, my workaround has been to pre-cut my data into bins using the gtools quantcut() function:
UPDATE: This first example is actually the right way to do this
require(gtools) # needed for quantcut()
...
fill_factor <- quantcut(fill_continuous, q=seq(0,1,by=0.25))
ggplot(mydata) +
aes(long,lat,group=group,fill=fill_factor) +
geom_polygon() +
scale_fill_brewer(name="mybins", palette="PuOr")
This works, however, I feel like I should be able to skip the step of pre-cutting my data and do something like this with the breaks option:
ggplot(mydata) +
aes(long,lat,group=group,fill=fill_continuous) +
geom_polygon() +
scale_fill_brewer(names="mybins", palette="PuOr", breaks=quantile(fill_continuous))
But this doesn't work. Instead I get an error something like:
Continuous variable (composite score) supplied to discrete scale_brewer.
Have I misunderstood the purpose of the "breaks" option? Or is breaks broken?

A major issue with pre-cutting continuous data is that there are three pieces of information used at different points in the code:
The Brewer palette -- determines the maximum number of colors available
The number of break points (or the bin width) -- has to be specified with the data
The actual data to be plotted -- influences the choice of the Brewer palette (sequential/diverging)
A true vicious circle. This can be broken by providing a function that accepts the data and the palette, automatically derives the number of break points and returns an object that can be added to the ggplot object. Something along the following lines:
fill_brewer <- function(fill, palette) {
require(RColorBrewer)
n <- brewer.pal.info$maxcolors[palette == rownames(brewer.pal.info)]
discrete.fill <- call("quantcut", match.call()$fill, q=seq(0, 1, length.out=n))
list(
do.call(aes, list(fill=discrete.fill)),
scale_fill_brewer(palette=palette)
)
}
Use it like this:
ggplot(mydata) + aes(long,lat,group=group) + geom_polygon() +
fill_brewer(fill=fill_continuous, palette="PuOr")

As Hadley explains, the breaks option moves the ticks, but does not make the data continuous. Therefore pre-cutting the data as per the first example in the question is the right way to use the scale_fill_brewer command.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to plot a graph of Probability density function using ggplot - r

Related

Geom_qq() Logarithmic Transformation

How to set heigth of rows grid in graph lines on ggplots (R)?

geom_density doesn't fill correctly with scale_y_log10

ggplot2 2d Density Weights

Setting breakpoints for data with scale_fill_brewer() function in ggplot2

Categories

Resources