How to control Y-axis units using sm.density.compare() - r

I have used sm.density.compare to plot 3 density functions for data with values between -90 and +10. The Y axis is labeled "density" and has the range 0 - 1.0 as for proportions or probability.
I then plot 4 density functions for data with values between 0 and 1.0. I get a useful plot and the Y axis still reads "density" but the values are apparently counts and range between 0 and 12 or so.
The function sm.options does not seem to offer control of which you get. I'd like both to be probability or proportions.
I'm new to R but have a substantial history with other software.

Related

Making binned polar plots

Python newbie here
I have 3 sets of 1x130677 data, theta (0 to 360), R (0 to 30) and C. I want to create a polar plot that is binned in 5 degrees of theta increments and and 1 unit length of R. The C values are C(theta,R) [-not a function], first data in theta and R correspond to the first data in C. Each bin should contain multiple C values and average them and then the whole plot will be color coded. This is a close example:Polar histogram in Python for given r, theta and z values
Any help making this plot is greatly appreciated!
thank you

Good plot same as my data range?

I have a data frame data, I would like draw a proper density plot for it.When I have drown plot the interval is shown a wider range than my data.
input:
X Y
1 0.4078791 0.471845
2 0.2892282 0.205871
3 0.4254774 0.407548
4 0.4749196 0.396765
5 0.2763627 0.142572
6 0.3942402 0.457668
7 0.2427948 0.248003
8 0.3117754 0.322484
9 0.4350599 0.450679
10 0.4459200 0.338858
That's how a kernel density estimation works. The result must cover a wider range than your data. You can try different kernels and bandwidth algorithms or fiddle with the adjust parameter, but you actually want the density estimate to cover a wider range than your data. Otherwise it wouldn't be a proper density estimate.

Understanding what the kde2d z values mean?

I have two data sets that I am comparing using a ked2d contour plot on a log10 scale,
Here I will use an example of the following data sets,
b<-log10(rgamma(1000,6,3))
a<-log10((rweibull(1000,8,2)))
density<-kde2d(a,b,n=100)
filled.contour(density,color.palette=colorRampPalette(c('white','blue','yellow','red','darkred')))
This produces the following plot,
Now my question is what does the z values on the legend actually mean? I know it represents where most the data lies but 0-15 confuses me. I thought it could be a percentage but without the log10 scale I have values ranging from 0-1? And I have also produced plots with scales 1-1.2, 1-2 using my real data.
The colors represent the the values of the estimated density function ranging from 0 to 15 apparently. Just like with your other question about the odd looking linear regression I can relate to your confusion.
You just have to understand that a density's integral over the full domain has to be 1, so you can use it to calculate the probability of an observation falling into a specific region.

How to spread points in boxplot in R?

I am working on data distribution which has following follwing points.
input<-read.table("infile",header=TRUE,sep="\t")
table(input)
0.786333 1 1.04453 1.06159 1.33277 1.53607 2.25893
49 938 1 1 36 16 166
if i plot box plot for it, i get single line for lowest datum, highest datum and median.
boxplot(input)
Is there any way to distribute points by normalization so that can have better boxplot with distinct boundary for lowest datum, highest datum and median?
You clearly have a biomodal distribution, I don't think a boxplot is a useful summary here
A density plot is more useful
plot(density(zz))
You could also consider a violin plot which is a bit of a mix between a kernel density plot and boxplot.
Using the vioplot package
library(vioplot)
violplot(zz)

Density plot in R, ggplot2

I am trying to plot and compare two sets of decimal numbers, between 0 and 1 using the R package, ggplot2. When I plotted using geom="density" in qplot, I noticed that the density curve goes past 1.0. I would like to have a density plot for the data that does not exceed the value range of the set, ie, all the area stays between 0 and 1.
Is it possible to plot the density between the values 0 and 1, without going past 1 or 0? If so, how would I accomplish this? I need the area of the two plots to be equal between 0 and 1, the range of the data.
Here is the code I used to generate the plots.
Right: qplot(precision,data = compare, fill=factor(dataset),binwidth = .05,geom="density", alpha=I(0.5))+ xlim(-1,2)
Left:qplot(precision,data = compare, fill=factor(dataset),binwidth = .05,geom="density", alpha=I(0.5))
You might consider using a different tool to estimate the density (the built in density functions do not consider bounds), then use ggplot2 to plot the estimated densities. The logspline package has tools that will estimate densities (useing a different algorythm than density does) and you can tell the functions that your density is bounded between 0 and 1 and it will take that into consideration in estimating the densities. Then use ggplot2 (or other code) to compare the estimated densities.

Resources