I'm looking for some technique in R similiar to command hold all in Matlab.
In Matlab I generate some data:
x = normrnd(0,1,1000,1);
[a,b]=hist(x,20);
L=b(2)-b(1);
area=sum(L*a);
frequency=a/area;
bar(b,frequency,1);
hold all;
range=b(1):0.1:b(20);
f1=normpdf(range,0,1);
f2=normpdf(range,2,2);
plot1=plot(range,f1,'r');
plot2=plot(range,f2,'m');
hold off;
I would like to create something similiar in R. I've tried this way:
x <- rnorm(1000)
h <- hist(x, breaks = 20)
a <- h$counts
b <- h$mids
L <- b[2] - b[1]
area <- sum(L*a)
frequency = a/area
range <- seq(b[1],b[20], by = 0.1)
f1 <- dnorm(range,0,1)
f2 <- dnorm(range,2,2)
barplot(frequency, names.arg = c(b))
And I stopped here, since I don't know how to add another graph to current plot. I tried to use ggplot2, but I haven't much experience with that and I failed on creating barplot with this library.
If there is a way to do that with ggplot2, I would like to know it with explanation, since I want to learn it. I will appreciate solution with traditional plot system aswell.
P.S. I used barplot(frequency, names.arg = c(b)), because I read here, that there is no equivalent in R for Matlab's bar function.
Sometimes it is better to tell us what you are trying to do, rather than how you are trying to do it. From the looks of your R code your boxplot is just a scaled histogram and from the other R code and my guesses from the matlab code you want to add reference lines for normal distributions. If I am correct then you are going about this the long way in R. The following R code is much simpler:
x <- rnorm(1000)
hist(x, prob=TRUE)
curve(dnorm(x,0,1), add=TRUE)
curve(dnorm(x,2,2), add=TRUE)
Even better would be to add col='blue' or similar to the curve calls. If you really feel the need to choose your own x values then you can replace the calls to curve with:
lines(range, dnorm(range, 0, 1) )
lines(range, dnorm(range, 2, 2) )
If you really want to learn to add lines to a barplot then you should realize that the default locations for bars may not be what you expect. Look at the updateusr function in the TeachingDemos package for R for examples of adding lines to a barplot.
Related
I'm trying to plot 18000 distributions as a heatmap type thing in R
One row can easily be plotted as a histogram but as i need to represent so many the only option I can think of is a heatmap.
This is not currently working as all the heatmap/imaging functions seem to do some kind of clustering/compare the rows instead of just plotting the distribution like in a histogram.
Does anyone know how to get around the problem or a better way of representing a large number of distribution?
matrix <- replicate(100, rnorm(100))
hist(matrix[1,],breaks = 60)
image2D(z=matrix, border="black")
image2D doesn't seem to do the trick...
Thanks
Edit 12/06/18:
Using
library(denstrip)
Does the trick for anyone who needs to visualise differences in a large amount of distributions.
You could overlay a lot of density plots using transparancy to get a sense of overlap.
m <- replicate(100, rnorm(100))
plot(range(m), c(0, 0.5), type = 'n')
for (i in 1:ncol(m)) lines(density(m[, i]), col = rgb(0.5, 0.5, 0.5, 0.5))
How can I get the area under overlapping density curves?
How can I solve the problem with R? (There is a solution for python here: Calculate overlap area of two functions )
set.seed(1234)
df <- data.frame(
sex=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5),
rnorm(200, mean=65, sd=5)))
)
(Source: http://www.sthda.com/english/wiki/ggplot2-density-plot-quick-start-guide-r-software-and-data-visualization )
ggplot(df, aes(x=weight, color=sex, fill=sex)) +
geom_density(aes(y=..density..), alpha=0.5)
"The points used in the plot are returned by ggplot_build(), so you can access them." So now, I have the points, and I can feed them to approxfun, but my problem is that i don't know how to subtract the density functions.
Any help greatly appreciated! (And I believe in high demand, there is no solution for this readily available.)
I was looking for a way to do this for empirical data, and had the problem of multiple intersections as mentioned by user5878028. After some digging I found a very simple solution, even for a total R noob like me:
Install and load the libraries "overlapping" (which performs the calculation) and "lattice" (which displays the result):
library(overlapping)
library(lattice)
Then define a variable "x" as a list that contains the two density distributions you want to compare. For this example, the two datasets "data1" and "data2" are both columns in a text file called "yourfile":
x <- list(X1=yourfile$data1, X2=yourfile$data2)
Then just tell it to display the output as a plot which will also display the estimated % overlap:
out <- overlap(x, plot=TRUE)
I hope this helps someone like it helped me! Here's an example overlap plot
I will make a few base R plots, but the plots are not actually part of
the solution. They are just there to confirm that I am getting the right
answer.
You can get each of the density functions and solve for where they intersect.
## Create the two density functions and display
FDensity = approxfun(density(df$weight[df$sex=="F"], from=40, to=80))
MDensity = approxfun(density(df$weight[df$sex=="M"], from=40, to=80))
plot(FDensity, xlim=c(40,80), ylab="Density")
curve(MDensity, add=TRUE)
Now solve for the intersection
## Solve for the intersection and plot to confirm
FminusM = function(x) { FDensity(x) - MDensity(x) }
Intersect = uniroot(FminusM, c(40, 80))$root
points(Intersect, FDensity(Intersect), pch=20, col="red")
Now we can just integrate to get the area of the overlap.
integrate(MDensity, 40,Intersect)$value +
integrate(FDensity, Intersect, 80)$value
[1] 0.2952838
The above two proposed methods give different results.
If the data in the first answer is given to the overlap function it will result in overlap% of 0.18, while the first one results in overlap% of 0.29.
X1 = df$weight[df$sex=="F"]
X2 = df$weight[df$sex=="M"]
x=list(X1=X1, X2=X2)
out <- overlap(x, plot=TRUE)
out$OV
X1-X2
0.1754
I have 23 different groups,each of them consists of from 7 to 20 individual samples (totally approximately 350-400 observations) with their own x,y & z coordinates. I'd like to produce 3D plot based on the data i have by means of plot3d function of rgl R package. It's not a big deal in general. The problem, that i'd like to make each one from the mentioned above 23 groups to be easy distinguishable on the 3D plot. I tried to use different colors for each group, but unfortunately it's not possible to find a 23 well recognizable by human eyes colors. I was thinking about pch parameter like in the plot function of base R library. But, again, as i can see there is not such option in the plot3d function. Besides, i have to explain, that there are too much points in my data set and adding the labels to each point (e.g. with text3d rgl function) is not a good idea (they will overlap with each other and give in result some kind of a mess on the 3D plot). Is there way to figure out it (i gues it's very common problem)? Thank you in advance!
Below is code of some toy example for better explanation:
# generate data
prefix=rep("ID",69)
suffix=rep(1:23,3)
suffix_2=as.character(suffix[order(suffix)])
titles_1=paste(prefix,suffix,sep="_")
titles_2=titles_1[order(titles_1)]
x=1:69
y=x+20
z=x+50
df=data.frame(titles_2,x,y,z)
# load rgl library
library('rgl')
# make 3D plot
plot3d(x,y,z)
If you like living on the bleeding edge, there's a new function rgl::pch3d() that draws symbols using the same codes as points() does
in base graphics. It's in rgl 0.95.1475, available on R-forge (and within a few hours on Github; see How do I install the latest version of rgl?). It's not completely working with rglwidget() yet.
The example code
open3d()
i <- 0:25; x <- i %% 5; y <- rep(0, 26); z <- i %/% 5
pch3d(x, y, z, pch = i, bg = "green")
text3d(x, y, z + 0.3, i)
pch3d(x + 5, y, z, pch = LETTERS[i+1])
text3d(x + 5, y, z + 0.3, i+65)
produces this display (after some resizing and rotation):
It's not perfect, but how about using letters a-w to distinguish the groups?
with(df,plot3d(x,y,z))
with(df,text3d(x,y,z,texts=letters[titles_2]))
Because i'm going to use the 3D plot for publication purposes i used this solution for now. It's not pretended to be the best one.
# generate data
prefix=rep("ID",69)
suffix=rep(1:23,3)
suffix_2=as.character(suffix[order(suffix)])
titles_1=paste(prefix,suffix,sep="_")
titles_2=titles_1[order(titles_1)]
x=1:69
y=x+20
z=x+50
df=data.frame(titles_2,x,y,z)
# load rgl library
library('rgl')
# load randomcoloR library
library(randomcoloR)
# create a custom palette
palette <- distinctColorPalette(23)
palette(palette)
# make 3D plot
plot3d(x,y,z,size = 10,col=suffix[order(suffix)])
With Mathematica I made a plot.
With R this plot can be made to look more elegant, I guess.
How can I make such a plot in R?
It is about the function M_{\pm}
M^2_\pm = \frac{y \pm \sqrt{14x + 6xy + y^2}}{2x}
The following is show on the plot
The curve M^2_+ = M_-
The curve M^2_+ = 0
The curve M^2_- = 0
The shaded region where both M^2_+ and M^2_- > 0
Some points with text
In the new plot
The axes should be on the outside of the plot as is usual in R
I would welcome a more elegant alternative for the text and the arrows in the pictures
P.S. With the help pages of R I tried to make such a plot, but I didn't get beyond the basic use of plot and curve.
Update Maybe contour can do the job
You could do something like this:
f <- function(x,y){x*y}
x <- seq(0.2,2,length=1000)
objective <- 0.5
y <- c()
for(i in 1:length(x)){
y[i] <- optimize(function(y){abs(f(x[i],y)-objective)},interval=c(0,4))$minimum
}
plot(x,y,type="l")
This plot shows where the function x*y=0.5 for x between 0.2 and 2. This isn't for your particular function, but I hope it's a useful start. Note that this is very hacky since optimize is slow and for loops should generally be avoided in R whenever possible.
I have a couple of cumulative empirical density functions which I would like to plot on top of each other in order to illustrate differences in the two curves. As was pointed out in a previous question, the function to draw the ECDF is simply plot(Ecdf()) And as I read the fine manual page, I determined that I can plot multiple ECDFs on top of each other using something like the following:
require( Hmisc )
set.seed(3)
g <- c(rep(1, 20), rep(2, 20))
Ecdf(c( rnorm(20), rnorm(20)), group=g)
However my curves sometimes overlap a bit and can be hard to tell which is which, just like the example above which produces this graph:
I would really like to make the color of these two CDFs different. I can't figure out how to do that, however. Any tips?
If memory serves, I have done this in the past. As I recall, you needed to trick it as Ecdf() is so darn paramterised. I think in help(ecdf) it hints that it is just a plot of stepfunctions, so you could estimate two or more ecdfs, plot one and then annotate via lines().
Edit Turns out it is as easy as
R> Ecdf(c(rnorm(20), rnorm(20)), group=g, col=c('blue', 'orange'))
as the help page clearly states the col= argument. But I have also found some scriptlets where I used plot.stepfun() explicitly.
You can add each curve one at a time (each with its own style), e.g.
Ecdf(rnorm(20), lwd = 2)
Ecdf(rnorm(20),add = TRUE, col = 'red', lty = 1)
Without using Ecdf (doesn't look like Hmisc is available):
set.seed(3)
mat <- cbind(rnorm(20), rnorm(20))
matplot(apply(mat, 2, sort), seq(20)/20, type='s')