How can I make a graph like this in R? - r

I need to make a probability plot for a geochemical analysis (cumulative vs element concentration in log scale), like in the picture:
.
I have tried with ppPlot in the 'qualityTools' package with the 'log-normal' argument but for some elements it does not work. It says I need positive values for a log-normal distributions but they are all positive, I've checked. I think the command uses the 'density' function in base R, and its density model inadvertently produces negative concentration values.
The 'qqnorm' command in base R produces a different kind of plot.
How can I go around this?
Edit: Here's part of my magnesium data (I need to generate a similar graph with them):
mg <- c(51.400, 149.000, 276.000, 135.000, 179.000, 81.000, 116.000, 8.150, 7.770, 7.870, 8.840, 15.600, 13.400,
57.400, 7.440, 14.800, 40.800, 15.100, 21.400, 5.550, 3.390, 18.800, 20.100, 19.600, 11.600, 11.700,
12.200, 12.500, 11.700, 12.100, 13.000, 12.300, 13.300, 13.200, 12.600, 29.700, 25.400, 21.000, 11.100,
11.500, 11.000, 32.600, 17.500, 16.500, 18.100, 27.200, 21.200, 26.400, 18.800, 19.900, 32.000, 28.600,
29.400, 30.700, 2.370, 2.070, 1.850, 1.970, 24.900, 19.100, 17.400, 23.100, 50.100, 48.800, 18.000,
15.800, 27.100, 43.500, 4.820, 13.400, 14.600, 24.100, 22.700, 22.500, 43.500, 41.300, 43.700, 41.100,
40.800, 63.700, 7.700, 8.360, 60.000, 58.400, 63.100, 65.100, 219.000, 25.800, 4.940, 3.670, 13.800,
5.190, 14.700, 15.000, 13.100, 12.300, 10.700, 10.700, 11.100, 10.100, 10.600, 63.200, 19.800, 22.200,
17.600, 11.500, 10.600, 9.380, 3.190, 9.180, 10.800, 189.000, 190.000, 152.000, 119.000, 194.000, 56.100)

This is my solution, but i had to create a new hole data frame with random numbers:
ggplot()+
geom_point(data=N,aes(x=Prob,y=AS))+
scale_x_log10(breaks=c(0.1,1,5,20,50,80,95,99,99.9))+
scale_y_log10(expand=c(0,0),
breaks=c(seq(0.01,0.1,0.01),seq(0.2,1,0.1),seq(2,10,1)))+
ylab(paste("As(","\U00B5","g/l)"))+ xlab("Cummulative probability (%)")+
theme_bw()+annotation_logticks()
For the lines you should add segments and text in other layer for example.

Related

Empirical Cumulative Density Function - R software

I have a problem with plotting ECDF. I try to reverse the x axis value like 1-(the function).
Because I wanna have smaller in the beginning of the graph and decreasing like in my reference graph.
load("91-20.RData")
ts <- data.frame(dat91,dat92,dat93,dat94,dat95,dat96,dat97,
dat98,dat99,dat00,dat11,dat12,dat12,dat13,
dat14,dat15,dat16,dat17,dat18,dat19,dat20)
ts
tsclean <- na.omit(ts)
#--------------------------------------------------------
ggplot(tsclean, aes(tsclean$dat91)) +
stat_ecdf(geom = "step")
This graph what i have, but i wanna duplicate like the reference
load("91-20.RData")
ts <- data.frame(dat91,dat92,dat93,dat94,dat95,dat96,dat97,
dat98,dat99,dat00,dat11,dat12,dat12,dat13,
dat14,dat15,dat16,dat17,dat18,dat19,dat20)
ts
tsclean <- na.omit(ts)
I think the graph you're looking for is called an "exceedance" graph. A web search finds some resources; try a web search for "R exceedance graph".
EDIT: This is more suitable as a comment than an answer, but my web browser is being unhelpful at the moment; sorry for the distraction.

Label outliers using mvOutlier from MVN in R

I'm trying to label outliers on a Chi-square Q-Q plot using mvOutlier() function of the MVN package in R.
I have managed to identify the outliers by their labels and get their x-coordinates. I tried placing the former on the plot using text(), but the x- and y-coordinates seem to be flipped.
Building on an example from the documentation:
library(MVN)
data(iris)
versicolor <- iris[51:100, 1:3]
# Mahalanobis distance
result <- mvOutlier(versicolor, qqplot = TRUE, method = "quan")
labelsO<-rownames(result$outlier)[result$outlier[,2]==TRUE]
xcoord<-result$outlier[result$outlier[,2]==TRUE,1]
text(xcoord,label=labelsO)
This produces the following:
I also tried text(x = xcoord, y = xcoord,label = labelsO), which is fine when the points are near the y = x line, but might fail when normality is not satisfied (and the points deviate from this line).
Can someone suggest how to access the Chi-square quantiles or why the x-coordinate of the text() function doesn't seem to obey the input parameters.
Looking inside the mvOutlier function, it looks like it doesn't save the chi-squared values. Right now your text code is treating xcoord as a y-value, and assumes that the actual x value is 1:2. Thankfully the chi-squared value is a fairly simple calculation, as it is rank-based in this case.
result <- mvOutlier(versicolor, qqplot = TRUE, method = "quan")
labelsO<-rownames(result$outlier)[result$outlier[,2]==TRUE]
xcoord<-result$outlier[result$outlier[,2]==TRUE,1]
#recalculate chi-squared values for ranks 50 and 49 (i.e., p=(size:(size-n.outliers + 1))-0.5)/size and df = n.variables = 3
chis = qchisq(((50:49)-0.5)/50,3)
text(xcoord,chis,label=labelsO)
As it is mentioned in the previous reply, MVN packge does not support to label outliers. Although it is not really necessary since it can be done manually, we still might consider to add "labeling outliers" option within mvOutlier(...) function. Thanks for your interest indeed. We might include it in the following updates of the package.
The web-based version of the MVN package has now ability to label outliers (Advanced options under Outlier detection tab). You can access this web-tool through http://www.biosoft.hacettepe.edu.tr/MVN/

R, graph of binomial distribution

I have to write own function to draw the density function of binomial distribution and hence draw
appropriate graph when n = 20 and p = 0.1,0.2,...,0.9. Also i need to comments on the graphs.
I tried this ;
graph <- function(n,p){
x <- dbinom(0:n,size=n,prob=p)
return(barplot(x,names.arg=0:n))
}
graph(20,0.1)
graph(20,0.2)
graph(20,0.3)
graph(20,0.4)
graph(20,0.5)
graph(20,0.6)
graph(20,0.7)
graph(20,0.8)
graph(20,0.9)
#OR
graph(20,scan())
My first question : is there any way so that i don't need to write down the line graph(20,p) several times except using scan()?
My second question :
I want to see the graph in one device or want to hit ENTER to see the next graph. I wrote
par(mfcol=c(2,5))
graph(20,0.1)
graph(20,0.2)
graph(20,0.3)
graph(20,0.4)
graph(20,0.5)
graph(20,0.6)
graph(20,0.7)
graph(20,0.8)
graph(20,0.9)
but the graph is too tiny. How can i present the graphs nicely with giving head line n=20 and p=the value which i used to draw the graph?[though it can be done by writing mtext() after calling the function graphbut doing so i have to write a similar line few times. So i want to do this including in function graph. ]
My last question :
About comment. The graphs are showing that as the probability of success ,p is increasing the graph is tending to right, that is , the graph is right skewed.
Is there any way to comment on the graph using program?
Here a job of mapply since you loop over 2 variables.
graph <- function(n,p){
x <- dbinom(0:n,size=n,prob=p)
barplot(x,names.arg=0:n,
main=sprintf(paste('bin. dist. ',n,p,sep=':')))
}
par(mfcol=c(2,5))
mapply(graph,20,seq(0.1,1,0.1))
Plotting base graphics is one of the times you often want to use a for loop. The reason is because most of the plotting functions return an object invisibly, but you're not interested in these; all you want is the side-effect of plotting. A loop ignores the returned obects, whereas the *apply family will waste effort collecting and returning them.
par(mfrow=c(2, 5))
for(p in seq(0.1, 1, len=10))
{
x <- dbinom(0:20, size=20, p=p)
barplot(x, names.arg=0:20, space=0)
}

need cube plot for 2 factors factorial design in R

Is there are any R package that can produce cube plots for 2 factors? I want something similar to the first plot at the end of this page
http://www.processma.com/resource/factorial_plots.htm
It is possible to obtain such plots in Minitab.
In the package FrF2 there is the command cubeplot but only for 3 factors.
Of course I can use 2 identical factors, but want images with nice squares(instead of cubes).
You can use cubePlot from FrF2 package. It produces a cube plot for the combined effect of three factors. Here an example :
data(BM93.e3.data) #from BsMD
iMdat <- BM93.e3.data[1:16,2:10] #only original experiment
colnames(iMdat) <- c("MoldTemp","Moisture","HoldPress","CavityThick","BoostPress",
"CycleTime","GateSize","ScrewSpeed", "y")
iM.lm <- lm(y ~ (.)^2, data = iMdat)
cubePlot(iM.lm, "MoldTemp", "HoldPress", "BoostPress")

1-D conditional slice from a 2-D probability density function in R using np package

consider the included example in the np-package for r,
page 21 of the Vignettes for np package.
npcdens returns a conditional density object and is able to plot 2d-pdf and 2d-cdf, as shown. I wanted to know if I can somehow extract the 1-D information (pdf / cdf) from the object if I were to specify one of the two parameters, like in a vector or something ?? I am new to R and was not able to find out the format of the object.
Thanks for the help.
-Egon.
Here is the code as requested:
require(np)
data("Italy")
attach(Italy)
bw <- npcdensbw(formula=gdp~ordered(year), tol=.1, ftol=.1)
fhat <- npcdens(bws=bw)
summary(fhat)
npplot(bws=bw)
npplot(bws=bw, cdf=TRUE)
detach(Italy)
The fhat object contains all the needed info plus a whole lot more. To see what all is in there, do a str( fhat ) to see the structure.
I believe the values you are interested in are xeval, yeval, and condens (PDF density).
There are lots of ways to get at the values but I tend to like data frames. I'd pop the three vectors in a single data frame:
denDf <- cbind( year=as.character( fhat$xeval[,1] ), fhat$yeval, fhat$condens )
## had to do a dance around the year variable because it's a factor
then I'd select the values I want with a subset():
subset( denDf, year==1951 & gdp > 8 & gdp < 8.2)
since gdp is a floating point value it's very hard to select with a == operator.
The method suggested by JD Long will only extract density for data points in the existing training set. If you want the density at other points (conditioning or conditional variables) you will need to use the predict()
function. The following code extracts and plots the 1-D density distribution conditioned on year ==1999, a value not contained in the original data set.
First construct a data frame with the same components as the Italy data set, with gdp regularly spaced and with "1999" an ordered factor.
yr1999<- rep("1999", 100)
gdpVals <-seq(1,35, length.out=100)
nD1999 <- data.frame(year = ordered(yr1999), gdp = gdpVals)
Next use the predict function to extract the densities.
gdpDens1999 <-predict(fhat,newdata = nD1999)
The following code plots the density.
plot(gdpVals, gdpDens1999, type='l', col='red', xlab='gdp', ylab = 'p(gdp|yr = 1999)')

Resources