R: Show Frequency of Values and dnorm curve - r

I have imported a csv file with one column and its values.
a <- read.csv("/home/file.csv")
Now I want a histogram with frequencies of the values
h <- hist(as.matrix(a))
text(h$mids,h$counts,labels=h$counts, adj=c(0.5,-0.5))
What does the function adj=c(0.5,-0.5) exactly?
x <- seq(8,13, 1)
-> that's the first and end value (8,13)
curve(dnorm(x, mean=mean(a$weight), sd=sd(a$weight)), add=T)
Now I have the curve, but not the frequencies of the values anymore.
I also tried it with: h <- hist(as.matrix(a), prob=T)
And I can't set the ylab and xlab h <- hist(as.matrix(a), xlab="..", ylab=".." ) it takes default labs.

Related

How to create a monomial plot in R?

I want to create a function, that result will be a plot of moniomals ( degree less than "n").
I wrote the simple code.
Monomial=function(m){
x=1:100
y=1:100
for(i in m) x2=x^m
plot(y,x2,type="l",col="red",xlab="Arguments",ylab="Values",
main=expression("Monomials"))
But for example: Monomial(3) I getting plot x^3. I need yet x^1 and x^2. How to name each line?
Here is what you need:
Monomial <- function(m){
x <- 1:100
cols <- palette(rainbow(m))
plot(x,x,type="l",col = cols[1],xlab="Arguments",ylab="Values",
main=expression("Monomials"))
for (d in 2:m){
lines(x, x^d, type="l", col=cols[d])
}
legend(90, 60, legend=c(as.character(paste0("x",1:m))),
col=cols, lty=1, cex=0.6)
}
You need to generate colors. This is what the cols variable achieves. lines adds a new curve to existing axes. Finally, ledend adds a legend to the plot.

R Statistics Distributions Plotting

I am having some trouble with a homework I have at Statistics.
I am required to graphical represent the density and the distribution function in two inline plots for a set of parameters at my choice ( there must be minimum 4 ) for Student, Fisher and ChiS repartitions.
Let's take only the example of Student Repartition.
From what I have searched on the internet, I have come with this:
First, I need to generate some random values.
x <- rnorm( 20, 0, 1 )
Question 1: I need to generate 4 of this?
Then I have to plot these values with:
plot(dt( x, df = 1))
plot(pt( x, df = 1))
But, how to do this for four set of parameters? They should be represented in the same plot.
Is this the good approach to what I came so far?
Please, tell me if I'm wrong.
To plot several densities of a certain distribution, you have to first have a support vector, in this case x below.
Then compute the values of the densities with the parameters of your choice.
Then plot them.
In the code that follows, I will plot 4 Sudent-t pdf's, with degrees of freedom 1 to 4.
x <- seq(-5, 5, by = 0.01) # The support vector
y <- sapply(1:4, function(d) dt(x, df = d))
# Open an empty plot first
plot(1, type = "n", xlim = c(-5, 5), ylim = c(0, 0.5))
for(i in 1:4){
lines(x, y[, i], col = i)
}
Then you can make the graph prettier, by adding a main title, changing the axis titles, etc.
If you want other distributions, such as the F or Chi-squared, you will use x strictly positive, for instance x <- seq(0.0001, 10, by = 0.01).

New outliers appear after I remove existing ones using QQ Plot Results

I'm working on the PCA section from Michael Faraway's Linear Models with R (chapter 11, page 164).
PCA analysis is sensitive to outliers and the Mahalanobis distance helps us identify them.
The author checks for outliers by plotting the Mahalanobis distance against the quantiles of a chi-squared distribution.
if require(faraway)==F install.packages("faraway"); require(faraway)
data(fat, package='faraway')
cfat <- fat[,9:18]
n <- nrow(cfat); p <- ncol(cfat)
plot(qchisq(1:n/(n+1),p), sort(md), xlab=expression(paste(chi^2,
"quantiles")),
ylab = "Sorted Mahalanobis distances")
abline(0,1)
I identify the points:
identify(qchisq(1:n/(n+1),p), sort(md))
It appears that the outliers are in rows 242:252. I remove these outliers and re-create the QQ Plot:
cfat.mod <- cfat[-c(242:252),] #remove outliers
robfat <- cov.rob(cfat.mod)
md <- mahalanobis(cfat.mod, center=robfat$center, cov=robfat$cov)
n <- nrow(cfat.mod); p <- ncol(cfat.mod)
plot(qchisq(1:n/(n+1),p), sort(md), xlab=expression(paste(chi^2,
"quantiles")),
ylab = "Sorted Mahalanobis distances")
abline(0,1)
identify(qchisq(1:n/(n+1),p), sort(md))
Alas, it appears now that a new set of points (rows 234:241) are now outliers. This keeps happening every time I remove additional outliers.
Look forward to understanding what I'm doing wrong.
To identify the points correctly, make sure the labels correspond to the positions of the points in the data. The functions order or sort with index.return=TRUE will give the sorted indices. Here is an example, arbitrarily removing the points with md greater than a threshold.
## Your data
data(fat, package='faraway')
cfat <- fat[, 9:18]
n <- nrow(cfat)
p <- ncol(cfat)
md <- sort(mahalanobis(cfat, colMeans(cfat), cov(cfat)), index.return=TRUE)
xs <- qchisq(1:n/(n+1), p)
plot(xs, md$x, xlab=expression(paste(chi^2, 'quantiles')))
## Use indices in data as labels for interactive identify
identify(xs, md$x, labels=md$ix)
## remove those with md>25, for example
inds <- md$x > 25
cfat.mod <- cfat[-md$ix[inds], ]
nn <- nrow(cfat.mod)
md1 <- mahalanobis(cfat.mod, colMeans(cfat.mod), cov(cfat.mod))
## Plot the new data
par(mfrow=c(1, 2))
plot(qchisq(1:nn/(nn+1), p), sort(md1), xlab='chisq quantiles', ylab='')
abline(0, 1, col='red')
car::qqPlot(md1, distribution='chisq', df=p, line='robust', main='With car::qqPlot')

how do i fit unique curves on each unique plot in a for loop

I have written this code (see below) for my data frame kleaf.df to combine multiple plots of variable press_mV with each individual plot for unique ID
I need some help fitting curves to my plots. when i run this code i get the same fitted curve (the curve fitted for the first plot) on ALL the plots where i want each unique fitted curve on each unique plot.
thanks in advance for any help given
f <- function(t,a,b) {a * exp(b * t)}
par(mfrow = c(5, 8), mar = c(1,1,1,1), srt = 0, oma = c(1,6,5,1))
for (i in unique(kleaf.df$ID))
{
d <- subset(kleaf.df, kleaf.df$ID == i)
plot(c(1:length(d$press_mV)),d$press_mV)
#----tp:turning point. the last maximum value before the values start to decrease
tp <- tail(which( d$press_mV == max(d$press_mV) ),1)
#----set the end points(A,B) to fit the curve to
A <- tp+5
B <- A+20
#----t = time, p = press_mV
# n.b:shift by 5 accomadate for the time before attachment
t <- A:B+5
p <- d$press_mV[A:B]
fit <- nls(p ~ f(t,a,b), start = c(a=d$press_mV[A], b=-0.01))
#----draw a curve on plot using the above coefficents
curve(f(x, a=co[1], b=co[2]), add = TRUE, col="green", lwd=2)
}

quantile plot, two data - issues with fitting the line in R

So I am trying to plot two p values from two different data frames and compare them to the normal distribution in QQplot in R
here is the code that I am using
## Taking values from 1st dataframe to plot
Rlogp = -log10(trialR$PVAL)
Rindex <- seq(1, nrow(trialR))
Runi <- Rindex/nrow(trialR)
Rloguni <- -log10(Runi)
## Taking values from 2nd dataframe to plot on existing plot
Nlogp = -log10(trialN$PVAL)
Nlogp = sort(Nlogp)
Nindex <- seq(1, nrow(trialN))
Nuni <- Nindex/nrow(trialN)
Nloguni <- -log10(Nuni)
Nloguni <- sort(Nloguni)
qqplot(Rloguni, Rlogp, xlim=range(0,6), ylim=range(0,6), col=rgb(100,0,0,50,maxColorValue=255), pch=19, lwd=2, bty="l",xlab ="", ylab ="")
qqline(Rloguni, Rlogp,distribution=qnorm, lty="dashed")
par(new=TRUE, cex.main=4.8, col.axis="white")
plot(Nloguni, Nlogp, xlim=range(0,6), ylim=range(0,6), col=rgb(0,0,100,50,maxColorValue=255), pch=19, lwd=2, bty="l",xlab ="", ylab ="")
The code plot the graph effectively,but I am not sure of the qqline as it seems bit offset... Can someone tell me if I am doing the correct way or is there something to change
the TARGET plot will look something like this - without the third data value..

Resources