quantile plot, two data - issues with fitting the line in R - r

So I am trying to plot two p values from two different data frames and compare them to the normal distribution in QQplot in R
here is the code that I am using
## Taking values from 1st dataframe to plot
Rlogp = -log10(trialR$PVAL)
Rindex <- seq(1, nrow(trialR))
Runi <- Rindex/nrow(trialR)
Rloguni <- -log10(Runi)
## Taking values from 2nd dataframe to plot on existing plot
Nlogp = -log10(trialN$PVAL)
Nlogp = sort(Nlogp)
Nindex <- seq(1, nrow(trialN))
Nuni <- Nindex/nrow(trialN)
Nloguni <- -log10(Nuni)
Nloguni <- sort(Nloguni)
qqplot(Rloguni, Rlogp, xlim=range(0,6), ylim=range(0,6), col=rgb(100,0,0,50,maxColorValue=255), pch=19, lwd=2, bty="l",xlab ="", ylab ="")
qqline(Rloguni, Rlogp,distribution=qnorm, lty="dashed")
par(new=TRUE, cex.main=4.8, col.axis="white")
plot(Nloguni, Nlogp, xlim=range(0,6), ylim=range(0,6), col=rgb(0,0,100,50,maxColorValue=255), pch=19, lwd=2, bty="l",xlab ="", ylab ="")
The code plot the graph effectively,but I am not sure of the qqline as it seems bit offset... Can someone tell me if I am doing the correct way or is there something to change
the TARGET plot will look something like this - without the third data value..

Related

R: Plot lines are very thick

When using matplot to plot a matrix using:
matplot(t, X[,1:4], col=1:4, lty = 1, xlab="Time", ylab="Stock Value")
my graph comes out as:
How do I reduce the line thickness? I previously used a different method and my graph was fine:
I have tried manupilating lwd but to no avail.
Even tried plot(t, X[1:4097,1]), yet the line being printed is very thick. Something wrong with my R?
EDIT: Here is the code I used to produce the matrix X:
####Inputs mean return, volatility, time period and time step
mu=0.25; sigma=2; T=1; n=2^(12); X0=5;
#############Generating trajectories for stocks
##NOTE: Seed is fixed. Changing seed will produce
##different trajectories
dt=T/n
t=seq(0,T,by=dt)
set.seed(201)
X <- matrix(nrow = n+1, ncol = 4)
for(i in 1:4){
X[,i] <- c(X0,mu*dt+sigma*sqrt(dt)*rnorm(n,mean=0,sd=1))
X[,i] <- cumsum(X[,i])
}
colnames(X) <- paste0("Stock", seq_len(ncol(X)))
Just needed to add type = "l" to matplot(....). Plots fine now.
matplot(t, X[,1:4], col=1:4, type = "l", xlab="Time", ylab="Stock Value")

R: Show Frequency of Values and dnorm curve

I have imported a csv file with one column and its values.
a <- read.csv("/home/file.csv")
Now I want a histogram with frequencies of the values
h <- hist(as.matrix(a))
text(h$mids,h$counts,labels=h$counts, adj=c(0.5,-0.5))
What does the function adj=c(0.5,-0.5) exactly?
x <- seq(8,13, 1)
-> that's the first and end value (8,13)
curve(dnorm(x, mean=mean(a$weight), sd=sd(a$weight)), add=T)
Now I have the curve, but not the frequencies of the values anymore.
I also tried it with: h <- hist(as.matrix(a), prob=T)
And I can't set the ylab and xlab h <- hist(as.matrix(a), xlab="..", ylab=".." ) it takes default labs.

Visualize data using histogram in R

I am trying to visualize some data and in order to do it I am using R's hist.
Bellow are my data
jancoefabs <- as.numeric(as.vector(abs(Janmodelnorm$coef)))
jancoefabs
[1] 1.165610e+00 1.277929e-01 4.349831e-01 3.602961e-01 7.189458e+00
[6] 1.856908e-04 1.352052e-05 4.811291e-05 1.055744e-02 2.756525e-04
[11] 2.202706e-01 4.199914e-02 4.684091e-02 8.634340e-01 2.479175e-02
[16] 2.409628e-01 5.459076e-03 9.892580e-03 5.378456e-02
Now as the more cunning of you might have guessed these are the absolute values of some model's coefficients.
What I need is an histogram that will have for axes:
x will be the number (count or length) of coefficients which is 19 in total, along with their names.
y will show values of each column (as breaks?) having a ylim="" set, according to min and max of those values (or something similar).
Note that Janmodelnorm$coef simply produces the following
(Intercept) LON LAT ME RAT
1.165610e+00 -1.277929e-01 -4.349831e-01 -3.602961e-01 -7.189458e+00
DS DSA DSI DRNS DREW
-1.856908e-04 1.352052e-05 4.811291e-05 -1.055744e-02 -2.756525e-04
ASPNS ASPEW SI CUR W_180_270
-2.202706e-01 -4.199914e-02 4.684091e-02 -8.634340e-01 -2.479175e-02
W_0_360 W_90_180 W_0_180 NDVI
2.409628e-01 5.459076e-03 -9.892580e-03 -5.378456e-02
So far and consulting ?hist, I am trying to play with the code bellow without success. Therefore I am taking it from scratch.
# hist(jancoefabs, col="lightblue", border="pink",
# breaks=8,
# xlim=c(0,10), ylim=c(20,-20), plot=TRUE)
When plot=FALSE is set, I get a bunch of somewhat useful info about the set. I also find hard to use breaks argument efficiently.
Any suggestion will be appreciated. Thanks.
Rather than using hist, why not use a barplot or a standard plot. For example,
## Generate some data
set.seed(1)
y = rnorm(19, sd=5)
names(y) = c("Inter", LETTERS[1:18])
Then plot the cofficients
barplot(y)
Alternatively, you could use a scatter plot
plot(1:19, y, axes=FALSE, ylim=c(-10, 10))
axis(2)
axis(1, 1:19, names(y))
and add error bars to indicate the standard errors (see for example Add error bars to show standard deviation on a plot in R)
Are you sure you want a histogram for this? A lattice barchart might be pretty nice. An example with the mtcars built-in data set.
> coef <- lm(mpg ~ ., data = mtcars)$coef
> library(lattice)
> barchart(coef, col = 'lightblue', horizontal = FALSE,
ylim = range(coef), xlab = '',
scales = list(y = list(labels = coef),
x = list(labels = names(coef))))
A base R dotchart might be good too,
> dotchart(coef, pch = 19, xlab = 'value')
> text(coef, seq(coef), labels = round(coef, 3), pos = 2)

plot several linegraphs in one image using R

I am an absolute beginner in R. so this is probably a stupid question.
I have a table like this (csv format):
,1A+,2A+,3A-,3A+,5A-,5A+,6A-,6A+,7A-,7A+
6,4.530309305,5.520356001,3.437626731,5.146758132,,4.355022819,,4.191337618,,4.076583859
10,8.697814022,9.765817956,,9.636004092,3.725756716,8.600484774,3.457423715,8.358842335,2.246622784,7.244668991
12,,,8.176341701,,,,,,,
17,,,,,6.24785396,,5.077069513,,3.137524578
I want to create a line graph in R plotting all the different Y values (1A+, 2A+, etc) vs the Y values (6,10,12,17).
I am doing:
new_curves <- read.csv("new_curves_R.csv", as.is = TRUE)
g_range <- range(0,new_curves$X)
axis(2, las=1, at=4*0:g_range[2])
plot(new_curves$X1A.,new_curves$X,type="o", col="blue")
legend(1, g_range[2], c("new_curves$X1A."), cex=0.8, col=c("blue"));
title(xlab="Days", col.lab=rgb(0,0.5,0))
title(ylab="Total", col.lab=rgb(0,0.5,0))
However, this (obviously) only plots the first datapoint. (the legend is not working for some reason either). I am guessing I need some sort of for loop to add each Y value to the graph recursively. Likewise, a loop would be needed to make the legend.
thanks
dat <- read.table(text=", 1A+,2A+,3A-,3A+,5A-,5A+,6A-,6A+,7A-,7A+
6,4.530309305,5.520356001,3.437626731,5.146758132,,4.355022819,,4.191337618,,4.076583859
10,8.697814022,9.765817956,,9.636004092,3.725756716,8.600484774,3.457423715,8.358842335,2.246622784,7.244668991
12,,,8.176341701,,,,,,,
17,,,,,6.24785396,,5.077069513,,3.137524578", header=TRUE, sep=",", fill=TRUE)
matplot(dat[1], dat[-1])

superpose a histogram and an xyplot

I'd like to superpose a histogram and an xyplot representing the cumulative distribution function using r's lattice package.
I've tried to accomplish this with custom panel functions, but can't seem to get it right--I'm getting hung up on one plot being univariate and one being bivariate I think.
Here's an example with the two plots I want stacked vertically:
set.seed(1)
x <- rnorm(100, 0, 1)
discrete.cdf <- function(x, decreasing=FALSE){
x <- x[order(x,decreasing=FALSE)]
result <- data.frame(rank=1:length(x),x=x)
result$cdf <- result$rank/nrow(result)
return(result)
}
my.df <- discrete.cdf(x)
chart.hist <- histogram(~x, data=my.df, xlab="")
chart.cdf <- xyplot(100*cdf~x, data=my.df, type="s",
ylab="Cumulative Percent of Total")
graphics.off()
trellis.device(width = 6, height = 8)
print(chart.hist, split = c(1,1,1,2), more = TRUE)
print(chart.cdf, split = c(1,2,1,2))
I'd like these superposed in the same frame, rather than stacked.
The following code doesn't work, nor do any of the simple variations of it that I have tried:
xyplot(cdf~x,data=cdf,
panel=function(...){
panel.xyplot(...)
panel.histogram(~x)
})
You were on the right track with your custom panel function. The trick is passing the correct arguments to the panel.- functions. For panel.histogram, this means not passing a formula and supplying an appropriate value to the breaks argument:
EDIT Proper percent values on y-axis and type of plots
xyplot(100*cdf~x,data=my.df,
panel=function(...){
panel.histogram(..., breaks = do.breaks(range(x), nint = 8),
type = "percent")
panel.xyplot(..., type = "s")
})
This answer is just a placeholder until a better answer comes.
The hist() function from the graphics package has an option called add. The following does what you want in the "classical" way:
plot( my.df$x, my.df$cdf * 100, type= "l" )
hist( my.df$x, add= T )

Resources