Cumulative frequency on normal probability plot - r

How do I get the cumulative frequency for each unique X value plotted on a normal probability (non linear) y axis. Have used probplot(x) but it doesn't accumulate identical values. My data is a data frame with blood results and. The probplot plots each individual blood result rather than a cumulated frequency for each unique result .
Small data example:
V1
7.1
7.2
7.2
7.6
6.8
6.9
6.8
7.4
7.0
I can calculate the cummulated frequency but not plot it with the correct normal probability axis:
tabvals <- table(data$V1)
tabvals <- cbind.data.frame(tabvals)
tabvals$frequency <- tabvals$Freq/sum(tabvals$Freq)
tabvals$kummulated <- NA
for (i in 1:nrow(tabvals)){
if (i == 1) {
tabvals$kummulated[i] <- tabvals$frequency[i]
} else {
tabvals$kummulated[i] <- tabvals$kummulated[i-1] + tabvals$frequency[i]
}}
plot(tabvals$Var1, tabvals$kummulated , type="l")
The only way to get the right Y axis is this:
library(e1071)
probplot(data$V1)
But this plots 7.2 and 7.2 as two different points rather than accumulate them.

Related

R: Draw a 95% confidence ellipse and exclude all observations out of the ellipse [duplicate]

This question already has an answer here:
How to get the points inside of the ellipse in ggplot2?
(1 answer)
Closed 2 years ago.
I have a data set that needs to be cleaned from mistakes. For that, I have a sub-data set that contains only observations that I know are correct ("Match"). I would like to draw a 95% confidence ellipse around those correct observations on a plot and exclude all observations out of the ellipse from my main data set.
I figured out how to draw it but now I would like to be able to take out data based on that.
I'm a beginner with R so all of that is pretty new to me so I might not understand complicated coding. :)
Thanks !
To add more details, my data are measurements of collembolas (a type of insect). It has this basic structure:
replicate node day MajorAxisLengtnh MinorAxisLength Data.type
1 1 1 50 2.1 0.4 Match
2 2 1 50 2.3 0.2 Unknown
Therefore, I want to validate measurements by excluding unrealistic aspect ratios (length/width). Using the subset that I know is correct (match observations), I want to determine a reasonable range of aspect ratios for collembola, and use it to remove any unrealistic observation. I was advised to use a 95% confidence ellipse for good observations and take out observations that don't fit in the ellipse.
The SIBER package has some functions to help you here.
library(SIBER)
Let's use the iris dataset, plotting sepal width vs length.
dat <- iris[,1:2]
plot(dat)
mu <- colMeans(dat)
Sigma <- cov(dat)
addEllipse(mu, Sigma, p.interval = 0.95, col = "blue", lty = 3)
Z <- pointsToEllipsoid(dat, Sigma, mu) # converts the data to ellipsoid coordinates
out <- !ellipseInOut(Z, p = 0.95) # logical vector
(outliers <- dat[out,]) # finds the points outside the ellipse
# Sepal.Length Sepal.Width
#16 5.7 4.4
#34 5.5 4.2
#42 4.5 2.3
#61 5.0 2.0
#118 7.7 3.8
#132 7.9 3.8
points(outliers, col="red", pch=19)
You can then use the out vector to remove unwanted rows.
dat.in <- dat[!out,]

How to make a X-Y plot

I am not sure how to make a X-Y plot by R.
I have A B C datasets.
A dataset
ID Result
1.1 2
1.2 4
1.3 2.5
1.4 9
B dataset
ID Result
1.1 1
1.2 7
1.3 6
1.4 9
C dataset
ID Result
1.1 0.5
1.2 8
1.3 9
1.4 9
I want to make a plot X=result A , y=the result B, the other plot x=result A and Y=result C....
then A represented by red spots, B is black and C is blue for example. So the spot 1.1 should be x=2 and y=1 in red (A) and block (B). the spot 4,7, it means it is ID 1.2 in red and block.... The spot 9,9 it means is is ID 1.4 in the red and block.....
I try qqplots but I dont know how to make the X and Y correctly.
Thanks
ggplot2 is an excellent library for producing plots and there are many reference manuals online. Below is an answer to your question using the ggplot approach. The A,B,C data frames are unified into a single frame and the geom_point() for an x-y plot is used. The aes() sets the x and y coordinates (here you seem to seek to plot 'result' as both the x and y, if I understood the question?). The points are scaled by color, which is defined in the data frame as attributes A,B,C. Importantly, this variable must be a factor. The colors are defined by the manual color scale.
library(ggplot2)
dataA <- data.frame(ID=c(1.1,1.2,1.3),result=c(2,4,2.5),index=c(1,2,3),color="A")
dataB <- data.frame(ID=c(1.1,1.2,1.3),result=c(1,7,6),index=c(1,2,3),color="B")
dataC <- data.frame(ID=c(1.1,1.2,1.3),result=c(0.5,8,9),index=c(1,2,3),color="C")
data <- rbind(dataA,dataB,dataC)
data$color <- as.factor(data$color)
ggplot(data) +
geom_point(aes(x=result,y=result,color=color,size=10)) +
scale_color_manual(values=c("red", "black", "blue")) +
theme_bw()

How to create two barplots of unequal height (different max values) in R but with the same units on the Y axis?

Is it possible to make barplots (two) of unequal size (different max values on Y axis) but equal units (count data)?
The data is count data of the number of nesting attempts per season. Each species has 7 seasons of data. My objective is to present the data as clearly as possible for the reader to show the increase in the number of each of the two species nesting season on season. Although the initial pattern of increase is similar for both species, the number of species 1 nesting rises more rapidly. Plotting both sets of data on the same barplot is not a good option because the 7 seasons of data are not concurrent for the two species - rather it is the first 7 years of colonisation for each species (eg the labels on the x axis are different for the two species)
I have tried par(fig) and layout but not yet achieved what I need and I am not sure which function is better suited to what I need. Any advice welcome
Two barplots, one above the other, each taking up half the window. The Y units are the same for both graphs but the maximum for one is 300 whilst the other is 900. When they are plotted a count of 100 looks very different on the two graphs
SPECIES1 <- c(2,12,44,153,451,857)
SPECIES2 <- c(4,15,35,54,63,243)
windows(11,12)
par(oma=c(3,0.1,1,0.1),mfrow=c(2,1),mar=c(2,6,2,2.1))
barplot(SPECIES2,space=c(0.1,0),ylim=c(0,300),col="black",axes=FALSE)
axis(2,at=seq(0,300,100),las=2, cex.axis=0.9)
barplot(SPECIES1,space=c(0.1,0),ylim=c(0,900), col="black",border=NA,axes=FALSE )axis(2,at=seq(0,900,100),las=2,cex.axis=0.9)
Here how you go by using ggplot package
## supp dose len
## 1 VC D0.5 6.8
## 2 VC D1 15.0
## 3 VC D2 33.0
## 4 OJ D0.5 4.2
## 5 OJ D1 10.0
## 6 OJ D2 29.5
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())
But you need third variable(supp in above case). Please provide Sample data which you want to plot for clear answer.

R plot matrixes and connect points of each line of second matrix

I have 2 matrices in R. One is called
j= matrix(c(1:8,1:8), nrow=2,ncol=8)
and the second:
B= matrix (c(Dav_Bou_k_med$r,Dav_Bou$r),nrow=2,ncol=8)
both Dav_Bou_k_med$r and Dav_Bou$r are matrices of nrow=1 and and ncol=8 so they are like this:
[1] 1.668 2.000 1.5 1.7 1.7 1.9 1.9 2.5
etc.
I used this plot:
plot(j,B)
but what I get is the relevant points for every 1:8 of the first matrix (j) (2 points for every 1:8, because I have two rows in B). What I want is to connect these points for every row in the B matrix in the plot. So, each of these points in the B matrix will be connected for each row (of B) and ideally with different colors. Is there any easy way to achieve that?
It's a little difficult to interpret exactly what you are looking for, but I imagine it's something like this?
j= matrix(c(1:8,1:8), nrow=2,ncol=8, byrow=TRUE)
fake_data <- sample(seq(1,3,0.2), 8, replace=TRUE)
more_fake_data <- sample(seq(1,3,0.2), 8, replace=TRUE)
B= matrix (c(fake_data, more_fake_data),nrow=2,ncol=8, byrow=TRUE)
plot(j, B)
lines(j[1,],B[1,])
lines(j[2,],B[2,], col="green")

Plot a character vector against a numeric vector in R

I have the following data frame in R:
>AcceptData
Mean.Rank Sentence.Type
1 2.5 An+Sp+a
2 2.6 An+Nsp+a
3 2.1 An+Sp-a
4 3.1 An+Nsp-a
5 2.4 In+Sp+a
6 1.7 In+Nsp+a
7 3.1 In+Sp-a
8 3.0 In+Nsp-a
Which I want to plot, with the Sentence.Type column in the x axis, with the actual name of each cell as a point in the x axis. I want the y axis to go from 1 to 4 in steps of .5
So far I haven't been able to plot this, neither with plot() not with hist(). I keep getting different types of errors, mainly because of the nature of the character column in the data.frame.
I know this should be easy for most, but I'm sort of noob with R still and after hours I can't get the plot right. Any help is much appreciated.
Edit:
Some of the errors I've gotten:
> hist(AcceptData$Sentence.Type,AcceptData$Mean.Rank)
Error in hist.default(AcceptData$Sentence.Type, AcceptData$Mean.Rank) :
'x' must be numeric
Or: (this doesn't give an error, but definitely not the graph I want. It has all the x values cramped to the left of the x axis)
plot(AcceptData$Sentence.Type,AcceptData$Mean.Rank,lty=5,lwd=2,xlim=c(1,16),ylim=c(1,4),xla b="Sentence Type",ylab="Mean Ranking",main="Mean Acceptability Ranking per Sentence")
The default plot function has a method that allows you to plot factors on the x-axis, but to use this, you have to convert your text data to a factor:
Here is an example:
x <- letters[1:5]
y <- runif(5, 0, 5)
plot(factor(x), y)
And with your sample data:
AcceptData <- read.table(text="
Mean.Rank Sentence.Type
1 2.5 An+Sp+a
2 2.6 An+Nsp+a
3 2.1 An+Sp-a
4 3.1 An+Nsp-a
5 2.4 In+Sp+a
6 1.7 In+Nsp+a
7 3.1 In+Sp-a
8 3.0 In+Nsp-a", stringsAsFactors=FALSE)
plot(Mean.Rank~factor(Sentence.Type), AcceptData, las=2,
xlab="", main="Mean Acceptability Ranking per Sentence")

Resources