How to make a X-Y plot - r

I am not sure how to make a X-Y plot by R.
I have A B C datasets.
A dataset
ID Result
1.1 2
1.2 4
1.3 2.5
1.4 9
B dataset
ID Result
1.1 1
1.2 7
1.3 6
1.4 9
C dataset
ID Result
1.1 0.5
1.2 8
1.3 9
1.4 9
I want to make a plot X=result A , y=the result B, the other plot x=result A and Y=result C....
then A represented by red spots, B is black and C is blue for example. So the spot 1.1 should be x=2 and y=1 in red (A) and block (B). the spot 4,7, it means it is ID 1.2 in red and block.... The spot 9,9 it means is is ID 1.4 in the red and block.....
I try qqplots but I dont know how to make the X and Y correctly.
Thanks

ggplot2 is an excellent library for producing plots and there are many reference manuals online. Below is an answer to your question using the ggplot approach. The A,B,C data frames are unified into a single frame and the geom_point() for an x-y plot is used. The aes() sets the x and y coordinates (here you seem to seek to plot 'result' as both the x and y, if I understood the question?). The points are scaled by color, which is defined in the data frame as attributes A,B,C. Importantly, this variable must be a factor. The colors are defined by the manual color scale.
library(ggplot2)
dataA <- data.frame(ID=c(1.1,1.2,1.3),result=c(2,4,2.5),index=c(1,2,3),color="A")
dataB <- data.frame(ID=c(1.1,1.2,1.3),result=c(1,7,6),index=c(1,2,3),color="B")
dataC <- data.frame(ID=c(1.1,1.2,1.3),result=c(0.5,8,9),index=c(1,2,3),color="C")
data <- rbind(dataA,dataB,dataC)
data$color <- as.factor(data$color)
ggplot(data) +
geom_point(aes(x=result,y=result,color=color,size=10)) +
scale_color_manual(values=c("red", "black", "blue")) +
theme_bw()

Related

R: Draw a 95% confidence ellipse and exclude all observations out of the ellipse [duplicate]

This question already has an answer here:
How to get the points inside of the ellipse in ggplot2?
(1 answer)
Closed 2 years ago.
I have a data set that needs to be cleaned from mistakes. For that, I have a sub-data set that contains only observations that I know are correct ("Match"). I would like to draw a 95% confidence ellipse around those correct observations on a plot and exclude all observations out of the ellipse from my main data set.
I figured out how to draw it but now I would like to be able to take out data based on that.
I'm a beginner with R so all of that is pretty new to me so I might not understand complicated coding. :)
Thanks !
To add more details, my data are measurements of collembolas (a type of insect). It has this basic structure:
replicate node day MajorAxisLengtnh MinorAxisLength Data.type
1 1 1 50 2.1 0.4 Match
2 2 1 50 2.3 0.2 Unknown
Therefore, I want to validate measurements by excluding unrealistic aspect ratios (length/width). Using the subset that I know is correct (match observations), I want to determine a reasonable range of aspect ratios for collembola, and use it to remove any unrealistic observation. I was advised to use a 95% confidence ellipse for good observations and take out observations that don't fit in the ellipse.
The SIBER package has some functions to help you here.
library(SIBER)
Let's use the iris dataset, plotting sepal width vs length.
dat <- iris[,1:2]
plot(dat)
mu <- colMeans(dat)
Sigma <- cov(dat)
addEllipse(mu, Sigma, p.interval = 0.95, col = "blue", lty = 3)
Z <- pointsToEllipsoid(dat, Sigma, mu) # converts the data to ellipsoid coordinates
out <- !ellipseInOut(Z, p = 0.95) # logical vector
(outliers <- dat[out,]) # finds the points outside the ellipse
# Sepal.Length Sepal.Width
#16 5.7 4.4
#34 5.5 4.2
#42 4.5 2.3
#61 5.0 2.0
#118 7.7 3.8
#132 7.9 3.8
points(outliers, col="red", pch=19)
You can then use the out vector to remove unwanted rows.
dat.in <- dat[!out,]

R: How to detect and plot polygons edges from X Y data frame

I have a data frame that contains grid data, with columns corresponding to XY coordinates and a factor "value".
I would like to detect areas with the same 'value' and plot there edges.
An example of my data:
dat = melt(volcano[26:40, 26:40])
dat$value=factor(round(dat$value/10))
dat[dat$X1==12 & dat$X2==6,"value"]=NA
dat[dat$X1==13 & dat$X2==6,"value"]=NA
dat=dat[7:nrow(dat),]
head(dat)
My plot:
library(ggplot2)
p=ggplot(dat) +
geom_tile(aes(x=X1, y=X2, fill=value))+geom_text(aes(x=X1, y=X2, label=value))
p
My attempt would be to use p+geom_polygon(data = polys, aes(x = x , y = y , group = id),size=1, color = "black"), but i'm struggling with the step consisting to get 'polys': the coordinates of edges points. This data frame should like this:
id x y
1 0.5 1.5
1 1.5 1.5
1 2.5 1.5
1 2.5 2.5
1 1.5 2.5
1 0.5 2.5
1 0.5 1.5
2 2.5 1.5
2 3.5 1.5
2 4.5 1.5
2 5.5 1.5
...
with x and y that are the coordinates of the corners of the tiles that i'm looking for, and 'id' the grouping factor for the different polygons. For example id=1 corresponds to the little purple rectangle ine the left bottom corner.
Any idea, to automatically detect these edges points for each area, based on "value" column ?
Thanks

How to create two barplots of unequal height (different max values) in R but with the same units on the Y axis?

Is it possible to make barplots (two) of unequal size (different max values on Y axis) but equal units (count data)?
The data is count data of the number of nesting attempts per season. Each species has 7 seasons of data. My objective is to present the data as clearly as possible for the reader to show the increase in the number of each of the two species nesting season on season. Although the initial pattern of increase is similar for both species, the number of species 1 nesting rises more rapidly. Plotting both sets of data on the same barplot is not a good option because the 7 seasons of data are not concurrent for the two species - rather it is the first 7 years of colonisation for each species (eg the labels on the x axis are different for the two species)
I have tried par(fig) and layout but not yet achieved what I need and I am not sure which function is better suited to what I need. Any advice welcome
Two barplots, one above the other, each taking up half the window. The Y units are the same for both graphs but the maximum for one is 300 whilst the other is 900. When they are plotted a count of 100 looks very different on the two graphs
SPECIES1 <- c(2,12,44,153,451,857)
SPECIES2 <- c(4,15,35,54,63,243)
windows(11,12)
par(oma=c(3,0.1,1,0.1),mfrow=c(2,1),mar=c(2,6,2,2.1))
barplot(SPECIES2,space=c(0.1,0),ylim=c(0,300),col="black",axes=FALSE)
axis(2,at=seq(0,300,100),las=2, cex.axis=0.9)
barplot(SPECIES1,space=c(0.1,0),ylim=c(0,900), col="black",border=NA,axes=FALSE )axis(2,at=seq(0,900,100),las=2,cex.axis=0.9)
Here how you go by using ggplot package
## supp dose len
## 1 VC D0.5 6.8
## 2 VC D1 15.0
## 3 VC D2 33.0
## 4 OJ D0.5 4.2
## 5 OJ D1 10.0
## 6 OJ D2 29.5
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())
But you need third variable(supp in above case). Please provide Sample data which you want to plot for clear answer.

warning: "adjust the group aesthetic" while visualizing clusters with R

I have created two clusters while I used the code to visualize the clusters through line chart, it shows "Each group consists of only one observation. Do you need to adjust the group aesthetic?"
Code
ggplot(b,aes(A, avg, colour=cluster))+geom_line()
Dataset
A cluster avg
A1 1 0.2
A1 2 0.3
A2 1 0.3
A2 2 0.4
b <- data.frame(A=c("A1","A2","A3"),
cluster=c(1,2,1),
avg=c(0.2,0.3,0.3))
x11()
ggplot(b,aes(A, avg, colour=cluster))+geom_point()
You only have one point so don't use lines, group aesthetics is explained here http://ggplot2.tidyverse.org/reference/aes_group_order.html

Plot a character vector against a numeric vector in R

I have the following data frame in R:
>AcceptData
Mean.Rank Sentence.Type
1 2.5 An+Sp+a
2 2.6 An+Nsp+a
3 2.1 An+Sp-a
4 3.1 An+Nsp-a
5 2.4 In+Sp+a
6 1.7 In+Nsp+a
7 3.1 In+Sp-a
8 3.0 In+Nsp-a
Which I want to plot, with the Sentence.Type column in the x axis, with the actual name of each cell as a point in the x axis. I want the y axis to go from 1 to 4 in steps of .5
So far I haven't been able to plot this, neither with plot() not with hist(). I keep getting different types of errors, mainly because of the nature of the character column in the data.frame.
I know this should be easy for most, but I'm sort of noob with R still and after hours I can't get the plot right. Any help is much appreciated.
Edit:
Some of the errors I've gotten:
> hist(AcceptData$Sentence.Type,AcceptData$Mean.Rank)
Error in hist.default(AcceptData$Sentence.Type, AcceptData$Mean.Rank) :
'x' must be numeric
Or: (this doesn't give an error, but definitely not the graph I want. It has all the x values cramped to the left of the x axis)
plot(AcceptData$Sentence.Type,AcceptData$Mean.Rank,lty=5,lwd=2,xlim=c(1,16),ylim=c(1,4),xla b="Sentence Type",ylab="Mean Ranking",main="Mean Acceptability Ranking per Sentence")
The default plot function has a method that allows you to plot factors on the x-axis, but to use this, you have to convert your text data to a factor:
Here is an example:
x <- letters[1:5]
y <- runif(5, 0, 5)
plot(factor(x), y)
And with your sample data:
AcceptData <- read.table(text="
Mean.Rank Sentence.Type
1 2.5 An+Sp+a
2 2.6 An+Nsp+a
3 2.1 An+Sp-a
4 3.1 An+Nsp-a
5 2.4 In+Sp+a
6 1.7 In+Nsp+a
7 3.1 In+Sp-a
8 3.0 In+Nsp-a", stringsAsFactors=FALSE)
plot(Mean.Rank~factor(Sentence.Type), AcceptData, las=2,
xlab="", main="Mean Acceptability Ranking per Sentence")

Resources