I have the following data frame in R:
>AcceptData
Mean.Rank Sentence.Type
1 2.5 An+Sp+a
2 2.6 An+Nsp+a
3 2.1 An+Sp-a
4 3.1 An+Nsp-a
5 2.4 In+Sp+a
6 1.7 In+Nsp+a
7 3.1 In+Sp-a
8 3.0 In+Nsp-a
Which I want to plot, with the Sentence.Type column in the x axis, with the actual name of each cell as a point in the x axis. I want the y axis to go from 1 to 4 in steps of .5
So far I haven't been able to plot this, neither with plot() not with hist(). I keep getting different types of errors, mainly because of the nature of the character column in the data.frame.
I know this should be easy for most, but I'm sort of noob with R still and after hours I can't get the plot right. Any help is much appreciated.
Edit:
Some of the errors I've gotten:
> hist(AcceptData$Sentence.Type,AcceptData$Mean.Rank)
Error in hist.default(AcceptData$Sentence.Type, AcceptData$Mean.Rank) :
'x' must be numeric
Or: (this doesn't give an error, but definitely not the graph I want. It has all the x values cramped to the left of the x axis)
plot(AcceptData$Sentence.Type,AcceptData$Mean.Rank,lty=5,lwd=2,xlim=c(1,16),ylim=c(1,4),xla b="Sentence Type",ylab="Mean Ranking",main="Mean Acceptability Ranking per Sentence")
The default plot function has a method that allows you to plot factors on the x-axis, but to use this, you have to convert your text data to a factor:
Here is an example:
x <- letters[1:5]
y <- runif(5, 0, 5)
plot(factor(x), y)
And with your sample data:
AcceptData <- read.table(text="
Mean.Rank Sentence.Type
1 2.5 An+Sp+a
2 2.6 An+Nsp+a
3 2.1 An+Sp-a
4 3.1 An+Nsp-a
5 2.4 In+Sp+a
6 1.7 In+Nsp+a
7 3.1 In+Sp-a
8 3.0 In+Nsp-a", stringsAsFactors=FALSE)
plot(Mean.Rank~factor(Sentence.Type), AcceptData, las=2,
xlab="", main="Mean Acceptability Ranking per Sentence")
Related
This question already has an answer here:
How to get the points inside of the ellipse in ggplot2?
(1 answer)
Closed 2 years ago.
I have a data set that needs to be cleaned from mistakes. For that, I have a sub-data set that contains only observations that I know are correct ("Match"). I would like to draw a 95% confidence ellipse around those correct observations on a plot and exclude all observations out of the ellipse from my main data set.
I figured out how to draw it but now I would like to be able to take out data based on that.
I'm a beginner with R so all of that is pretty new to me so I might not understand complicated coding. :)
Thanks !
To add more details, my data are measurements of collembolas (a type of insect). It has this basic structure:
replicate node day MajorAxisLengtnh MinorAxisLength Data.type
1 1 1 50 2.1 0.4 Match
2 2 1 50 2.3 0.2 Unknown
Therefore, I want to validate measurements by excluding unrealistic aspect ratios (length/width). Using the subset that I know is correct (match observations), I want to determine a reasonable range of aspect ratios for collembola, and use it to remove any unrealistic observation. I was advised to use a 95% confidence ellipse for good observations and take out observations that don't fit in the ellipse.
The SIBER package has some functions to help you here.
library(SIBER)
Let's use the iris dataset, plotting sepal width vs length.
dat <- iris[,1:2]
plot(dat)
mu <- colMeans(dat)
Sigma <- cov(dat)
addEllipse(mu, Sigma, p.interval = 0.95, col = "blue", lty = 3)
Z <- pointsToEllipsoid(dat, Sigma, mu) # converts the data to ellipsoid coordinates
out <- !ellipseInOut(Z, p = 0.95) # logical vector
(outliers <- dat[out,]) # finds the points outside the ellipse
# Sepal.Length Sepal.Width
#16 5.7 4.4
#34 5.5 4.2
#42 4.5 2.3
#61 5.0 2.0
#118 7.7 3.8
#132 7.9 3.8
points(outliers, col="red", pch=19)
You can then use the out vector to remove unwanted rows.
dat.in <- dat[!out,]
I am not sure how to make a X-Y plot by R.
I have A B C datasets.
A dataset
ID Result
1.1 2
1.2 4
1.3 2.5
1.4 9
B dataset
ID Result
1.1 1
1.2 7
1.3 6
1.4 9
C dataset
ID Result
1.1 0.5
1.2 8
1.3 9
1.4 9
I want to make a plot X=result A , y=the result B, the other plot x=result A and Y=result C....
then A represented by red spots, B is black and C is blue for example. So the spot 1.1 should be x=2 and y=1 in red (A) and block (B). the spot 4,7, it means it is ID 1.2 in red and block.... The spot 9,9 it means is is ID 1.4 in the red and block.....
I try qqplots but I dont know how to make the X and Y correctly.
Thanks
ggplot2 is an excellent library for producing plots and there are many reference manuals online. Below is an answer to your question using the ggplot approach. The A,B,C data frames are unified into a single frame and the geom_point() for an x-y plot is used. The aes() sets the x and y coordinates (here you seem to seek to plot 'result' as both the x and y, if I understood the question?). The points are scaled by color, which is defined in the data frame as attributes A,B,C. Importantly, this variable must be a factor. The colors are defined by the manual color scale.
library(ggplot2)
dataA <- data.frame(ID=c(1.1,1.2,1.3),result=c(2,4,2.5),index=c(1,2,3),color="A")
dataB <- data.frame(ID=c(1.1,1.2,1.3),result=c(1,7,6),index=c(1,2,3),color="B")
dataC <- data.frame(ID=c(1.1,1.2,1.3),result=c(0.5,8,9),index=c(1,2,3),color="C")
data <- rbind(dataA,dataB,dataC)
data$color <- as.factor(data$color)
ggplot(data) +
geom_point(aes(x=result,y=result,color=color,size=10)) +
scale_color_manual(values=c("red", "black", "blue")) +
theme_bw()
Is it possible to make barplots (two) of unequal size (different max values on Y axis) but equal units (count data)?
The data is count data of the number of nesting attempts per season. Each species has 7 seasons of data. My objective is to present the data as clearly as possible for the reader to show the increase in the number of each of the two species nesting season on season. Although the initial pattern of increase is similar for both species, the number of species 1 nesting rises more rapidly. Plotting both sets of data on the same barplot is not a good option because the 7 seasons of data are not concurrent for the two species - rather it is the first 7 years of colonisation for each species (eg the labels on the x axis are different for the two species)
I have tried par(fig) and layout but not yet achieved what I need and I am not sure which function is better suited to what I need. Any advice welcome
Two barplots, one above the other, each taking up half the window. The Y units are the same for both graphs but the maximum for one is 300 whilst the other is 900. When they are plotted a count of 100 looks very different on the two graphs
SPECIES1 <- c(2,12,44,153,451,857)
SPECIES2 <- c(4,15,35,54,63,243)
windows(11,12)
par(oma=c(3,0.1,1,0.1),mfrow=c(2,1),mar=c(2,6,2,2.1))
barplot(SPECIES2,space=c(0.1,0),ylim=c(0,300),col="black",axes=FALSE)
axis(2,at=seq(0,300,100),las=2, cex.axis=0.9)
barplot(SPECIES1,space=c(0.1,0),ylim=c(0,900), col="black",border=NA,axes=FALSE )axis(2,at=seq(0,900,100),las=2,cex.axis=0.9)
Here how you go by using ggplot package
## supp dose len
## 1 VC D0.5 6.8
## 2 VC D1 15.0
## 3 VC D2 33.0
## 4 OJ D0.5 4.2
## 5 OJ D1 10.0
## 6 OJ D2 29.5
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())
But you need third variable(supp in above case). Please provide Sample data which you want to plot for clear answer.
So I have plotted a curve, and have had a look in both my book and on stack but can not seem to find any code to instruct R to tell me the value of y when along curve at 70 x.
curve(
20*1.05^x,
from=0, to=140,
xlab='Time passed since 1890',
ylab='Population of Salmon',
main='Growth of Salmon since 1890'
)
So in short, I would like to know how to command R to give me the number of salmon at 70 years, and at other times.
Edit:
To clarify, I was curious how to command R to show multiple Y values for X at an increase of 5.
salmon <- data.frame(curve(
20*1.05^x,
from=0, to=140,
xlab='Time passed since 1890',
ylab='Population of Salmon',
main='Growth of Salmon since 1890'
))
salmon$y[salmon$x==70]
1 608.5285
This salmon data.frame gives you all of the data.
head(salmon)
x y
1 0.0 20.00000
2 1.4 21.41386
3 2.8 22.92768
4 4.2 24.54851
5 5.6 26.28392
6 7.0 28.14201
If you can also use inequalities to check the number of salmon in given ranges using the syntax above.
It's also simple to answer the 2nd part of your question using this object:
salmon$z <- salmon$y*5 # I am using * instead of + to make the plot more clear
plot(x=salmon$x,y=salmon$z, xlab='Time passed since 1890', ylab='Population of Salmon',type="l")
lines(salmon$x,salmon$y, col="blue")
curve is plotting the function 20*1.05^x
so just plug any value you want in that function instead of x, e.g.
> 20*1.05^70
[1] 608.5285
>
20*1.05^(seq(from=0, to=70, by=10))
Was all I had to do, I had forgotten until Ed posted his reply that I could type a function directly into R.
I have a column from a data frame (that contains a set of estimated proportions of cell counts) for which class() returns "factor" and column (that contains the actual cell counts) from another for which class() returns "numeric". As I have to plot these against one another to see if there's a relationship between them. Hence I have to convert the factor entities to numerics:
> class(proportions$Neutrophils)
[1] "factor"
> head(proportions$Neutrophils)
[1] 2.3 14.9
178 Levels: #VALUE! 0.0 0.4 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ... abs neutrophils
> head(as.numeric(proportions$Neutrophils)) #notice that the numbers are completely trans
[1] 1 1 82 57 1 1
> head(as.numeric(proportions$Neutrophils))
[1] 1 1 82 57 1 1
> max(as.numeric(proportions$Neutrophils)) #factors converted to numeric
[1] 176
I have arranged the patterns of the columns in such a way that the corresponding values align:
ptr<-match(sample.details$barcode1, proportions$barcode2)
proportions<-proportions[ptr,]
I convert the factor column to numeric and plot:
plot(proportions$Neutrophils, as.numeric(SPVs[,7]), pch=19, ylab= "proportion estimates", xlab="counts", main="Neutrophils Proportions Validation")
When I don't convert it to numeric though:
plot(proportions1$Neutrophils, SPVs[,7], pch=19, ylab= "proportion estimates", xlab="counts", main="Neutrophils Proportions Validation")
What is worrying about the graphs are analogous and yet the x-axis on the second graph is not arranged in ascending order...
All I want is an estimate of whether the two columns are related but if the order is mixed up there is no way of telling this...
How do I ensure that the order of the x-axis values are ascending?