Trouble producing discrete legend using ggplot for a scatterplot - r

I am fairly new to the ggplot function in R. Currently, I am struggling to produce a legend for a given data set that I have constructed by hand. For simplicity, suppose this was my data set:
rawdata<-data.frame(matrix(c(1,1,1,
2,1,-1,
3,-1,-1,
4,-1,1
4,-2,2),5,3,byrow=TRUE))
names(rawdata)<-c("Town","x-coordinate","y-coordinate")
rawdata[,1]<-as.factor(rawdata[,1])
Now, using ggplot, I am trying to figure out how to produce a legend on a scatterplot. So far I have done the following:
p1<-ggplot(data=rawdata,aes(x=x.coordinate,y=y.coordinate,fill=rawdata[,1]))
+geom_point(data=rawdata,aes(x=x.coordinate,y=y.coordinate))
I produce the following using the above code,
As you can see, the coordinates have been plotted and the legend has been constructed, but they are only colored black.
I learned that to color coordinates, I would have needed to use the argument colour=rawdata[,1] in the geom_point function to color in points. However, when I try this, I get the following error code:
Error: Aesthetics must be either length 1 or the same as the data (4): colour
I understand that this has something to do with the length of the vector, but as of right now, I have absolutely no idea how to tackle this small problem.

geom_point() takes a colour, not a fill. And, having passed the data into ggplot(data = ..), there's no need to then pass it into the geom_point() again.
I've also fixed an error in the creation of your df in your example.
rawdata<-data.frame(matrix(c(1,1,1,2,1,-1,3,-1,-1,4,-1,1,4,-2,2),5,3,byrow=TRUE))
names(rawdata)<-c("Town","x.coordinate","y.coordinate")
rawdata[,1]<-as.factor(rawdata[,1])
library(ggplot2)
ggplot(data=rawdata,aes(x=x.coordinate,y=y.coordinate,colour=Town)) +
geom_point()

Related

ggplot2 error in plotting a scatter plot in R

I expect this kind of scatter plot.
However, whenever I tried to apply on my data, I get this.
I just used this code, and this is my data.
And I also confirmed they are numeric class.
ggplot(selected.df, aes(x, y))
making a right plot.
Those variables were not numeric.

Trouble producing a polygon on top of a scatterplot using ggplot

Currently, I am trying to transition my graphical knowledge from the plot function in R, to the ggplot function. I have began constructing scatterplots and corresponding legends for a given data set, however I want to incorporate the function geom_polygon onto my plots using ggplot.
Specifically, I want to capture a triangular region from the origin of a scatterplot. For reproducibility, say I have the following data set:
rawdata<-data.frame(matrix(c(1,1,1,
2,1,-1,
3,-1,-1,
4,-1,1,
4,-2,2),5,3,byrow=TRUE))
names(rawdata)<-c("Town","x.coordinate","y.coordinate")
rawdata[,1]<-as.factor(rawdata[,1])
To construct a scatterplot along with a legend, I have been told to do the following:
p1<-ggplot(data=rawdata,aes(x=x.coordinate,y=y.coordinate,colour=Town,shape=Town))
+ theme_bw() + geom_point()
The result is the following:
Click here.
What I want to do now is produce a polygon. To do so, I have construct the following dataframe to use in the geom_polygon function:
geom_polygon(data=polygondata,aes(x = xa, y = ya),colour="darkslategray2",
fill = "darkslategray2",alpha=0.25)
However, when I combine this with p1, I get the following error:
Error in eval(expr, envir, enclos) : object 'Town' not found
From some messing around, I have noticed that when I omit the shape argument from the ggplot function, I can easily produce the desired output which is shown here. However, I wish to keep the shape for aesthetics.
I also get a similar problem when I try to produce arrows which connect points on the scatterplot using ggplot. However, I will address this problem after, as the root problem may be here.
Add the following to polygondata:
polygondata$Town = NA
Even though you're not using that variable in geom_polygon, ggplot expects it to be there if that column is used for an aesthetic in the main call to ggplot.
Alternatively, I think you could avoid the error if you move the aesthetic mapping in the initial plot to geom_point rather than the main ggplot call, like this:
p1 <- ggplot(data=rawdata) +
theme_bw() +
geom_point(aes(x=x.coordinate, y=y.coordinate, colour=Town, shape=Town))
In that case, you wouldn't need to add a Town column to polygondata.

R plotting a graph with different groups of data

I have a dataset:
a<-c(1,2,3,4,5,6,7,8,9,10)
b<-c(2,2,2,2,4,5,6,8,4,1)
c<-c("red","red","red","blue","blue","blue","orange","orange","orange","orange")
data<-data.frame(a=a,b=b,c=c)
I now want to plot the data on a graph with each group having a different colour:
plot(a[c=="red"],b[c=="red"],col="red",xlim=c(min(a),max(a)),ylim=c(min(b),max(b)))
points(a[c=="blue"],b[c=="blue"],col="blue")
points(a[c=="orange"],b[c=="orange"],col="orange")
This works fine - however, say if I have 30 groups, the task of writing the code becomes tedious. I am wondering if there is a better way of writing the code such that R will automatically plot the graph and give different colours to different groups?
Also, I wonder if there is a quick way to display a legend in the graph.
Thank you for all your help.
Try this:
with(data,plot(a,b,col=c))
The col argument in plot() stands for color. This can contain a vector of the colors you want.
Additionally, you don't have to make a column just to define the color if the color-group relationship is not that important. For example, you could make column c a more meaningful column like this:
a<-c(1,2,3,4,5,6,7,8,9,10)
b<-c(2,2,2,2,4,5,6,8,4,1)
c<-c(rep('Group1',3),rep('Group2',3),rep('Group3',4))
data<-data.frame(a=a,b=b,c=c)
Then to plot, use:
with(data,plot(a,b,col=c))
To add a legend:
legend('topleft',legend = levels(data[,'c']),col=1:nlevels(data[,'c']),pch=1)
Try ggplot2
library(ggplot2)
ggplot(data=data, aes(x=a, y=b, colour=c)) + geom_point()

R: creating a barplot depicting a percentage of a percentage in ggplot2

I'm having a lot of trouble using my current dataset to create the barplot I need. It seems straightforward enough, but I am getting an error whenever I run my code.
link to my data set
some background information
Percent_Calls is calculated by Call/(Call+Noise)
Percent_Total is calculated by (Call+Noise)/(sum(Call)+sum(Noise));
PercentofCall is calculated by Percent_Calls*Percent_Total
I am trying to create a barplot (with percentages on the y axis) with CRF_Score as the x-variable and the Percent_Total values as the bars. Eventually, I would like to highlight the portion of PercentofCall in Percent_Total.
require(ggplot2)
ggplot(FD2_CAna, aes(CRF_Score, fill=Percent_Total)) + geom_bar(binwidth=0.05)
The above code usually works for me, however I am getting this error instead:
Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0
I have tried using as.factor(x) as suggested in another thread, but the graph output is not what I need.
This is more along of lines of what I want, except it was made in JMP.
Sorry for the long explanation, what am I doing wrong here?
To get the similar plot to JMP you should use Percent_Total as y values and not as the fill= values and then in geom_bar() use stat="identity".
With your JMP plot it seems that Percent_Total is treated as factor and not as numeric variable - you can see it by comparing the height of bars with values 23 and 2 - they are almost the same width. If file FD2_CAna.csv is imported properly then values are numeric.
FD2_CAna<-read.csv2(file="FD2_CAna.csv",header=T,sep=",",dec=".")
ggplot(FD2_CAna, aes(CRF_Score, Percent_Total)) + geom_bar(stat="identity")

Setting breakpoints for data with scale_fill_brewer() function in ggplot2

I am creating a map (choropleth) as described on the ggplot2 wiki. Everything works like a charm, except that I am running into an issue mapping a continuous value to the polygon fill color via the scale_fill_brewer() function.
This question describes the problem I'm having. As in the answer, my workaround has been to pre-cut my data into bins using the gtools quantcut() function:
UPDATE: This first example is actually the right way to do this
require(gtools) # needed for quantcut()
...
fill_factor <- quantcut(fill_continuous, q=seq(0,1,by=0.25))
ggplot(mydata) +
aes(long,lat,group=group,fill=fill_factor) +
geom_polygon() +
scale_fill_brewer(name="mybins", palette="PuOr")
This works, however, I feel like I should be able to skip the step of pre-cutting my data and do something like this with the breaks option:
ggplot(mydata) +
aes(long,lat,group=group,fill=fill_continuous) +
geom_polygon() +
scale_fill_brewer(names="mybins", palette="PuOr", breaks=quantile(fill_continuous))
But this doesn't work. Instead I get an error something like:
Continuous variable (composite score) supplied to discrete scale_brewer.
Have I misunderstood the purpose of the "breaks" option? Or is breaks broken?
A major issue with pre-cutting continuous data is that there are three pieces of information used at different points in the code:
The Brewer palette -- determines the maximum number of colors available
The number of break points (or the bin width) -- has to be specified with the data
The actual data to be plotted -- influences the choice of the Brewer palette (sequential/diverging)
A true vicious circle. This can be broken by providing a function that accepts the data and the palette, automatically derives the number of break points and returns an object that can be added to the ggplot object. Something along the following lines:
fill_brewer <- function(fill, palette) {
require(RColorBrewer)
n <- brewer.pal.info$maxcolors[palette == rownames(brewer.pal.info)]
discrete.fill <- call("quantcut", match.call()$fill, q=seq(0, 1, length.out=n))
list(
do.call(aes, list(fill=discrete.fill)),
scale_fill_brewer(palette=palette)
)
}
Use it like this:
ggplot(mydata) + aes(long,lat,group=group) + geom_polygon() +
fill_brewer(fill=fill_continuous, palette="PuOr")
As Hadley explains, the breaks option moves the ticks, but does not make the data continuous. Therefore pre-cutting the data as per the first example in the question is the right way to use the scale_fill_brewer command.

Resources