R plotting a graph with different groups of data - r

I have a dataset:
a<-c(1,2,3,4,5,6,7,8,9,10)
b<-c(2,2,2,2,4,5,6,8,4,1)
c<-c("red","red","red","blue","blue","blue","orange","orange","orange","orange")
data<-data.frame(a=a,b=b,c=c)
I now want to plot the data on a graph with each group having a different colour:
plot(a[c=="red"],b[c=="red"],col="red",xlim=c(min(a),max(a)),ylim=c(min(b),max(b)))
points(a[c=="blue"],b[c=="blue"],col="blue")
points(a[c=="orange"],b[c=="orange"],col="orange")
This works fine - however, say if I have 30 groups, the task of writing the code becomes tedious. I am wondering if there is a better way of writing the code such that R will automatically plot the graph and give different colours to different groups?
Also, I wonder if there is a quick way to display a legend in the graph.
Thank you for all your help.

Try this:
with(data,plot(a,b,col=c))
The col argument in plot() stands for color. This can contain a vector of the colors you want.
Additionally, you don't have to make a column just to define the color if the color-group relationship is not that important. For example, you could make column c a more meaningful column like this:
a<-c(1,2,3,4,5,6,7,8,9,10)
b<-c(2,2,2,2,4,5,6,8,4,1)
c<-c(rep('Group1',3),rep('Group2',3),rep('Group3',4))
data<-data.frame(a=a,b=b,c=c)
Then to plot, use:
with(data,plot(a,b,col=c))
To add a legend:
legend('topleft',legend = levels(data[,'c']),col=1:nlevels(data[,'c']),pch=1)

Try ggplot2
library(ggplot2)
ggplot(data=data, aes(x=a, y=b, colour=c)) + geom_point()

Related

How to create histogram plot in ggplot2 without data frame?

I am plotting two histograms in R by using the following code.
x1<-rnorm(100)
x2<-rnorm(50)
h1<-hist(x1)
h2<-hist(x2)
plot(h1, col=rgb(0,0,1,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE)
plot(h2, col=rgb(1,0,0,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE,add=TRUE)
legend("topright", c("H1", "H2"), fill=c(rgb(0,0,1,.25),rgb(1,0,0,.25)))
The code produces the following output.
I need a visually good looking (or stylistic) version of the above plot. I want to use ggplot2. I am looking for something like this (see Change fill colors section). However, I think, ggplot2 only works with data frames. I do not have data frames in this case. Hence, how can I create good looking histogram plot in ggplot2? Please let me know. Thanks in advance.
You can (and should) put your data into a data.frame if you want to use ggplot. Ideally for ggplot, the data.frame should be in long format. Here's a simple example:
df1 = rbind(data.frame(grp='x1', x=x1), data.frame(grp='x2', x=x2))
ggplot(df1, aes(x, fill=grp)) +
geom_histogram(color='black', alpha=0.5)
There are lots of options to change the appearnce how you like. If you want to have the histograms stacked or grouped, or shown as percent versus count, or as densities etc., you will find many resources in previous questions showing how to implement each of those options.

How to prevent geom_text_repel from labeling points on scatter plot with default number ordering list?

My dataset looks like this:
I'm trying to create a simple scatter plot with data labels that are names (first and last name).
I used geom_text_repel in ggrepel to create data labels, but the labels on the plot are just numbers in the order of the data points in my dataset.
For example, if you look at the first datapoint, instead of the label being "Stephen Curry" it is "1"
I have no idea why this is happening and I can't find anyone else who even has my problem, let alone a solution.
Code:
ggplot(gravity,
aes(TS., USG., label = rownames(gravity))) +
geom_point(aes(TS., USG.), color='black') +
geom_text_repel(aes(TS., USG., label = rownames(gravity)))
The image above shows the plot created by the code. As you can see, the labels are just the ordering number instead of the name. I don't see why this happening considering those ordering numbers are not part of the dataset I imported.
Thanks in advance

Trouble producing discrete legend using ggplot for a scatterplot

I am fairly new to the ggplot function in R. Currently, I am struggling to produce a legend for a given data set that I have constructed by hand. For simplicity, suppose this was my data set:
rawdata<-data.frame(matrix(c(1,1,1,
2,1,-1,
3,-1,-1,
4,-1,1
4,-2,2),5,3,byrow=TRUE))
names(rawdata)<-c("Town","x-coordinate","y-coordinate")
rawdata[,1]<-as.factor(rawdata[,1])
Now, using ggplot, I am trying to figure out how to produce a legend on a scatterplot. So far I have done the following:
p1<-ggplot(data=rawdata,aes(x=x.coordinate,y=y.coordinate,fill=rawdata[,1]))
+geom_point(data=rawdata,aes(x=x.coordinate,y=y.coordinate))
I produce the following using the above code,
As you can see, the coordinates have been plotted and the legend has been constructed, but they are only colored black.
I learned that to color coordinates, I would have needed to use the argument colour=rawdata[,1] in the geom_point function to color in points. However, when I try this, I get the following error code:
Error: Aesthetics must be either length 1 or the same as the data (4): colour
I understand that this has something to do with the length of the vector, but as of right now, I have absolutely no idea how to tackle this small problem.
geom_point() takes a colour, not a fill. And, having passed the data into ggplot(data = ..), there's no need to then pass it into the geom_point() again.
I've also fixed an error in the creation of your df in your example.
rawdata<-data.frame(matrix(c(1,1,1,2,1,-1,3,-1,-1,4,-1,1,4,-2,2),5,3,byrow=TRUE))
names(rawdata)<-c("Town","x.coordinate","y.coordinate")
rawdata[,1]<-as.factor(rawdata[,1])
library(ggplot2)
ggplot(data=rawdata,aes(x=x.coordinate,y=y.coordinate,colour=Town)) +
geom_point()

Change contour colours using directlabels

I'm fairly new to ggplot2, and I'm trying to create a contour plot of data that has missing values. Because there's missing values I can't have the contours by themselves, so I'm combining a tiles background with a contour. The problem is the labels are the same colour as the background.
Suppose I have data like so:
DF1 <- data.frame(x=rep(1:3,3),y=rep(1:3,each=3),z=c(1,2,3,2,3,4,3,NA,NA))
I can make a plot like this:
require(ggplot2); require(directlabels)
plotDF <- ggplot(DF1,aes(x,y,z=z)) + geom_tile(aes(fill=z)) + stat_contour(aes(x,y,z=z,colour= ..level..),colour="white")
direct.label(plotDF)
This gives me a plot similar to what I want but I'd like to be able to change the colours of the labels to be black. Any ideas?
I spotted a similar post and thought this would be easy, something along the lines of direct.label(p, list("last.points", colour = "black"). I could not make it work, unfortunately; I believe, this is not directly supproted.
I then decided to use black magic and managed to do the trick by manually overriding the colour scale:
direct.label(plotDF +
scale_colour_gradient(low="black", high="black"))

How to use ggplot in R to create a single graphic comprised of a bar plot and a line plot?

I am new to using qplot and ggplot, and basically want to make a figure that is just the combination of a bar plot and a line plot. I can do one or the other, but don't know how to do both at once!
Here is my data:
bulk = data.frame(x_pos=c(1,2,3,4,5,6,7,8),
y_line=c(3,7,6,8,14,16,18,12),
y_bar=c(0,0,10,0,0,0,10,0))
For a line graph, I just do qplot(x_pos, y_line, data=bulk, geom="line")
For a bar plot, I just do qplot(x_pos, y_bar, data=bulk)
But! How can I combine these at once into a single figure?? My real intention is to use several (maybe 6-10) different graphics techniques like this to generate complex figures, but it all starts with knowing how to do two at once. Thanks for any help!
Don't use qplot for this.
library(ggplot2)
ggplot(bulk, aes(x=x_pos)) +
geom_bar(aes(y=y_bar), stat="identity") +
geom_line(aes(y=y_line), color="red", size=2)

Resources