Can characters be graphed in a histogram in R? - r

So I have a data frame which I will call R. Looks something like this:
zep SEX AGE BMI
1 O F 3.416667 16.00000
2 O F 3.833333 14.87937
3 O G 3.416667 14.80223
4 O F 4.000000 15.09656
5 N G 3.666667 16.50000
6 O G 4.000000 16.49102
7 N G 3.916667 16.02413
With this data frame I want to plot multiple histograms comparing different aspects like how gender effects BMI. Like so:
par(mfrow=c(1,3)
boxplot(DF$BMI ~ DF$ZEP)
boxplot(DF$BMI ~ DF$GENDER)
boxplot(DF$BMI ~ ~ DF$AGE)
But for some reason the columns are made to be in characters instead of factors.
Now I pose this, is there a way to plot these if they are characters? If not,what can I do?
Also is there a way maybe to change zep and sex into a vector of logical factors? Maybe like in zep if O then true (1) if not then false (0), and the same thing for SEX. If G then true (1) if not then false (0).

I have to plot categorical variables for me advanced data analysis class. I can help you out. beedstands for border entry and employment data, don't steal my research plz.
The code I use to create factors is for example: (I have a column called portname that is dummy variables in a column, to create a column with factor variables (the names) This is how I would make the logical you describe. I've added that code with the larger code chunk below.
beed$portdisc <- as.numeric(beed$portname)
beed$portdisc[beed$portdisc==0] <- "Columbus Port of Entry"
beed$portdisc[beed$portdisc==1] <- "Santa Teresa Port of Entry"
beed$portdisc[beed$portdisc==2] <- "New Mexico All Ports Aggregate"
So what I've done here is taken by dataframe beed and used the specific column containing my portname variables. I add a new column to my dataframe called beed$portdisc then using the [ ] I define what I want to label as what.
In your case I think this should work (think, but I've tested by using the data you provided).
I have a hard time making the labels come out right with discrete variables. My apologies but this gets you very close.
library(ggplot2)
DF$SEX.factor <- as.character(DF$SEX)
DF$SEX.factor[DF$SEX.factor== "G"] <- "0"
DF$SEX.factor[DF$SEX.factor== "F"] <- "1"
DF$SEX.factor <- as.factor(DF$SEX.factor)
bar <- ggplot()
bar <- bar + geom_bar(data = DF$Sex.factor, aes(x=DF$SEX.factor),binwidth = .5)+ xlab("Sex")
bar <- bar + scale_x_discrete(limits = c(0,1,2), breaks= c(0,1,2), labels = c(" ","Male" ,"Female"))
bar
# DF.BMI5 = cut(DF$BMI,pretty(DF$BMI,5)) # Creates close to 5 integer ranges as factors, actomatically chooses pretty scales.
# This would be good to compair say age and BMI, best with one discreate and one continious variable
p <- ggplot(DF, aes(x = SEX.factor, y = BMI))
p <- p + geom_boxplot(width = 0.25, alpha = 0.4)
p <- p + geom_jitter(position = position_jitter(width = 0.1), alpha = .35, color = "blue")
# diamond at mean for each group
p <- p + stat_summary(fun.y = mean, geom = "point", shape = 18, size = 6,
colour = "red", alpha = 0.8)
p <- p + scale_x_discrete(limits = c(0,1,2), breaks= c(0,1,2), labels = c(" ","Male" ,"Female")) + xlab("Sex")
p
Here is what I got when I ran this code on my own data. I think this is what you're looking to create, I've included the code above. It'll work with anything where x is a discrete variable, just use the at.factor() and set y as type continuous. function/
If you need any more help just let me know, I like to help out people on here because it helps me hone my R skills. I'm more of an Visual Studio kind of guy, VBA is my friend.
Hope this helps!

If you ever need to change a character to a factor, you can always use as.factor('A'), for instance.

Related

ggplot2 does not plot multiple groups of a variable, only plots one line

I would like to make a plot with multiple lines corresponding to different groups of variable "Prob" (0.1, 0.5 and 0.9) using ggplot. Although that, when I run the code, it only plots one line instead of 3. Thanks for the help :)
Here my code:
Prob <- c(0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9,0.9)
nit <- c(0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999,0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999,0.9,0.902777775,0.90555555,0.908333325,0.9111111,0.913888875,0.91666665,0.919444425,0.9222222,0.924999975,0.92777775,0.930555525,0.9333333,0.936111075,0.93888885,0.941666625,0.9444444,0.947222175,0.94999995,0.952777725,0.9555555,0.958333275,0.96111105,0.963888825,0.9666666,0.969444375,0.97222215,0.974999925,0.9777777,0.980555475,0.98333325,0.986111025,0.9888888,0.991666575,0.99444435,0.997222125,0.9999999)
greek <- log((1-Prob)/Prob)/-10
italian <- ((0.997-nit)/(0.997-0.97))^3
Temp<-c(rep(25,111))
GT <- ((30-Temp)/(30-3.3))^3
GH <- 1-GT-italian
acid <- (-1*(((sign(GH)*(abs(GH)^(1/3)))*(7-5))-7))
Species<-c(rep("Case",111))
data <- as.data.frame(cbind(Prob,greek,GT,GH,italian, Temp,acid,nit, Species))
ggplot() +
geom_line(data = data, aes_string(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
The answer seems to be kind of two parts:
In your data frame data, the columns that should be numeric are not numeric.
The reason why you only see one line.
Fixing the Data Frame and Using aes() in place of aes_string()
I noticed something was odd when you had as.data.frame(cbind(... to make your data frame and are using aes_string(.. within the ggplot portion. If you do a quick check on data via str(data), you'll see all of your columns in data are characters, whereas in the environment the data prepared in the code for their respective columns are numeric. Ex. acid is numeric, yet data$acid is a character.
The reason for this is that you're binding the columns into a data frame by using as.data.frame(cbind(.... This results in all data being coerced into a character, so you loose the numeric nature of the data. This is also why you have to use aes_string(...) to make it work instead of aes(). To bind vectors together into a data frame, use data.frame(..., not as.data.frame(cbind(....
To fix all this, bind your columns together like this + the ggplot code:
data <- data.frame(Prob,greek,GT,GH,italian, Temp,acid,nit, Species)
# data <- as.data.frame(cbind(Prob,greek,GT,GH,italian, Temp,acid,nit, Species))
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
Why is there only one line?
The simple answer to why you only see one line is that the line for each of the values of data$Prob is equal. What you see is the effect of overplotting. It means that the line for data$Prob == 0.1 is the same line when data$Prob == 0.5 and data$Prob = 0.9.
To demonstrate this, let's separate each. I'm going to do this realizing that Prob could be created by repeating 0.1, 0.5, and 0.9 each 37 times in a row. I'll create a factor that I'll use as multiplication factor for data$nit that will result in separating our our lines:
my_factor <- rep(c(1,1.1,1.5), each=37) # our multiplication fractor
data$nit <- data$nit * my_factor # new nit column
# same plot code
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8)
There ya go. We have all lines there, you just could not see them due to overplotting. You can convince yourself of this without the multiplication business and the original data by comparing the plots for each data$Prob:
# use original dataset as above
ggplot() +
geom_line(data=data, aes(x = acid, y = nit, group = Prob, color = factor(Prob)), size = 0.8) +
facet_wrap(~Prob)

How to fix this in sorting

Hello stackoverflow community, I have a question regarding coding for ggplot. Here is my code, data format and output at the moment and below is my question.
Data format:
ID time var1 var2 var3
a 1 2 3 4
a 5 6 7 8
b 9 11 12 13
b 14 15 16 17
c . . . .
c . . . .
and so forth
Code:
gg1 <- ggplot() + geom_line(aes(x=TIME, y=Var1, col="red"), FILE) +
geom_line(aes(x=TIME, y=Var2, col="blue"), FILE) +
geom_point(aes(x=TIME, y=Var3), Model_20160806) + facet_wrap( ~ ID)+
xlab("Time (Hr)") + ylab("Concentration (ng/ml)") + ggtitle("x")
I have been struggling in making the plots in the right format and any help would be very much appreciated.
As you can see, the col="red/blue" is displayed as the legend rather than the color? Is there a way to fix it?
How do I add legends for Var1, Var2, Var3 on the bottom of the output?
I have tried adding , facet_wrap( ~ ID, ncol=3) into the code but it doesn't work and provided a null. Is there a way to fix this?
Since there are a lot of cell samples, is there a way to make the graphs onto multiple pages so the graphs are visible and interpretable
Lastly, for better visualization of the transfection data, I tried using gg1+theme_bw(), but this does not work.
Without a reproducible example it is difficult to help you with these questions.
aes(..., col="blue") Doesn't work. Inside aes() everything must refer to a column of your dataframe. If you have a grouping variable in the dataframe, use that to define color. If you want everything to be just blue, define color outside of aes().
Something like scale_colour_manual(values=c("red","green","blue")). Possible duplicate question from Add legend to ggplot2 line plot.
Could you explain what you want to do with facet_wrap( ~ ID, ncol=3)?
Yes that is possible. The easiest way is to make multiple graphs is by splitting your x into groups of 10.
Again a reason why you need a reproducible example. The short answer is, theme_bw() works for me and I have no clue why it wouldn't work for you.
For example:
library(car)
library(ggplot2)
data("diamonds")
ggplot(diamonds, aes(x = carat, y = cut, color = color)) +
geom_point() +
theme_bw()
Edit: to give an example of splitting the dataframe into groups of 10:
# Example data
df = data.frame(x = factor(rep(1:30, each = 10)), y1 = rnorm(300), y2 = rnorm(300))
# Assume that df$x is the grouping variable consisting of too many groups
# Every df$x < 10 becomes 0, 10 < df$ < 20 becomes 1, etc.
df$x2 = floor(as.numeric(df$x) / 10)
# Split the dataframe based on this new grouping variable df$x2
dfSplit = split(df, df$x2)
# do a loop over dfSplit
for (i in 1:length(dfSplit)) {
dfForPlotting = dfSplit[[i]]
# do plotting stuff
ggplot(data = dfForPlotting, aes(x = y1, y = y2, color = x)) + geom_line()
}
Regarding question 2, the easiest way to do this is using the grid package and grid.text().
library(grid)
par(mar=c(6.5, 2, 2, 2))
plot(1:10,1:10)
grid.text(x=0.2, y = 0.05, "Var1 = Birds, Var2 = Bees")

Geom_tile using a scale fill gradient within each level rather than entire data set

I am recently exploring R and I am trying to do a geom_tile plot in ggplot where rather than set up the scale of high to low between all range of values, a comparison is made at each level of a variable for 4 factor levels.
An example of the data is:
data <- read.table(text = "
cam k1 k2 k3
n1 342232.6 112964 56589.85
n2 159472.8 54713.9 29480.88
n3 102048.4 38358.95 23376.48
n4 75924.33 32455.58 22504.05
", sep = "", header = TRUE)
I have written this simple code to get a colour map that uses the entire range of values.
library(reshape2)
library(ggplot2)
datam <- melt(data)
p <- ggplot(datam, aes(cam,variable))
(p + geom_tile(aes(fill=value), colour = "white") +
scale_fill_gradient(low="green",high="red"))
However, I would like to get a gradient fill scale that compares the range of values (i.e. 4 values) between factors cam but within each level of k. Basically to highlight the lowest value at each level of k.*
I have many scenarios to plot, so a facet for each level of k is not an option.
Any suggestions would be highly appreciated. Thanks

Visualize critical values / pairwise comparisons from posthoc Tukey in R

I'm trying to get a fine-grain visualisation of critical values I got from posthoc Tukey. There are some good guidelines out there for visualizing pairwise comparisons, but I need something more refined. The idea is that I would have a plot where each small square would represent a critical value from the matrix below, coded in such manner that:
if the value is higher or equal to 5.45 - it's a black square;
if the value is lower or equal to -5.45 - it's a gray square;
if the value is between -5.65 and 5.65 - it's a white square.
The data matrix is here.
Or maybe you would have better suggestion how to visualize those critical values?
EDIT: Following comments from #Aaron and #DWin I want to provide a bit more context for the above data and justification for my question. I am looking at the mean ratings of acceptability for seven virtual characters, each of them is animated on 5 different levels. So, I have two factors there - character (7 levels) and motion (5 levels). Because I have found interaction between those two factors, I decided to look at differences between the means for all the characters for all levels of motion , which resulted in this massive matrix, as an output of posthoc Tukey. It's probably too much detail now, but please don't throw me out to Cross Validated, they will eat me alive...
This is fairly straightforward with image:
d <- as.matrix(read.table("http://dl.dropbox.com/u/2505196/postH.dat"))
image(x=1:35, y=1:35, as.matrix(d), breaks=c(min(d), -5.45, 5.45, max(d)),
col=c("grey", "white", "black"))
For just half, set half to missing with d[upper.tri(d)] <- NA and add na.rm=TRUE to the
min and max functions.
Here is a ggplot2 solution. I'm sure there are simpler ways to accomplish this -- I guess I got carried away!
library(ggplot2)
# Load data.
postH = read.table("~/Downloads/postH.dat")
names(postH) = paste("item", 1:35, sep="") # add column names.
postH$item_id_x = paste("item", 1:35, sep="") # add id column.
# Convert data.frame to long form.
data_long = melt(postH, id.var="item_id_x", variable_name="item_id_y")
# Convert to factor, controlling the order of the factor levels.
data_long$item_id_y = factor(as.character(data_long$item_id_y),
levels=paste("item", 1:35, sep=""))
data_long$item_id_x = factor(as.character(data_long$item_id_x),
levels=paste("item", 1:35, sep=""))
# Create critical value labels in a new column.
data_long$critical_level = ifelse(data_long$value >= 5.45, "high",
ifelse(data_long$value <= -5.65, "low", "middle"))
# Convert to labels to factor, controlling the order of the factor levels.
data_long$critical_level = factor(data_long$critical_level,
levels=c("high", "middle", "low"))
# Named vector for ggplot's scale_fill_manual
critical_level_colors = c(high="black", middle="grey80", low="white")
# Calculate grid line positions manually.
x_grid_lines = seq(0.5, length(levels(data_long$item_id_x)), 1)
y_grid_lines = seq(0.5, length(levels(data_long$item_id_y)), 1)
# Create plot.
plot_1 = ggplot(data_long, aes(xmin=as.integer(item_id_x) - 0.5,
xmax=as.integer(item_id_x) + 0.5,
ymin=as.integer(item_id_y) - 0.5,
ymax=as.integer(item_id_y) + 0.5,
fill=critical_level)) +
theme_bw() +
opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank()) +
coord_cartesian(xlim=c(min(x_grid_lines), max(x_grid_lines)),
ylim=c(min(y_grid_lines), max(y_grid_lines))) +
scale_x_continuous(breaks=seq(1, length(levels(data_long$item_id_x))),
labels=levels(data_long$item_id_x)) +
scale_y_continuous(breaks=seq(1, length(levels(data_long$item_id_x))),
labels=levels(data_long$item_id_y)) +
scale_fill_manual(name="Critical Values", values=critical_level_colors) +
geom_rect() +
geom_hline(yintercept=y_grid_lines, colour="grey40", size=0.15) +
geom_vline(xintercept=x_grid_lines, colour="grey40", size=0.15) +
opts(axis.text.y=theme_text(size=9)) +
opts(axis.text.x=theme_text(size=9, angle=90)) +
opts(title="Critical Values Matrix")
# Save to pdf file.
pdf("plot_1.pdf", height=8.5, width=8.5)
print(plot_1)
dev.off()
If you set this up with findInterval as an index into the bg, col, and/or pch arguments (although they are all squares at the moment), you should find the code fairly compact and understandable.
You'll need to get the data in long format first; here's one way:
d <- as.matrix(read.table("http://dl.dropbox.com/u/2505196/postH.dat"))
dat <- within(as.data.frame(as.table(d)),
{ Var1 <- as.numeric(Var1)
Var2 <- as.numeric(Var2) })
Then the code is as follows; pch=22 uses filled squares, bg sets the fill color of the square, col sets the border color, and cex=1.5 just makes them a little bigger than the default.
plot(dat$Var1, dat$Var2,
bg = c("grey", "white", "black")[1+findInterval(dat$Freq, c(-5.45,5.45))],
col="white", cex=1.5, pch = 22)
You need the 1+ in there because the values would be 0,1,2 and your indices need to start with 1.
To make a closure here I used majority of suggestions from #DWin and #Aaron to create the plot below. The lightest level of gray stands for non-significant values. I also used rect to create lines above axis names to better differentiate between conditions:
d <- as.matrix(read.table("http://dl.dropbox.com/u/2505196/postH.dat"))
#remove upper half of the values (as they are mirrored values)
d[upper.tri(d)] <- NA
dat <- within(as.data.frame(as.table(d)),{
Var1 <- as.numeric(Var1)
Var2 <- as.numeric(Var2)})
par(mar=c(6,3,3,6))
colPh=c("gray50","gray90","black")
plot(dat$Var1,dat$Var2,bg = colPh[1+findInterval(dat$Freq, c(-5.45,5.45))],
col="white",cex=1.2,pch = 21,axes=F,xlab="",ylab="")
labDis <- rep(c("A","B","C","D","E"),times=7)
labChar <- c(1:7)
axis(1,at=1:35,labels=labDis,cex.axis=0.5,tick=F,line=-1.4)
axis(1,at=seq(3,33,5),labels=labChar, tick=F)
#drawing lines above axis for better identification
rect(1,0,5,0,angle=90);rect(6,0,10,0,angle=90);rect(11,0,15,0,angle=90);
rect(16,0,20,0,angle=90);rect(21,0,25,0,angle=90);rect(26,0,30,0,angle=90);
rect(31,0,35,0,angle=90)
axis(4,at=1:35,labels=labDis,cex.axis=0.5,tick=F,line=-1.4)
axis(4,at=seq(3,33,5),labels=labChar,tick=F)
#drawing lines above axis for better identification
rect(36,1,36,5,angle=90);rect(36,6,36,10,angle=90);rect(36,11,36,15,angle=90);
rect(36,16,36,20,angle=90);rect(36,21,36,25,angle=90);rect(36,26,36,30,angle=90);
rect(36,31,36,35,angle=90)
legend("topleft",legend=c("not significant","p<0.01","p<0.05"),pch=16,
col=c("gray90","gray50","black"),cex=0.7,bty="n")

How to give color to each class in scatter plot in R?

In a dataset, I want to take two attributes and create supervised scatter plot. Does anyone know how to give different color to each class ?
I am trying to use col == c("red","blue","yellow") in the plot command but not sure if it is right as if I include one more color, that color also comes in the scatter plot even though I have only 3 classes.
Thanks
Here is a solution using traditional graphics (and Dirk's data):
> DF <- data.frame(x=1:10, y=rnorm(10)+5, z=sample(letters[1:3], 10, replace=TRUE))
> DF
x y z
1 1 6.628380 c
2 2 6.403279 b
3 3 6.708716 a
4 4 7.011677 c
5 5 6.363794 a
6 6 5.912945 b
7 7 2.996335 a
8 8 5.242786 c
9 9 4.455582 c
10 10 4.362427 a
> attach(DF); plot(x, y, col=c("red","blue","green")[z]); detach(DF)
This relies on the fact that DF$z is a factor, so when subsetting by it, its values will be treated as integers. So the elements of the color vector will vary with z as follows:
> c("red","blue","green")[DF$z]
[1] "green" "blue" "red" "green" "red" "blue" "red" "green" "green" "red"
You can add a legend using the legend function:
legend(x="topright", legend = levels(DF$z), col=c("red","blue","green"), pch=1)
Here is an example that I built based on this page.
library(e1071); library(ggplot2)
mysvm <- svm(Species ~ ., iris)
Predicted <- predict(mysvm, iris)
mydf = cbind(iris, Predicted)
qplot(Petal.Length, Petal.Width, colour = Species, shape = Predicted,
data = iris)
This gives you the output. You can easily spot the misclassified species from this figure.
One way is to use the lattice package and xyplot():
R> DF <- data.frame(x=1:10, y=rnorm(10)+5,
+> z=sample(letters[1:3], 10, replace=TRUE))
R> DF
x y z
1 1 3.91191 c
2 2 4.57506 a
3 3 3.16771 b
4 4 5.37539 c
5 5 4.99113 c
6 6 5.41421 a
7 7 6.68071 b
8 8 5.58991 c
9 9 5.03851 a
10 10 4.59293 b
R> with(DF, xyplot(y ~ x, group=z))
By giving explicit grouping information via variable z, you obtain different colors. You can specify colors etc, see the lattice documentation.
Because z here is a factor variable for which we obtain the levels (== numeric indices), you can also do
R> with(DF, plot(x, y, col=z))
but that is less transparent (to me, at least :) then xyplot() et al.
Here is how I do it in 2018. Who knows, maybe an R newbie will see it one day and fall in love with ggplot2.
library(ggplot2)
ggplot(data = iris, aes(Petal.Length, Petal.Width, color = Species)) +
geom_point() +
scale_color_manual(values = c("setosa" = "red", "versicolor" = "blue", "virginica" = "yellow"))
If you have the classes separated in a data frame or a matrix, then you can use matplot. For example, if we have
dat<-as.data.frame(cbind(c(1,2,5,7),c(2.1,4.2,-0.5,1),c(9,3,6,2.718)))
plot.new()
plot.window(c(0,nrow(dat)),range(dat))
matplot(dat,col=c("red","blue","yellow"),pch=20)
Then you'll get a scatterplot where the first column of dat is plotted in red, the second in blue, and the third in yellow. Of course, if you want separate x and y values for your color classes, then you can have datx and daty, etc.
An alternate approach would be to tack on an extra column specifying what color you want (or keeping an extra vector of colors, filling it iteratively with a for loop and some if branches). For example, this will get you the same plot:
dat<-as.data.frame(
cbind(c(1,2,5,7,2.1,4.2,-0.5,1,9,3,6,2.718)
,c(rep("red",4),rep("blue",4),rep("yellow",4))))
dat[,1]=as.numeric(dat[,1]) #This is necessary because
#the second column consisting of strings confuses R
#into thinking that the first column must consist of strings, too
plot(dat[,1],pch=20,col=dat[,2])
Assuming the class variable is z, you can use:
with(df, plot(x, y, col = z))
however, it's important that z is a factor variable, as R internally stores factors as integers.
This way, 1 is 'black', 2 is 'red', 3 is 'green, ....
This article is old, but I spent a hot minute trying to figure this out so I figured I would post an updated response. My main source is this wonderful PowerPoint: http://www.lrdc.pitt.edu/maplelab/slides/14-Plotting.pdf. Okay, here's what I did:
In this example, my data set is called 'Data' and I was comparing 'Touch' data against 'Gaze' data. The subjects were divided into two groups: 'Red' and 'Blue'.
`plot(Data$Touch[Data$Category == "Blue"], Data$Gaze[Data$Category == "Blue"], main = "Touch v Gaze", xlab = "Gaze(s)", ylab = "Touch (s)", col = "blue", pch = 20)`
This set of code creates a scatterplot of Touch v Gaze of my Blue group
par(new = TRUE)
This tells R to create a new plot. This second plot is laid over the first automatically by R when you run all the code together
plot(Data$Touch[Data$Category == "Red"], Data$Gaze[Data$Category == "Red"], axes = FALSE, xlab = "", ylab = "", col = "red", pch = 2)
This is the second plot. I found when I was coding these that R didn't just lay over the data points onto the Blue plot, but it also lay the axes, axes titles, and main title.
To get rid of the annoying overlap problem, I used the axes function to get rid of the axes themselves and set the titles to be blank.
legend(x = 60, y = 50, legend = c("Blue", "Red"), col = c("blue", "red"), pch = c(20, 2))
Adding a pretty legend to round out the project
This way may be a bit longer than the pretty ggplots but I did not want to learn something completely new today, hope this helps someone!

Resources