I have Test data as below;
Test
x y
1 4324.3329 484.6496
3 3258.4572 499.9621
4 4462.8230 562.7703
7 5173.4353 572.9492
8 4188.0244 530.8349
9 3557.5385 494.6672
10 2353.1382 517.5235
11 4944.2605 537.7489
15 3335.6628 488.4479
16 4059.0555 534.5479
17 4694.1778 531.7709
18 3213.8639 496.0062
19 4119.5348 516.3399
20 4267.7457 537.1041
22 4284.2706 503.8527
23 3019.6271 498.8519
35 2549.8743 503.5473
36 4976.5386 566.5985
37 2717.9942 513.2320
38 3545.2092 448.4752
40 3352.3206 457.7265
41 3198.0481 560.4075
42 1387.7531 395.7657
43 957.6421 296.1419
44 3168.8167 489.5333
45 2717.1015 478.6760
46 3694.8913 455.2763
47 4131.9760 519.9161
48 4366.2339 502.5977
49 4314.1003 486.7103
50 3818.1977 461.5844
52 3745.0532 467.7885
I add scatter plot as follows;
gg <- ggplot(Test, aes(x = x, y = y))+
geom_point()+
stat_ellipse()
ggMarginal(
gg,
type = "boxplot",
margins = "both",
size = 5
)
print(gg)
It seems like there are two groups;
(1) at right-top with large number of points
(2) at left-bottom with two points.
In this case, how can I divide the data into two groups?
I have tried k-mean clustering as follows;
#k-mean
km <- kmeans(Test,2)
library(cluster)
clusplot(Test, km$cluster, color=TRUE, shade=TRUE, labels=2, lines=0)
But, this changes x-y coordinates into PC1 & PC2, which is not what I want in this case.
For example,
set.seed(42)
km <- kmeans(Test,2)
ggplot(Test, aes(x = x, y = y,colour = factor(km$cluster)))+
geom_point()+
stat_ellipse(type = "norm", linetype = 2)
gives,
Related
I have colored a graph with ggplot2 based on a threshold value of 1. Surface scores greater than 1
was colored azure and surface scores less than 1 is colored beige. Here is my sample code.
library(ggplot2)
setwd("F:/SUST_mutation/Graph_input")
d <- read.csv(file = "N.csv", sep = ",", header = TRUE)
ggplot(d, aes(x= Position,y= wild_Score)) + xlab("Positions") + ylab("Scores") +
geom_ribbon(aes(ymin=pmin(wild_Score,1), ymax=1), fill="beige", alpha= 1.5) +
geom_ribbon(aes(ymin=1, ymax=pmax(wild_Score,1)), fill="azure", alpha= 1.5)
My problem is that if I go through the upper surface to the lower surface, I expect the surface line in one line.
But if you see the figure, you will see that they are not. Around the threshold line, the lower surface does not meet the upper surface rather it creates some extra surface. For convenience, I have marked the portions with a red circle.
extra surface on the negative portion close to threshold:
Position Wild_Score
4 1.048
5 1.052
6 1.016
7 0.996
8 0.97
9 0.951
10 0.971
11 1.047
12 1.036
13 1.051
14 1.124
15 1.172
16 1.172
17 1.164
18 1.145
19 1.186
20 1.197
21 1.197
22 1.216
23 1.193
24 1.216
25 1.216
26 1.262
Problem-2:
I have a data frame like following.
Position Score_1 Score_2
4 1.048 1.048
5 1.052 1.052
6 1.016 1.016
7 0.996 1.433
8 0.97 1.432
9 0.951 1.567
10 0.971 1.231
11 1.047 1.055
12 1.036 1.036
13 1.051 1.051
14 1.124 1.124
15 1.172 1.172
16 1.172 1.172
17 1.164 1.164
I plot the surface for position vs score_1 with Tibble and a line graph on that surface with the same positions vs score_2 like the following,
desired graph
As the line just differs at some points I subsetted the main dataset(both column and row).
I get the following error.
"Error: Aesthetics must be either length 1 or the same as the data (13): x" I guess this is because I used two different data frames for the graphs.
here is my code:
d <- read.csv(file = "E.csv", sep = ",", header = TRUE)
d1 <- tibble::tibble(
x = seq(min(d$Position), max(d$Position), length.out = 1000),
y = approx(d$Position, d$Score_1, xout = x)$y
)
ggplot(d1, aes(x= x,y= y)) + xlab("Positions") + ylab("Scores") +
geom_ribbon(aes(ymin=pmin(y,1), ymax=1), fill="red", alpha= 1.5) +
geom_ribbon(aes(ymin=1, ymax=pmax(y,1)), fill="blue", alpha= 1.5) +
geom_line(aes(y=1)) + geom_line(d = d[c(3:10), c(1,3)],aes(y =
Score_2), color = "blue", size = 1)
I want to know what is causing the problem and how should I deal with it?
It's because the negative surface at, for example, row 3 and 4 starts from 1 and goes to 0.996, instead of going from 1.016 to 0.996. Relevant discussion and other examples at ggplot2's issue tracker.
This problem is typically only visible if the number of observations is small-ish, so the typical way people overcome this problem is to interpolate the data. You can find an example of that below (I've omitted your colours because it was hard to see):
library(ggplot2)
# txt <- "your_example_table" # Omitted for brevity
df <- read.table(text = txt, sep = "\t", header = TRUE)
data2 <- tibble::tibble(
x = seq(min(df$Position), max(df$Position), length.out = 1000),
y = approx(df$Position, df$Wild_Score, xout = x)$y
)
ggplot(data2, aes(x= x,y= y)) + xlab("Positions") + ylab("Scores") +
geom_ribbon(aes(ymin=pmin(y,1), ymax=1, fill = "A")) +
geom_ribbon(aes(ymin=1, ymax=pmax(y,1), fill = "B"))
This is great for hiding the problem, but calculating the exact line intersection points is a bit of a pain. I apologise for the self-promotion but I ran into this too and wrapped my solution for finding these line intersection points in a function on the dev version of my package ggh4x, which you might find useful.
library(ggh4x) # devtools::install_github("teunbrand/ggh4x")
ggplot(df, aes(x= Position,y= Wild_Score)) +
stat_difference(aes(ymin = 1, ymax = Wild_Score))
Created on 2021-08-15 by the reprex package (v1.0.0)
I am following the tutorial over here : https://www.rpubs.com/loveb/som . This tutorial shows how to use the Kohonen Network (also called SOM, a type of machine learning algorithm) on the iris data.
I ran this code from the tutorial:
library(kohonen) #fitting SOMs
library(ggplot2) #plots
library(GGally) #plots
library(RColorBrewer) #colors, using predefined palettes
iris_complete <-iris[complete.cases(iris),]
iris_unique <- unique(iris_complete) # Remove duplicates
#scale data
iris.sc = scale(iris_unique[, 1:4]) #Levels/Factors cannot be scaled... But used in predictive SOM:s using xyf. Later.
#build grid
iris.grid = somgrid(xdim = 10, ydim=10, topo="hexagonal", toroidal = TRUE)
set.seed(33) #for reproducability
iris.som <- som(iris.sc, grid=iris.grid, rlen=700, alpha=c(0.05,0.01), keep.data = TRUE)
#plot 1
plot(iris.som, type="count")
#plot2
var <- 1 #define the variable to plot
plot(iris.som, type = "property", property = getCodes(iris.som)[,var], main=colnames(getCodes(iris.som))[var], palette.name=terrain.colors)
The above code fits a Kohonen Network on the iris data. Each observation from the data set is assigned to each one of the "colorful circles" (also called "neurons") in the below pictures.
My question: In these plots, how would you identify which observations were assigned to which circles? Suppose I wanted to know which observations belong in the circles outlined in with the black triangles below:
Is it possible to do this? Right now, I am trying to use iris.som$classif to somehow trace which points are in which circle. Is there a better way to do this?
UPDATE: #Jonny Phelps showed me how to identify observations within a triangular form (see answer below). But i am still not sure if it possible to identify irregular shaped forms. E.g.
In a previous post (Labelling Points on a Plot (R Language)), a user showed me how to assign arbitrary numbers to each circle on the grid:
Based on the above plot, how could you use the "som$classif" statement to find out which observations were in circles 92,91,82,81,72 and 71?
Thanks
EDIT: Now with Shiny App!
A plotly solution is also possible, where you can mouse over individual neurons to display the associated iris rownames (called id here). Based on your iris.som data and Jonny Phelps' grid approach, you can just assign the row numbers as concatenated strings to the individual neurons and have these shown upon mouseover:
library(ggplot2)
library(plotly)
ga <- data.frame(g=iris.som$unit.classif,
sample=seq_len(dim(iris.som$data[[1]])[1]))
grid_pts <- as.data.frame(iris.som$grid$pts)
grid_pts$column <- rep(1:iris.som$grid$xdim, by=iris.som$grid$ydim)
grid_pts$row <- rep(1:iris.som$grid$ydim, each=iris.som$grid$xdim)
grid_pts$classif <- 1:nrow(grid_pts)
grid_pts$id <- sapply(seq_along(grid_pts$classif),
function(x) paste(ga$sample[ga$g==x], collapse=", "))
grid_pts$count <- sapply(seq_along(grid_pts$classif),
function(x) length(ga$sample[ga$g==x]))
grid_pts$count <- factor(grid_pts$count, levels=0:max(grid_pts$count))
p1 <- ggplot(grid_pts, aes(x=x, y=y, colour=count, row=row, column=column, id=id)) +
geom_point(size=8) +
scale_colour_manual(values=c("grey50", heat.colors(length(unique(grid_pts$count))))) +
theme_void() +
theme(plot.margin=unit(c(1,rep(.3, 3)),"cm"))
ggplotly(p1)
Here is a full Shiny app that allows lasso selection and shows a table with the data:
invisible(suppressPackageStartupMessages(
lapply(c("shiny","dplyr","ggplot2", "plotly", "kohonen", "GGally", "DT"),
require, character.only=TRUE)))
iris_complete <- iris[complete.cases(iris),]
iris_unique <- unique(iris_complete) # Remove duplicates
#scale data
iris.sc = scale(iris_unique[, 1:4]) #Levels/Factors cannot be scaled... But used in predictive SOM:s using xyf. Later.
#build grid
iris.grid = somgrid(xdim = 10, ydim=10, topo="hexagonal", toroidal = TRUE)
set.seed(33) #for reproducability
iris.som <- som(iris.sc, grid=iris.grid, rlen=700, alpha=c(0.05,0.01), keep.data = TRUE)
ga <- data.frame(g=iris.som$unit.classif,
sample=seq_len(dim(iris.som$data[[1]])[1]))
grid_pts <- as.data.frame(iris.som$grid$pts)
grid_pts$column <- rep(1:iris.som$grid$xdim, by=iris.som$grid$ydim)
grid_pts$row <- rep(1:iris.som$grid$ydim, each=iris.som$grid$xdim)
grid_pts$classif <- 1:nrow(grid_pts)
grid_pts$id <- sapply(seq_along(grid_pts$classif),
function(x) paste(ga$sample[ga$g==x], collapse=", "))
grid_pts$count <- sapply(seq_along(grid_pts$classif),
function(x) length(ga$sample[ga$g==x]))
grid_pts$count <- factor(grid_pts$count, levels=0:max(grid_pts$count))
# Shiny app, adapted from https://gist.github.com/dgrapov/128e3be71965bf00495768e47f0428b9
ui <- fluidPage(
fluidRow(
column(12, plotlyOutput("plot", height = "600px")),
column(12, DT::dataTableOutput('data_table'))
)
)
server <- function(input, output){
output$plot <- renderPlotly({
req(data())
p <- ggplot(data = data()$data,
aes(x=x, y=y, classif=classif, colour=count, row=row, column=column, id=id)) +
geom_point(size=8) +
scale_colour_manual(
values=c("grey50", heat.colors(length(unique(grid_pts$count))))
) +
theme_void() +
theme(plot.margin=unit(c(1, rep(.3, 3)), "cm"))
obj <- data()$sel
if(nrow(obj) != 0) {
p <- p + geom_point(data=obj, mapping=aes(x=x, y=y, classif=classif,
count=count, row=row, column=column, id=id), color="blue",
size=5, inherit.aes=FALSE)
}
ggplotly(p, source="p1") %>% layout(dragmode = "lasso")
})
selected <- reactive({
event_data("plotly_selected", source = "p1")
})
output$data_table <- DT::renderDataTable(
data()$sel, filter='top', options=list(
pageLength=5, autoWidth=TRUE
)
)
data <- reactive({
tmp <- grid_pts
sel <- tryCatch(filter(grid_pts, paste(x, y, sep="_") %in%
paste(selected()$x, selected()$y, sep="_")),
error=function(e){NULL})
list(data=tmp, sel=sel)
})
}
shinyApp(ui,server)
From what I can see, using iris.som$unit.classif & iris.som$grid is the way to go in isolating circles within the plotting grid. I have made an assumption that the classifier value matches the row index of iris.som$grid so this will need some more validation. Let me know if this helps your problem :)
findTriangle <- function(top_row, top_column, side_length, iris.som,
reverse=FALSE){
# top_row: row index of the top most triangle value
# top_column: column index...
# side_length: how many rows does the triangle occupy?
# iris.som: the som object
# reverse: set to TRUE to flip the triangle
# make the grid
grid_pts <- as.data.frame(iris.som$grid$pts)
grid_pts$column <- rep(1:iris.som$grid$xdim, by=iris.som$grid$ydim)
grid_pts$row <- rep(1:iris.som$grid$ydim, each=iris.som$grid$xdim)
grid_pts$classif <- 1:nrow(grid_pts)
# starting point - top most point of the triangle
# use reverse for triangles the other way around
grid_pts$triangle <- FALSE
grid_pts[grid_pts$column == top_column & grid_pts$row == top_row, ][["triangle"]] <- TRUE
# loop through the remaining rows and fill out the triangle
value_row <- top_row
value_start_column <- grid_pts[grid_pts$triangle == TRUE,]$x
value_end_column <- grid_pts[grid_pts$triangle == TRUE,]$x
if(reverse){
row_move <- -1
}else{
row_move <- 1
}
# update triangle
for(row in 1:(side_length-1)){
value_row <- value_row + row_move
value_start_column <- value_start_column - 0.5
value_end_column <- value_end_column + 0.5
grid_pts[grid_pts$row == value_row &
grid_pts$x >= value_start_column &
grid_pts$x <= value_end_column, ]$triangle <- TRUE
}
# visualise
pl <- ggplot(grid_pts, aes(x=x, y=rev(row), col=as.factor(triangle))) +
geom_point(size=7) +
scale_color_manual(values=c("grey", "indianred")) +
theme_void()
print(pl)
return(grid_pts)
}
# take the grid and pick out the triangle
top_row <- 2
top_column <- 6
side_length <- 4
reverse <- FALSE # set to TRUE to flip the triangle ie go from the bottom
grid_pts <- findTriangle(top_row, top_column, side_length, iris.som, reverse)
# now add the classifier and merge to get the co-ordinates
iris.sc2 <- as.data.frame(iris.sc)
iris.sc2$classif <- iris.som$unit.classif
iris.sc2 <- merge(iris.sc2, grid_pts, by=c("classif"), all.x=TRUE)
# filter to the points in the triangle
iris.sc2[iris.sc2$triangle==TRUE,]
Output data:
classif Sepal.Length Sepal.Width Petal.Length Petal.Width x y column row triangle
21 16 -1.01537328 0.5506423 -1.3287735 -1.3042249 6.0 1.732051 6 2 TRUE
22 16 -1.01537328 0.3214643 -1.4419091 -1.3042249 6.0 1.732051 6 2 TRUE
39 25 -0.89501479 1.0089981 -1.3287735 -1.3042249 5.5 2.598076 5 3 TRUE
40 25 -0.77465630 1.0089981 -1.2722057 -1.3042249 5.5 2.598076 5 3 TRUE
41 25 -0.77465630 0.7798202 -1.3287735 -1.3042249 5.5 2.598076 5 3 TRUE
42 25 -1.01537328 0.7798202 -1.2722057 -1.3042249 5.5 2.598076 5 3 TRUE
43 25 -0.89501479 0.7798202 -1.2722057 -1.3042249 5.5 2.598076 5 3 TRUE
44 26 -0.89501479 0.5506423 -1.1590702 -0.9108454 6.5 2.598076 6 3 TRUE
45 26 -1.01537328 0.7798202 -1.2156380 -1.0419719 6.5 2.598076 6 3 TRUE
58 36 -0.53393933 0.7798202 -1.2722057 -1.0419719 6.0 3.464102 6 4 TRUE
59 36 -0.41358084 1.0089981 -1.3853413 -1.3042249 6.0 3.464102 6 4 TRUE
60 36 -0.53393933 0.7798202 -1.1590702 -1.3042249 6.0 3.464102 6 4 TRUE
61 37 -1.01537328 1.0089981 -1.2156380 -0.7797188 7.0 3.464102 7 4 TRUE
62 37 -1.01537328 1.0089981 -1.3853413 -1.1730984 7.0 3.464102 7 4 TRUE
63 37 -0.89501479 1.0089981 -1.3287735 -1.1730984 7.0 3.464102 7 4 TRUE
74 44 0.06785311 0.3214643 0.5945312 0.7937995 4.5 4.330127 4 5 TRUE
75 46 -0.65429782 1.4673539 -1.2722057 -1.3042249 6.5 4.330127 6 5 TRUE
76 46 -0.53393933 1.4673539 -1.2722057 -1.3042249 6.5 4.330127 6 5 TRUE
77 47 -0.89501479 1.6965319 -1.0459346 -1.0419719 7.5 4.330127 7 5 TRUE
78 47 -0.89501479 1.6965319 -1.2156380 -1.3042249 7.5 4.330127 7 5 TRUE
79 47 -0.89501479 1.4673539 -1.2722057 -1.0419719 7.5 4.330127 7 5 TRUE
80 47 -0.89501479 1.6965319 -1.2722057 -1.1730984 7.5 4.330127 7 5 TRUE
Validation plotting on the grid:
I elaborated the example in my post, however, not on the iris data set but I suppose it is no problem: R, SOM, Kohonen Package, Outlier Detection and also added code snippets you might need. They show
How to generate data, add outliers and depict them on plots
How to train the SOM
How to do the clustering
How to use hierarchic clustering to add the cluster boundaries to the SOM plots
Finally, I added the clusters predicted by SOM to compare them with the real clusters in which I generated the data
I think this answers your questions. It would also be nice to compare the performance of SOM with t-SNE. I have only used SOM as an experiment on the data I generated and on the real wine data set. It would also be nice to prepare heat maps if you have more than 2 variables. All the best to you analysis!
I would like to eliminate the gap between the x and y axes in barplot and extend the predicted line back to intersect the y axis, preferably in base R. Is this possible? Thank you for any advice or suggestions.
my.data <- read.table(text = '
band mid.point count
1 0.5 74
2 1.5 73
3 2.5 79
4 3.5 70
5 4.5 78
6 5.5 63
7 6.5 59
8 7.5 60
', header = TRUE)
my.data
x <- my.data$mid.point^2
my.model <- lm(count ~ x, data = my.data)
my.plot <- barplot(my.data$count, ylim=c(0,100), space=0, col=NA)
axis(1, at=my.plot+0.5, labels=my.data$band)
lines(predict(my.model, data.frame(x=x), type="resp"), col="black", lwd = 1.5)
EDIT November 26, 2014
I just realized the two plots are not the same (the plot in the original post and the plot in my answer below). Compare the two curved lines closely, particularly at the right-side of the plot. Clearly the two curved lines intersect the top of the 8th bar in different locations. However, I have not yet had time to figure out why the plots differ.
Here is one way to extrapolate the predicted line back to the y axis. I incorporate rawr's suggestion regarding eliminating the gap between the y axis and the x axis.
setwd('c:/users/markm/simple R programs/')
jpeg(filename = "barplot_and_line.jpeg")
my.data <- read.table(text = '
band mid.point count
1 0.5 74
2 1.5 73
3 2.5 79
4 3.5 70
5 4.5 78
6 5.5 63
7 6.5 59
8 7.5 60
', header = TRUE)
x <- my.data$mid.point^2
my.model <- lm(count ~ x, data = my.data)
z <- seq(0,8,0.01)
y <- my.model$coef[1] + my.model$coef[2] * z^2
barplot(my.data$count, ylim=c(0,100), space=0, col=NA, xaxs = 'i')
points(z, y, type='l', col=1)
dev.off()
I am plotting a graph using the following piece of code:
library (ggplot2)
png (filename = "graph.png")
stats <- read.table("processed-r.dat", header=T, sep=",")
attach (stats)
stats <- stats[order(best), ]
sp <- stats$A / stats$B
index <- seq (1, sum (sp >= 1.0))
stats <- data.frame (x=index, y=sp[sp>=1.0])
ggplot (data=stats, aes (x=x, y=y, group=1)) + geom_line()
dev.off ()
1 - How one can add a vertical line in the plot which intersects at a particular value of y (for example 2)?
2 - How one can make the y-axis start at 0.5 instead of 1?
You can add vertical line with geom_vline(). In your case:
+ geom_vline(xintercept=2)
If you want to see also number 0.5 on your y axis, add scale_y_continuous() and set limits= and breaks=
+ scale_y_continuous(breaks=c(0.5,1,2,3,4,5),limits=c(0.5,6))
Regarding the first question:
This answer is assuming that the value of Y you desire is specifically within your data set. First, let's create a reproducible example as I cannot access your data set:
set.seed(9999)
stats <- data.frame(y = sort(rbeta(250, 1, 10)*10 ,decreasing = TRUE), x = 1:250)
ggplot(data=stats, aes (x=x, y=y, group=1)) + geom_line()
What you need to do is to use the y column in your data frame to search for the specific value. Essentially you will need to use
ggplot(data=stats, aes (x=x, y=y, group=1)) + geom_line() +
geom_vline(xintercept = stats[stats$y == 2, "x"])
Using the data I generated above, here's an example. Since my data frame does not likely contain the exact value 2, I will use the trunc function to search for it:
stats[trunc(stats$y) == 2, ]
# y x
# 9 2.972736 9
# 10 2.941141 10
# 11 2.865942 11
# 12 2.746600 12
# 13 2.741729 13
# 14 2.693501 14
# 15 2.680031 15
# 16 2.648504 16
# 17 2.417008 17
# 18 2.404882 18
# 19 2.370218 19
# 20 2.336434 20
# 21 2.303528 21
# 22 2.301500 22
# 23 2.272696 23
# 24 2.191114 24
# 25 2.136638 25
# 26 2.067315 26
Now we know where all the values of 2 are. Since this graph is decreasing, we will reverse it, then the value closest to 2 will be at the beginning:
rev(stats[trunc(stats$y) == 2, 1])
# y x
# 26 2.067315 26
And we can use that value to specify where the x intercept should be:
ggplot(data=stats, aes (x=x, y=y, group=1)) + geom_line() +
geom_vline(xintercept = rev(stats[trunc(stats$y) == 2, "x"])[1])
Hope that helps!
I'm trying to fit regression lines to this relation angc~ext. Variable pch divides the data into two sets to each of which I want to fit a regression line
with its confidence intervals. Here's my data frame (C):
"ext" "angc" "pch"
25 3.76288002820208 0
29 4.44255895177431 0
21 2.45214044383301 0
35 4.01334352881766 0
35 9.86225452423762 0
28 19.9304126868056 1
32 25.6984064030981 1
20 5.10582966112880 0
36 5.75603291081328 0
11 4.62311785943305 0
33 4.94401591414043 0
27 8.10039123328465 0
29 16.3882499757369 1
30 29.3492784626796 1
29 3.85960848290140 0
32 5.35857680326963 0
26 4.86451443776053 0
16 8.22008387344697 0
30 10.2212259432413 0
32 17.2519440101067 1
29 27.5011256290209 1
My code:
c0 <- C[C$pch == 0, ]
c1 <- C[C$pch == 1, ]
prd0 <- as.data.frame( predict( lm(c0$angc ~ c0$ext), interval = c("confidence") ) )
prd1 <- as.data.frame( predict( lm(c1$angc ~ c1$ext), interval = c("confidence") ) )
dev.new()
plot( C$angc ~ C$ext, type = 'n' )
points( c0$angc ~ c0$ext, pch = 17 ) # triangles
abline(lm(c0$angc ~ c0$ext)) # regression line
lines(prd0$lwr) # lower CI
lines(prd0$upr) # upper CI
points( c1$angc ~ c1$ext, pch = 1 ) # circles
abline(lm(c1$angc ~ c1$ext))
lines(prd1$lwr, type = 'l', lty = 3 )
lines(prd1$upr, type = 'l', lty = 3 )
I have two problems:
How can I get the desired regression line for the circles? It should be an almost vertical line (check c1)
I don't get correct confidence intervals
Thank you for your help,
Santi
In ggplot2 you can do this rather efficiently:
ggplot(C, aes(x = ext, y = angc, shape = pch)) + geom_point() +
geom_smooth(method = "lm")
This will create a scatterplot (geom_point()) of angc vs ext, where the shape of the points is based on pch. In addition, a regression line is drawn in the plot for each unique element in pch. The name geom_smooth() comes from the fact that it draws a smoothed version of the data, in this case a linear regression.