r - How to plot alphabets? - r

I want a scatter plot which looks like letters of the alphabet. How can I do this with a program? I can just enter co-ordinates and make the plot look like an 'A' or 'S' or whatever. But can it be done in an easier manner?

The pch argument of plot will take arguments that can be used to represent these values. From ?points, values 32-127 are the ASCII character set.
With a little messing around, values 65:90 correspond to capital letters, and values 97:122 correspond to lower case letters.
To illustrate this, try
plot(1:10, 1:10, type="p", pch=97:107)
for example.
Here is a plot of all of the latin alphabet
# blank canvas
plot(1:30, 1:30, type="n")
# upper case
points(1:26, 1:26, pch=65:90)
# lower case
points(1:26, 4:29, pch=97:107)
You could even build a mapping between these values for easier reference.
myRefUpper <- setNames(65:90, LETTERS)
myRefUpper
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
myRefLower <- setNames(97:107, letters)
myRefLower
a b c d e f g h i j k l m n o p q r s t u v w x y z
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
This way, you could refer to specific letters by name. For example, try
plot(1:10, 1:10, type="p", pch=c(myRefLower[c("a", "t", "q")], myRefUpper[LETTERS[10:16]]))

There is now an R package on GitHub that provide coordinates for the Hershey fonts that Ben Bolker mentioned: hershey.
For example, we can get coordinates for the start and end of each stroke (line) in the letter A, for the Roman Simplex font (a simple font using minimal straight lines to create letters):
library(hershey)
coord <- subset(hershey, font == 'rowmans' & char == 'A')
coord
#> x y left right width stroke idx glyph font ascii char
#> 93723 0 12 -9 9 18 0 1 34 rowmans 65 A
#> 93724 -8 -9 -9 9 18 0 2 34 rowmans 65 A
#> 93725 0 12 -9 9 18 1 3 34 rowmans 65 A
#> 93726 8 -9 -9 9 18 1 4 34 rowmans 65 A
#> 93727 -5 -2 -9 9 18 2 5 34 rowmans 65 A
#> 93728 5 -2 -9 9 18 2 6 34 rowmans 65 A
We can use the base approx function to interpolate between the start and end points for each stroke, then plot the result, using the graphical parameter pty to set a square aspect ratio:
op <- par(pty = "s")
plot(coord[, 1:2], type = "n")
for (i in unique(coord$stroke)){
points(approx(subset(coord, stroke == i)))
}
To reset the default graphical parameters:
par(op)
For ggplot2 you can do the interpolation first as below:
library(dplyr)
library(ggplot2)
coord2 <- coord %>%
group_by(stroke) %>%
do(as_tibble(approx(.)))
ggplot(coord2, aes(x, y, group = stroke)) +
geom_point() +
coord_equal() +
theme_minimal()
Created on 2022-01-05 by the reprex package (v2.0.1)
Edit
approx won't work for letters with strokes that have x values with different y values, e.g. vertical strokes or strokes that bend back on themselves. For this we can define our own linear interpolation function:
interp <- function(coord, eps = 0.5) {
y <- coord$y
x <- coord$x
n <- length(x)
x2 <- (x[-1] - x[-n])/eps
y2 <- (y[-1] - y[-n])/eps
p <- pmax(abs(x2), abs(y2))
id <- sequence(p)
list(x = c(x[1], rep(x[-n], p) + rep((x[-1] - x[-n])/p, p)*id),
y = c(y[1], rep(y[-n], p) + rep((y[-1] - y[-n])/p, p)*id))
}
library(hershey)
coord <- subset(hershey, font == 'rowmans' & char == 'C')
op <- par(pty = "s")
plot(coord[, 1:2], type = "n")
for (i in unique(coord$stroke)){
points(interp(subset(coord, stroke == i)))
}
Created on 2022-01-05 by the reprex package (v2.0.1)

Related

Smoothing Lines in ggplot between all data point

I have a data.frame similar to this example
SqMt <- "Sex Sq..Meters PDXTotalFreqStpy
1 M 129 22
2 M 129 0
3 M 129 1
4 F 129 35
5 F 129 42
6 F 129 5
7 M 557 20
8 M 557 0
9 M 557 15
10 F 557 39
11 F 557 0
12 F 557 0
13 M 1208 33
14 M 1208 26
15 M 1208 3
16 F 1208 7
17 F 1208 0
18 F 1208 8
19 M 604 68
20 M 604 0
21 M 604 0
22 F 604 0
23 F 604 0
24 F 604 0"
Data <- read.table(text=SqMt, header = TRUE)
I want to show the average PDXTotalFreqStpy for each Sq..Meters organized by Sex. This is what I use:
library(ggplot2)
ggplot(Data, aes(x=Sq..Meters, y=PDXTotalFreqStpy)) + stat_summary(fun.y="mean", geom="line", aes(group=Sex,color=Sex))
How do I get these lines smoothed out so that they are not jagged and instead, nice and curvy and go through all the data points? I have seen things on spline, but I have not gotten those to work?
See if this works for you:
library(dplyr)
# increase n if the result is not smooth enough
# (for this example, n = 50 looks sufficient to me)
n = 50
# manipulate data to calculate the mean for each sex at each x-value
# before passing the result to ggplot()
Data %>%
group_by(Sex, x = Sq..Meters) %>%
summarise(y = mean(PDXTotalFreqStpy)) %>%
ungroup() %>%
ggplot(aes(x, y, color = Sex)) +
# optional: show point locations for reference
geom_point() +
# optional: show original lines for reference
geom_line(linetype = "dashed", alpha = 0.5) +
# further data manipulation to calculate values for smoothed spline
geom_line(data = . %>%
group_by(Sex) %>%
summarise(x1 = list(spline(x, y, n)[["x"]]),
y1 = list(spline(x, y, n)[["y"]])) %>%
tidyr::unnest(),
aes(x = x1, y = y1))

R: sampling from a dataset based on a certain distribution centered around points in a different dataset

I am trying to sample rows from a set of points, df_map, in X-Y-Z space according to the distribution of the points on the X-Y plane. The mean and standard deviation of the distribution is in another dataset, df_pts.
My data looks like this
> df_map
X Y Z
A 6 0 103
B -4 2 102
C -2 15 112
D 13 6 105
E 1 -3 117
F 5 16 105
G 10 5 103
H 14 -7 119
I 8 14 107
J -8 -4 100
> df_pts
x y accuracy
a 5 18 -0.8464018
b 3 2 0.5695678
c -18 14 -0.4711559
d 11 13 -0.7306417
e -3 -10 2.1887011
f -9 -11 2.1523923
g 5 1 -0.9612284
h 12 -19 -0.4750582
i -16 20 -1.4554292
j 0 -8 3.4028887
I want to iterate through the rows in df_pts and choose one row from df_map according to Gaussian distribution of distances from the (df_pts[i, x], df_pts[i, y]) with the 2d standard deviation being df_pts[i, accuracy]. In other words, at each i = 1:10, I want to take a sample from df_map according to normal distribution with mean df_pts[i, x]^2 + df_pts[i, y]^2 and 2d sd df_pts[i, accuracy].
I'd appreciate any suggestions for an efficient and sophisticated way of doing this. I'm relatively new to R, and coming from a C background, my way for coding tasks like this involves too many basics loops and calculations at each step using basic operations, which makes the code extremely slow.
I apologize in advance if the question is too trivial or is not well-framed.
Easy-to-use data:
df_map <- data.frame(x = c(6,-4,-2,13,1,5,10,14,8,-8),
y= c(0,2,15,6,-3,16,5,-7,14,-4),
z= c(103,102,112,105,117,105,103,119,107,100))
df_pts <- data.frame(x = c(5,3,-18,11,-3,-9,5,12,-16,0),
y= c(18,2,14,13,-10,-11,1,-19,20,-8),
accuracy = c(-0.8464018, 0.5695678,-0.4711559,-0.7306417, 2.1887011, 2.1523923,-0.9612284,-0.4750582,-1.4554292,3.4028887))
What I think you are looking for is a nearest neighbour search. I have struggled A LOT with this in the past but here is the code I came up with:
library("FNN")
findNeighbour <- function(index){
first = df_pts[index,1:2]
hit = get.knnx(df_map[c("x","y")], first, k =1 )
hit_index = hit[[1]]
hit_result = df_map[hit_index,]
result = append(df_pts[index,], hit_result)
}
t <- do.call(rbind, lapply(1:nrow(df_map),findNeighbour))
which results in:
x y accuracy x.1 y.1 z
1 5 18 -0.8464018 5 16 105
2 3 2 0.5695678 6 0 103
3 -18 14 -0.4711559 -2 15 112
4 11 13 -0.7306417 8 14 107
5 -3 -10 2.1887011 -8 -4 100
6 -9 -11 2.1523923 -8 -4 100
7 5 1 -0.9612284 6 0 103
8 12 -19 -0.4750582 14 -7 119
9 -16 20 -1.4554292 -2 15 112
10 0 -8 3.4028887 1 -3 117
As you can see some data is matched multiple times in this example, so depending on your goal you might want to throw these out or do a bidirectional search.
I hope this is what you are looking for
Thank you for the suggestion.
I ended up doing the following
df_map <- data.frame(X = c(6,-4,-2,13,1,5,10,14,8,-8),
Y= c(0,2,15,6,-3,16,5,-7,14,-4),
Z= c(103,102,112,105,117,105,103,119,107,100))
df_pts <- data.frame(x = c(5,3,-18,11,-3,-9,5,12,-16,0),
y= c(18,2,14,13,-10,-11,1,-19,20,-8),
accuracy = c(-0.8464018, 0.5695678,-0.4711559,-0.7306417, 2.1887011, 2.1523923,-0.9612284,-0.4750582,-1.4554292,3.4028887))
map.point2map <- function(map_in, pt_in) {
dists <- dist(rbind(cbind(x = pt_in['x'],
y = pt_in['y']),
cbind(x = map_in$X,
y = map_in$Y)))[1:dim(map_in)[1]]
mu <- mean(dists)
stddev <- abs(as.numeric(pt_in['accuracy']))
return(sample_n(tbl = map_in[, c('X', 'Y')],
size = 1,
replace = TRUE,
weight = dnorm(dists, mean = mu, sd = stddev)))
}
mapped <- apply(df_pts,
1,
function(x) map.point2map(map_in = df_map,
pt_in = x))
and mapped is a list of 10 points sampled from df_map as desired.

Points in a scatterplot with individual ellipses using ggplot2 in R

My dataset is formed by 4 columns, as shown below:
The two columns on the left represent the coordinates XY of a geographical structure, and the two on the left represent the size of "each" geographical unit (diameters North-South and East-West)
I would like to graphically represent a scatterplot where to plot all the coordinates and draw over each point an ellipse including the diameters of each geographical unit.
Manually, and using only two points, the image should be like this one:
How can I do it using ggplot2?
You can download the data here
Use geom_ellipse() from ggforce:
library(ggplot2)
library(ggforce)
d <- data.frame(
x = c(10, 20),
y = c(10, 20),
ns = c(5, 8),
ew = c(4, 4)
)
ggplot(d, aes(x0 = x, y0 = y, a = ew/2, b = ns/2, angle = 0)) +
geom_ellipse() +
coord_fixed()
Created on 2019-06-01 by the reprex package (v0.2.1)
I'm not adding any new code to what Claus Wilke already posted above. All credit should go to Claus. I'm simply testing it with the actual data, and showing OP how to post data,
Loading packages needed
# install.packages(c("tidyverse"), dependencies = TRUE)
library(tidyverse)
Reading data,
tbl <- read.table(
text = "
X Y Diameter_N_S Diameter_E_W
-4275 1145 77 96
-4855 1330 30 25
-4850 1612 45 90
-4990 1410 15 15
-5055 1230 60 50
-5065 1503 43 45
-5135 1305 40 50
-5505 1190 55 70
-5705 1430 90 40
-5645 1535 52 60
", header = TRUE, stringsAsFactors = FALSE) %>% as_tibble()
showing data,
tbl
#> # A tibble: 10 x 4
#> X Y Diameter_N_S Diameter_E_W
#> <int> <int> <int> <int>
#> 1 -4275 1145 77 96
#> 2 -4855 1330 30 25
#> 3 -4850 1612 45 90
#> 4 -4990 1410 15 15
#> 5 -5055 1230 60 50
#> 6 -5065 1503 43 45
#> 7 -5135 1305 40 50
#> 8 -5505 1190 55 70
#> 9 -5705 1430 90 40
#> 10 -5645 1535 52 60
loading more packages needed
library(ggforce) # devtools::install_github("thomasp85/ggforce")
executing
ggplot(tbl, aes(x0 = X, y0 = Y, a = Diameter_E_W, b = Diameter_N_S, angle = 0)) +
geom_ellipsis() + geom_point(aes(X, Y), size = .5) + coord_fixed() + theme_bw()

R: applying a function on whole dataset to find points within a circle

I have a difficulty with application of the data frame on my function in R. I have a data.frame with three columns ID of a point, its location on x axis and its location on y axis. All I need to do is to find for a given point IDs of points that lies in its neighborhood. I've made the function that shows whether the point lies within a circle where the center is a location of observed point and returns it's ID if true.
Here is my code:
point_id <- locationdata$point_id
x_loc <- locationdata$x_loc
y_loc <- locationdata$y_loc
locdata <- data.frame(point_id, x_loc, y_loc)
#radius set to1km
incircle3 <- function(x_loc, y_loc, center_x, center_y, pointid, r = 1000000){
dx = (x_loc-center_x)
dy = (y_loc-center_y)
if (b <- dx^2 + dy^2 <= r^2){
print(shopid)} ##else {print('')}
}
Unfortunately I don't know how to apply this function on the whole data frame. So once I enter the locations of the observed point it would return me IDs of all points that lies in the neighborhood. Ideally I would need to find this relation for all the points automatically. So it would return me the points that lies in the neighborhood of each point from the dataset. Previously I have been inserting the center_x and center_y manually.
Thank you very much for your advices in advance!
You can tackle this with R's dist function:
# set the random seed and create some dummy data
set.seed(101)
dummy <- data.frame(id=1:100, x=runif(100), y=runif(100))
> head(dummy)
id x y
1 1 0.37219838 0.12501937
2 2 0.04382482 0.02332669
3 3 0.70968402 0.39186128
4 4 0.65769040 0.85959857
5 5 0.24985572 0.71833452
6 6 0.30005483 0.33939503
Call the dist function which returns a dist object. The default distance metric is Euclidean which is what you have coded in your question.
dists <- dist(dummy[,2:3])
Loop over the distance matrix and return the indices for each id that are within some constant distance:
neighbors <- apply(as.matrix(dists), 1, function(x) which(x < 0.33))
> neighbors[[1]]
1 6 7 8 19 23 30 32 33 34 42 44 46 51 55 87 88 91 94 99
Here's a modification to handle volatile ids:
set.seed(101)
dummy <- data.frame(id=sample(1:100, 100), x=runif(100), y=runif(100))
> head(dummy)
id x y
1 38 0.12501937 0.60567568
2 5 0.02332669 0.56259740
3 70 0.39186128 0.27685556
4 64 0.85959857 0.22614243
5 24 0.71833452 0.98355758
6 29 0.33939503 0.09838715
dists <- dist(dummy[,2:3])
neighbors <- apply(as.matrix(dists), 1, function(x) {
dummy$id[which(x < 0.33)]
})
names(neighbors) <- dummy$id
> neighbors[['38']]
[1] 38 5 55 80 63 76 17 71 47 11 88 13 41 21 36 31 73 61 99 59 39 89 94 12 18 3

plot plate layout heatmap in r

I am trying to plot a plate layout heatmap in R. The plate layout is simply 8 (row) x 12 (column) circles (wells). Rows are labeled by alphabets and columns by numbers. Each well need to be filled with some color intensity depends upon a qualitative or quantitative variable. The plate layout look like this:
Here is small dataset:
set.seed (123)
platelay <- data.frame (rown = rep (letters[1:8], 12), coln = rep (1:12, each = 8),
colorvar = rnorm (96, 0.3, 0.2))
rown coln colorvar
1 a 1 0.187904871
2 b 1 0.253964502
3 c 1 0.611741663
4 d 1 0.314101678
5 e 1 0.325857547
6 f 1 0.643012997
7 g 1 0.392183241
8 h 1 0.046987753
9 a 2 0.162629430
10 b 2 0.210867606
11 c 2 0.544816359
12 d 2 0.371962765
13 e 2 0.380154290
14 f 2 0.322136543
15 g 2 0.188831773
16 h 2 0.657382627
17 a 3 0.399570096
18 b 3 -0.093323431
19 c 3 0.440271180
20 d 3 0.205441718
21 e 3 0.086435259
22 f 3 0.256405017
23 g 3 0.094799110
24 h 3 0.154221754
25 a 4 0.174992146
26 b 4 -0.037338662
27 c 4 0.467557409
28 d 4 0.330674624
29 e 4 0.072372613
30 f 4 0.550762984
31 g 4 0.385292844
32 h 4 0.240985703
33 a 5 0.479025132
34 b 5 0.475626698
35 c 5 0.464316216
36 d 5 0.437728051
37 e 5 0.410783531
38 f 5 0.287617658
39 g 5 0.238807467
40 h 5 0.223905800
41 a 6 0.161058604
42 b 6 0.258416544
43 c 6 0.046920730
44 d 6 0.733791193
45 e 6 0.541592400
46 f 6 0.075378283
47 g 6 0.219423033
48 h 6 0.206668929
49 a 7 0.455993024
50 b 7 0.283326187
51 c 7 0.350663703
52 d 7 0.294290649
53 e 7 0.291425909
54 f 7 0.573720457
55 g 7 0.254845803
56 h 7 0.603294121
57 a 8 -0.009750561
58 b 8 0.416922750
59 c 8 0.324770849
60 d 8 0.343188314
61 e 8 0.375927897
62 f 8 0.199535309
63 g 8 0.233358523
64 h 8 0.096284923
65 a 9 0.085641755
66 b 9 0.360705728
67 c 9 0.389641956
68 d 9 0.310600845
69 e 9 0.484453494
70 f 9 0.710016937
71 g 9 0.201793767
72 h 9 -0.161833775
73 a 10 0.501147705
74 b 10 0.158159847
75 c 10 0.162398277
76 d 10 0.505114274
77 e 10 0.243045399
78 f 10 0.055856458
79 g 10 0.336260696
80 h 10 0.272221728
81 a 11 0.301152837
82 b 11 0.377056080
83 c 11 0.225867994
84 d 11 0.428875310
85 e 11 0.255902688
86 f 11 0.366356393
87 g 11 0.519367803
88 h 11 0.387036298
89 a 12 0.234813683
90 b 12 0.529761524
91 c 12 0.498700771
92 d 12 0.409679392
93 e 12 0.347746347
94 f 12 0.174418785
95 g 12 0.572130490
96 h 12 0.179948083
Is there is package that can readily do it ? Is it possible write a function in base or ggplot2 or other package that can achieve this target.
Changing the colour of points of sufficient size, with ggplot2. Note I've implemeted #TylerRinkler's suggestion, but within the call to ggplot. I've also removed the axis labels
ggplot(platelay, aes(y = factor(rown, rev(levels(rown))),x = factor(coln))) +
geom_point(aes(colour = colorvar), size =18) +theme_bw() +
labs(x=NULL, y = NULL)
And a base graphics approach, which will let you have the x axis above the plot
# plot with grey colour dictated by rank, no axes or labels
with(platelay, plot( x=as.numeric(coln), y= rev(as.numeric(rown)), pch= 19, cex = 2,
col = grey(rank(platelay[['colorvar']] ) / nrow(platelay)), axes = F, xlab= '', ylab = ''))
# add circular outline
with(platelay, points( x=as.numeric(coln), y= rev(as.numeric(rown)), pch= 21, cex = 2))
# add the axes
axis(3, at =1:12, labels = 1:12)
axis(2, at = 1:8, labels = LETTERS[8:1])
# the background grid
grid()
# and a box around the outside
box()
And for giggles and Christmas cheer, here is a version using base R plotting functions.
Though there is very possibly a better solution.
dev.new(width=6,height=4)
rown <- unique(platelay$rown)
coln <- unique(platelay$coln)
plot(NA,ylim=c(0.5,length(rown)+0.5),xlim=c(0.5,length(coln)+0.5),ann=FALSE,axes=FALSE)
box()
axis(2,at=seq_along(rown),labels=rev(rown),las=2)
axis(3,at=seq_along(coln),labels=coln)
colgrp <- findInterval(platelay$colorvar,seq(min(platelay$colorvar),max(platelay$colorvar),length.out=10))
colfunc <- colorRampPalette(c("green", "blue"))
collist <- colfunc(length(unique(colgrp)))
symbols(platelay$coln,
factor(platelay$rown, rev(levels(platelay$rown))),
circles=rep(0.2,nrow(platelay)),
add=TRUE,
inches=FALSE,
bg=collist[colgrp])
And the resulting image:
here a solution using ggplot2 solution of #mnel and grid solution
here the code of given solution
d <- ggplot(platelay, aes(y=rown,x=factor(coln))) +
geom_point(aes(colour = colorvar), size =18) + theme_bw()
I use the data generated by ggplot
data <- ggplot_build(d)$data[[1]]
x <- data$x
y <- data$y
grid.newpage()
pushViewport(plotViewport(c(4, 4, 2, 2)),
dataViewport(x, y))
grid hase an ellipse geom
grid.ellipse(x, y,size=20, ar = 2,angle=0,gp =gpar(fill=data$colour))
grid.xaxis(at=c(labels=1:12,ticks=NA),gp=gpar(cex=2))
grid.yaxis(at = 1:8,label=rev(LETTERS[1:8]),gp=gpar(cex=2))
grid.roundrect(gp=gpar(fill=NA))
I add grid :
gpgrid <- gpar(col='grey',lty=2,col='white')
grid.segments(unit(1:12, "native") ,unit(0, "npc"), unit(1:12, "native"),unit(1, "npc"),gp=gpgrid)
grid.segments(unit(0, "npc"), unit(1:8, "native"), unit(1, "npc"),unit(1:8, "native"),gp=gpgrid)
upViewport()
This answer is an add on for #thelatemail answer which explains the platemap for (8,12) = 96 format.
To construct (32,48) = 1536 format, single digits of A-Z is insufficent. Hence one needs to expand letters such as AA, AB, AC, AD ... ZZ and it can be expanded to three or more digits by concatenating LETTERS to the levels variable as below.
levels = c(LETTERS, c(t(outer(LETTERS, LETTERS, paste, sep = "")))))
#thelatemail answer can be improved for letters in double digits for 1536 plate format as below
rown = rep (c(LETTERS, c(t(outer(LETTERS[1], LETTERS[1:6], paste, sep = "")))),
symbols(platelay$coln,
factor(platelay$rown,
levels = rev(c(LETTERS, c(t(outer(LETTERS[1], LETTERS[1:6], paste, sep = "")))))),
circles=rep(0.45,nrow(platelay)),
add=TRUE,
inches=FALSE,
bg=collist[colgrp])
The levels variable inside symbols function should have characters with alphabetically sorted single, then double, then triple ... and so on digits.
For example, if you have below incorrect order of levels inside the symbols function, then it will plot with incorrect color representation.
Incorrect order:
A, AA, AB, AC, AD, AE, AF, B, C,D, ...Z
Correct order:
A, B, C, D, E, .....Z, AA, AB, AC, AD, AE, AF

Resources