Smoothing Lines in ggplot between all data point - r

I have a data.frame similar to this example
SqMt <- "Sex Sq..Meters PDXTotalFreqStpy
1 M 129 22
2 M 129 0
3 M 129 1
4 F 129 35
5 F 129 42
6 F 129 5
7 M 557 20
8 M 557 0
9 M 557 15
10 F 557 39
11 F 557 0
12 F 557 0
13 M 1208 33
14 M 1208 26
15 M 1208 3
16 F 1208 7
17 F 1208 0
18 F 1208 8
19 M 604 68
20 M 604 0
21 M 604 0
22 F 604 0
23 F 604 0
24 F 604 0"
Data <- read.table(text=SqMt, header = TRUE)
I want to show the average PDXTotalFreqStpy for each Sq..Meters organized by Sex. This is what I use:
library(ggplot2)
ggplot(Data, aes(x=Sq..Meters, y=PDXTotalFreqStpy)) + stat_summary(fun.y="mean", geom="line", aes(group=Sex,color=Sex))
How do I get these lines smoothed out so that they are not jagged and instead, nice and curvy and go through all the data points? I have seen things on spline, but I have not gotten those to work?

See if this works for you:
library(dplyr)
# increase n if the result is not smooth enough
# (for this example, n = 50 looks sufficient to me)
n = 50
# manipulate data to calculate the mean for each sex at each x-value
# before passing the result to ggplot()
Data %>%
group_by(Sex, x = Sq..Meters) %>%
summarise(y = mean(PDXTotalFreqStpy)) %>%
ungroup() %>%
ggplot(aes(x, y, color = Sex)) +
# optional: show point locations for reference
geom_point() +
# optional: show original lines for reference
geom_line(linetype = "dashed", alpha = 0.5) +
# further data manipulation to calculate values for smoothed spline
geom_line(data = . %>%
group_by(Sex) %>%
summarise(x1 = list(spline(x, y, n)[["x"]]),
y1 = list(spline(x, y, n)[["y"]])) %>%
tidyr::unnest(),
aes(x = x1, y = y1))

Related

Circular line graph with groups

I have four dataframes that look like below:
X score.i score.ii score.iii mm
1: 1 -0.3958555 -0.3750726 -0.3378881 10
2: 2 -0.3954955 -0.3799290 -0.3400876 15
3: 3 -0.3962514 -0.3776692 -0.3401180 20
4: 4 -0.4033265 -0.3764099 -0.3436115 25
5: 5 -0.4035860 -0.3753792 -0.3426287 30
---
186: 186 -0.4041035 -0.3767158 -0.3419871 80
187: 187 -0.4040643 -0.3767881 -0.3417620 85
188: 188 -0.4052228 -0.3766468 -0.3436883 90
189: 189 -0.4047009 -0.3767359 -0.3431591 95
190: 190 -0.4061497 -0.3766785 -0.3433624 100
How can I plot a circular line graph with aes(x=mm, y=score.i) for these four such that there is a gap between the lines for each dataframe?
library(ggplot2)
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(-c(X, mm), names_to = "Variable", values_to = "Score") %>%
ggplot(., aes(x = mm, y = Score, color = Variable)) +
geom_line() +
coord_polar()
Data:
read.table(text =
"X score.i score.ii score.iii mm
1 -0.3958555 -0.3750726 -0.3378881 10
2 -0.3954955 -0.3799290 -0.3400876 15
3 -0.3962514 -0.3776692 -0.3401180 20
4 -0.4033265 -0.3764099 -0.3436115 25
5 -0.4035860 -0.3753792 -0.3426287 30
186 -0.4041035 -0.3767158 -0.3419871 80
187 -0.4040643 -0.3767881 -0.3417620 85
188 -0.4052228 -0.3766468 -0.3436883 90
189 -0.4047009 -0.3767359 -0.3431591 95
190 -0.4061497 -0.3766785 -0.3433624 100",
header = T, stringsAsFactors = F) -> df1

how to add regression lines for each factor on a plot

I've created a model and I'm trying to add curves that fit the two parts of the data, insulation and no insulation. I was thinking about using the insulation coefficient as a true/false term, but I'm not sure how to translate that into code. Entries 1:56 are "w/o" and 57:101 are "w/". I'm not sure how to include the data I'm using but here's the head and tail:
month year kwh days est cost avgT dT.yr kWhd.1 id insulation
1 8 2003 476 21 a 33.32 69 -8 22.66667 1 w/o
2 9 2003 1052 30 e 112.33 73 -1 35.05172 2 w/o
3 10 2003 981 28 a 24.98 60 -6 35.05172 3 w/o
4 11 2003 1094 32 a 73.51 53 2 34.18750 4 w/o
5 12 2003 1409 32 a 93.23 44 6 44.03125 5 w/o
6 1 2004 1083 32 a 72.84 34 3 33.84375 6 w/o
month year kwh days est cost avgT dT.yr kWhd.1 id insulation
96 7 2011 551 29 e 55.56 72 0 19.00000 96 w/
97 8 2011 552 27 a 61.17 78 1 20.44444 97 w/
98 9 2011 666 34 e 73.87 71 -2 19.58824 98 w/
99 10 2011 416 27 a 48.03 64 0 15.40741 99 w/
100 11 2011 653 31 e 72.80 53 1 21.06452 100 w/
101 12 2011 751 33 a 83.94 45 2 22.75758 101 w/
bill$id <- seq(1:101)
bill$insulation <- as.factor(ifelse(bill$id > 56, c("w/"), c("w/o")))
m1 <- lm(kWhd.1 ~ avgT + insulation + I(avgT^2), data=bill)
with(bill, plot(kWhd.1 ~ avgT, xlab="Average Temperature (F)",
ylab="Daily Energy Use (kWh/d)", col=insulation))
no_ins <- data.frame(bill$avgT[1:56], bill$insulation[1:56])
curve(predict(m1, no_ins=x), add=TRUE, col="red")
ins <- data.frame(bill$avgT[57:101], bill$insulation[57:101])
curve(predict(m1, ins=x), add=TRUE, lty=2)
legend("topright", inset=0.01, pch=21, col=c("red", "black"),
legend=c("No Insulation", "Insulation"))
ggplot2 makes this a lot easier than base plotting. Something like this should work:
ggplot(bill, aes(x = avgT, y = kWhd.1, color = insulation)) +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = FALSE) +
geom_point()
In base, I'd create a data frame with point you want to predict on, something like
pred_data = expand.grid(
kWhd.1 = seq(min(bill$kWhd.1), max(bill$kWhd.1), length.out = 100),
insulation = c("w/", "w/o")
)
pred_data$prediction = predict(m1, newdata = pred_data)
And then use lines to add the predictions to your plot. My base graphics is pretty rusty, so I'll leave that to you (or another answerer) if you want it.
In base R it's important to order the x-values. Since this is to be done on multiple factors, we can do this with by, resulting in a list L.
Since your example data is not complete, here's an example with iris where we consider Species as the "factor".
L <- by(iris, iris$Species, function(x) x[order(x$Petal.Length), ])
Now we can do the plot and add loess predictions as lines with a sapply.
with(iris, plot(Sepal.Width ~ Petal.Length, col=Species))
sapply(seq(L), function(x)
lines(L[[x]]$Petal.Length,
predict(loess(Sepal.Width ~ Petal.Length, L[[x]], span=1.1)), # span=1.1 for smoothing
col=x))
Yields

R find number of rows in a group and plot

I have a table of Tennis matches. I want to group by winner_ids and plot them against height, basically to check if the taller players have won more matches.
The data looks like this.
m_id winner_id winner_height
1 21 166
2 21 166
3 22 167
4 21 166
5 23 170
6 24 163
7 22 167
8 25 164
Here m_id is the match_id. I want to plot number of matches a person has won against his height
example: 21 has won 3 matches and her height is 166 cm
how can I acheive this in ggplot?
my following code doesn't seem to be working
matches %>% group_by(winner_id) %>%
ggplot(., aes(x = winner_ht, y = nrow((winner_id)))) + geom_point()
Can anyone help?
Do you mean something like this?
library(tidyverse)
df %>%
group_by(winner_id, winner_height) %>%
summarise(n = n()) %>%
ggplot(aes(winner_height, n, label = winner_id)) +
geom_point() +
geom_text(position = position_nudge(y = -0.1))
Explanation: We count the number of games n per winner_id and winner_height and pass the summarised data to ggplot where we plot winner_height vs. n. We can also add labels to indicate the winner_id.
Sample data
df <- read.table(text =
"m_id winner_id winner_height
1 21 166
2 21 166
3 22 167
4 21 166
5 23 170
6 24 163
7 22 167
8 25 164", header = T)

Points in a scatterplot with individual ellipses using ggplot2 in R

My dataset is formed by 4 columns, as shown below:
The two columns on the left represent the coordinates XY of a geographical structure, and the two on the left represent the size of "each" geographical unit (diameters North-South and East-West)
I would like to graphically represent a scatterplot where to plot all the coordinates and draw over each point an ellipse including the diameters of each geographical unit.
Manually, and using only two points, the image should be like this one:
How can I do it using ggplot2?
You can download the data here
Use geom_ellipse() from ggforce:
library(ggplot2)
library(ggforce)
d <- data.frame(
x = c(10, 20),
y = c(10, 20),
ns = c(5, 8),
ew = c(4, 4)
)
ggplot(d, aes(x0 = x, y0 = y, a = ew/2, b = ns/2, angle = 0)) +
geom_ellipse() +
coord_fixed()
Created on 2019-06-01 by the reprex package (v0.2.1)
I'm not adding any new code to what Claus Wilke already posted above. All credit should go to Claus. I'm simply testing it with the actual data, and showing OP how to post data,
Loading packages needed
# install.packages(c("tidyverse"), dependencies = TRUE)
library(tidyverse)
Reading data,
tbl <- read.table(
text = "
X Y Diameter_N_S Diameter_E_W
-4275 1145 77 96
-4855 1330 30 25
-4850 1612 45 90
-4990 1410 15 15
-5055 1230 60 50
-5065 1503 43 45
-5135 1305 40 50
-5505 1190 55 70
-5705 1430 90 40
-5645 1535 52 60
", header = TRUE, stringsAsFactors = FALSE) %>% as_tibble()
showing data,
tbl
#> # A tibble: 10 x 4
#> X Y Diameter_N_S Diameter_E_W
#> <int> <int> <int> <int>
#> 1 -4275 1145 77 96
#> 2 -4855 1330 30 25
#> 3 -4850 1612 45 90
#> 4 -4990 1410 15 15
#> 5 -5055 1230 60 50
#> 6 -5065 1503 43 45
#> 7 -5135 1305 40 50
#> 8 -5505 1190 55 70
#> 9 -5705 1430 90 40
#> 10 -5645 1535 52 60
loading more packages needed
library(ggforce) # devtools::install_github("thomasp85/ggforce")
executing
ggplot(tbl, aes(x0 = X, y0 = Y, a = Diameter_E_W, b = Diameter_N_S, angle = 0)) +
geom_ellipsis() + geom_point(aes(X, Y), size = .5) + coord_fixed() + theme_bw()

r - How to plot alphabets?

I want a scatter plot which looks like letters of the alphabet. How can I do this with a program? I can just enter co-ordinates and make the plot look like an 'A' or 'S' or whatever. But can it be done in an easier manner?
The pch argument of plot will take arguments that can be used to represent these values. From ?points, values 32-127 are the ASCII character set.
With a little messing around, values 65:90 correspond to capital letters, and values 97:122 correspond to lower case letters.
To illustrate this, try
plot(1:10, 1:10, type="p", pch=97:107)
for example.
Here is a plot of all of the latin alphabet
# blank canvas
plot(1:30, 1:30, type="n")
# upper case
points(1:26, 1:26, pch=65:90)
# lower case
points(1:26, 4:29, pch=97:107)
You could even build a mapping between these values for easier reference.
myRefUpper <- setNames(65:90, LETTERS)
myRefUpper
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
myRefLower <- setNames(97:107, letters)
myRefLower
a b c d e f g h i j k l m n o p q r s t u v w x y z
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
This way, you could refer to specific letters by name. For example, try
plot(1:10, 1:10, type="p", pch=c(myRefLower[c("a", "t", "q")], myRefUpper[LETTERS[10:16]]))
There is now an R package on GitHub that provide coordinates for the Hershey fonts that Ben Bolker mentioned: hershey.
For example, we can get coordinates for the start and end of each stroke (line) in the letter A, for the Roman Simplex font (a simple font using minimal straight lines to create letters):
library(hershey)
coord <- subset(hershey, font == 'rowmans' & char == 'A')
coord
#> x y left right width stroke idx glyph font ascii char
#> 93723 0 12 -9 9 18 0 1 34 rowmans 65 A
#> 93724 -8 -9 -9 9 18 0 2 34 rowmans 65 A
#> 93725 0 12 -9 9 18 1 3 34 rowmans 65 A
#> 93726 8 -9 -9 9 18 1 4 34 rowmans 65 A
#> 93727 -5 -2 -9 9 18 2 5 34 rowmans 65 A
#> 93728 5 -2 -9 9 18 2 6 34 rowmans 65 A
We can use the base approx function to interpolate between the start and end points for each stroke, then plot the result, using the graphical parameter pty to set a square aspect ratio:
op <- par(pty = "s")
plot(coord[, 1:2], type = "n")
for (i in unique(coord$stroke)){
points(approx(subset(coord, stroke == i)))
}
To reset the default graphical parameters:
par(op)
For ggplot2 you can do the interpolation first as below:
library(dplyr)
library(ggplot2)
coord2 <- coord %>%
group_by(stroke) %>%
do(as_tibble(approx(.)))
ggplot(coord2, aes(x, y, group = stroke)) +
geom_point() +
coord_equal() +
theme_minimal()
Created on 2022-01-05 by the reprex package (v2.0.1)
Edit
approx won't work for letters with strokes that have x values with different y values, e.g. vertical strokes or strokes that bend back on themselves. For this we can define our own linear interpolation function:
interp <- function(coord, eps = 0.5) {
y <- coord$y
x <- coord$x
n <- length(x)
x2 <- (x[-1] - x[-n])/eps
y2 <- (y[-1] - y[-n])/eps
p <- pmax(abs(x2), abs(y2))
id <- sequence(p)
list(x = c(x[1], rep(x[-n], p) + rep((x[-1] - x[-n])/p, p)*id),
y = c(y[1], rep(y[-n], p) + rep((y[-1] - y[-n])/p, p)*id))
}
library(hershey)
coord <- subset(hershey, font == 'rowmans' & char == 'C')
op <- par(pty = "s")
plot(coord[, 1:2], type = "n")
for (i in unique(coord$stroke)){
points(interp(subset(coord, stroke == i)))
}
Created on 2022-01-05 by the reprex package (v2.0.1)

Resources