Repeating X axis labels and legend labels in ggplot2 - r

I'm not sure why I am having such a problem with my x-scale labels repeating as opposed to just labeling where there is a measured point. Additionally, my labels for my legend are not working.
FamIncome Ethnicity mean.bmi
1 1 1 28.54250
2 1 2 26.66300
3 1 3 26.62105
4 1 4 29.51396
5 1 5 25.66722
6 2 1 29.62404
7 2 2 28.08393
8 2 3 28.62215
9 2 4 28.97561
10 2 5 25.57714
11 3 1 29.52630
12 3 2 28.27235
13 3 3 29.67060
14 3 4 31.36768
15 3 5 26.13361
16 4 1 30.83368
17 4 2 30.80814
18 4 3 29.29594
19 4 4 29.18521
20 4 5 24.80550
21 5 1 29.76500
22 5 2 29.24404
23 5 3 28.89435
24 5 4 31.48172
25 5 5 28.02522
26 6 1 30.05087
27 6 2 29.88574
28 6 3 29.53793
29 6 4 30.97993
30 6 5 25.57857
31 7 1 30.31787
32 7 2 29.28055
33 7 3 28.50421
34 7 4 30.65427
35 7 5 26.66094
36 8 1 29.15000
37 8 2 29.02789
38 8 3 28.36507
39 8 4 33.51915
40 8 5 28.38263
41 9 1 28.17679
42 9 2 28.74731
43 9 3 28.06196
44 9 4 31.38483
45 9 5 26.96000
46 10 1 28.71633
47 10 2 33.44409
48 10 3 30.63048
49 10 4 30.22587
50 10 5 27.36375
51 14 1 30.78161
52 14 2 27.43575
53 14 3 28.96817
54 14 4 32.22378
55 14 5 25.62778
56 15 1 29.15982
57 15 2 27.42672
58 15 3 27.60567
59 15 4 30.05013
60 15 5 26.80271
code below:
a <- ggplot(nh1, aes(x=FamIncome, y=mean.bmi)) + geom_line(aes(group=Ethnicity, colour = Ethnicity)) + geom_point()
a = a + labs(list(title="Average BMI versus Family Income", x = "Family Income", y = "Average BMI"))
a = a + scale_x_discrete(breaks=c("1","2","3","4","5","6","7","8","9","10","14","15"),
labels = c("0-4,999", "5K-9,999", "10K-14,999", "15K-19,999", "20K-24,999", "25K-34,999", "35K-44,999", "45K-54,999", "55K-64,999", "65K-74,999", "75K-100K", "Over 100K"))
a = a + theme(axis.text.x=element_text(angle=-90))
a = a + scale_colour_continuous(name = "Ethnicity",
breaks=c("5","4","3","2","1"),
labels=c("Other Race/Multi", "Black","White","Other Hispanic", "Mexican-American"))
a
I cannot post a picture of the image that I'm getting until I get 2 more "reputation" points

Try converting your x variable to a factor:
a <- ggplot(nh1, aes(x=factor(FamIncome), y=mean.bmi)) + geom_line(aes(group=Ethnicity, colour = factor(Ethnicity)))
a = a + labs(list(title="Average BMI versus Family Income", x = "Family Income", y = "Average BMI"))
a = a + scale_x_discrete("Family Income", labels = c("0-4,999", "5K-9,999", "10K-14,999", "15K-19,999", "20K-24,999", "25K-34,999", "35K-44,999", "45K-54,999", "55K-64,999", "65K-74,999", "75K-100K", "Over 100K"))
a = a + opts(axis.text.x=theme_text(angle=-90))
a = a + scale_colour_discrete(name = "Ethnicity",
breaks=c("5","4","3","2","1"),
labels=c("Other Race/Multi", "Black","White","Other Hispanic", "Mexican-American"))
With a numeric x variable, ggplot is treating it as a numeric scale, when you really intended it to be categorical. Also note the confusing between fill and colour. fill is for two dimensional filled regions.

Related

identify sequences of approximately equivalent values in a series using R

I have a series of values that includes strings of values that are close to each other, for example the sequences below. Note that roughly around the places I have categorized the values in V1 with distinct values in V2, the range of the values changes. That is, all the values called 1 in V2 are within 20 points of each other. All the values marked 2 in V2 are within 20 points of each other. All the values marked 3 are within 20 points of each other, etc. Notice that the values are not identical (they are all different). But instead, they cluster around a common value.
I identified these clusters manually. How could I automate it?
V1 V2
1 399.710 1
2 403.075 1
3 405.766 1
4 407.112 1
5 408.458 1
6 409.131 1
7 410.477 1
8 411.150 1
9 412.495 1
10 332.419 2
11 330.400 2
12 329.054 2
13 327.708 2
14 326.363 2
15 325.017 2
16 322.998 2
17 319.633 2
18 314.923 2
19 288.680 3
20 285.315 3
21 283.969 3
22 281.950 3
23 279.932 3
24 276.567 3
25 273.875 3
26 272.530 3
27 271.857 3
28 272.530 3
29 273.875 3
30 274.548 3
31 275.894 3
32 275.894 3
33 276.567 3
34 277.240 3
35 278.586 3
36 279.932 3
37 281.950 3
38 284.642 3
39 288.007 3
40 291.371 3
41 294.063 4
42 295.409 4
43 296.754 4
44 297.427 4
45 298.100 4
46 299.446 4
47 300.792 4
48 303.484 4
49 306.848 4
50 327.708 5
51 309.540 6
52 310.213 6
53 309.540 6
54 306.848 6
55 304.156 6
56 302.811 6
57 302.811 6
58 304.156 6
59 305.502 6
60 306.175 6
61 306.175 6
62 304.829 6
I haven't tried anything yet, I don't know how to do this.
Using dist and hclust with cutree to detect clusters, but with unique levels at the breaks.
hc <- hclust(dist(x))
cl <- cutree(hc, k=6)
data.frame(x, seq=cumsum(c(0, diff(cl)) != 0) + 1)
# x seq
# 1 399.710 1
# 2 403.075 1
# 3 405.766 1
# 4 407.112 1
# 5 408.458 1
# 6 409.131 1
# 7 410.477 1
# 8 411.150 1
# 9 412.495 1
# 10 332.419 2
# 11 330.400 2
# 12 329.054 2
# 13 327.708 2
# 14 326.363 2
# 15 325.017 2
# 16 322.998 2
# 17 319.633 3
# 18 314.923 3
# 19 288.680 4
# 20 285.315 4
# 21 283.969 4
# 22 281.950 4
# 23 279.932 4
# 24 276.567 5
# 25 273.875 5
# 26 272.530 5
# 27 271.857 5
# 28 272.530 5
# 29 273.875 5
# 30 274.548 5
# 31 275.894 5
# 32 275.894 5
# 33 276.567 5
# 34 277.240 5
# 35 278.586 6
# 36 279.932 6
# 37 281.950 6
# 38 284.642 6
# 39 288.007 6
# 40 291.371 6
# 41 294.063 7
# 42 295.409 7
# 43 296.754 7
# 44 297.427 7
# 45 298.100 7
# 46 299.446 7
# 47 300.792 7
# 48 303.484 7
# 49 306.848 7
# 50 327.708 8
# 51 309.540 9
# 52 310.213 9
# 53 309.540 9
# 54 306.848 9
# 55 304.156 9
# 56 302.811 9
# 57 302.811 9
# 58 304.156 9
# 59 305.502 9
# 60 306.175 9
# 61 306.175 9
# 62 304.829 9
However, the dendrogram suggests rather k=4 clusters instead of 6, but it is arbitrary.
plot(hc)
abline(h=30, lty=2, col=2)
abline(h=18.5, lty=2, col=3)
abline(h=14, lty=2, col=4)
legend('topright', lty=2, col=2:4, legend=paste(c(4, 5, 7), 'cluster'), cex=.8)
Data:
x <- c(399.71, 403.075, 405.766, 407.112, 408.458, 409.131, 410.477,
411.15, 412.495, 332.419, 330.4, 329.054, 327.708, 326.363, 325.017,
322.998, 319.633, 314.923, 288.68, 285.315, 283.969, 281.95,
279.932, 276.567, 273.875, 272.53, 271.857, 272.53, 273.875,
274.548, 275.894, 275.894, 276.567, 277.24, 278.586, 279.932,
281.95, 284.642, 288.007, 291.371, 294.063, 295.409, 296.754,
297.427, 298.1, 299.446, 300.792, 303.484, 306.848, 327.708,
309.54, 310.213, 309.54, 306.848, 304.156, 302.811, 302.811,
304.156, 305.502, 306.175, 306.175, 304.829)
This solution iterates over every value, checks the range of all values in the group up to that point, and starts a new group if the range is greater than a threshold.
maxrange <- 18
grp_start <- 1
grp_num <- 1
V3 <- numeric(length(dat$V1))
for (i in seq_along(dat$V1)) {
grp <- dat$V1[grp_start:i]
if (max(grp) - min(grp) > maxrange) {
grp_num <- grp_num + 1
grp_start <- i
}
V3[[i]] <- grp_num
}
cbind(dat, V3)
V1 V2 V3
1 399.710 1 1
2 403.075 1 1
3 405.766 1 1
4 407.112 1 1
5 408.458 1 1
6 409.131 1 1
7 410.477 1 1
8 411.150 1 1
9 412.495 1 1
10 332.419 2 2
11 330.400 2 2
12 329.054 2 2
13 327.708 2 2
14 326.363 2 2
15 325.017 2 2
16 322.998 2 2
17 319.633 2 2
18 314.923 2 2
19 288.680 3 3
20 285.315 3 3
21 283.969 3 3
22 281.950 3 3
23 279.932 3 3
24 276.567 3 3
25 273.875 3 3
26 272.530 3 3
27 271.857 3 3
28 272.530 3 3
29 273.875 3 3
30 274.548 3 3
31 275.894 3 3
32 275.894 3 3
33 276.567 3 3
34 277.240 3 3
35 278.586 3 3
36 279.932 3 3
37 281.950 3 3
38 284.642 3 3
39 288.007 3 3
40 291.371 3 4
41 294.063 4 4
42 295.409 4 4
43 296.754 4 4
44 297.427 4 4
45 298.100 4 4
46 299.446 4 4
47 300.792 4 4
48 303.484 4 4
49 306.848 4 4
50 327.708 5 5
51 309.540 6 6
52 310.213 6 6
53 309.540 6 6
54 306.848 6 6
55 304.156 6 6
56 302.811 6 6
57 302.811 6 6
58 304.156 6 6
59 305.502 6 6
60 306.175 6 6
61 306.175 6 6
62 304.829 6 6
A threshold of 18 reproduces your groups, except that group 4 starts one row earlier. You could use a higher threshold, but then group 6 would start later than you have it.

Efficiently derive bins based on condition in R

I need to create bins for every completed rotation e.g. 360° and bins will be of varying lengths. I have created a for loop but with 100,000+ rows it is slow. I tried to implement using dplyr and/or other non-loop methods but am unclear where and how to declare the cutoffs. None of the examples I found for either dplyr or cut() seemed to address my problem.
Sample data:
x <- c(seq(90, .5, length.out = 3),
seq(359.5, .2, length.out = 5),
seq(358.9, .8, length.out = 8),
seq(359.2, .3, length.out = 11),
seq(358.3, .1, length.out = 15))
df <- data.frame(x)
df$bin <- NA
df[1,2] <- 1
For loop:
for(i in 2:nrow(df)) {
if(df[i,1] < df[i-1,1]) {
df[i,2] <- df[i-1,2]
} else {
df[i,2] <- df[i-1,2] + 1
}
}
How are the results in df$bin achieved without using a loop?
It looks like you could do:
df$binnew <- cumsum(c(1, diff(df$x) > 0))
Compare:
x bin binnew
1 90.00000 1 1
2 45.25000 1 1
3 0.50000 1 1
4 359.50000 2 2
5 269.67500 2 2
6 179.85000 2 2
7 90.02500 2 2
8 0.20000 2 2
9 358.90000 3 3
10 307.74286 3 3
11 256.58571 3 3
12 205.42857 3 3
13 154.27143 3 3
14 103.11429 3 3
15 51.95714 3 3
16 0.80000 3 3
17 359.20000 4 4
18 323.31000 4 4
19 287.42000 4 4
20 251.53000 4 4
21 215.64000 4 4
22 179.75000 4 4
23 143.86000 4 4
24 107.97000 4 4
25 72.08000 4 4
26 36.19000 4 4
27 0.30000 4 4
28 358.30000 5 5
29 332.71429 5 5
30 307.12857 5 5
31 281.54286 5 5
32 255.95714 5 5
33 230.37143 5 5
34 204.78571 5 5
35 179.20000 5 5
36 153.61429 5 5
37 128.02857 5 5
38 102.44286 5 5
39 76.85714 5 5
40 51.27143 5 5
41 25.68571 5 5
42 0.10000 5 5

How to put axes behind the graph?

I created a graph using geom_line and geom_point via ggplot. I want my axes to meet at (0,0) and I want my lines and data points to be in front of the axes instead of behind as shown:
I've tried:
coord_cartesian(clip = 'off')
putting geom_line and geom_point at the end
creating a base graph then add geom_line and geom_point
playing around with the functions of coord_cartesian
manually setting xlim =c(-0.1, 25) and ylim=c(-0.1, 1500)
data7 is as follows:
Treatment Days N mean sd se
1 1 0 7 204.7000000 41.579963 15.7157488
2 1 2 7 255.0571429 41.116617 15.5406205
3 1 5 7 290.6000000 49.506498 18.7116974
4 1 8 7 330.8142857 49.044144 18.5369442
5 1 12 7 407.5142857 95.584194 36.1274294
6 1 15 7 540.8571429 164.299390 62.0993323
7 1 19 7 737.5285714 308.786359 116.7102736
8 1 21 7 978.4571429 502.506726 189.9296898
9 2 0 7 205.7428571 46.902482 17.7274721
10 2 2 7 227.5571429 47.099889 17.8020846
11 2 5 7 232.4857143 59.642922 22.5429054
12 2 8 7 247.9857143 66.478529 25.1265220
13 2 12 7 272.0428571 79.173162 29.9246423
14 2 15 7 289.1142857 82.847016 31.3132288
15 2 19 7 312.3857143 105.648591 39.9314140
16 2 21 7 334.7142857 121.569341 45.9488920
17 3 0 7 212.2285714 47.549263 17.9719320
18 3 2 7 235.4142857 52.689671 19.9148237
19 3 5 7 177.0714286 54.895225 20.7484447
20 3 8 7 205.2571429 72.611451 27.4445489
21 3 12 7 247.8142857 119.369558 45.1174522
22 3 15 7 280.4285714 140.825847 53.2271669
23 3 19 7 366.9142857 210.573799 79.5894149
24 3 21 7 451.0428571 289.240793 109.3227438
25 4 0 7 211.6857143 24.329161 9.1955587
26 4 2 7 227.8428571 28.762525 10.8712127
27 4 5 7 205.9428571 49.148919 18.5765451
28 4 8 7 153.1142857 25.189246 9.5206399
29 4 12 7 128.2571429 43.145910 16.3076210
30 4 15 7 104.1714286 45.161662 17.0695038
31 4 19 7 85.4714286 51.169708 19.3403318
32 4 21 7 66.9000000 52.724567 19.9280133
33 5 0 7 216.7857143 39.957829 15.1026398
34 5 2 7 212.2000000 27.037135 10.2190765
35 5 5 7 115.5000000 37.094070 14.0202405
36 5 8 7 46.1000000 34.925492 13.2005952
37 5 12 7 29.3142857 24.761222 9.3588621
38 5 15 6 10.0666667 13.441974 5.4876629
39 5 19 6 6.4000000 11.692733 4.7735382
40 5 21 6 5.3666667 12.662017 5.1692467
41 6 0 7 206.6857143 40.359155 15.2543269
42 6 2 7 197.0428571 40.608327 15.3485048
43 6 5 7 106.2142857 58.279654 22.0276388
44 6 8 7 46.0571429 62.373014 23.5747833
45 6 12 7 31.7571429 49.977457 18.8897031
46 6 15 7 28.1142857 45.437995 17.1739480
47 6 19 7 26.2857143 38.414946 14.5194849
48 6 21 7 32.7428571 53.203003 20.1088450
49 7 0 7 193.2000000 37.300447 14.0982437
50 7 2 7 133.2428571 26.462606 10.0019250
51 7 5 7 3.8142857 7.445900 2.8142857
52 7 8 7 0.7142857 1.496026 0.5654449
53 7 12 7 0.0000000 0.000000 0.0000000
54 7 15 7 0.0000000 0.000000 0.0000000
55 7 19 7 0.0000000 0.000000 0.0000000
56 7 21 7 0.0000000 0.000000 0.0000000
My code is as follows:
ggplot(data7, aes(Days, mean, color=Treatment)) +
geom_line() +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=0.5, size= 0.25) +
geom_point(size=2.5) +
scale_colour_hue(limits = c("1", "2", "3", "4", "5", "6", "7")) +
scale_x_continuous(expand = c(0, 0), limits = c(0, NA), breaks = scales::pretty_breaks(n = 10)) +
scale_y_continuous(expand = c(0, 0), limits = c(0, NA), breaks = scales::pretty_breaks(n = 8)) +
theme_classic() +
theme(axis.text = element_text(color = "#000000"), plot.title = element_text(hjust = 0.5)) +
coord_cartesian(clip = 'off')
Here's one approach that omits the axis lines/ticks and then explicitly layers them below the rest of the plot layers. Because the new lines/ticks are drawn as literal objects, they will then ignore any other theming you may later apply. With control comes responsibility ...
This method has the side-effect of a "simple" axis tick, just the + symbol, which shows as a cross-line at each point. This is in contrast to the standard way (typically just pointing outwards). I'm guessing that something more robust could be devised, but I thought "simple" up-front could be adapted in other ways.
Taking the literal code of your ggplot(...) + ... and storing as gg, no changes. First we'll extract the tick marks. If you are confident enough (or not OCD-enough) to determine the tick locations yourself, then feel free to hard-code it. This method (of using ggplot_build then extracting the ...$x$breaks) has the advantage of matching the tick and label locations, especially if they might change with different/updated data.
ticks <- with(ggplot_build(gg)$layout$panel_params[[1]],
na.omit(rbind(
data.frame(x = x$breaks, y = 0),
data.frame(x = 0, y = y$breaks)
)))
head(ticks,3); tail(ticks,3)
# x y
# 1 0 0
# 2 2 0
# 3 4 0
# x y
# 16 0 600
# 17 0 800
# 18 0 1000
From here, I'll take a cue from https://stackoverflow.com/a/20250185/3358272 and prepend some layers below all of the others. (This is where I identify the + symbol for axis ticks, using shape=3.)
gg$layers <- c(
geom_hline(aes(yintercept = 0)),
geom_vline(aes(xintercept = 0)),
geom_point(data = ticks, aes(x, y), shape = 3, inherit.aes = FALSE),
gg$layers)
Now we just plot the previously-generated gg, adding a cue to omit the theme axis lines/ticks.
gg + theme(axis.line = element_blank(), axis.ticks = element_blank())
Data, including converting Treatment to character (to avoid continuous/discrete warnings from scale_colour_hue):
data7 <- read.table(header=TRUE, text = "
Treatment Days N mean sd se
1 1 0 7 204.7000000 41.579963 15.7157488
2 1 2 7 255.0571429 41.116617 15.5406205
3 1 5 7 290.6000000 49.506498 18.7116974
4 1 8 7 330.8142857 49.044144 18.5369442
5 1 12 7 407.5142857 95.584194 36.1274294
6 1 15 7 540.8571429 164.299390 62.0993323
7 1 19 7 737.5285714 308.786359 116.7102736
8 1 21 7 978.4571429 502.506726 189.9296898
9 2 0 7 205.7428571 46.902482 17.7274721
10 2 2 7 227.5571429 47.099889 17.8020846
11 2 5 7 232.4857143 59.642922 22.5429054
12 2 8 7 247.9857143 66.478529 25.1265220
13 2 12 7 272.0428571 79.173162 29.9246423
14 2 15 7 289.1142857 82.847016 31.3132288
15 2 19 7 312.3857143 105.648591 39.9314140
16 2 21 7 334.7142857 121.569341 45.9488920
17 3 0 7 212.2285714 47.549263 17.9719320
18 3 2 7 235.4142857 52.689671 19.9148237
19 3 5 7 177.0714286 54.895225 20.7484447
20 3 8 7 205.2571429 72.611451 27.4445489
21 3 12 7 247.8142857 119.369558 45.1174522
22 3 15 7 280.4285714 140.825847 53.2271669
23 3 19 7 366.9142857 210.573799 79.5894149
24 3 21 7 451.0428571 289.240793 109.3227438
25 4 0 7 211.6857143 24.329161 9.1955587
26 4 2 7 227.8428571 28.762525 10.8712127
27 4 5 7 205.9428571 49.148919 18.5765451
28 4 8 7 153.1142857 25.189246 9.5206399
29 4 12 7 128.2571429 43.145910 16.3076210
30 4 15 7 104.1714286 45.161662 17.0695038
31 4 19 7 85.4714286 51.169708 19.3403318
32 4 21 7 66.9000000 52.724567 19.9280133
33 5 0 7 216.7857143 39.957829 15.1026398
34 5 2 7 212.2000000 27.037135 10.2190765
35 5 5 7 115.5000000 37.094070 14.0202405
36 5 8 7 46.1000000 34.925492 13.2005952
37 5 12 7 29.3142857 24.761222 9.3588621
38 5 15 6 10.0666667 13.441974 5.4876629
39 5 19 6 6.4000000 11.692733 4.7735382
40 5 21 6 5.3666667 12.662017 5.1692467
41 6 0 7 206.6857143 40.359155 15.2543269
42 6 2 7 197.0428571 40.608327 15.3485048
43 6 5 7 106.2142857 58.279654 22.0276388
44 6 8 7 46.0571429 62.373014 23.5747833
45 6 12 7 31.7571429 49.977457 18.8897031
46 6 15 7 28.1142857 45.437995 17.1739480
47 6 19 7 26.2857143 38.414946 14.5194849
48 6 21 7 32.7428571 53.203003 20.1088450
49 7 0 7 193.2000000 37.300447 14.0982437
50 7 2 7 133.2428571 26.462606 10.0019250
51 7 5 7 3.8142857 7.445900 2.8142857
52 7 8 7 0.7142857 1.496026 0.5654449
53 7 12 7 0.0000000 0.000000 0.0000000
54 7 15 7 0.0000000 0.000000 0.0000000
55 7 19 7 0.0000000 0.000000 0.0000000
56 7 21 7 0.0000000 0.000000 0.0000000")
data7$Treatment <- as.character(data7$Treatment)
A fairly straightforward way to do this is just to move the panel in front of the axes once the plot elements are created (i.e. as a grobTree). The grobTree contains a layout data frame which allows you to move plot elements forwards or backwards by adjusting their z component.
If you store your plot as p, then the code would be:
ggp <- ggplot_gtable(ggplot_build(p))
ggp$layout$z[which(ggp$layout$name == "panel")] <- max(ggp$layout$z) + 1
grid::grid.draw(ggp)
Plot code:
This is just the original plot except I have added a vline at 0 and an hline at 0 in case bringing the panel forwards clips your axis lines).
p <- ggplot(data7, aes(Days, mean, color=Treatment)) +
geom_hline(aes(yintercept = 0)) +
geom_vline(aes(xintercept = 0)) +
geom_line() +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=0.5, size= 0.25) +
geom_point(size=2.5) +
scale_colour_hue(limits = c("1", "2", "3", "4", "5", "6", "7")) +
scale_x_continuous(expand = c(0, 0), limits = c(0, NA), breaks = scales::pretty_breaks(n = 10)) +
scale_y_continuous(expand = c(0, 0), limits = c(0, NA), breaks = scales::pretty_breaks(n = 8)) +
theme_classic() +
theme(axis.text = element_text(color = "#000000"), plot.title = element_text(hjust = 0.5)) +
coord_cartesian(clip = 'off')

Can I plot the number count in ggplot2 using geom_text instead the size of points (geom_count)?

I'm trying to plot a ggplot graph and instead of the size of point indicating the count, I need to plot the overlapping count number. Can you help me?
https://imgur.com/a/pm1SsWd
Thank you very much!
My data:
ID CIM DD
1 8 8
2 8 8
3 8 4
4 4 4
5 2 2
6 8 8
7 8 8
8 8 8
9 2 2
10 2 2
11 2 4
12 4 4
13 8 4
14 2 2
15 4 4
16 4 8
17 2 4
18 16 8
19 8 16
20 16 16
21 2 4
22 16 8
23 8 8
24 8 8
25 8 8
26 4 4
27 1 2
28 4 8
29 8 8
30 2 4
31 8 8
32 2 2
33 1 2
34 4 8
35 8 8
36 16 8
37 8 8
38 4 4
39 4 8
40 4 8
41 8 8
42 8 8
43 2 2
I used the code below to make an overlapping count graph as shown in an image link:
https://imgur.com/a/pm1SsWd
breaks = c(1,2,4,8,16)
labels = as.character(breaks)
ggplot(data = Data,aes(CIM,DD)) +
geom_count()+
scale_x_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "CIM")+
scale_y_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "DD")
Take a look at this example:
Add count as label to points in geom_count
You could do the following with your data:
p <- ggplot(data = Data,aes(CIM,DD)) +
geom_count(show.legend = FALSE)+
scale_x_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "CIM") +
scale_y_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "DD") +
scale_size_continuous(range = c(10, 10))
p + geom_text(data = ggplot_build(p)$data[[1]], aes(x, y, label = n), color = "#ffffff")
You can adjust the range in scale_size_continuous if you wish to vary the size of points.

How do I split y-axis disproportionally to better show data in my plot

I have a data.frame named final that looks like:
labels gvs order color f3
1 Adygei -2.3321916 1 1 353.0184
2 Basque -0.8519079 2 1 368.1515
3 French -0.9298674 3 1 365.2545
4 Italian -2.8859587 4 1 354.4481
5 Orcadian -1.4996229 5 1 350.9650
6 Russian -1.5597359 6 1 358.9736
7 Sardinian -1.4494841 7 1 355.1171
8 Tuscan -2.4279528 8 1 362.4717
9 Bedouin -3.1717421 9 2 319.3706
10 Druze -0.5058627 10 2 346.2211
11 Mozabite -2.6491331 11 2 299.5014
12 Palestinian -0.7819299 12 2 330.4576
13 Balochi -1.4095947 13 3 327.1238
14 Brahui -1.2534511 14 3 331.0927
15 Burusho 1.7958170 15 3 335.0919
16 Hazara 2.2810477 16 3 325.2444
17 Kalash -0.9258497 17 3 337.7116
18 Makrani -0.9007551 18 3 321.5726
19 Pathan 2.5543214 19 3 326.1923
20 Sindhi 2.6614486 20 3 318.7025
21 Uygur -1.2207974 21 3 322.0286
22 Cambodian 2.3706977 22 4 310.8989
23 Dai -0.9441980 23 4 305.5687
24 Daur -1.0325107 24 4 309.0984
25 Han -0.7381369 25 4 309.1198
26 Hezhen -2.7590587 26 4 296.9128
27 Japanese -0.5644325 27 4 297.9313
28 Lahu -0.8449225 28 4 307.0776
29 Miao -0.7237586 29 4 303.6593
30 Mongola -0.9452944 30 4 302.1380
31 Naxi -0.1625003 31 4 311.8019
32 Oroqen -1.2035258 32 4 308.7219
33 She -2.7758460 33 4 302.1271
34 Tu -0.7703779 34 4 307.3750
35 Tujia -1.0265275 35 4 303.5923
36 Xibo -1.1163019 36 4 295.5764
37 Yakut -3.2102686 37 4 315.0111
38 Yi -0.9614190 38 4 296.8134
39 Colombian -1.9659984 39 5 311.3134
40 Karitiana -0.9195156 40 5 300.8539
41 Maya 2.1239768 41 5 333.8995
42 Pima -3.0895998 42 5 325.3484
43 Surui -0.9377928 43 5 313.8505
44 Melanesian -1.6961014 44 6 294.5214
45 Papuan -0.7037952 45 6 286.7389
46 BantuKenya -1.9311354 46 7 152.9971
47 BantuSouthAfrica -1.8515908 47 7 133.6722
48 BiakaPygmy -1.7657017 48 7 117.5555
49 Mandenka -0.5423822 49 7 152.8525
50 MbutiPygmy -1.6244801 50 7 114.1691
51 San -0.9049735 51 7 0.0000
52 Yoruba 2.0949378 52 7 154.4460
I'm using the following code to make a graph
jpeg("F3.SCZ.Jul_22.jpg", 700,700)
final$color <- as.factor(final$color)
levels(final$color) <- c("blue","yellow3","red","pink","purple","green","orange")
plot(final$gvs, final$f3, cex=2,pch = 21, bg = as.character(final$color), xaxt="n", xlab="Genetic Values", ylab="F3", main="SCZ")
dev.off()
that looks like:
I would like to split the y-axis at 200, to have the y-values range from 0 to 200 to take up only 10% of the graph, while 200 to 400 to take up 90% of the y-axis. Is that possible?
EDIT:
Here is the data that is running into issues:
labels gvs order color f3
1 Adygei -2.3321916 1 1 0.09862109
2 Basque -0.8519079 2 1 0.09942770
3 French -0.9298674 3 1 0.10357547
4 Italian -2.8859587 4 1 0.09960179
5 Orcadian -1.4996229 5 1 0.10244666
6 Russian -1.5597359 6 1 0.10097691
7 Sardinian -1.4494841 7 1 0.10189642
8 Tuscan -2.4279528 8 1 0.09794686
9 Bedouin -3.1717421 9 2 0.09272493
10 Druze -0.5058627 10 2 0.09682272
11 Mozabite -2.6491331 11 2 0.08563901
12 Palestinian -0.7819299 12 2 0.09331649
13 Balochi -1.4095947 13 3 0.09227273
14 Brahui -1.2534511 14 3 0.09328593
15 Burusho 1.7958170 15 3 0.09396032
16 Hazara 2.2810477 16 3 0.09342432
17 Kalash -0.9258497 17 3 0.09666599
18 Makrani -0.9007551 18 3 0.09222257
19 Pathan 2.5543214 19 3 0.09468376
20 Sindhi 2.6614486 20 3 0.09172395
21 Uygur -1.2207974 21 3 0.09140727
22 Cambodian 2.3706977 22 4 0.08655821
23 Dai -0.9441980 23 4 0.08739080
24 Daur -1.0325107 24 4 0.08656669
25 Han -0.7381369 25 4 0.08764395
26 Hezhen -2.7590587 26 4 0.08802065
27 Japanese -0.5644325 27 4 0.08810874
28 Lahu -0.8449225 28 4 0.08609791
29 Miao -0.7237586 29 4 0.08700414
30 Mongola -0.9452944 30 4 0.08921706
31 Naxi -0.1625003 31 4 0.08646436
32 Oroqen -1.2035258 32 4 0.08719536
33 She -2.7758460 33 4 0.08656100
34 Tu -0.7703779 34 4 0.08818588
35 Tujia -1.0265275 35 4 0.08737680
36 Xibo -1.1163019 36 4 0.08806230
37 Yakut -3.2102686 37 4 0.08965344
38 Yi -0.9614190 38 4 0.08593454
39 Colombian -1.9659984 39 5 0.09114697
40 Karitiana -0.9195156 40 5 0.09040477
41 Maya 2.1239768 41 5 0.09068139
42 Pima -3.0895998 42 5 0.09084750
43 Surui -0.9377928 43 5 0.08925535
44 Melanesian -1.6961014 44 6 0.08430903
45 Papuan -0.7037952 45 6 0.08272786
46 BantuKenya -1.9311354 46 7 0.04668356
47 BantuSouthAfrica -1.8515908 47 7 0.03914248
48 BiakaPygmy -1.7657017 48 7 0.03546243
49 Mandenka -0.5423822 49 7 0.04612336
50 MbutiPygmy -1.6244801 50 7 0.03098719
51 San -0.9049735 51 7 0.00000000
52 Yoruba 2.0949378 52 7 0.04561542
You can do:
my_color <- as.factor(final$color)
levels(my_color) <- c("blue","yellow3","red","pink","purple","green","orange")
par(mfrow = c(1,2))
# original plot
pos <- seq(min(final$f3), max(final$f3), by = 25) ## y-axis tick marks position.
plot(final$gvs, final$f3, cex=2, pch=21, bg = as.character(my_color),
xaxt="n", yaxt="n", xlab="Genetic Values", ylab="F3", main="SCZ")
axis(2, at = pos, labels = pos) ## add y-axis
# new plot
threshold <- 260 ## cut off threshold
## some rescaling
## if f3 < threshold, we take new_f3 <- 0.1 * f3
## if f3 > threshold, we take new_f3 <- f3 - 0.9 * threshold
new_f3 <- ifelse(final$f3 < threshold, 0.1 * final$f3, final$f3 - threshold * 0.9)
## we apply the same transform to `pos` to get `new_pos`
new_pos <- ifelse(pos < threshold, 0.1 * pos, pos - threshold * 0.9)
plot(final$gvs, new_f3, cex=2, pch=21, bg = as.character(my_color),
xaxt="n", yaxt="n", xlab="Genetic Values", ylab="F3", main="SCZ")
abline(h = threshold * 0.1, lty = 3) # threshold line
axis(2, at = new_pos, labels = pos)
I would use trans_new() from scales package to transform the y-axis. This should get you close. I prefer the continuously differentiable transform (first), but you can also do a step change in scale (second). H/T to Gregor for pointing out that pmin and pmax handle vectors and are correct here.
setwd("C:/Users/rherron1/Desktop/")
final <- read.table("Scratch2.txt", header=TRUE)
final$id <- NULL
# default y-scale
require(ggplot2)
a <- ggplot(final, aes(gvs, f3, color=factor(color)))
a <- a + geom_point()
a
# transform y-axis
require(scales)
skew <- function(x) x^2
iskew <- function(x) x^(1/2)
skew_trans <- function() trans_new("skew", "skew", "iskew")
b <- a + coord_trans(y="skew")
b
# transform y-axis
require(scales)
sku <- function(x) pmin(x, 200) + 9*pmax(x-200, 0)
isku <- function(x) pmax((x-200)/9, 0) + pmin(x, 200)
sku_trans <- function() trans_new("sku", "sku", "isku")
c <- a + coord_trans(y="sku")
c

Resources