I'm not sure why I am having such a problem with my x-scale labels repeating as opposed to just labeling where there is a measured point. Additionally, my labels for my legend are not working.
FamIncome Ethnicity mean.bmi
1 1 1 28.54250
2 1 2 26.66300
3 1 3 26.62105
4 1 4 29.51396
5 1 5 25.66722
6 2 1 29.62404
7 2 2 28.08393
8 2 3 28.62215
9 2 4 28.97561
10 2 5 25.57714
11 3 1 29.52630
12 3 2 28.27235
13 3 3 29.67060
14 3 4 31.36768
15 3 5 26.13361
16 4 1 30.83368
17 4 2 30.80814
18 4 3 29.29594
19 4 4 29.18521
20 4 5 24.80550
21 5 1 29.76500
22 5 2 29.24404
23 5 3 28.89435
24 5 4 31.48172
25 5 5 28.02522
26 6 1 30.05087
27 6 2 29.88574
28 6 3 29.53793
29 6 4 30.97993
30 6 5 25.57857
31 7 1 30.31787
32 7 2 29.28055
33 7 3 28.50421
34 7 4 30.65427
35 7 5 26.66094
36 8 1 29.15000
37 8 2 29.02789
38 8 3 28.36507
39 8 4 33.51915
40 8 5 28.38263
41 9 1 28.17679
42 9 2 28.74731
43 9 3 28.06196
44 9 4 31.38483
45 9 5 26.96000
46 10 1 28.71633
47 10 2 33.44409
48 10 3 30.63048
49 10 4 30.22587
50 10 5 27.36375
51 14 1 30.78161
52 14 2 27.43575
53 14 3 28.96817
54 14 4 32.22378
55 14 5 25.62778
56 15 1 29.15982
57 15 2 27.42672
58 15 3 27.60567
59 15 4 30.05013
60 15 5 26.80271
code below:
a <- ggplot(nh1, aes(x=FamIncome, y=mean.bmi)) + geom_line(aes(group=Ethnicity, colour = Ethnicity)) + geom_point()
a = a + labs(list(title="Average BMI versus Family Income", x = "Family Income", y = "Average BMI"))
a = a + scale_x_discrete(breaks=c("1","2","3","4","5","6","7","8","9","10","14","15"),
labels = c("0-4,999", "5K-9,999", "10K-14,999", "15K-19,999", "20K-24,999", "25K-34,999", "35K-44,999", "45K-54,999", "55K-64,999", "65K-74,999", "75K-100K", "Over 100K"))
a = a + theme(axis.text.x=element_text(angle=-90))
a = a + scale_colour_continuous(name = "Ethnicity",
breaks=c("5","4","3","2","1"),
labels=c("Other Race/Multi", "Black","White","Other Hispanic", "Mexican-American"))
a
I cannot post a picture of the image that I'm getting until I get 2 more "reputation" points
Try converting your x variable to a factor:
a <- ggplot(nh1, aes(x=factor(FamIncome), y=mean.bmi)) + geom_line(aes(group=Ethnicity, colour = factor(Ethnicity)))
a = a + labs(list(title="Average BMI versus Family Income", x = "Family Income", y = "Average BMI"))
a = a + scale_x_discrete("Family Income", labels = c("0-4,999", "5K-9,999", "10K-14,999", "15K-19,999", "20K-24,999", "25K-34,999", "35K-44,999", "45K-54,999", "55K-64,999", "65K-74,999", "75K-100K", "Over 100K"))
a = a + opts(axis.text.x=theme_text(angle=-90))
a = a + scale_colour_discrete(name = "Ethnicity",
breaks=c("5","4","3","2","1"),
labels=c("Other Race/Multi", "Black","White","Other Hispanic", "Mexican-American"))
With a numeric x variable, ggplot is treating it as a numeric scale, when you really intended it to be categorical. Also note the confusing between fill and colour. fill is for two dimensional filled regions.
Related
I have a series of values that includes strings of values that are close to each other, for example the sequences below. Note that roughly around the places I have categorized the values in V1 with distinct values in V2, the range of the values changes. That is, all the values called 1 in V2 are within 20 points of each other. All the values marked 2 in V2 are within 20 points of each other. All the values marked 3 are within 20 points of each other, etc. Notice that the values are not identical (they are all different). But instead, they cluster around a common value.
I identified these clusters manually. How could I automate it?
V1 V2
1 399.710 1
2 403.075 1
3 405.766 1
4 407.112 1
5 408.458 1
6 409.131 1
7 410.477 1
8 411.150 1
9 412.495 1
10 332.419 2
11 330.400 2
12 329.054 2
13 327.708 2
14 326.363 2
15 325.017 2
16 322.998 2
17 319.633 2
18 314.923 2
19 288.680 3
20 285.315 3
21 283.969 3
22 281.950 3
23 279.932 3
24 276.567 3
25 273.875 3
26 272.530 3
27 271.857 3
28 272.530 3
29 273.875 3
30 274.548 3
31 275.894 3
32 275.894 3
33 276.567 3
34 277.240 3
35 278.586 3
36 279.932 3
37 281.950 3
38 284.642 3
39 288.007 3
40 291.371 3
41 294.063 4
42 295.409 4
43 296.754 4
44 297.427 4
45 298.100 4
46 299.446 4
47 300.792 4
48 303.484 4
49 306.848 4
50 327.708 5
51 309.540 6
52 310.213 6
53 309.540 6
54 306.848 6
55 304.156 6
56 302.811 6
57 302.811 6
58 304.156 6
59 305.502 6
60 306.175 6
61 306.175 6
62 304.829 6
I haven't tried anything yet, I don't know how to do this.
Using dist and hclust with cutree to detect clusters, but with unique levels at the breaks.
hc <- hclust(dist(x))
cl <- cutree(hc, k=6)
data.frame(x, seq=cumsum(c(0, diff(cl)) != 0) + 1)
# x seq
# 1 399.710 1
# 2 403.075 1
# 3 405.766 1
# 4 407.112 1
# 5 408.458 1
# 6 409.131 1
# 7 410.477 1
# 8 411.150 1
# 9 412.495 1
# 10 332.419 2
# 11 330.400 2
# 12 329.054 2
# 13 327.708 2
# 14 326.363 2
# 15 325.017 2
# 16 322.998 2
# 17 319.633 3
# 18 314.923 3
# 19 288.680 4
# 20 285.315 4
# 21 283.969 4
# 22 281.950 4
# 23 279.932 4
# 24 276.567 5
# 25 273.875 5
# 26 272.530 5
# 27 271.857 5
# 28 272.530 5
# 29 273.875 5
# 30 274.548 5
# 31 275.894 5
# 32 275.894 5
# 33 276.567 5
# 34 277.240 5
# 35 278.586 6
# 36 279.932 6
# 37 281.950 6
# 38 284.642 6
# 39 288.007 6
# 40 291.371 6
# 41 294.063 7
# 42 295.409 7
# 43 296.754 7
# 44 297.427 7
# 45 298.100 7
# 46 299.446 7
# 47 300.792 7
# 48 303.484 7
# 49 306.848 7
# 50 327.708 8
# 51 309.540 9
# 52 310.213 9
# 53 309.540 9
# 54 306.848 9
# 55 304.156 9
# 56 302.811 9
# 57 302.811 9
# 58 304.156 9
# 59 305.502 9
# 60 306.175 9
# 61 306.175 9
# 62 304.829 9
However, the dendrogram suggests rather k=4 clusters instead of 6, but it is arbitrary.
plot(hc)
abline(h=30, lty=2, col=2)
abline(h=18.5, lty=2, col=3)
abline(h=14, lty=2, col=4)
legend('topright', lty=2, col=2:4, legend=paste(c(4, 5, 7), 'cluster'), cex=.8)
Data:
x <- c(399.71, 403.075, 405.766, 407.112, 408.458, 409.131, 410.477,
411.15, 412.495, 332.419, 330.4, 329.054, 327.708, 326.363, 325.017,
322.998, 319.633, 314.923, 288.68, 285.315, 283.969, 281.95,
279.932, 276.567, 273.875, 272.53, 271.857, 272.53, 273.875,
274.548, 275.894, 275.894, 276.567, 277.24, 278.586, 279.932,
281.95, 284.642, 288.007, 291.371, 294.063, 295.409, 296.754,
297.427, 298.1, 299.446, 300.792, 303.484, 306.848, 327.708,
309.54, 310.213, 309.54, 306.848, 304.156, 302.811, 302.811,
304.156, 305.502, 306.175, 306.175, 304.829)
This solution iterates over every value, checks the range of all values in the group up to that point, and starts a new group if the range is greater than a threshold.
maxrange <- 18
grp_start <- 1
grp_num <- 1
V3 <- numeric(length(dat$V1))
for (i in seq_along(dat$V1)) {
grp <- dat$V1[grp_start:i]
if (max(grp) - min(grp) > maxrange) {
grp_num <- grp_num + 1
grp_start <- i
}
V3[[i]] <- grp_num
}
cbind(dat, V3)
V1 V2 V3
1 399.710 1 1
2 403.075 1 1
3 405.766 1 1
4 407.112 1 1
5 408.458 1 1
6 409.131 1 1
7 410.477 1 1
8 411.150 1 1
9 412.495 1 1
10 332.419 2 2
11 330.400 2 2
12 329.054 2 2
13 327.708 2 2
14 326.363 2 2
15 325.017 2 2
16 322.998 2 2
17 319.633 2 2
18 314.923 2 2
19 288.680 3 3
20 285.315 3 3
21 283.969 3 3
22 281.950 3 3
23 279.932 3 3
24 276.567 3 3
25 273.875 3 3
26 272.530 3 3
27 271.857 3 3
28 272.530 3 3
29 273.875 3 3
30 274.548 3 3
31 275.894 3 3
32 275.894 3 3
33 276.567 3 3
34 277.240 3 3
35 278.586 3 3
36 279.932 3 3
37 281.950 3 3
38 284.642 3 3
39 288.007 3 3
40 291.371 3 4
41 294.063 4 4
42 295.409 4 4
43 296.754 4 4
44 297.427 4 4
45 298.100 4 4
46 299.446 4 4
47 300.792 4 4
48 303.484 4 4
49 306.848 4 4
50 327.708 5 5
51 309.540 6 6
52 310.213 6 6
53 309.540 6 6
54 306.848 6 6
55 304.156 6 6
56 302.811 6 6
57 302.811 6 6
58 304.156 6 6
59 305.502 6 6
60 306.175 6 6
61 306.175 6 6
62 304.829 6 6
A threshold of 18 reproduces your groups, except that group 4 starts one row earlier. You could use a higher threshold, but then group 6 would start later than you have it.
I need to create bins for every completed rotation e.g. 360° and bins will be of varying lengths. I have created a for loop but with 100,000+ rows it is slow. I tried to implement using dplyr and/or other non-loop methods but am unclear where and how to declare the cutoffs. None of the examples I found for either dplyr or cut() seemed to address my problem.
Sample data:
x <- c(seq(90, .5, length.out = 3),
seq(359.5, .2, length.out = 5),
seq(358.9, .8, length.out = 8),
seq(359.2, .3, length.out = 11),
seq(358.3, .1, length.out = 15))
df <- data.frame(x)
df$bin <- NA
df[1,2] <- 1
For loop:
for(i in 2:nrow(df)) {
if(df[i,1] < df[i-1,1]) {
df[i,2] <- df[i-1,2]
} else {
df[i,2] <- df[i-1,2] + 1
}
}
How are the results in df$bin achieved without using a loop?
It looks like you could do:
df$binnew <- cumsum(c(1, diff(df$x) > 0))
Compare:
x bin binnew
1 90.00000 1 1
2 45.25000 1 1
3 0.50000 1 1
4 359.50000 2 2
5 269.67500 2 2
6 179.85000 2 2
7 90.02500 2 2
8 0.20000 2 2
9 358.90000 3 3
10 307.74286 3 3
11 256.58571 3 3
12 205.42857 3 3
13 154.27143 3 3
14 103.11429 3 3
15 51.95714 3 3
16 0.80000 3 3
17 359.20000 4 4
18 323.31000 4 4
19 287.42000 4 4
20 251.53000 4 4
21 215.64000 4 4
22 179.75000 4 4
23 143.86000 4 4
24 107.97000 4 4
25 72.08000 4 4
26 36.19000 4 4
27 0.30000 4 4
28 358.30000 5 5
29 332.71429 5 5
30 307.12857 5 5
31 281.54286 5 5
32 255.95714 5 5
33 230.37143 5 5
34 204.78571 5 5
35 179.20000 5 5
36 153.61429 5 5
37 128.02857 5 5
38 102.44286 5 5
39 76.85714 5 5
40 51.27143 5 5
41 25.68571 5 5
42 0.10000 5 5
I created a graph using geom_line and geom_point via ggplot. I want my axes to meet at (0,0) and I want my lines and data points to be in front of the axes instead of behind as shown:
I've tried:
coord_cartesian(clip = 'off')
putting geom_line and geom_point at the end
creating a base graph then add geom_line and geom_point
playing around with the functions of coord_cartesian
manually setting xlim =c(-0.1, 25) and ylim=c(-0.1, 1500)
data7 is as follows:
Treatment Days N mean sd se
1 1 0 7 204.7000000 41.579963 15.7157488
2 1 2 7 255.0571429 41.116617 15.5406205
3 1 5 7 290.6000000 49.506498 18.7116974
4 1 8 7 330.8142857 49.044144 18.5369442
5 1 12 7 407.5142857 95.584194 36.1274294
6 1 15 7 540.8571429 164.299390 62.0993323
7 1 19 7 737.5285714 308.786359 116.7102736
8 1 21 7 978.4571429 502.506726 189.9296898
9 2 0 7 205.7428571 46.902482 17.7274721
10 2 2 7 227.5571429 47.099889 17.8020846
11 2 5 7 232.4857143 59.642922 22.5429054
12 2 8 7 247.9857143 66.478529 25.1265220
13 2 12 7 272.0428571 79.173162 29.9246423
14 2 15 7 289.1142857 82.847016 31.3132288
15 2 19 7 312.3857143 105.648591 39.9314140
16 2 21 7 334.7142857 121.569341 45.9488920
17 3 0 7 212.2285714 47.549263 17.9719320
18 3 2 7 235.4142857 52.689671 19.9148237
19 3 5 7 177.0714286 54.895225 20.7484447
20 3 8 7 205.2571429 72.611451 27.4445489
21 3 12 7 247.8142857 119.369558 45.1174522
22 3 15 7 280.4285714 140.825847 53.2271669
23 3 19 7 366.9142857 210.573799 79.5894149
24 3 21 7 451.0428571 289.240793 109.3227438
25 4 0 7 211.6857143 24.329161 9.1955587
26 4 2 7 227.8428571 28.762525 10.8712127
27 4 5 7 205.9428571 49.148919 18.5765451
28 4 8 7 153.1142857 25.189246 9.5206399
29 4 12 7 128.2571429 43.145910 16.3076210
30 4 15 7 104.1714286 45.161662 17.0695038
31 4 19 7 85.4714286 51.169708 19.3403318
32 4 21 7 66.9000000 52.724567 19.9280133
33 5 0 7 216.7857143 39.957829 15.1026398
34 5 2 7 212.2000000 27.037135 10.2190765
35 5 5 7 115.5000000 37.094070 14.0202405
36 5 8 7 46.1000000 34.925492 13.2005952
37 5 12 7 29.3142857 24.761222 9.3588621
38 5 15 6 10.0666667 13.441974 5.4876629
39 5 19 6 6.4000000 11.692733 4.7735382
40 5 21 6 5.3666667 12.662017 5.1692467
41 6 0 7 206.6857143 40.359155 15.2543269
42 6 2 7 197.0428571 40.608327 15.3485048
43 6 5 7 106.2142857 58.279654 22.0276388
44 6 8 7 46.0571429 62.373014 23.5747833
45 6 12 7 31.7571429 49.977457 18.8897031
46 6 15 7 28.1142857 45.437995 17.1739480
47 6 19 7 26.2857143 38.414946 14.5194849
48 6 21 7 32.7428571 53.203003 20.1088450
49 7 0 7 193.2000000 37.300447 14.0982437
50 7 2 7 133.2428571 26.462606 10.0019250
51 7 5 7 3.8142857 7.445900 2.8142857
52 7 8 7 0.7142857 1.496026 0.5654449
53 7 12 7 0.0000000 0.000000 0.0000000
54 7 15 7 0.0000000 0.000000 0.0000000
55 7 19 7 0.0000000 0.000000 0.0000000
56 7 21 7 0.0000000 0.000000 0.0000000
My code is as follows:
ggplot(data7, aes(Days, mean, color=Treatment)) +
geom_line() +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=0.5, size= 0.25) +
geom_point(size=2.5) +
scale_colour_hue(limits = c("1", "2", "3", "4", "5", "6", "7")) +
scale_x_continuous(expand = c(0, 0), limits = c(0, NA), breaks = scales::pretty_breaks(n = 10)) +
scale_y_continuous(expand = c(0, 0), limits = c(0, NA), breaks = scales::pretty_breaks(n = 8)) +
theme_classic() +
theme(axis.text = element_text(color = "#000000"), plot.title = element_text(hjust = 0.5)) +
coord_cartesian(clip = 'off')
Here's one approach that omits the axis lines/ticks and then explicitly layers them below the rest of the plot layers. Because the new lines/ticks are drawn as literal objects, they will then ignore any other theming you may later apply. With control comes responsibility ...
This method has the side-effect of a "simple" axis tick, just the + symbol, which shows as a cross-line at each point. This is in contrast to the standard way (typically just pointing outwards). I'm guessing that something more robust could be devised, but I thought "simple" up-front could be adapted in other ways.
Taking the literal code of your ggplot(...) + ... and storing as gg, no changes. First we'll extract the tick marks. If you are confident enough (or not OCD-enough) to determine the tick locations yourself, then feel free to hard-code it. This method (of using ggplot_build then extracting the ...$x$breaks) has the advantage of matching the tick and label locations, especially if they might change with different/updated data.
ticks <- with(ggplot_build(gg)$layout$panel_params[[1]],
na.omit(rbind(
data.frame(x = x$breaks, y = 0),
data.frame(x = 0, y = y$breaks)
)))
head(ticks,3); tail(ticks,3)
# x y
# 1 0 0
# 2 2 0
# 3 4 0
# x y
# 16 0 600
# 17 0 800
# 18 0 1000
From here, I'll take a cue from https://stackoverflow.com/a/20250185/3358272 and prepend some layers below all of the others. (This is where I identify the + symbol for axis ticks, using shape=3.)
gg$layers <- c(
geom_hline(aes(yintercept = 0)),
geom_vline(aes(xintercept = 0)),
geom_point(data = ticks, aes(x, y), shape = 3, inherit.aes = FALSE),
gg$layers)
Now we just plot the previously-generated gg, adding a cue to omit the theme axis lines/ticks.
gg + theme(axis.line = element_blank(), axis.ticks = element_blank())
Data, including converting Treatment to character (to avoid continuous/discrete warnings from scale_colour_hue):
data7 <- read.table(header=TRUE, text = "
Treatment Days N mean sd se
1 1 0 7 204.7000000 41.579963 15.7157488
2 1 2 7 255.0571429 41.116617 15.5406205
3 1 5 7 290.6000000 49.506498 18.7116974
4 1 8 7 330.8142857 49.044144 18.5369442
5 1 12 7 407.5142857 95.584194 36.1274294
6 1 15 7 540.8571429 164.299390 62.0993323
7 1 19 7 737.5285714 308.786359 116.7102736
8 1 21 7 978.4571429 502.506726 189.9296898
9 2 0 7 205.7428571 46.902482 17.7274721
10 2 2 7 227.5571429 47.099889 17.8020846
11 2 5 7 232.4857143 59.642922 22.5429054
12 2 8 7 247.9857143 66.478529 25.1265220
13 2 12 7 272.0428571 79.173162 29.9246423
14 2 15 7 289.1142857 82.847016 31.3132288
15 2 19 7 312.3857143 105.648591 39.9314140
16 2 21 7 334.7142857 121.569341 45.9488920
17 3 0 7 212.2285714 47.549263 17.9719320
18 3 2 7 235.4142857 52.689671 19.9148237
19 3 5 7 177.0714286 54.895225 20.7484447
20 3 8 7 205.2571429 72.611451 27.4445489
21 3 12 7 247.8142857 119.369558 45.1174522
22 3 15 7 280.4285714 140.825847 53.2271669
23 3 19 7 366.9142857 210.573799 79.5894149
24 3 21 7 451.0428571 289.240793 109.3227438
25 4 0 7 211.6857143 24.329161 9.1955587
26 4 2 7 227.8428571 28.762525 10.8712127
27 4 5 7 205.9428571 49.148919 18.5765451
28 4 8 7 153.1142857 25.189246 9.5206399
29 4 12 7 128.2571429 43.145910 16.3076210
30 4 15 7 104.1714286 45.161662 17.0695038
31 4 19 7 85.4714286 51.169708 19.3403318
32 4 21 7 66.9000000 52.724567 19.9280133
33 5 0 7 216.7857143 39.957829 15.1026398
34 5 2 7 212.2000000 27.037135 10.2190765
35 5 5 7 115.5000000 37.094070 14.0202405
36 5 8 7 46.1000000 34.925492 13.2005952
37 5 12 7 29.3142857 24.761222 9.3588621
38 5 15 6 10.0666667 13.441974 5.4876629
39 5 19 6 6.4000000 11.692733 4.7735382
40 5 21 6 5.3666667 12.662017 5.1692467
41 6 0 7 206.6857143 40.359155 15.2543269
42 6 2 7 197.0428571 40.608327 15.3485048
43 6 5 7 106.2142857 58.279654 22.0276388
44 6 8 7 46.0571429 62.373014 23.5747833
45 6 12 7 31.7571429 49.977457 18.8897031
46 6 15 7 28.1142857 45.437995 17.1739480
47 6 19 7 26.2857143 38.414946 14.5194849
48 6 21 7 32.7428571 53.203003 20.1088450
49 7 0 7 193.2000000 37.300447 14.0982437
50 7 2 7 133.2428571 26.462606 10.0019250
51 7 5 7 3.8142857 7.445900 2.8142857
52 7 8 7 0.7142857 1.496026 0.5654449
53 7 12 7 0.0000000 0.000000 0.0000000
54 7 15 7 0.0000000 0.000000 0.0000000
55 7 19 7 0.0000000 0.000000 0.0000000
56 7 21 7 0.0000000 0.000000 0.0000000")
data7$Treatment <- as.character(data7$Treatment)
A fairly straightforward way to do this is just to move the panel in front of the axes once the plot elements are created (i.e. as a grobTree). The grobTree contains a layout data frame which allows you to move plot elements forwards or backwards by adjusting their z component.
If you store your plot as p, then the code would be:
ggp <- ggplot_gtable(ggplot_build(p))
ggp$layout$z[which(ggp$layout$name == "panel")] <- max(ggp$layout$z) + 1
grid::grid.draw(ggp)
Plot code:
This is just the original plot except I have added a vline at 0 and an hline at 0 in case bringing the panel forwards clips your axis lines).
p <- ggplot(data7, aes(Days, mean, color=Treatment)) +
geom_hline(aes(yintercept = 0)) +
geom_vline(aes(xintercept = 0)) +
geom_line() +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=0.5, size= 0.25) +
geom_point(size=2.5) +
scale_colour_hue(limits = c("1", "2", "3", "4", "5", "6", "7")) +
scale_x_continuous(expand = c(0, 0), limits = c(0, NA), breaks = scales::pretty_breaks(n = 10)) +
scale_y_continuous(expand = c(0, 0), limits = c(0, NA), breaks = scales::pretty_breaks(n = 8)) +
theme_classic() +
theme(axis.text = element_text(color = "#000000"), plot.title = element_text(hjust = 0.5)) +
coord_cartesian(clip = 'off')
I'm trying to plot a ggplot graph and instead of the size of point indicating the count, I need to plot the overlapping count number. Can you help me?
https://imgur.com/a/pm1SsWd
Thank you very much!
My data:
ID CIM DD
1 8 8
2 8 8
3 8 4
4 4 4
5 2 2
6 8 8
7 8 8
8 8 8
9 2 2
10 2 2
11 2 4
12 4 4
13 8 4
14 2 2
15 4 4
16 4 8
17 2 4
18 16 8
19 8 16
20 16 16
21 2 4
22 16 8
23 8 8
24 8 8
25 8 8
26 4 4
27 1 2
28 4 8
29 8 8
30 2 4
31 8 8
32 2 2
33 1 2
34 4 8
35 8 8
36 16 8
37 8 8
38 4 4
39 4 8
40 4 8
41 8 8
42 8 8
43 2 2
I used the code below to make an overlapping count graph as shown in an image link:
https://imgur.com/a/pm1SsWd
breaks = c(1,2,4,8,16)
labels = as.character(breaks)
ggplot(data = Data,aes(CIM,DD)) +
geom_count()+
scale_x_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "CIM")+
scale_y_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "DD")
Take a look at this example:
Add count as label to points in geom_count
You could do the following with your data:
p <- ggplot(data = Data,aes(CIM,DD)) +
geom_count(show.legend = FALSE)+
scale_x_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "CIM") +
scale_y_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "DD") +
scale_size_continuous(range = c(10, 10))
p + geom_text(data = ggplot_build(p)$data[[1]], aes(x, y, label = n), color = "#ffffff")
You can adjust the range in scale_size_continuous if you wish to vary the size of points.
I have a data.frame named final that looks like:
labels gvs order color f3
1 Adygei -2.3321916 1 1 353.0184
2 Basque -0.8519079 2 1 368.1515
3 French -0.9298674 3 1 365.2545
4 Italian -2.8859587 4 1 354.4481
5 Orcadian -1.4996229 5 1 350.9650
6 Russian -1.5597359 6 1 358.9736
7 Sardinian -1.4494841 7 1 355.1171
8 Tuscan -2.4279528 8 1 362.4717
9 Bedouin -3.1717421 9 2 319.3706
10 Druze -0.5058627 10 2 346.2211
11 Mozabite -2.6491331 11 2 299.5014
12 Palestinian -0.7819299 12 2 330.4576
13 Balochi -1.4095947 13 3 327.1238
14 Brahui -1.2534511 14 3 331.0927
15 Burusho 1.7958170 15 3 335.0919
16 Hazara 2.2810477 16 3 325.2444
17 Kalash -0.9258497 17 3 337.7116
18 Makrani -0.9007551 18 3 321.5726
19 Pathan 2.5543214 19 3 326.1923
20 Sindhi 2.6614486 20 3 318.7025
21 Uygur -1.2207974 21 3 322.0286
22 Cambodian 2.3706977 22 4 310.8989
23 Dai -0.9441980 23 4 305.5687
24 Daur -1.0325107 24 4 309.0984
25 Han -0.7381369 25 4 309.1198
26 Hezhen -2.7590587 26 4 296.9128
27 Japanese -0.5644325 27 4 297.9313
28 Lahu -0.8449225 28 4 307.0776
29 Miao -0.7237586 29 4 303.6593
30 Mongola -0.9452944 30 4 302.1380
31 Naxi -0.1625003 31 4 311.8019
32 Oroqen -1.2035258 32 4 308.7219
33 She -2.7758460 33 4 302.1271
34 Tu -0.7703779 34 4 307.3750
35 Tujia -1.0265275 35 4 303.5923
36 Xibo -1.1163019 36 4 295.5764
37 Yakut -3.2102686 37 4 315.0111
38 Yi -0.9614190 38 4 296.8134
39 Colombian -1.9659984 39 5 311.3134
40 Karitiana -0.9195156 40 5 300.8539
41 Maya 2.1239768 41 5 333.8995
42 Pima -3.0895998 42 5 325.3484
43 Surui -0.9377928 43 5 313.8505
44 Melanesian -1.6961014 44 6 294.5214
45 Papuan -0.7037952 45 6 286.7389
46 BantuKenya -1.9311354 46 7 152.9971
47 BantuSouthAfrica -1.8515908 47 7 133.6722
48 BiakaPygmy -1.7657017 48 7 117.5555
49 Mandenka -0.5423822 49 7 152.8525
50 MbutiPygmy -1.6244801 50 7 114.1691
51 San -0.9049735 51 7 0.0000
52 Yoruba 2.0949378 52 7 154.4460
I'm using the following code to make a graph
jpeg("F3.SCZ.Jul_22.jpg", 700,700)
final$color <- as.factor(final$color)
levels(final$color) <- c("blue","yellow3","red","pink","purple","green","orange")
plot(final$gvs, final$f3, cex=2,pch = 21, bg = as.character(final$color), xaxt="n", xlab="Genetic Values", ylab="F3", main="SCZ")
dev.off()
that looks like:
I would like to split the y-axis at 200, to have the y-values range from 0 to 200 to take up only 10% of the graph, while 200 to 400 to take up 90% of the y-axis. Is that possible?
EDIT:
Here is the data that is running into issues:
labels gvs order color f3
1 Adygei -2.3321916 1 1 0.09862109
2 Basque -0.8519079 2 1 0.09942770
3 French -0.9298674 3 1 0.10357547
4 Italian -2.8859587 4 1 0.09960179
5 Orcadian -1.4996229 5 1 0.10244666
6 Russian -1.5597359 6 1 0.10097691
7 Sardinian -1.4494841 7 1 0.10189642
8 Tuscan -2.4279528 8 1 0.09794686
9 Bedouin -3.1717421 9 2 0.09272493
10 Druze -0.5058627 10 2 0.09682272
11 Mozabite -2.6491331 11 2 0.08563901
12 Palestinian -0.7819299 12 2 0.09331649
13 Balochi -1.4095947 13 3 0.09227273
14 Brahui -1.2534511 14 3 0.09328593
15 Burusho 1.7958170 15 3 0.09396032
16 Hazara 2.2810477 16 3 0.09342432
17 Kalash -0.9258497 17 3 0.09666599
18 Makrani -0.9007551 18 3 0.09222257
19 Pathan 2.5543214 19 3 0.09468376
20 Sindhi 2.6614486 20 3 0.09172395
21 Uygur -1.2207974 21 3 0.09140727
22 Cambodian 2.3706977 22 4 0.08655821
23 Dai -0.9441980 23 4 0.08739080
24 Daur -1.0325107 24 4 0.08656669
25 Han -0.7381369 25 4 0.08764395
26 Hezhen -2.7590587 26 4 0.08802065
27 Japanese -0.5644325 27 4 0.08810874
28 Lahu -0.8449225 28 4 0.08609791
29 Miao -0.7237586 29 4 0.08700414
30 Mongola -0.9452944 30 4 0.08921706
31 Naxi -0.1625003 31 4 0.08646436
32 Oroqen -1.2035258 32 4 0.08719536
33 She -2.7758460 33 4 0.08656100
34 Tu -0.7703779 34 4 0.08818588
35 Tujia -1.0265275 35 4 0.08737680
36 Xibo -1.1163019 36 4 0.08806230
37 Yakut -3.2102686 37 4 0.08965344
38 Yi -0.9614190 38 4 0.08593454
39 Colombian -1.9659984 39 5 0.09114697
40 Karitiana -0.9195156 40 5 0.09040477
41 Maya 2.1239768 41 5 0.09068139
42 Pima -3.0895998 42 5 0.09084750
43 Surui -0.9377928 43 5 0.08925535
44 Melanesian -1.6961014 44 6 0.08430903
45 Papuan -0.7037952 45 6 0.08272786
46 BantuKenya -1.9311354 46 7 0.04668356
47 BantuSouthAfrica -1.8515908 47 7 0.03914248
48 BiakaPygmy -1.7657017 48 7 0.03546243
49 Mandenka -0.5423822 49 7 0.04612336
50 MbutiPygmy -1.6244801 50 7 0.03098719
51 San -0.9049735 51 7 0.00000000
52 Yoruba 2.0949378 52 7 0.04561542
You can do:
my_color <- as.factor(final$color)
levels(my_color) <- c("blue","yellow3","red","pink","purple","green","orange")
par(mfrow = c(1,2))
# original plot
pos <- seq(min(final$f3), max(final$f3), by = 25) ## y-axis tick marks position.
plot(final$gvs, final$f3, cex=2, pch=21, bg = as.character(my_color),
xaxt="n", yaxt="n", xlab="Genetic Values", ylab="F3", main="SCZ")
axis(2, at = pos, labels = pos) ## add y-axis
# new plot
threshold <- 260 ## cut off threshold
## some rescaling
## if f3 < threshold, we take new_f3 <- 0.1 * f3
## if f3 > threshold, we take new_f3 <- f3 - 0.9 * threshold
new_f3 <- ifelse(final$f3 < threshold, 0.1 * final$f3, final$f3 - threshold * 0.9)
## we apply the same transform to `pos` to get `new_pos`
new_pos <- ifelse(pos < threshold, 0.1 * pos, pos - threshold * 0.9)
plot(final$gvs, new_f3, cex=2, pch=21, bg = as.character(my_color),
xaxt="n", yaxt="n", xlab="Genetic Values", ylab="F3", main="SCZ")
abline(h = threshold * 0.1, lty = 3) # threshold line
axis(2, at = new_pos, labels = pos)
I would use trans_new() from scales package to transform the y-axis. This should get you close. I prefer the continuously differentiable transform (first), but you can also do a step change in scale (second). H/T to Gregor for pointing out that pmin and pmax handle vectors and are correct here.
setwd("C:/Users/rherron1/Desktop/")
final <- read.table("Scratch2.txt", header=TRUE)
final$id <- NULL
# default y-scale
require(ggplot2)
a <- ggplot(final, aes(gvs, f3, color=factor(color)))
a <- a + geom_point()
a
# transform y-axis
require(scales)
skew <- function(x) x^2
iskew <- function(x) x^(1/2)
skew_trans <- function() trans_new("skew", "skew", "iskew")
b <- a + coord_trans(y="skew")
b
# transform y-axis
require(scales)
sku <- function(x) pmin(x, 200) + 9*pmax(x-200, 0)
isku <- function(x) pmax((x-200)/9, 0) + pmin(x, 200)
sku_trans <- function() trans_new("sku", "sku", "isku")
c <- a + coord_trans(y="sku")
c