Jitter doesn't seem to separate my points in ggplot scatter plot - r
I'm trying to offset two points in a ggplot scatter plot made with geom_points. I tried the default jitter as well as my own settings, but in both cases I don't see a difference in the plot. The plot always looks like the attached image, with Brazil and Mexico quite crowded together. What should I do differently?
Here's the code I've tried:
p = ggplot(morning_night, aes(x = ascent_median, y = duration_median)) + geom_point(position = "jitter", aes(color=type, shape=region, size=30))
+geom_text(aes(label=country), hjust = 0, vjust = 1.2)
p = ggplot(morning_night, aes(x = ascent_median, y = duration_median)) + geom_point(position = position_jitter(w=.1, h=.1), aes(color=type, shape=region, size=30))
+geom_text(aes(label=country), hjust = 0, vjust = 1.2)
p = ggplot(morning_night, aes(x = ascent_median, y = duration_median)) + geom_point(position = position_jitter(w=.1, h=.3), aes(color=type, shape=region, size=30))
+geom_text(aes(label=country), hjust = 0, vjust = 1.2)
Here's the data frame used for this graph:
country ascent_median duration_median type region
Sweden 41.6 1793 morning Scandinavia
Denmark 33 1960 night Scandinavia
Mexico 75 1877 morning LatinAmerica
Brazil 74.4 1928 night LatinAmerica
Indonesia 41.8 1492 morning SoutheastAsia
Malaysia 48.7 2208 night SoutheastAsia
Switzerland 64.3 2461 morning CentralEurope
Austria 67.9 2113 night CentralEurope
Related
Two columns on x-axis and different grids in R
I need some help with a graph in R. This is how my dataframe looks like Footprint Local Number Remote Number Location 10.4 45 4 L1 12.5 452 78 L9 15.6 86 52 L5 85.3 12 12 L4 12.5 35 36 L2 85.9 78 78 L3 78.5 44 44 L6 4.6 10 11 L7 13.9 157 2 L8 What I want to achieve is a graph with the 'Footprint' column in the y-axis, the 'Local Number' column(in the x-axis) in the positive grid of the graph and the 'Remote Number' column(in the x-axis) in the negative grid of the graph. The data should be presented in dots and the lab name should be the label. So basically, I want to show for each location the remote and local number of employees. I am struggling on presenting the two columns in the x-axis. I appreciate the help!
Maybe you want something like where you could use geom_point for both columns with one negative and positive and add labels using geom_text like this: df <- read.table(text = 'Footprint Local_Number Remote_Number Location 10.4 45 4 L1 12.5 452 78 L9 15.6 86 52 L5 85.3 12 12 L4 12.5 35 36 L2 85.9 78 78 L3 78.5 44 44 L6 4.6 10 11 L7 13.9 157 2 L8 ', header = TRUE) library(ggplot2) ggplot() + geom_point(df, mapping = aes(x = Footprint, y = Local_Number, color = '1')) + geom_point(df, mapping = aes(x = -Remote_Number, y = Local_Number, color = '2')) + geom_text(df, mapping = aes(x = Footprint, y = Local_Number, label = Location), hjust = 0, vjust = 0) + geom_text(df, mapping = aes(x = -Remote_Number, y = Local_Number, label = Location), hjust = 0, vjust = 0) + scale_color_manual('Legend', labels = c('Footprint', 'Remote number'), values = c('blue', 'red')) + labs(y = 'Local Number') Created on 2022-10-14 with reprex v2.0.2 If you want to show it on only a positive axis you could the negative sign like this: library(ggplot2) ggplot() + geom_point(df, mapping = aes(x = Footprint, y = Local_Number, color = '1')) + geom_point(df, mapping = aes(x = Remote_Number, y = Local_Number, color = '2')) + geom_text(df, mapping = aes(x = Footprint, y = Local_Number, label = Location), hjust = 0, vjust = 0) + geom_text(df, mapping = aes(x = Remote_Number, y = Local_Number, label = Location), hjust = 0, vjust = 0) + scale_color_manual('Legend', labels = c('Footprint', 'Remote number'), values = c('blue', 'red')) + labs(y = 'Local Number') Created on 2022-10-14 with reprex v2.0.2
Here two more suggestions for visualisation. This seems to be paired data - remote vs local number. That can be either represented as a scatter plot or as change. Footprint can then be encoded as color. Thanks +1 to Quieten for the data. library(tidyverse) df <- read.table(text = 'Footprint Local_Number Remote_Number Location 10.4 45 4 L1 12.5 452 78 L9 15.6 86 52 L5 85.3 12 12 L4 12.5 35 36 L2 85.9 78 78 L3 78.5 44 44 L6 4.6 10 11 L7 13.9 157 2 L8 ', header = TRUE) df %>% ggplot(aes(Local_Number, Remote_Number)) + ## use Number as x and y and color code by footprint value geom_point(aes(color = Footprint), size = 3) + ## label the points, best with repel ggrepel::geom_text_repel(aes(label = Location)) + ## optional add a line of equality to help intuitive recognition of change ## + keeping same limits helps intuitive comparison geom_abline(intercept = 0, slope = 1, lty = 2, size = .3) + coord_equal(xlim = range(c(df$Local_Number, df$Remote_Number)), ylim = range(c(df$Local_Number, df$Remote_Number))) + ## optional change color scale scale_color_viridis_c(option = "magma") ## or, not to waste half of your graph (there is no positive value) ## you can show the difference instead df %>% mutate(change = Local_Number-Remote_Number) %>% ggplot() + ## now use Location as x variable, therefore no labels needed any more geom_point(aes(Location, change, color = Footprint), size = 3) + ## optional change color scale scale_color_viridis_c(option = "magma") Created on 2022-10-14 by the reprex package (v2.0.1)
geom_text() to label two separate points from different plots in ggplot
I am trying to create individual plots facetted by 'iid' using 'facet_multiple', in the following dataset (first 3 rows of data) iid Age iop al baseIOP baseAGE baseAL agesurg 1 1 1189 20 27.9 21 336 24.9 336 2 2 877 11 21.5 16 98 20.3 98 3 2 1198 15 21.7 16 98 20.3 98 and wrote the following code: # Install gg_plus from GitHub remotes::install_github("guiastrennec/ggplus") # Load libraries library(ggplot2) library(ggplus) # Generate ggplot object p <- ggplot(data_longF1, aes(x = Age, y = al)) + geom_point(alpha = 0.5) + geom_point(aes(x= baseAGE, y=baseAL)) + labs(x = 'Age (days)', y = 'Axial length (mm)', title = 'Individual plots of Axial length v time') p1 <- p+geom_vline(aes(xintercept = agesurg), linetype = "dotted", colour = "red", size =1.0) p2<- p1 + geom_text(aes(label=iop ,hjust=-1, vjust=-1)) p3 <- p2 + geom_text(aes(label = baseIOP, hjust=-1, vjust=-1)) # Plot on multiple pages (output plot to R/Rstudio) facet_multiple(plot = p3, facets = 'iid', ncol = 1, nrow = 1, scales = 'free') The main issue I am having is labeling the points. The points corresponding to (x=age, y=axl) get labelled fine, but labels for the second group of points (x=baseIOP, y=baseAL) gets put in the wrong place.individual plot sample I have had a look at similar issues in Stack Overflow e.g. ggplot combining two plots from different data.frames But not been able to correct my code. Thanks for your help
You need to define the x and y coordinates for the labels or they will default to the last ones specified. Thus the geom_text() definitions should look something like: data_longF1 <-read.table(header=TRUE, text="iid Age iop al baseIOP baseAGE baseAL agesurg 1 1 1189 20 27.9 21 336 24.9 336 2 2 877 11 21.5 16 98 20.3 98 3 2 1198 15 21.7 16 98 20.3 98") # Generate ggplot object p <- ggplot(data_longF1, aes(x = Age, y = al)) + geom_point(alpha = 0.5) + geom_point(aes(x= baseAGE, y=baseAL)) + labs(x = 'Age (days)', y = 'Axial length (mm)', title = 'Individual plots of Axial length v time') p1 <- p+geom_vline(aes(xintercept = agesurg), linetype = "dotted", colour = "red", size =1.0) #Need to specify the x and y coordinates or will default to the last ones defined p2<- p1 + geom_text(aes(x=Age, y= al, label=iop ,hjust=-1, vjust=-1)) p3 <- p2 + geom_text(aes(x=baseAGE, y= baseAL, label = baseIOP, hjust=-1, vjust=-1)) print(p3)
How to put gestational age in weeks.days on x-axis in ggplot
I am trying to plot weight of a fetus over time. The y-axis is fetal weight in grams The x-axis needs to be formatted as the following: 7 weeks 3 days == 27.3 29 weeks 6 days == 29.6 etc My data (df) looks something like this weight age 2013 22.4 2302 25.6 2804 27.2 3011 29.1 I have tried something like this... but not sure how to adjust the scale... ggplot(df, aes(x = age, y = weight)) + geom_point() + scale_x_continuous() If I get the actual numeric value for the age (i.e. 22.4 == 22weeks + 4/7days == 22.57), Is it possible to label the corresponding age value with the label i want? For example... weight age.label age.value 2013 22.4 22.57 2302 25.6 25.86 2804 27.2 27.29 3011 29.1 29.14 When I call this: df <- df %>% mutate(age.label = as.character(age.label)) ggplot(df, aes(x = age.value, y = weight)) + geom_point() + scale_x_continuous(label = "age.label") I get the following... Error in f(..., self = self) : Breaks and labels are different lengths Any help much appreciated
I borrowed from this answer and this one, to create a variable ticks labels that uses formatting to seperate the days and the weeks. I have supplied three different methods. Simply places ticks at every day point but does not number them. Numbers the days and the weeks correctly and distinguishes between them by making weeks bold and days light grey. Same as 2 but uses size. This method doesn't work very well, as it creates a large gap between the labels and the plot. It has been included for completeness... and in the hope somebody says how to fix it. The plot below is the second method. I think the vertical tick lines could also be coloured so that some of them disappear if you want as well. library(ggplot2) library(tidyverse) df<-read.table(header=TRUE, text="weight age.label age.value 2013 22.4 22.57 2302 25.6 25.86 2804 27.2 27.29 3011 29.1 29.14") #have ticks for every day using 1/7 distance tick marks ggplot(df, aes(x = age.value, y = weight)) + geom_point() + scale_x_continuous(limits=c(22, 30), minor_breaks = seq(from = 1, to = 33, by = 1/7), breaks = 1:30) #create a df of tick marks labels containing day number and week number breaks_labels_df <- data.frame(breaks = seq(from = 1, to = 33, by = 1/7)) %>% mutate(minors= rep(0:6, length.out = nrow(.)), break_label = ifelse(minors == 0, breaks, minors)) #plot both day number and week number differentiating between them by the label formatting. #remove the minor tick lines to reduce the busyness of the plot ggplot(df, aes(x = age.value, y = weight)) + geom_point() + scale_x_continuous(limits=c(22, 30), breaks = seq(from = 1, to = 33, by = 1/7), labels = breaks_labels_df$break_label) + theme(axis.text.x = element_text(color = c("grey60","grey60","black",rep("grey60",4)), size = 8, angle = 0, hjust = .5, vjust = .5, face = c("plain","plain","bold",rep("plain",4))), panel.grid.minor.x = element_blank()) + labs(title = "Baby weight in relation to age", x = "Age in weeks and days", y = "weight in grams") #Changing the font size places a large gap between the tick labels and the axis ggplot(df, aes(x = age.value, y = weight)) + geom_point() + scale_x_continuous(limits=c(22, 30), breaks = seq(from = 1, to = 33, by = 1/7), labels = breaks_labels_df$break_label) + theme(axis.text.x = element_text(vjust = 0, size = c(8,8,12,rep(8,4)), margin = margin(t = 0), lineheight = 0))
In order to add labels to the plot, use the geom_text function in the ggplot2 package. One can use the "hjust" and "vjust" to fine tune the placement. df<-read.table(header=TRUE, text="weight age 2013 22.4 2302 25.6 2804 27.2 3011 29.1") library(dplyr) library(ggplot2) #calculate the proper decimal value for axis df<-df %>%mutate(age.value=floor(age)+ (age-floor(age))*10/7) %>% round(2) ggplot(df, aes(x = age.value, y = weight)) + geom_point() + scale_x_continuous(limits=c(20, 30)) + geom_text(aes(label = age), hjust = -.2, vjust=.1)
ggplot2: issues with dual y-axes and Loess smoothing
I'm a novice with R and ggplot. I recognize the power of R and elegance of ggplot and am trying to learn. Normally, I can find a solution online but have had no luck this time. I am trying to generate a chart in ggplot comparing Economic Freedom scores with Life Expectancy and Infant mortality using World Bank data (the csv data is included at the bottom of the post). I have had some success using this code (using the example at https://rpubs.com/MarkusLoew/226759): p <- ggplot(mydata, aes(x = Score)) p <- p + geom_point(aes(y = Longevity, colour = "Life Expectancy")) p <- p + geom_point(aes(y = Infant/1, colour = "Infant mortality (per capita)")) p <- p + scale_y_continuous(sec.axis = sec_axis(~.*1, name = "Infant mortality (per capita)")) p <- p + scale_colour_manual(values = c("blue", "red")) p <- p + labs(y = "Life Expectancy (years)", x = "Score", colour = " ") p This has produced the following: my messed up chart I can't manage to properly scale the primary y-axis. Scaling the graphs as in the example (link above) doesn't work: I just expand out or squash the Longevity data. I tried loading the Longevity data on the secondary y but it still didn't work. The other issue is that I would like to add LOESS smooth trendlines to each set of data. I have tried following various examples but nothing works. If anyone has a solution it will be much appreciated! Thanks Data: Country Name,Score,GDP,Infant,Longevity,,,,,,,,, Afghanistan,48.9,585.850064,53.2,63.673,,,,,,,,, Albania,64.4,4537.86249,8.1,78.345,,,,,,,,, Algeria,46.5,4.12E+03,21,76.078,,,,,,,,, Angola,48.5,4.17E+03,55.8,61.547,,,,,,,,, Argentina,50.4,1.44E+04,9.7,76.577,,,,,,,,, Armenia,70.3,3936.79832,11.9,74.618,,,,,,,,, Australia,81,5.38E+04,3.1,82.5,,,,,,,,, Austria,72.3,4.73E+04,3,80.8902439,,,,,,,,, Azerbaijan,63.6,4131.61831,21.9,72.026,,,,,,,,, Bahrain,68.5,23655.0356,6.4,76.9,,,,,,,,, Bangladesh,55,1.52E+03,28.3,72.489,,,,,,,,, Barbados,54.5,16788.6839,11.9,75.906,,,,,,,,, Belarus,58.6,5726.02967,2.9,73.82682927,,,,,,,,, Belgium,67.8,4.33E+04,3.1,80.99268293,,,,,,,,, Belize,58.6,4905.50628,12.8,70.384,,,,,,,,, Benin,59.2,829.797231,65.1,60.907,,,,,,,,, Bhutan,58.4,3110.23011,26.5,70.197,,,,,,,,, Bolivia,47.7,3393.95582,29,69.125,,,,,,,,, Bosnia and Herzegovina,60.2,5180.6363,5.1,76.911,,,,,,,,, Botswana,70.1,7595.59585,32.3,66.797,,,,,,,,, Brazil,52.9,9.82E+03,14.6,75.509,,,,,,,,, Brunei Darussalam,69.8,28290.5852,9,77.203,,,,,,,,, Bulgaria,67.9,8031.59844,6.7,74.61463415,,,,,,,,, Burkina Faso,59.6,670.705913,52.6,60.361,,,,,,,,, Burundi,53.2,320.08687,44.1,57.481,,,,,,,,, Cabo Verde,56.9,3209.69112,15.9,72.798,,,,,,,,, Cambodia,59.5,1384.42319,26.3,68.981,,,,,,,,, Cameroon,51.8,1446.70289,56.6,58.073,,,,,,,,, Canada,78.5,4.50E+04,4.6,82.3005122,,,,,,,,, Central African Republic,51.8,418.411287,89.2,52.171,,,,,,,,, Chad,49,669.886426,75,52.903,,,,,,,,, Chile,76.5,1.53E+04,6.6,79.522,,,,,,,,, China,57.4,8.83E+03,8.6,76.252,,,,,,,,, Colombia,69.7,6.30E+03,13.1,74.381,,,,,,,,, Comoros,55.8,797.286368,53.6,63.701,,,,,,,,, Costa Rica,65,11630.6684,8,79.831,,,,,,,,, Cote d'Ivoire,63,1662.44247,66,53.582,,,,,,,,, Croatia,59.4,13294.5149,4,78.02195122,,,,,,,,, Cyprus,67.9,25233.571,2.2,80.508,,,,,,,,, Czech Republic,73.3,2.04E+04,2.6,78.33170732,,,,,,,,, Denmark,75.1,5.63E+04,3.7,80.70487805,,,,,,,,, Djibouti,46.7,1927.58971,53,62.465,,,,,,,,, Dominica,63.7,7609.61435,30.4,,,,,,,,,, Dominican Republic,62.9,7052.25884,25.6,73.861,,,,,,,,, Ecuador,49.3,6.20E+03,12.7,76.327,,,,,,,,, "Egypt, Arab Rep.",52.6,2.41E+03,19.4,71.484,,,,,,,,, El Salvador,64.1,3889.30877,12.9,73.512,,,,,,,,, Equatorial Guinea,45,9850.01358,67.4,57.681,,,,,,,,, Estonia,79.1,19704.655,2.3,77.73658537,,,,,,,,, Ethiopia,52.7,767.563478,42.5,65.475,,,,,,,,, Fiji,63.4,5589.38883,21.1,70.269,,,,,,,,, Finland,74,4.57E+04,1.9,81.7804878,,,,,,,,, France,63.3,3.85E+04,3.5,82.27317073,,,,,,,,, Gabon,58.6,7220.68724,36.1,66.105,,,,,,,,, Georgia,76,4078.25488,10.2,73.261,,,,,,,,, Germany,73.8,4.45E+04,3.2,80.64146341,,,,,,,,, Ghana,56.2,1641.48662,37.2,62.742,,,,,,,,, Greece,55,1.86E+04,4.2,81.03658537,,,,,,,,, Guatemala,63,4470.98957,23.9,73.409,,,,,,,,, Guinea,47.6,825.34493,58.1,60.015,,,,,,,,, Guinea-Bissau,56.1,723.658622,57.4,57.403,,,,,,,,, Guyana,58.5,4725.31906,26.7,66.65,,,,,,,,, Haiti,49.6,765.683925,55,63.33,,,,,,,,, Honduras,58.8,2480.12593,16.2,73.575,,,,,,,,, "Hong Kong SAR, China",88.6,4.62E+04,,84.22682927,,,,,,,,, Hungary,65.8,1.42E+04,4.1,75.56829268,,,,,,,,, Iceland,74.4,70056.8734,1.7,82.46829268,,,,,,,,,
This should give you a good start. You can play around with scale_ratio & dif if you want to library(tidyverse) mydata <- read_csv(text, col_types = paste0(c("c", rep("d", 4), rep("_", 9)), collapse = "")) mydata #> # A tibble: 67 x 5 #> `Country Name` Score GDP Infant Longevity #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 Afghanistan 48.9 586. 53.2 63.7 #> 2 Albania 64.4 4538. 8.1 78.3 #> 3 Algeria 46.5 4120 21 76.1 #> 4 Angola 48.5 4170 55.8 61.5 #> 5 Argentina 50.4 14400 9.7 76.6 #> 6 Armenia 70.3 3937. 11.9 74.6 #> 7 Australia 81 53800 3.1 82.5 #> 8 Austria 72.3 47300 3 80.9 #> 9 Azerbaijan 63.6 4132. 21.9 72.0 #> 10 Bahrain 68.5 23655. 6.4 76.9 #> # ... with 57 more rows Calculate ratios needed to scale the two y-axes scale_ratio <- (max(mydata$Infant, na.rm = TRUE) - min(mydata$Infant, na.rm = TRUE)) / (max(mydata$Longevity, na.rm = TRUE) - min(mydata$Longevity, na.rm = TRUE)) dif <- min(mydata$Longevity, na.rm = TRUE) - min(mydata$Infant, na.rm = TRUE) myColor <- c("#d95f02", "#1b9e77") p <- ggplot(mydata, aes(x = Score, y = Longevity)) + geom_point(aes(colour = "Life Expectancy"), shape = "triangle", alpha = 0.7, size = 2) + geom_point(aes(y = Infant/scale_ratio + dif, colour = "Infant mortality (per capita)"), alpha = 0.7, size = 2) + scale_y_continuous(sec.axis = sec_axis(~ (. - dif) * scale_ratio, name = "Infant mortality (per capita)")) + scale_colour_manual(values = myColor) + theme_bw(base_size = 14) + labs(y = "Life Expectancy (years)", x = "Score", colour = " ") + guides(colour = guide_legend(title = "", override.aes = list(shape = c("circle", "triangle")))) + theme(legend.position = 'bottom') + NULL p Add fitted lines and their corresponding equations/R2 ### https://docs.r4photobiology.info/ggpmisc/articles/user-guide.html library(ggpmisc) formula <- y ~ poly(x, 2, raw = TRUE) p + stat_smooth(aes(y = Longevity), method = "lm", formula = formula, se = FALSE, size = 1, color = myColor[2]) + stat_smooth(aes(y = Infant/scale_ratio + dif), method = "lm", formula = formula, se = FALSE, size = 1, color = myColor[1]) + stat_poly_eq(aes(y = Longevity, label = paste(..eq.label.., ..adj.rr.label.., sep = "~~italic(\"with\")~~")), geom = "text", alpha = 0.7, formula = formula, parse = TRUE, color = myColor[2], label.x.npc = 0.5, label.y.npc = 0.95) + stat_poly_eq(aes(y = Infant/scale_ratio + dif, label = paste(..eq.label.., ..adj.rr.label.., sep = "~~italic(\"with\")~~")), geom = "text", alpha = 0.7, color = myColor[1], formula = formula, parse = TRUE, label.x.npc = 0.75, label.y.npc = 0.15) + NULL Created on 2018-10-07 by the reprex package (v0.2.1.9000)
Creating GW contours using ggplot
I am a novice at R coding and am trying to plot GW contours using X (Easting) and Y (Northing) cords and GW level (rswl) data in ggplot. An example of the data that I am trying to plot is: X Obs_No Season Easting Northing rswl 1 56 ADE146 Winter 2017 275638.7 6131431 5.72 2 113 YAT099 Winter 2017 271723.0 6133405 3.16 4 227 YAT066 Winter 2017 276503.0 6135636 2.31 5 292 YAT053 Winter 2017 277780.8 6139285 -2.30 6 400 YAT129 Winter 2017 282065.1 6146759 5.60 7 509 PTA040 Winter 2017 270868.0 6150199 1.68 An example of the code I have tried is: ggplot(data)+ aes(x = Easting, y = Northing, z = rswl, fill = rswl)+ geom_tile()+ geom_contour(colour = "white", alpha = 0.5)+ scale_fill_distiller(palette = "Spectral", na.value = "white") + theme_bw() but it comes up with "Not possible to generate contour data" Something else I tried with 1 of my datasets is: ggplot(data, aes(x = Easting, y = Northing, z = rswl)) + geom_density_2d(colour = "black")+ geom_point(aes(color = factor(Obs_No)))+ theme(legend.title = element_blank())+ ggtitle("Tomw.T2 Winter 2017.csv") This seems to be contours based on the distribution of points and has nothing to do with the GW level. Any tips would be greatly appreciated. Thanks