Visualising Categorical Data across a Time Frame - r
still fairly new to R and have stepped away for a while, so please bear with me.
I have a set of data which describes the degree of mobility (categorical data) after an operation across 3 days. I have been looking for a way to demonstrate the flow across those 3 days.
I've tried using geom_jitter with x and y being Day 1 and 2, and aes(colour) being Day 3 but this doesn't really convey what I want to show. I've done some reading around Sankey Diagram and Parallel Coordinates but have not got the understanding to quite fit the samples posed by others to fit my data.
This is what I've tried:
test %>% filter(!is.na(Mob_D1.factor) & !is.na(Mob_D2.factor) & !is.na(Mob_D3.factor)) %>%
ggplot(aes(x = Mob_D1.factor, y = Mob_D2.factor, colour = Mob_D3.factor)) +
geom_jitter(size = 5, alpha = 0.25, height = 0.25, width = 0.2) +
scale_colour_brewer(palette = "Dark2", name = "Mobilisation on Day 3") +
xlab("Mobilisation on Day 1") +
ylab("Mobilisation on Day 2") + theme_minimal()
As I said, not quite what I want.
This is a sample of the data:
structure(list(Mob_D1.factor = structure(c(2L, 2L, 2L, 2L, 4L,
1L, 2L, 2L, 1L, 4L, 2L, 4L, 2L, 1L, 2L, 4L, 4L, 2L, 4L, 4L, 2L,
4L, 2L, 2L, 4L, 2L, 1L, 4L, 4L, 3L, 4L, 2L, 3L, 2L, 2L, 2L, 2L,
2L, 4L, 4L, 2L, 4L, 4L, 2L, 2L, 4L, 2L, 4L, 4L, 4L), .Label = c("None",
"Bed", "Stand", "Assisted Walk"), class = "factor"), Mob_D2.factor = structure(c(2L,
3L, 2L, 4L, 4L, 1L, 3L, 4L, 4L, 4L, 3L, 4L, 2L, 2L, 2L, 4L, 4L,
4L, 4L, 4L, 1L, 4L, 2L, 2L, 4L, 2L, 1L, 4L, 4L, 4L, 4L, 2L, 3L,
2L, 2L, 2L, 4L, 4L, 2L, 4L, 3L, 4L, 4L, 2L, 2L, 4L, 4L, 4L, 4L,
4L), .Label = c("None", "Bed", "Stand", "Assisted Walk"), class = "factor"),
Mob_D3.factor = structure(c(2L, 3L, 2L, 4L, 4L, 1L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 4L, 2L,
2L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 2L, 4L, 4L, 2L, 2L, 4L, 4L,
3L, 4L, 4L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("None",
"Bed", "Stand", "Assisted Walk"), class = "factor")), row.names = c(NA,
-50L), class = c("tbl_df", "tbl", "data.frame"))
Thanks in advance to anyone who takes the time to reply. Any extended explanation would be appreciated as I am still learning.
Larry
I am not entirely sure what the expected result should be, but could a barplot be helpful?
Edit
I now think I understand what you need and I found the package ggalluvial that can help you with this.
Hope this helps.
library(tidyverse)
library(ggalluvial)
# Some data wrangling first. Add row_number to give a unique ID for each patient
d <- df %>% mutate(Patient = row_number()) %>%
# transform it to longer format
pivot_longer(col=(-Patient), values_to = "Stage", names_to = "Day")
# Make the plot
ggplot(d,
aes(x = Day, stratum = Stage, alluvium = Patient,
fill = Stage, label = Stage)) +
scale_fill_brewer(type = "qual", palette = "Set2") +
geom_flow(stat = "alluvium", lode.guidance = "frontback",
color = "darkgray") +
geom_stratum()
Created on 2020-02-24 by the reprex package (v0.3.0)
Related
ggplot: how to add an outline around each fill in a stacked barchart but only partly
I have the following stacked barchart produced with geom_bar. Question: how to add an outline around each fill corresponding to the matched color in cols. The tricky part is, that the outline should not be in between each fill but around the "borders" and the top, exclusively (expected output below) I have Written with library(ggplot) cols = c("#E1B930", "#2C77BF","#E38072","#6DBCC3", "grey40","black") ggplot(i, aes(fill=uicc, x=n)) + theme_bw() + geom_bar(position="stack", stat="count") + scale_fill_manual(values=alpha(cols,0.5)) Expected output My data i i <- structure(list(uicc = structure(c(4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 4L, 3L, 4L, 2L, 4L, 4L, 4L, 1L, 4L, 4L, 2L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 2L, 4L, 4L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 1L, 3L, 1L, 4L, 4L, 3L, 1L, 2L, 1L, 3L, 3L, 3L, 4L, 3L, 4L, 4L, 3L, 4L, 3L, 3L, 3L, 2L, 2L, 4L, 3L, 4L, 2L, 1L, 1L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 3L, 3L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), n = structure(c(4L, 4L, 4L, 4L, 2L, 1L, 4L, 1L, 4L, 2L, 4L, 2L, 4L, 1L, 4L, 5L, 2L, 1L, 1L, 5L, 1L, 1L, 2L, 2L, 1L, 4L, 3L, 4L, 2L, 1L, 5L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 3L, 2L, 1L, 4L, 2L, 1L, 4L, 1L, 4L, 1L, 2L, 2L, 2L, 4L, 1L, 4L, 2L, 4L, 1L, 1L, 1L, 4L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 4L, 1L, 1L, 4L, 2L, 2L, 2L, 1L, 1L, 3L, 2L, 5L, 1L, 1L, 1L, 4L, 4L, 4L, 5L, 1L, 4L, 4L, 1L, 4L, 2L, 1L, 1L, 2L, 2L, 4L), .Label = c("0", "1", "2", "3", "4", "5"), class = "factor")), row.names = c(NA, 100L), class = "data.frame")
Well, I found a way. There's no easy way to draw just the "outside" lines, so the approach I used was to go ahead and draw them with the geom_bar call. The inner lines are "erased" by drawing white rectangles over top the initial geom_bar call, and then the fill is drawn back in with a colorless color= aesthetic. In order to draw the rectangles over the initial geom_bar call, I created a summary dataframe of i which sets the y values. i.sum <- i %>% group_by(n) %>% tally() ggplot(i, aes(x=n)) + theme_bw() + # draw lines geom_bar(position='stack', stat='count', aes(color=uicc), fill=NA, size=1.5) + # cover over those inner lines geom_col(data=i.sum, aes(y=nn), fill='white') + # put back in the fill geom_bar(position='stack', stat='count', aes(fill=uicc), color=NA) + scale_fill_manual(values=alpha(cols,0.5)) + scale_color_manual(values=cols) Note that the size= of the color= aesthetic needs to be much higher than normal, since the white rectangle ends up covering about half the line.
Normalization of data within ggplot
I have my data as melted.df <- structure(list(organisms = structure(c(1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("Botrytis cinerea", "Fusarium graminearum", "Human", "Mus musculus"), class = "factor"), types = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("AllMismatches", "mismatchType2", "MismatchesType1", "totalDNA"), class = "factor"), mutations = c(30501L, 12256L, 58357L, 366531L, 3475L, 186907L, 253453L, 222L, 24906L, 2775L, 247990L, 12324L, 4395L, 25324L, 77862L, 1862L, 112217L, 163117L, 100L, 17549L, 1057L, 20331L, 18177L, 7861L, 33033L, 288669L, 1613L, 74690L, 90336L, 122L, 7357L, 1718L, 227659L, 635951L, 229493L, 868052L, 2418724L, 65833L, 1081903L, 1339758L, 4318L, 59387L, 15199L, 2134229L )), row.names = c(NA, -44L), class = "data.frame") The values totalDNA in type column indicates total DNAs in the data whereas mismatches are the mutations. I would like to normalize this data based on totalDNA values and plot it. The way I am plotting right now doesn't give me the accurate picture of the data as todalDNA inflates the whole Y-axis and other three types(mismatchType2, mismatchesType1 and AllMismatches) are not properly visible with respect to totalDNA. What would be the better way to plot this? Should I first calculate the percentage? or Perhaps do log scaling? Thanks for helping me out. ggplot(melted.df, aes(x = types, y = mutations, color=types)) + geom_point()+ facet_grid(.~organisms)+ xlab("Types")+ ylab("Mismatches")+ theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank())
Try a log scale? ggplot(melted.df, aes(x = types, y = mutations, color=types)) + geom_point()+ facet_grid(.~organisms)+ xlab("Types")+ ylab("Mismatches")+ # ylim(c(90,130))+ scale_y_log10()+ #add log scale theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank()) How would you normalise on total DNA? Would you use the (geometric) mean?
Gps heatmap with R
I would like to replicate a heatmap like this with R and my gps data cool heatmap I have a DB with activities: lat, long and ID. For the moment my "solution" is: heatmap <- ggplot(db) + geom_path(aes(lon,lat, group=id, colour = "white"), db %>% dplyr::filter(lon > 7.15, lon < 7.40, lat > 44.11, lat < 45.28), alpha = 0.3,size = 0.3, lineend = "round") + coord_map() + theme_black() + guides(colour=F) (theme_black it is my function to have black background). heatmap with geom_path But, sincerly, for the moment I don't like this solution because there aren't big differences between colour (there isn't red and white like desidered) but I don't know what I can do to improve. Any guesses? Ggplot isn't the best solution for this stuff? Thanks in advance!!! Here a sample of data (sorry but it is very large dataset) > dput(db[1:500,]) structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), lat = c(45.129635038, 45.130085063, 45.131009, 45.131882059, 45.131590033, 45.131329021, 45.13112006, 45.131015035, 45.131039007, 45.131698998, 45.132489998, 45.133366997, 45.133759019, 45.134701983, 45.135683001, 45.136293036, 45.136023054, 45.135670009, 45.135460042, 45.135254015, 45.135128035, 45.135034996, 45.134912033, 45.134307027, 45.133504041, 45.133148984, 45.133015041, 45.132966007, 45.132905992, 45.132654032, 45.132439036, 45.131962022, 45.131192983, 45.130391002, 45.129920023, 45.129436052, 45.129070014, 45.129021986, 45.129591033, 45.129928992, 45.130368036, 45.13124604, 45.131983061, 45.132388996, 45.132566022, 45.13284405, 45.132908004, 45.133012023, 45.133217045, 45.133765054, 45.134631994, 45.134917062, 45.135047988, 45.135109008, 45.135300031, 45.135462054, 45.135700016, 45.136083991, 45.136218017, 45.135245046, 45.134240056, 45.133684001, 45.133936044, 45.134249025, 45.134536021, 45.134823017, 45.135122, 45.13459503, 45.133557015, 45.132596029, 45.131550052, 45.131247046, 45.131424994, 45.131285016, 45.130373987, 45.129641995, 45.128759046, 45.128316984, 45.129005055, 45.129666051, 45.122702031, 45.122991039, 45.122958014, 45.122913003, 45.123416002, 45.124176994, 45.125018035, 45.12542506, 45.125733011, 45.12628102, 45.126707994, 45.127108062, 45.127236054, 45.127496983, 45.127754055, 45.128442042, 45.129054005, 45.129768059, 45.130593006, 45.130973042, 45.131786002, 45.132748999, 45.13345903, 45.133549052, 45.132284055, 45.131362046, 45.13043199, 45.129638056, 45.129016035, 45.129181997, 45.129563038, 45.129855985, 45.130079028, 45.130468032, 45.131136992, 45.131705033, 45.132078028, 45.132376005, 45.132479018, 45.132704994, 45.132877997, 45.132926025, 45.132952009, 45.133025015, 45.133105062, 45.133291057, 45.133701016, 45.134175012, 45.134924019, 45.134991997, 45.135074055, 45.135140021, 45.135174051, 45.135249991, 45.135427017, 45.135488038, 45.135602031, 45.13577604, 45.135962034, 45.136227992, 45.136352044, 45.136202008, 45.135484014, 45.134862999, 45.134246007, 45.13367302, 45.133871001, 45.134082057, 45.134269057, 45.134495033, 45.134673987, 45.134855036, 45.13503399, 45.135185032, 45.135087047, 45.134516994, 45.133909055, 45.133294996, 45.132439036, 45.131882059, 45.131336984, 45.130725021, 45.130062013, 45.129416019, 45.12933899, 45.128922996, 45.128330982, 45.128401055, 45.129221056, 45.130106018, 45.130646986, 45.13123506, 45.131795055, 45.132434007, 45.133362052, 45.134454046, 45.135240017, 45.135089059, 45.134865011, 45.13461699, 45.13438104, 45.134141988, 45.133906037, 45.133666985, 45.133435058, 45.133203047, 45.133004061, 45.13272905, 45.132522017, 45.132340046, 45.13216805, 45.131994041, 45.131811986, 45.131628003, 45.131385012, 45.131203041, 45.131021992, 45.130847062, 45.130666012, 45.130484041, 45.130299053, 45.130098055, 45.129903008, 45.129663034, 45.129481985, 45.129300014, 45.12912106, 45.12894898, 45.128773043, 45.128603058, 45.128431062, 45.128258059, 45.128033005, 45.127850028, 45.127668979, 45.127458007, 45.127250052, 45.127065985, 45.126884014, 45.126703049, 45.126526023, 45.126340028, 45.126152022, 45.125964016, 45.125770981, 45.125583059, 45.125393041, 45.125204029, 45.125013006, 45.124807984, 45.124527023, 45.124449993, 45.124267016, 45.12408999, 45.123908019, 45.123734011, 45.123548016, 45.123355987, 45.123176028, 45.122982992, 45.122779983, 45.122572028, 45.12230498, 45.121796031, 45.121328991, 45.120846026, 45.120289049, 45.119771047, 45.119686055, 45.120316038, 45.120865053, 45.121838024, 45.122644028, 45.122594994, 45.122980059, 45.12336504, 45.123782039, 45.12420499, 45.124636993, 45.124972018, 45.125372002, 45.125795037, 45.12622, 45.126666001, 45.127089035, 45.127513998, 45.12776403, 45.128168038, 45.128481018, 45.128721998, 45.129045036, 45.129408056, 45.129672003, 45.129911054, 45.130124039, 45.130358061, 45.130662995, 45.130887043, 45.131170016, 45.131380989, 45.131610988, 45.13187703, 45.132096049, 45.132301992, 45.132506008, 45.132714047, 45.132897024, 45.13332098, 45.133744015, 45.134139054, 45.134440048, 45.134644986, 45.134853025, 45.135062991, 45.135276059, 45.135480997, 45.13562701, 45.135738992, 45.135847034, 45.135989024, 45.136178036, 45.136355062, 45.136536027, 45.136685979, 45.136855042, 45.136982028, 45.137129047, 45.13734706, 45.137550992, 45.137713014, 45.137870007, 45.137991041, 45.138122051, 45.138241996, 45.13832506, 45.138433019, 45.138613985, 45.138707024, 45.138897041, 45.139046994, 45.13924598, 45.138971054, 45.138756058, 45.138572997, 45.138494039, 45.138537039, 45.138555982, 45.13856998, 45.138490016, 45.138134037, 45.13817603, 45.138190028, 45.138119033, 45.138011996, 45.137893057, 45.137830025, 45.137340019, 45.136485987, 45.136036046, 45.136160015, 45.136310051, 45.136460003, 45.136604004, 45.136782036, 45.136945986, 45.137076995, 45.137200041, 45.137354017, 45.137520984, 45.137670015, 45.137808987, 45.137971009, 45.138123056, 45.138263034, 45.138408041, 45.138550031, 45.138677016, 45.138722027, 45.138562017, 45.13831802, 45.137910994, 45.138398989, 45.138588001, 45.138689003, 45.138574003, 45.138490016, 45.138418016, 45.138348027, 45.138279044, 45.138208049, 45.138142, 45.138009985, 45.13792801, 45.137807059, 45.137701028, 45.137589045, 45.137460048, 45.137328033, 45.137216051, 45.137101051, 45.13703299, 45.136954032, 45.136867028, 45.13679201, 45.136722021, 45.136656056, 45.136593024, 45.136534015, 45.136470061, 45.136398061, 45.136333017, 45.136257999, 45.136191028, 45.136120033, 45.136053062, 45.135992041, 45.136270991, 45.136828052, 45.13735804, 45.137744027, 45.13783606, 45.137914012, 45.138001016, 45.138095983, 45.138160021, 45.138153986, 45.138158009, 45.138193045, 45.138021049, 45.137166011, 45.136320025, 45.135530031, 45.134769038, 45.134702988, 45.135192995, 45.135535982, 45.135733041, 45.135798, 45.135761036, 45.135694987, 45.135620053, 45.135560038, 45.135509998, 45.135451995, 45.135401033, 45.135342025, 45.135242029, 45.134860987, 45.134496039, 45.134132013, 45.13376103, 45.133521056, 45.133300026, 45.133006994, 45.132452028, 45.131868061, 45.131346036, 45.131149062, 45.131034984, 45.130921996, 45.130788053, 45.131065997, 45.131710062, 45.132181041, 45.132570046, 45.133207992, 45.133852058, 45.135253009, 45.135628015, 45.135519051, 45.135387036, 45.135186038, 45.134960061, 45.134741042, 45.134521017, 45.134305016, 45.134094043, 45.133896063, 45.133704033, 45.133504041, 45.133304049, 45.133121994, 45.132931054, 45.132731062, 45.132539033, 45.132356056, 45.13217903, 45.131998987, 45.131801006, 45.131603025, 45.131293063, 45.131077061, 45.130906992, 45.130640029, 45.130429056, 45.130185059, 45.129916, 45.129638056, 45.129436052, 45.129249052, 45.129059034, 45.128870022, 45.12869199, 45.128509014, 45.128316984, 45.128100982, 45.127882047, 45.127689012, 45.127498994, 45.127322052, 45.127143015, 45.126965989, 45.126790053, 45.126614033, 45.126442036, 45.126274063, 45.126093014, 45.125907019, 45.12567702, 45.125481051), lon = c(7.177825015, 7.178235979, 7.178516019, 7.178818019, 7.180071029, 7.181411044, 7.182807972, 7.184132984, 7.18553603, 7.186470026, 7.186401043, 7.186276991, 7.187393963, 7.18766101, 7.187816998, 7.188647979, 7.189946001, 7.191276963, 7.192704988, 7.194035029, 7.195474034, 7.196896024, 7.198408035, 7.199529031, 7.200370993, 7.201671026, 7.202974999, 7.204132037, 7.205226965, 7.206691032, 7.208102041, 7.209466029, 7.210153009, 7.20987297, 7.21114199, 7.212511006, 7.213782038, 7.213086005, 7.212089983, 7.21093898, 7.209804993, 7.210080003, 7.209278023, 7.207956029, 7.206746017, 7.205330984, 7.203929027, 7.202396983, 7.200898969, 7.199761964, 7.199052017, 7.197729017, 7.196266962, 7.194900041, 7.193565977, 7.192243984, 7.190915033, 7.189523972, 7.188117992, 7.187784979, 7.187623962, 7.186670018, 7.185395969, 7.183994012, 7.182630024, 7.181356981, 7.180001963, 7.179446997, 7.179225044, 7.17899999, 7.178715005, 7.177729042, 7.176305962, 7.175184045, 7.175042977, 7.175579, 7.176279978, 7.177192013, 7.177936997, 7.177846975, 7.240954995, 7.239716988, 7.238274965, 7.237349016, 7.236500013, 7.235478008, 7.234529009, 7.233205003, 7.231871023, 7.230718009, 7.229324014, 7.228024987, 7.226928969, 7.22580504, 7.224867021, 7.224035034, 7.223248979, 7.222385978, 7.221774015, 7.220518993, 7.219154, 7.218150016, 7.216942016, 7.215428998, 7.215080982, 7.214733971, 7.214371035, 7.214068029, 7.213604007, 7.212810995, 7.211996022, 7.211213991, 7.210382003, 7.209782026, 7.210057037, 7.20995, 7.209150031, 7.208283007, 7.20733099, 7.206362042, 7.205403991, 7.204609973, 7.203629039, 7.202443, 7.201614029, 7.200698977, 7.199939996, 7.199496006, 7.199036007, 7.19783597, 7.196806002, 7.195955993, 7.195093998, 7.194239044, 7.193458018, 7.192641034, 7.191804017, 7.19099902, 7.190212043, 7.189392041, 7.18862099, 7.187857985, 7.187763018, 7.187660005, 7.187563026, 7.18674403, 7.185946995, 7.185042001, 7.184242032, 7.183256991, 7.182461967, 7.181659987, 7.18086203, 7.180093996, 7.179516986, 7.179345995, 7.17919797, 7.179040977, 7.178834028, 7.178706036, 7.178580979, 7.178419962, 7.178262969, 7.177921993, 7.177279017, 7.176531016, 7.176589019, 7.177392006, 7.178126009, 7.178326001, 7.178472014, 7.178597994, 7.178737972, 7.178898988, 7.17914902, 7.179385976, 7.179649001, 7.180366994, 7.181394029, 7.182473031, 7.183556979, 7.184621984, 7.185677014, 7.186734978, 7.187767041, 7.188773037, 7.189650036, 7.190855018, 7.191781973, 7.192599963, 7.193384006, 7.194171989, 7.194980004, 7.195796989, 7.19684699, 7.19766104, 7.198472995, 7.199277993, 7.200095983, 7.200926043, 7.201757025, 7.20265699, 7.203524014, 7.204577032, 7.205392005, 7.206207983, 7.207015998, 7.207799036, 7.208567991, 7.209362009, 7.210172036, 7.211006036, 7.212032986, 7.212842008, 7.213678019, 7.214617965, 7.215551039, 7.216375986, 7.217187019, 7.218014983, 7.21882501, 7.219644006, 7.220502983, 7.221353998, 7.222200989, 7.223062984, 7.223912993, 7.224756045, 7.225593984, 7.226522028, 7.227319985, 7.228115009, 7.228928975, 7.229760963, 7.230591023, 7.23134003, 7.232190039, 7.233049016, 7.233908999, 7.234774012, 7.235642042, 7.236505964, 7.23738598, 7.238509994, 7.23919798, 7.239701984, 7.239998033, 7.240014042, 7.240470018, 7.240710998, 7.24098802, 7.241482971, 7.241980018, 7.236541001, 7.234863028, 7.233077012, 7.231303988, 7.229408001, 7.227598013, 7.225949041, 7.224104017, 7.222255975, 7.220355043, 7.218403987, 7.216424014, 7.214443035, 7.21335498, 7.211516996, 7.210116045, 7.209041988, 7.207609018, 7.206005979, 7.204825975, 7.203763987, 7.202810043, 7.20175803, 7.200392032, 7.199333984, 7.198078962, 7.197141027, 7.196108041, 7.19506902, 7.194110969, 7.193170016, 7.192237027, 7.191262044, 7.190321007, 7.188476989, 7.186560969, 7.18479901, 7.183440974, 7.182486023, 7.181530989, 7.180590037, 7.179652018, 7.178611992, 7.177518991, 7.176512995, 7.175524015, 7.17428299, 7.172614991, 7.17101497, 7.169426012, 7.168046016, 7.166526964, 7.165433963, 7.164168045, 7.162294019, 7.160434996, 7.159037985, 7.157677015, 7.156510003, 7.155374003, 7.154417041, 7.153558986, 7.152548045, 7.151016, 7.149832978, 7.148454994, 7.14724297, 7.145434994, 7.14365803, 7.141911996, 7.14024098, 7.139148986, 7.137836044, 7.136638019, 7.135302027, 7.134236017, 7.132942019, 7.130933044, 7.129730995, 7.128744026, 7.127353971, 7.125574996, 7.124366996, 7.122896978, 7.121798027, 7.120348041, 7.118866037, 7.117006009, 7.114993011, 7.112969033, 7.110609024, 7.108462, 7.10662301, 7.104880999, 7.102604977, 7.10047304, 7.098388042, 7.096420977, 7.094558015, 7.092581981, 7.090666968, 7.088789003, 7.086752032, 7.085011027, 7.08365098, 7.082186997, 7.080838013, 7.08017601, 7.081466991, 7.082715978, 7.084399987, 7.086051976, 7.087146988, 7.088107973, 7.089052027, 7.090004965, 7.090939045, 7.091842027, 7.093496028, 7.094554998, 7.096346965, 7.097899964, 7.099425973, 7.101134037, 7.102866996, 7.104503981, 7.106013981, 7.107015032, 7.108085988, 7.109268004, 7.110304007, 7.111242026, 7.112125981, 7.112966016, 7.113799009, 7.114587998, 7.115390984, 7.116221044, 7.117093013, 7.117956014, 7.118815997, 7.119701964, 7.120579969, 7.121648997, 7.12239398, 7.123135024, 7.124066002, 7.125184986, 7.126317968, 7.127448016, 7.128650987, 7.129692019, 7.130693992, 7.131727984, 7.133294981, 7.134498035, 7.134834988, 7.135549964, 7.136335013, 7.136930044, 7.137892035, 7.139461966, 7.14130699, 7.142842974, 7.143689043, 7.14469001, 7.146057015, 7.147123025, 7.148335971, 7.14946099, 7.150665972, 7.151748998, 7.15290503, 7.153922006, 7.155127994, 7.156057966, 7.156963966, 7.157977003, 7.158876968, 7.159796044, 7.160927014, 7.162616973, 7.164022032, 7.16541502, 7.166560994, 7.167449979, 7.168308034, 7.169376978, 7.17003303, 7.170956045, 7.171747045, 7.17239899, 7.173422001, 7.173888035, 7.174727985, 7.176419034, 7.177881006, 7.178866969, 7.17985201, 7.180855995, 7.181842964, 7.182824988, 7.183801982, 7.184746036, 7.185595039, 7.186423003, 7.187330009, 7.188232991, 7.189049976, 7.189923035, 7.190823, 7.191693041, 7.192525029, 7.193338995, 7.194151034, 7.195040019, 7.195914, 7.197277987, 7.198237966, 7.198999965, 7.200217018, 7.20117004, 7.20226304, 7.203474979, 7.204716004, 7.205629045, 7.206463044, 7.207314981, 7.208162978, 7.20899002, 7.209822008, 7.210690038, 7.211654041, 7.212608991, 7.213448019, 7.214296016, 7.215085005, 7.215888997, 7.216691983, 7.217486001, 7.218272978, 7.219039, 7.219812985, 7.220614044, 7.221435973, 7.222482035, 7.223379988)), row.names = c(NA, 500L), class = "data.frame")
You are trying to assign a single color whereas there are multiple groups in your data. This is why your colour = "white" is not taken. As a kind of 'hack', I've colored by id and then changed the range of colors from white to white. For clarity reasons, I've taken the dplyr part out of the part. Also, I don't have your theme_black function, so I've just manually added the black background: db_plot <- db %>% filter(lon > 7.15, lon < 7.40, lat > 44.11, lat < 45.28) ggplot(db_plot, aes(lon,lat, group = id, col = id)) + geom_path(alpha = 0.3, size = 0.5, lineend = "round") + coord_map() + guides(colour=F) + theme(panel.background=element_rect(fill="black"), panel.grid.minor=element_blank(), panel.grid.major=element_blank()) + scale_color_continuous(low = "white", high = "white") The image you posted seems to have different color ranges according to how many times a path has been taken. In order to achieve that, I think you need to have your data grouped by frequency of points. Then you can use that group to color and have a continuous range from red to white.
Cannot remove leading/trailing white space with gsub or trimws
I am trying to work with a data set that requires significant cleaning. I have one subject name that I cannot seem to remove the leading white space from. Example data: Data <- dput(Data) structure(list(Teacher = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L ), .Label = c("Please.rate.teacher:.JOHN.DOE .Overall.rating.for.teacher", "Please.rate.teacher: Jane.Doe.Overall.rating.for.teacher"), class = "factor"), Overall_Rating = c(5L, 4L, 5L, 4L, 4L, 5L, 4L, 4L, 4L, 4L, 3L, 5L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L)), .Names = c("Teacher", "Overall_Rating"), class = "data.frame", row.names = c(NA, -22L )) My attempt at cleaning: Data_clean <- Data %>% mutate(Teacher = as.character(Teacher), Teacher = gsub("Please.rate.teacher|.Overall.rating.for.teacher|[:]", "", Teacher), Teacher = gsub("[.]", " ", Teacher), Teacher = trimws(Teacher), Teacher = tolower(Teacher), Teacher = tools::toTitleCase(Teacher)) Results in remaining leading and trailing white space, which also breaks the title case for the second name: unique(Data_clean$Teacher) [1] "John Doe " " jane Doe" The first name still has trailing white space and the second has leading white space. How can I remove that?
I suspect your data contains a non-ASCII space like "\u00A0". The trimws function will only remove ASCII space characters. Try running utf8::utf8_print(unique(Data_clean$Teacher), utf8 = FALSE) to see if this is the case. To handle non-ASCII spaces, replace trimws(x) in your code with gsub("(^[[:space:]]*)|([[:space:]]*$)", "", x)
Here is a completely reproducible example with stringr and str_trim in particular since I don't know why trimws isn't working for you. Your posted code gave me the same output, correctly changing the case to title and removing the spaces. data <- structure(list(Teacher = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L ), .Label = c("Please.rate.teacher:.JOHN.DOE .Overall.rating.for.teacher", "Please.rate.teacher: Jane.Doe.Overall.rating.for.teacher"), class = "factor"), Overall_Rating = c(5L, 4L, 5L, 4L, 4L, 5L, 4L, 4L, 4L, 4L, 3L, 5L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L)), .Names = c("Teacher", "Overall_Rating"), class = "data.frame", row.names = c(NA, -22L )) library(tidyverse) data %>% mutate( Teacher = Teacher %>% str_remove_all("Please.rate.teacher:|.Overall.rating.for.teacher") %>% str_replace_all("\\.", " ") %>% str_trim() %>% str_to_title() ) %>% `[[`(1) %>% unique() #> [1] "John Doe" "Jane Doe" Created on 2018-03-15 by the reprex package (v0.2.0).
What about this? Data_clean <- Data %>% mutate(Teacher = gsub("Please.rate.teacher|\\s*\\.Overall.rating.for.teacher|:", "", Teacher), Teacher = gsub("\\.", " ", Teacher), Teacher = trimws(Teacher), Teacher = tolower(Teacher), Teacher = tools::toTitleCase(Teacher)) unique(Data_clean$Teacher); #[1] "John Doe" "Jane Doe" Explanation: Replace optional (>=0) whitespaces occurring before ".Overall.rating..." in Teacher. Sample data Data <- structure(list(Teacher = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L ), .Label = c("Please.rate.teacher:.JOHN.DOE .Overall.rating.for.teacher", "Please.rate.teacher: Jane.Doe.Overall.rating.for.teacher"), class = "factor"), Overall_Rating = c(5L, 4L, 5L, 4L, 4L, 5L, 4L, 4L, 4L, 4L, 3L, 5L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L)), .Names = c("Teacher", "Overall_Rating"), class = "data.frame", row.names = c(NA, -22L ))
Automatically adjusting ylim with stat_summary
ggplot2 adjust the ylim automatically for the data points. Is there any way to adjust the ylim for stat_summary too? df <- structure(list(Varieties = structure(c(2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L), .Label = c("F9917", "Hegari", "JS263", "JS2002"), class = "factor"), Priming = structure(c(2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L), .Label = c("CaCl2", "Dry", "Hydropriming", "KNO3", "OnFarmpriming"), class = "factor"), PH = c(225.8, 224.26, 228.9, 215.82, 230.3, 227.7, 232.8, 221.1, 260.2, 230.8, 236.75, 230.5, 250.56, 230.74, 240.64, 226.7, 268.4, 233.4, 243.33, 232.7, 252.04, 233.1, 237.14, 220.6, 265.55, 234.93, 240.04, 218.21, 300.55, 245, 243.5, 234.65, 253.3, 233.5, 238.62, 225.93, 255.74, 233.64, 238.1, 230.93, 246, 240.33, 246.08, 221.7, 250.54, 242.87, 251, 225.32, 251.47, 245.4, 266.74, 227.73, 290.62, 246.68, 256.4, 225.83, 282.67, 240.58, 258.35, 235.87)), .Names = c("Varieties", "Priming", "PH"), class = "data.frame", row.names = c(NA, 60L )) p1 <- ggplot(data=df, aes(x=Varieties, y=PH, group=Priming, shape=Priming, colour=Priming))+ stat_summary(fun.y=mean, geom="point", size=2, aes(group=Priming, shape=Priming, colour=Priming))+ theme_bw() p1 <- p1 + stat_summary(fun.y=mean, geom="line", aes(group=Priming, shape=Priming, colour=Priming)) print(p1) See extra space in ylim for stat_summary values. Thanks in advance for your help and time.
Here is one approach, using plyr to prep the data before plotting df <- ddply(df, .(Varieties, Priming), transform, meanPH = mean(PH)) ggplot(df, aes(Varieties, meanPH)) + geom_point() + geom_line(aes(group = Priming, color = Priming))
The current "official" answer for 0.8.9 is, I believe, that you can't, at least not automatically, and not without preprocessing the data as Ramnath indicates. Most people asking this question, or some variant of it, are pointed towards setting the limits manually using coord_cartesian. The reason stat_summary behaves this way is that it sort of assumes that you aren't going to just plot the summaries, but at least some of the underlying data as well, so it sets up the plotting area using the underlying data frame. However, I found this thread on the ggplot2 list that suggests this behavior might change in the upcoming 0.9.0 release. (The thread is a little vague, but I read it as implying that in the next version, if the only layer you add is form stat_summary then the plot limits will be calculated based on the summaries, not the original data. I could be wrong though.)