ggplot2 graphic with several x variable? - r

I need help for a R graphic issue with ggplot2.
Lets take an example :
date <- c("oct", "dec")
min.national <- c(17, 20)
min.international <- c(11, 12)
min.roaming <- c(5, 7)
mb.national <- c(115, 150)
mb.international <- c(72, 75)
mb.roaming <- c(30, 40)
df <- data.frame(min.national, min.international, min.roaming, mb.national, mb.international, mb.roaming)
What I want is to have two graphic one for the minutes and one for the megabytes sideline. And to get bars for the three variable (for the minutes in national, international and roaming for example) on the same graphic with fill = date ?
Is it clear for you ?
Thanks

I appreciate there may be a language challenge here, and it sounds like you're just getting started with ggplot2 so not sure how to get started on this, so I hope you find this useful.
It makes sense to treat the minutes and mb separately; they're different units. So I'll just use the minutes as an example. What I understand you're trying to achieve is easy with the right approach and the tidyr library.
library(tidyr)
library(ggplot2)
#first get your data in a data frame
min.df <- data.frame(national = min.national, international = min.international, roaming = min.roaming, month = date)
#now use the tidyr function to create a long data frame, you should recognize that this gives you a data structure readily suited to what you want to plot
min.df.long <- gather(min.df, "region", "minutes", 1:3)
ggplot(min.df.long) + geom_bar(aes(x = region, y = minutes, fill = month), stat = "identity")
If you want the months side by side, as I understand your question, then you could do:
ggplot(min.df.long) + geom_bar(aes(x = region, y = minutes, fill = factor(month, levels = c("oct", "dec"))), position = "dodge", stat = "identity") + labs(fill = "month")
The key parameter is the position keyword, the rest is just to make it neater.

df <- data.frame(date, min.national, min.international, min.roaming, mb.national, mb.international, mb.roaming)
df.stk <- tidyr::separate(melt(df), col="variable", into=c("min_byte", "type"), sep="\\.")
plt <- ggplot(df.stk, aes(type, value, fill = date)) +
geom_bar(stat = "identity") +
facet_grid(.~min_byte)
print(plt)

Related

Shading different regions of the graph based on time period

I am creating a graph using ggplot2 that takes dates on the x-axis (i.e 1000 years ago) and probabilities on the y-axis. I would like to distinguish different time periods by shading regions of the graph different colors. I stored the following dates here:
paleo.dates <- c(c(13500,8000), c(13500,10050) ,c(10050,9015),
c(9015,8000), c(8000,2500), c(8000,5500), c(5500,3500), c(3500,2500),
c(2500,1150), c(2500,2000), c(2000,1500), c(1500,1150), c(1150,500))
I would like to take a time period, say 13500 to 8000, and color code it until it overlaps with another date, such as the third entry.
I am using the ggplot2 cheatsheat, and I attempted to use aes(fill = paleo.dates), but this does not work as it is not the same length as my dataset. I was also thinking of using + geom_rect() to manually fill the areas, but that does not seem very elegant, and I am not sure it will even work.
Any advice is appreciated, thank you.
You just need to create a subset of period. In this case I created a sub vector to transform into a factor to facilitate the fill.
library(dplyr)
library(ggplot2)
df <- data.frame(paleo.dates = seq(500, 13000, 100),
p = runif(n = length(seq(500, 13000, 100)),
0, 1))
sub <- data.frame(sub = rep(1:(13000/500), each = 5))
sub <- sub %>%
dplyr::slice(1:nrow(df))
df <- df %>%
dplyr::mutate(period = sub$sub,
period = as.factor(period))
ggplot2::ggplot(df) +
geom_bar(aes(x = paleo.dates, y = p,
fill = period,
col = period),
show.legend = F, stat = "identity") +
theme_bw()

How to generate grouped bar plot or pie chart from list of csv files?

I got list of data.frame that need to be classified, I did manipulate these list and finally export them as csv files in default folder. However, to make these exported data more informative, I think it is better to generate grouped bar plot, or pie chart for each data.frame objects. As a beginner, I am still learning features of ggplot2 packages, so I have little idea how to do this easily. Can any one give me possible ideas how to generate grouped bar plot easily ? How can I generate well informative bar plot for list of files ? How can I make this happen ? Any idea ? Thanks in advance :)
reproducible data :
savedDF <- list(
bar.saved = data.frame(start=sample(100, 15), stop=sample(150, 15), score=sample(36, 15)),
cat.saved = data.frame(start=sample(100, 20), stop=sample(100,20), score=sample(45,20)),
foo.saved = data.frame(start=sample(125, 24), stop=sample(140, 24), score=sample(32, 24))
)
dropedDF <- list(
bar.droped = data.frame(start=sample(60, 12), stop=sample(90,12), score=sample(35,12)),
cat.droped = data.frame(start=sample(75, 18), stop=sample(84,18), score=sample(28,18)),
foo.droped = data.frame(start=sample(54, 14), stop=sample(72,14), score=sample(25,14))
)
so I am getting list of csv files from this pipeline :
comb <- do.call("rbind", c(savedDF, dropedDF))
cn <- c("letter", "saved","seq")
DF <- cbind(read.table(text = chartr("_", ".", rownames(comb)), sep = ".", col.names = cn), comb)
DF <- transform(DF, updown = ifelse(score>= 12, "stringent", "weak"))
by(DF, DF[c("letter", "saved", "updown")],
function(x) write.csv(x[-(1:3)],
sprintf("%s_%s_%s.csv", x$letter[1], x$updown[1], x$saved[1])))
To better understand the exported data, I think generating grouped bar plot and pie chart for each data.frame object will be much informative.
In desired plot, I intend to see number of features in each csv files for each data.frame objects. Can any one give me ideas to do this task ?
How can I make this happen easily by using ggplot2 packages ? Is there any way to get this done more efficiently ? Thanks a lot
If I understand correctly, this may work for you as a rough solution. Please comment to let me know if this is acceptable. In the future, if you can provide a rough sketch along with your data to show what you're trying to achieve that would be a good idea.
library(dplyr)
library(ggplot2)
plot_data <- DF %>%
group_by(letter, saved, updown) %>%
tally %>%
group_by(saved, updown) %>%
mutate(percentage = n/sum(n))
ggplot(plot_data, aes(x = saved, y = n, fill = saved)) +
geom_bar(stat = "identity") +
facet_wrap(~ letter + updown, ncol = 2)
You can always change the facet_wrap(~ letter + updown, ncol = 2) to an explicit facet_grid(letter ~ updown) if you wish.
Or you could view it this way:
ggplot(plot_data, aes(x = letter, y = n)) +
geom_bar(stat = "identity") +
facet_wrap(~updown+saved, ncol = 2)
For a pie (cleaning up and labeling is up to you):
ggplot(plot_data, aes(x = 1, y = percentage, fill = letter)) +
geom_bar(stat = "identity", width =1) +
facet_wrap(~updown+saved, ncol = 2) +
coord_polar(theta = "y") +
theme_void()
The bar, 4 interaction pie just requires some manipulating of your data:
library(dplyr)
library(tidyr)
library(ggplot2)
plot_data <- DF %>%
unite(interaction, saved, updown, sep = "-") %>%
group_by(letter, interaction) %>%
tally %>%
mutate(percentage = n/sum(n)) %>%
filter(letter == "bar")
ggplot(plot_data, aes(x = 1, y = percentage, fill = interaction)) +
geom_bar(stat = "identity", width =1) +
coord_polar(theta = "y") +
theme_void()
You should really look into dplyr, tidyr and ggplot2 packages. Read their documentation and vignettes and work through the exmaples. Best way to learn is by doing.

How to plot the mean of a single factor in a barplot with

I'm having trouble to create a figure with ggplot2.
In this plot, I'm using geom_bar to plot three factors. I mean, for each "time" and "dose" I'm plotting two bars (two genotypes).
To be more specific, this is what I mean:
This is my code till now (Actually I changed some settings, but I'm presenting just what is need for):
ggplot(data=data, aes(x=interaction(dose,time), y=b, fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")
Question: I intend to add the mean of each time using points and that these points are just in the middle of the bars of a certain time. How can I proceed?
I tried to add these points using geom_dotplot and geom_point but I did not succeed.
library(dplyr)
time_data = data %>% group_by(time) %>% summarize(mean(b))
data <- inner_join(data,time_data,by = "time")
this gives you data with the means attached. Now make the plot
ggplot(data=data, aes(x=interaction(dose,time), y=b,fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")+
geom_text(aes(b),vjust = 0)
You might need to fiddle around with the argument hjust and vjust in the geom_text statement. Maybe the aes one too, I didn't run the program so I don't know.
It generally helps if you can give a reproducible example. Here, I made some of my own data.
sampleData <-
data.frame(
dose = 1:3
, time = rep(1:3, each = 3)
, genotype = rep(c("AA","aa"), each = 9)
, b = rnorm(18, 20, 5)
)
You need to calculate the means somewhere, and I chose to do that on the fly. Note that, instead of using points, I used a line to show that the mean is for all of those values. I also sorted somewhat differently, and used facet_wrap to cluster things together. Points would be a fair bit harder to place, particularly when using position_dodge, but you could likely modify this code to accomplish that.
ggplot(
sampleData
, aes(x = dose
, y = b
, fill = genotype)
) +
geom_bar(position = "dodge", stat = "identity") +
geom_hline(data =
sampleData %>%
group_by(time) %>%
summarise(meanB = mean(b)
, dose = NA, genotype = NA)
, aes(yintercept = meanB)
, col = "black"
) +
facet_wrap(~time)

Vertical profile in r plot()

I am building a vertical profile plot of water columns. My issue is that the dots are connected on the x observations, and not the y observations. Under ggplot, I know geom_path can do this, but I can't use ggplot as I want to add several x axis. Therefore I am using plot().
So here is what I tried:
Storfjorden <- read.delim("C:/Users/carvi/Desktop/Storfjorden.txt")
smooF=smooth.spline(Storfjorden$Fluorescence,Storfjorden$Depth,spar=0.50)
plot(Storfjorden$Fluorescence,Storfjorden$Depth,ylim=c(80,0),type="n")
lines(smooF)
Resulting plot
As you see, the dots are connected through x observations. But to observe a vertical profile, I would like to see them connected through y observations. I tried ordering them by depth (using order()) and it didn't affect the result. Anyone has a clue?
If, as an alternative, someone would have an idea how to plot different lines with different axis on a single plot (Temperature, salinity, fluorescence), then I may use geom_path (). Thank you!
**An emerging question I have that you may answer, is there a way in ggplot to make a geom_smooth(), but with the observations connected in order they appear instead of x axis?
ggplot(melteddf,aes(y=Depth,x=value))+geom_path()+facet_wrap
+(~variable,nrow=1,scales="free‌​_x")+scale_y_reverse‌​()
+geom_smooth(span=‌​0.5,se=FALSE)
I tried using smooth.spline, but didn't recognize the object in geom_path. Thanks!
There is a reason that ggplot2 makes it difficult to plot multiple x-axes on a single plot -- it generally leads to difficult to read (or worse, misleading) graphs. If you have a motivating example for why your example will not fall into one of those categories, it might allow us to help you more to know more details. Below, however, are two workarounds that might help.
Here is a quick MWE to address the question -- it might be more helpful if you gave us something that looks like your actual data, but this at least gets things on very different scales (though, with no structure, the plots are rather messy).
Note that I am using dplyr for several manipulations and reshape2 to melt the data into a long format for easier plotting.
library(dplyr)
library(reshape2)
df <-
data.frame(
depth = runif(20, 0, 100) %>% round %>% sort
, measureA = rnorm(20, 10, 3)
, measureB = rnorm(20, 50, 10)
, measureC = rnorm(20, 1000, 30)
)
meltedDF <-
df %>%
melt(id.vars = "depth")
The first option is to simply use facets to plot the data next to each other:
meltedDF %>%
ggplot(aes(y = depth
, x = value)) +
geom_path() +
facet_wrap(~variable
, nrow = 1
, scales = "free_x") +
scale_y_reverse()
The second is to standardize the data, then plot that. Here, I am using the z-score, though if you have a reason to use something else (e.g. scaled to center at the "appropriate" amount of whatever variable you are using) you could change that formula:
meltedDF %>%
group_by(variable) %>%
mutate(Standardized = (value - mean(value)) / sd(value) ) %>%
ggplot(aes(y = depth
, x = Standardized
, col = variable)) +
geom_path() +
scale_y_reverse()
If you need to plot multiple sites, here is some sample data with sites:
df <-
data.frame(
depth = runif(60, 0, 100) %>% round %>% sort
, measureA = rnorm(60, 10, 3)
, measureB = rnorm(60, 50, 10)
, measureC = rnorm(60, 1000, 30)
, site = sample(LETTERS[1:3], 60, TRUE)
)
meltedDF <-
df %>%
melt(id.vars = c("site", "depth"))
You can either use facet_grid (my preference):
meltedDF %>%
ggplot(aes(y = depth
, x = value)) +
geom_path() +
facet_grid(site~variable
, scales = "free_x") +
scale_y_reverse()
or add facet_wrap to the standardized plot:
meltedDF %>%
ggplot(aes(y = depth
, x = value)) +
geom_path() +
facet_grid(site~variable
, scales = "free_x") +
scale_y_reverse()

ggplot2: plotting non-contiguous time durations as a bar chart

I'm using ggplot to plot various events as a function of the date (x-axis) and start time (y-axis) on which they began. The data/code are as follows:
date<-c("2013-06-05","2013-06-05","2013-06-04","2013-06-04","2013-06-04","2013-06-04","2013-06-04",
"2013-06-04","2013-06-04","2013-06-03","2013-06-03","2013-06-03","2013-06-03","2013-06-03",
"2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02")
start <-c("07:36:00","01:30:00","22:19:00","22:12:00","20:16:00","19:19:00","09:00:00",
"06:45:00","01:03:00","22:15:00","19:05:00","08:59:00","08:01:00","07:08:00",
"23:24:00","20:39:00","18:53:00","16:57:00","15:07:00","14:33:00","13:24:00")
duration <-c(0.5,6.1,2.18,0.12,1.93,0.95,10.32,
2.25,5.7,2.78,3.17,9.03,0.95,0.88,
7.73,2.75,1.77,1.92,1.83,0.57,1.13)
event <-c("AF201","SS431","BE201","CD331","HG511","CD331","WQ115",
"CD331","SS431","WQ115","HG511","WQ115","CD331","AF201",
"SS431","WQ115","HG511","WQ115","CD331","AS335","CD331")
df<-data.frame(date,start,duration,event)
library(ggplot2)
library(scales)
p <- ggplot(df, aes(as.Date(date),as.POSIXct(start,format='%H:%M:%S'),color=event))
p <- p+geom_point(alpha = I(6/10),size=5)
p + ylab("time (hr)") + xlab("date") + scale_x_date(labels = date_format("%m/%d")) +
scale_y_datetime(labels = date_format("%H"))+
scale_colour_hue(h=c(360, 90))
theme(axis.text.x = element_text(hjust=1, angle=0))
The resulting plot looks like this:
Question: Instead of simply indicating the start time of the event with a single point (shown above), how can I plot a bar that spans the time duration of the event? As shown in the data frame above I have this duration data (in hours). Alternatively, I could supply a 'stop time' (not shown).
I'm imagining the solution would look something like a stacked bar chart. However, a bar chart isn't quite right as it assumes the bar starts at the bottom of the plot and that the vertically stacked events have no gaps between them. My events may be non-contiguous -- 'starting' and 'stopping' at various positions along the y-axis. The solution will also have to take into consideration that 1) some events may ultimately be concurrent (overlap in time) and 2) some events will span multiple days.
I'd be very grateful for any suggestions!
It's a bit unclear exactly what you want - #Michele's answer seemed good, I wasn't clear if you wanted to to use geom_rect because it would make for thicker lines (if so, just change the line width), or if there was another reason. I decided to give it a go using geom_rect to enable dodging. I've plotted it with the starting date on the x axis, and the start and end times on y. I've set up the data slightly differently to enable that. If you're after something different, try to make it explicit, but at least here's another option:
df<-data.frame(date,start,duration,event)
df <- transform(df,
start = as.POSIXct(paste(date, start)),
end = as.POSIXct(paste(date, start)) + duration*3600)
df <- df[c("event", "start", "end")]
df$date <- strptime(df$start, "%Y-%m-%d")
df$start.new <- format(df$start, format = "%H:%M:%S")
df$end.new <- format(df$end, format = "%H:%M:%S")
df$day <- factor(as.POSIXct(df$date))
levels(df$day) <- 1:4
df$day <- as.numeric(as.character(df$day))
df$event.int <- df$event
levels(df$event.int) <- 1:7
df$event.int <- as.numeric(as.character(df$event.int))
p <- ggplot(df, aes(day, start)) + geom_rect(aes(ymin = start, ymax = end,
xmin = (day - 0.45) + event.int/10,
xmax = (day - 0.35) + event.int/10,
fill = event)) +
scale_x_discrete(limits = 1:4,breaks = 1:4, labels = sort(unique(date)),
name = "Start date") + ylab("Duration")
Thanks (+1s) to #Michele and #alexwhan for your input. Using geom_rect I was able to get all of the events which occur on the same date on the same point on the x axis. (I'm anticipating that this data set may ultimately include many months of events.)
df<-data.frame(date,start,duration,event)
library(ggplot2)
p <- ggplot(df, aes(xmin=as.Date(date),xmax=as.Date(date)+1,
ymin=as.POSIXct(start,format='%H:%M:%S'),
ymax=as.POSIXct(start,format='%H:%M:%S')+duration*3600,
fill=event))
p <- p+geom_rect(alpha = I(8/10))
p + ylab("time") + xlab("date") + scale_x_date(labels = date_format("%m/%d")) +
scale_y_datetime(labels = date_format("%H"))+
scale_colour_hue(h=c(360, 90))
theme(axis.text.x = element_text(hjust=1, angle=0))
... resulting in this:
This is pretty close to what I was aiming for.
I think I can deal with the potential overplotting issue by adjusting the alpha.
Ideally I'd like the y axis to include just a single day (00 to 00). To do this I guess I'll probably need to reformat the data such that events with durations that extend beyond midnight are reallocated to the next day. (Not sure how to do this in R.)
try this method. Probably it's different to what you planned but I think it's a quite clear way to show your data:
df<-data.frame(date,start,duration,event)
df <- transform(df,
start = as.POSIXct(paste(date, start)),
end = as.POSIXct(paste(date, start)) + duration*3600)
df <- df[c("event", "start", "end")]
library(reshape2)
df <- melt(df, id.vars="event")
df$value <- as.POSIXct(df$value, origin=as.Date("1970-01-01"))
df <- df[order(df$event, df$value),]
df$eventID <- rep(seq(1, nrow(df)/2, 1), each=2)
library(ggplot2)
ggplot(df) +
geom_line(aes(value, event, group=eventID, color=event))
Combining the benefits of: (i) y-axis containing a single ~24 hour period; (ii) events not overlapping; (iii) events labelled within the graph in addition to the legend; and (iv) concise code.
library(dplyr)
library(lubridate)
# Re-create data frame
df <- data_frame(date, start, duration, event) %>%
mutate(start_dt = as.POSIXct(paste(date, start), tz = 'UTC'),
start_hr = hour(start_dt),
end_dt = start_dt + duration * 3600,
end_hr = hour(end_dt) + (as.Date(end_dt) - as.Date(start_dt)) * 24)
# Plot
df %>% ggplot() +
geom_segment(aes(x = event, y = start_hr, xend = event, yend = end_hr,
color = event, size = 1)) +
facet_wrap(~ date, nrow = 1) +
guides(size = 'none')
Image of plot:

Resources