Gantt style time line plot (in base R) - r

I have a dataframe that looks like this:
person n start end
1 sam 6 0 6
2 greg 5 6 11
3 teacher 4 11 15
4 sam 4 15 19
5 greg 5 19 24
6 sally 5 24 29
7 greg 4 29 33
8 sam 3 33 36
9 sally 5 36 41
10 researcher 6 41 47
11 greg 6 47 53
Where start and end are times or durations (sam spoke from 0 to 6; greg from 6 to 11 etc.). n is how long (in this case # of words) the person spoke. I want to plot this as a time line in base R (I eventually may ask a similar question using ggplot2 but this answer is specific to base R [when I say base I mean the packages that come with a standard install]).
The y axis will be by person and the x axis will be time. Hopefully the final product looks something like this for the data above:
I would like to use base R to make this. I'm not sure how to approach this. My thoughts are to use a dot plot and plot a dotplot but leave out the dots. Then go over this with square end segments. I'm not sure about how this will work since the segments need numeric x and y points to make the segments and the y axis is categorical. Another thought is to convert the factors to numeric (assign each factor a number) and plot as a blank scatterplot and then go over with square end line segments. This could be a powerful tool in my field looking at speech patterns.
I thank you in advance for your help.
PS the argument for square ended line segments is segments(... , lend=2) to save time looking this information up for those not familiar with all the segment arguments.

You say you want a base R solution, but you don't say why. Since this is one line of code in ggplot, I show this anyway.
library(ggplot2)
ggplot(dat, aes(colour=person)) +
geom_segment(aes(x=start, xend=end, y=person, yend=person), size=3) +
xlab("Duration")

Pretty similar to #John's approach, but since I did it, I will post it :)
Here's a generic function to plot a gantt (no dependencies):
plotGantt <- function(data, res.col='resources',
start.col='start', end.col='end', res.colors=rainbow(30))
{
#slightly enlarge Y axis margin to make space for labels
op <- par('mar')
par(mar = op + c(0,1.2,0,0))
minval <- min(data[,start.col],na.rm=T)
maxval <- max(data[,end.col],na.rm=T)
res.colors <- rev(res.colors)
resources <- sort(unique(data[,res.col]),decreasing=T)
plot(c(minval,maxval),
c(0.5,length(resources)+0.5),
type='n', xlab='Duration',ylab=NA,yaxt='n' )
axis(side=2,at=1:length(resources),labels=resources,las=1)
for(i in 1:length(resources))
{
yTop <- i+0.1
yBottom <- i-0.1
subset <- data[data[,res.col] == resources[i],]
for(r in 1:nrow(subset))
{
color <- res.colors[((i-1)%%length(res.colors))+1]
start <- subset[r,start.col]
end <- subset[r,end.col]
rect(start,yBottom,end,yTop,col=color)
}
}
par(mar=op) # reset the plotting margins
}
Usage example:
data <- read.table(text=
'"person","n","start","end"
"sam",6,0,6
"greg",5,6,11
"teacher",4,11,15
"sam",4,15,19
"greg",5,19,24
"sally",5,24,29
"greg",4,29,33
"sam",3,33,36
"sally",5,36,41
"researcher",6,41,47
"greg",6,47,53',sep=',',header=T)
plotGantt(data, res.col='person',start.col='start',end.col='end',
res.colors=c('green','blue','brown','red','yellow'))
Result:

While the y-axis is categorical all you need to do is assign numbers to the categories (1:5) and track them. Using the default as.numeric() of the factor will usually number them alphabetically but you should check anyway. Make your plot with the xaxt = 'n' argument. Then use the axis() command to put in a y-axis.
axis(2, 1:5, myLabels)
Keep in mind that whenever you're plotting the only way to place things is with a number. Categorical x or y values are always just the numbers 1:nCategories with category name labels in place of the numbers on the axis.
Something like the following gets you close enough (assuming your data.frame object is called datf)...
datf$pNum <- as.numeric(datf$person)
plot(datf$pNum, xlim = c(0, 53), type = 'n', yaxt = 'n', xlab ='Duration (words)', ylab = 'person', main = 'Speech Duration')
axis(2, 1:5, sort(unique(datf$person)), las = 2, cex.axis = 0.75)
with(datf, segments(start, pNum, end, pNum, lwd = 3, lend=2))

Related

How do I get rid of the straight line from the first and last point?

How do I get rid of the straight line from the first and last points? Why aren't the lines colored? Is there a better way to make a line plot, I get a similar problem with ggplot2.
dat<-Data_for_Analysis_1_
dat$Enrichment<-factor(dat$Enrichment)
plot(TF~Minute, data=dat, col=Enrichment, pch=20,xlab="Minute",
ylab="No. of Tongue Flicks", cex.lab=1.5, cex.axis=1.5,
cex.main=1.5, cex.sub=1.6)
lines(TF~Minute, data=dat, col=Enrichment)
Assuming your data looks like the toy data I create below, the following should do it:
# Create some toy data
m <- 10
dat <- data.frame(Minute = rep(0:60, m))
dat$Enrichment <- rep(LETTERS[1:m], each = 61)
dat$TF <- c(replicate(m, cumsum(rnorm(61, mean = 0.3))))
head(dat)
# Minute Enrichment TF
#1 0 A 0.3203584
#2 1 A 0.9571599
#3 2 A 1.5361236
#4 3 A 1.7571507
# ...
#60 59 A 19.25409068
#61 60 A 20.68664549
#62 0 B -0.05674056
#63 1 B 0.64608473
## And so on...
# Your code:
dat$Enrichment <- factor(dat$Enrichment)
plot(TF ~ Minute, data=dat, col=Enrichment, pch=20,xlab="Minute",
ylab="No. of Tongue Flicks", cex.lab=1.5, cex.axis=1.5,
cex.main=1.5, cex.sub = 1.6)
# Draw line for each group (could also be done with a for-loop)
.tmp <- lapply(split(dat, dat$Enrichment),
function(d) lines(TF ~ Minute, data=d, col=Enrichment))
I suspect your problem is that the time and TF vectors you pass to lines is interpreted as two long unbroken vectors of values. As such, it "resets" after each new group as the your code basically ignores your the grouping variable.
Thanks guys,
You didn't answer my question directly but you informed me about the group function.
I went to excel and made another column labeled "Instance" where each individual observation was labelled as 1, 2 ,3 etc and then grouped the data by that.
ggplot(dat, aes(x=Minute, y=TF, group=Instance , col=Enrichment)) + geom_line() + geom_point() + labs(y="No. of Tongue Flicks")
Just got to give it a more appropriate Key.
Thanks again, the replies are really fast and extensive.
enter image description here

How can I have different color for each bar of stack barplots? in R

My question maybe very simple but I couldn't find the answer!
I have a matrix with 12 entries and I made a stack barplot with barplot function in R.
With this code:
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
barplot(mydata, xlim=c(0,25),horiz=T,
legend.text = c("A","B","C","D","E","F"),
col=c("blue","green"),axisnames = T, main="Stack barplot")
Here is the image from the code:
What I want to do is to give each of the group (A:F , only the blue part) a different color but I couldn't add more than two color.
and I also would like to know how can I start the plot from x=2 instead of 0.
I know it's possible to choose the range of x by using xlim=c(2,25) but when I choose that part of my bars are out of range and I get picture like this:
What I want is to ignore the part of bars that are smaller than 2 and start the x-axis from two and show the rest of bars instead of put them out of range.
Thank you in advance,
As already mentioned in the other post is entirely clear your desired output. Here another option using ggplot2. I think the difficulty here is to reshape2 the data, then the plot step is straightforwardly.
library(reshape2)
library(ggplot2)
## Set a seed to make your data reproducible
set.seed(1)
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
## tranfsorm you matrix to names data.frame
myData <- setNames(as.data.frame(mydata),LETTERS[1:6])
## put the data in the long format
dd <- melt(t(myData))
## transform the fill variable to the desired behavior.
## I used cumsum to bes sure to have a unique value for all VAR2==2.
## maybe you should chyange this step if you want an alternate behvior
## ( see other solution)
dd <- transform(dd,Var2 =ifelse(Var2==1,cumsum(Var2)+2,Var2))
## a simple bar plot
ggplot(dd) +
## use stat identity since you want to set the y aes
geom_bar(aes(x=Var1,fill=factor(Var2),y=value),stat='identity') +
## horizontal rotation and zooming
coord_flip(ylim = c(2, max(dd$value)*2)) +
theme_bw()
Another option using lattice package
I like the formula notation in lattice and its flexibility for flipping coordinates for example:
library(lattice)
barchart(Var1~value,groups=Var2,data=dd,stack=TRUE,
auto.key = list(space = "right"),
prepanel = function(x,y, ...) {
list(xlim = c(2, 2*max(x, na.rm = TRUE)))
})
You do this by using the "add" and "offset" arguments to barplot(), along with setting axes and axisnames FALSE to avoid double-plotting: (I'm throwing in my color-blind color palette, as I'm red-green color-blind)
# Conservative 8-color palette adapted for color blindness, with first color = "black".
# Wong, Bang. "Points of view: Color blindness." nature methods 8.6 (2011): 441-441.
colorBlind.8 <- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",
yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
cols <- colorBlind.8[1:ncol(mydata)]
bar2col <- colorBlind.8[8]
barplot(mydata[1,], xlim=c(0,25), horiz=T, col=cols, axisnames=T,
legend.text=c("A","B","C","D","E","F"), main="Stack barplot")
barplot(mydata[2,], offset=mydata[1,], add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
For the second part of your question, the "offset" argument is used for the first set of bars also, and you change xlim and use xaxp to adjust the x-axis numbering, and of course you must also adjust the height of the first row of bars to remove the excess offset:
offset <- 2
h <- mydata[1,] - offset
h[h < 0] <- 0
barplot(h, offset=offset, xlim=c(offset,25), xaxp=c(offset,24,11), horiz=T,
legend.text=c("A","B","C","D","E","F"),
col=cols, axisnames=T, main="Stack barplot")
barplot(mydata[2,], offset=offset+h, add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
I'm not entirely sure if this is what you're looking for: 'A' has two values (x1 and x2), but your legend seems to hint otherwise.
Here is a way to approach what you want with ggplot. First we set up the data.frame (required for ggplot):
set.seed(1)
df <- data.frame(
name = letters[1:6],
x1=sample(1:6, replace=T),
x2=sample(1:6, replace=T))
name x1 x2
1 a 5 3
2 b 3 5
3 c 5 6
4 d 3 2
5 e 5 4
6 f 6 1
Next, ggplot requires it to be in a long format:
# Make it into ggplot format
require(dplyr); require(reshape2)
df <- df %>%
melt(id.vars="name")
name variable value
1 a x1 5
2 b x1 3
3 c x1 5
4 d x1 3
5 e x1 5
6 f x1 6
...
Now, as you want some bars to be a different colour, we need to give them an alternate name so that we can assign their colour manually.
df <- df %>%
mutate(variable=ifelse(
name %in% c("b", "d", "f") & variable == "x1",
"highlight_x1",
as.character(variable)))
name variable value
1 a x1 2
2 b highlight_x1 3
3 c x1 4
4 d highlight_x1 6
5 e x1 2
6 f highlight_x1 6
7 a x2 6
8 b x2 4
...
Next, we build the plot. This uses the standard colours:
require(ggplot2)
p <- ggplot(data=df, aes(y=value, x=name, fill=factor(variable))) +
geom_bar(stat="identity", colour="black") +
theme_bw() +
coord_flip(ylim=c(1,10)) # Zooms in on y = c(2,12)
Note that I use coord_flip (which in turn calls coord_cartesian) with the ylim=c(1,10) parameter to 'zoom in' on the data. It doesn't remove the data, it just ignores it (unlike setting the limits in the scale). Now, if you manually specify the colours:
p + scale_fill_manual(values = c(
"x1"="coral3",
"x2"="chartreuse3",
"highlight_x1"="cornflowerblue"))
I would like to simplify the proposed solution by #tedtoal, which was the finest one for me.
I wanted to create a barplot with different colors for each bar, without the need to use ggplot or lettuce.
color_range<- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
barplot(c(1,6,2,6,1), col= color_range[1:length(c(1,6,2,6,1))])

Cumulative plot in R

I try to make a cumulative plot for a particular (for instance the first) column of my data (example):
1 3
2 5
4 9
8 11
12 17
14 20
16 34
20 40
Than I want to overlap this plot with another cumulative plot of another data (for example the second column) and save it as a png or jpg image.
Without using the vectors implementation "by hand" as in Cumulative Plot with Given X-Axis because if I have a very large dataset i can't be able to do that.
I try the follow simple commands:
A <- read.table("cumul.dat", header=TRUE)
Read the file, but now I want that the cumulative plot is down with a particular column of this file.
The command is:
cdat1<-cumsum(dat1)
but this is for a particular vector dat1 that I need to take from the data array (cumul.dat).
Thanks
I couldn't follow your question so this is a shot in the dark answer based on key words I did get:
m <- read.table(text=" 1 3
2 5
4 9
8 11
12 17
14 20
16 34
20 40")
library(ggplot2)
m2 <- stack(m)
qplot(rep(1:nrow(m), 2), values, colour=ind, data=m2, geom="step")
EDIT I decided I like this approach bettwe:
library(ggplot2)
library(reshape2)
m$x <- seq_len(nrow(m))
m2 <- melt(m, id='x')
qplot(x, value, colour=variable, data=m2, geom="step")
I wasn't quite sure when the events were happening and what the observations were. I'm assuming the events are just at 1,2,3,4 and the columns represent sounds of the different groups. If that's the case, using Lattice I would do
require(lattice)
A<-data.frame(dat1=c(1,2,4,8,12,14,16,20), dat2=c(3,5,9,11,17,20,34,40))
dd<-do.call(make.groups, lapply(A, function(x) {data.frame(x=seq_along(x), y=cumsum(x))}))
xyplot(y~x,dd, groups=which, type="s", auto.key=T)
Which produces
With base graphics, this can be done by specifying type='s' in the plot call:
matplot(apply(A, 2, cumsum), type='s', xlab='x', ylab='y', las=1)
Note I've used matplot here, but you could also plot the series one at a time, the first with plot and the second with points or lines.
We could also add a legend with, for example:
legend('topleft', c('Series 1', 'Series 2'), bty='n', lty=c(1, 3), col=1:2)

R plot function - axes for a line chart

assume the following frequency table in R, which comes out of a survey:
1 2 3 4 5 8
m 5 16 3 16 5 0
f 12 25 3 10 3 1
NA 1 0 0 0 0 0
The rows stand for the gender of the survey respondent (male/female/no answer). The colums represent the answers to a question on a 5 point scale (let's say: 1= agree fully, 2 = agree somewhat, 3 = neither agree nor disagree, 4= disagree somewhat, 5 = disagree fully, 8 = no answer).
The data is stored in a dataframe called "slm", the gender variable is called "sex", the other variable is called "tv_serien".
My problem is, that I don't find a (in my opinion) proper way to create a line chart, where the x-axis represents the 5-point scale (plus the don't know answers) and the y-axis represents the frequencies for every point on the scale. Furthemore I want to create two lines (one for males, one for females).
My solution so far is the following:
I create a plot without plotting the "content" and the x-axis:
plot(slm$tv_serien, xlim = c(1,6), ylim = c(0,100), type = "n", xaxt = "n")
The problem here is that it feels like cheating to specify the xlim=c(1,6), because the raw scores of slm$tv_serienare 100 values. I tried also to to plot the variable via plot(factor(slm$tv_serien)...), but then it would still create a metric scale from 1 to 8 (because the dont know answer is 8).
So my first question is how to tell R that it should take the six distinct values (1 to 5 and 8) and take that as the x-axis?
I create the new x axis with proper labels:
axis(1, 1:6, labels = c("1", "2", "3", "4", "5", "DK"))
At least that works pretty well. ;-)
Next I create the line for the males:
lines(1:5, table(slm$tv_serien[slm$sex == 1]), col = "blue")
The problem here is that there is no DK (=8) answer, so I manually have to specify x = 1:5 instead of 1:6 in the "normal" case. My question here is, how to tell R to also draw the line for nonexisting values? For example, what would have happened, if no male had answered with 3, but I want a continuous line?
At last I create the line for females, which works well:
lines(1:6, table(slm$tv_serien[slm$sex == 2], col = "red")
To summarize:
How can I tell R to take the 6 distinct values of slm$tv_serien as the x axis?
How can i draw continuous lines even if the line contains "0"?
Thanks for your help!
PS: Attached you find the current plot for the abovementiond functions.
PPS: I tried to make a list from "1." to "4." but it seems that every new list element started again with "1.". Sorry.
Edit: Response to OP's comment.
This directly creates a line chart of OP's data. Below this is the original answer using ggplot, which produces a far superior output.
Given the frequency table you provided,
df <- data.frame(t(freqTable)) # transpose (more suitable for plotting)
df <- cbind(Response=rownames(df),df) # add row names as first column
plot(as.numeric(df$Response),df$f,type="b",col="red",
xaxt="n", ylab="Count",xlab="Response")
lines(as.numeric(df$Response),df$m,type="b",col="blue")
axis(1,at=c(1,2,3,4,5,6),labels=c("Str.Agr.","Sl.Agr","Neither","Sl.Disagr","Str.Disagr","NA"))
Produces this, which seems like what you were looking for.
Original Answer:
Not quite what you asked for, but converting your frequency table to a data frame, df
df <- data.frame(freqTable)
df <- cbind(Gender=rownames(df),df) # append rownames (Gender)
df <- df[-3,] # drop unknown gender
df
# Gender X1 X2 X3 X4 X5 X8
# m m 5 16 3 16 5 0
# f f 12 25 3 10 3 1
df <- df[-3,] # remove unknown gender column
library(ggplot2)
library(reshape2)
gg=melt(df)
labels <- c("Agree\nFully","Somewhat\nAgree","Neither Agree\nnor Disagree","Somewhat\nDisagree","Disagree\nFully", "No Answer")
ggp <- ggplot(gg,aes(x=variable,y=value))
ggp <- ggp + geom_bar(aes(fill=Gender), position="dodge", stat="identity")
ggp <- ggp + scale_x_discrete(labels=labels)
ggp <- ggp + theme(axis.text.x = element_text(angle=90, vjust=0.5))
ggp <- ggp + labs(x="", y="Frequency")
ggp
Produces this:
Or, this, which is much better:
ggp + facet_grid(Gender~.)

Plot with colours and shapes and filling to match multiple variables in R

I have the following dataframe
Op.1 Op.2 Site diet Horse ICS
35 25 a 1 1 10
32 31 a 1 2 10
19 32 a 1 3 10
17 26 a 1 4 10
25 19 a 1 5 10
25 17 a 1 6 10
#... to 432 observations
I have done Bland-Altman plots using the following function:
BAplot <- function(x,y,yAxisLim=c(-50,50),xlab="Average", ylab="Difference") {
d <- ((x + y)/2)
diff <- x - y
plot(diff ~ d,ylim=yAxisLim,xlim=c(0,60),xlab=xlab,ylab=ylab)
abline(h=(mean(na.omit(diff))-c(-0.96,0,0.96)*sd(na.omit(diff))),lty=2)
}
The plot obtained is fine. Now I am trying to give colours according to data$Site (4 levels: 0,1,2,3) and shapes according to the levels of data$ICS (6 levels: 10,11,12,13,14,15)
I wrote the following code:
clr <- c("a"="red","b"="blue","c"="green","d"="yellow")[data$Site]
shape <- c("10"="0","11"="1","12"="2","13"="3","14"="4","15"="5")[data$ICS]
plot.ops<-BAplot(data$Op.1,data$Op.2,xlab="(Op1 vs Op 2)/2", ylab="Op1-mean of aOp1+Op2",col=clr,pch=shape)
But it gives the error
Error in BAplot(data$Op.1, data$Op.2, xlab = "(Op1 vs Op 2)/2", ylab = "Op1-mean of Op1+Op2", :
unused arguments (col = clr, pch = shape)
I also tried to change shape <- c(10=0,11=1,12=2...) 1,2,3 are different shapes types in pch but it still doesn't work. Same said for clr.
What I ultimately wish to have is the plot with different colours for "site" and different shapes for "ICS".
This is meant to be something very simple but I think there might be a basic conceptual error, nevertheless I am stuck.
I also would add diet (2 levels) by using filled or emptied shapes... but can not get to that stage until I get this sorted first!
Many thanks,
M
I tried to replicate your code, and the problem is that shape is all made of NA.
This is due to the fact that data$ICS is numeric, not string.
You can use this to solve the issue (note that I removed the quotes from the number, otherwise the number themselves will be used as shapes, which is quite ugly:
shapes <- c("10"=0,"11"=1,"12"=2,"13"=3,"14"=4,"15"=5)[as.character(data$ICS)]
or, much simpler
shapes <- (1:5)[data$ICS-10]
This is what made it for me in the end:
a<-ifelse(data$ICS==10,"a",ifelse(data$ICS==11,"b",ifelse(data$ICS==12,"c",ifelse(data$ICS==13,"d",ifelse(data$ICS==14,"e","f"))))) #ICS as characters
cls<-c(2,"orange",7,3,6,4) [factor(a)] #10-11-12-13-14-15: red,orange,yellow,green,purple,blue
b<-data$Site
shapes<-c(0,1,2,8)[factor(b)] #Square is RDC liv, Circle is RDC V, Triangle is RVC V, Star is RVC CCJ
BAplot <- function(x,y,yAxisLim=c(-50,50),xlab="Average", ylab="Difference",col=cls,pch=shapes) {
d <- ((x + y)/2)
diff <- x - y
plot(diff ~ d,ylim=yAxisLim,xlim=c(0,60),xlab=xlab,ylab=ylab,col=cls,pch=shapes)
abline(h=(mean(na.omit(diff))-c(-0.96,0,0.96)*sd(na.omit(diff))),lty=2)
}
plot.ops<-BAplot(data$Op.1,data$Op.2,xlab="(Op1 vs Op 2)/2", ylab="Op1-mean of Op1+Op2",col=cls,pch=shapes)
title(main="Bland-Altman plots of Op1 vs Op2")
legend (34,53,legend=c("RDC Liver","RDC V","RVC V","RVC CCJ"), pch=c(0,1,2,8), pt.cex=2, y.intersp=0.8) #legend for shape
legend (49,53,legend=c("10th ICS","11th ICS","12th ICS","13th ICS","14th ICS","15th ICS"), pch=22, pt.cex=2, pt.bg=c(2,"orange",7,3,6,4), y.intersp=0.6) #legend for the colours
Not sure why but it would not work had I written
shapes<-c(0,1,2,8)[factor(data$Site)]
it only worked if I created
b<-data$Site
shapes<-c(0,1,2,8)[factor(b)]
Anyway, sorted now!
Many thanks,
Marco

Resources