I try to make a cumulative plot for a particular (for instance the first) column of my data (example):
1 3
2 5
4 9
8 11
12 17
14 20
16 34
20 40
Than I want to overlap this plot with another cumulative plot of another data (for example the second column) and save it as a png or jpg image.
Without using the vectors implementation "by hand" as in Cumulative Plot with Given X-Axis because if I have a very large dataset i can't be able to do that.
I try the follow simple commands:
A <- read.table("cumul.dat", header=TRUE)
Read the file, but now I want that the cumulative plot is down with a particular column of this file.
The command is:
cdat1<-cumsum(dat1)
but this is for a particular vector dat1 that I need to take from the data array (cumul.dat).
Thanks
I couldn't follow your question so this is a shot in the dark answer based on key words I did get:
m <- read.table(text=" 1 3
2 5
4 9
8 11
12 17
14 20
16 34
20 40")
library(ggplot2)
m2 <- stack(m)
qplot(rep(1:nrow(m), 2), values, colour=ind, data=m2, geom="step")
EDIT I decided I like this approach bettwe:
library(ggplot2)
library(reshape2)
m$x <- seq_len(nrow(m))
m2 <- melt(m, id='x')
qplot(x, value, colour=variable, data=m2, geom="step")
I wasn't quite sure when the events were happening and what the observations were. I'm assuming the events are just at 1,2,3,4 and the columns represent sounds of the different groups. If that's the case, using Lattice I would do
require(lattice)
A<-data.frame(dat1=c(1,2,4,8,12,14,16,20), dat2=c(3,5,9,11,17,20,34,40))
dd<-do.call(make.groups, lapply(A, function(x) {data.frame(x=seq_along(x), y=cumsum(x))}))
xyplot(y~x,dd, groups=which, type="s", auto.key=T)
Which produces
With base graphics, this can be done by specifying type='s' in the plot call:
matplot(apply(A, 2, cumsum), type='s', xlab='x', ylab='y', las=1)
Note I've used matplot here, but you could also plot the series one at a time, the first with plot and the second with points or lines.
We could also add a legend with, for example:
legend('topleft', c('Series 1', 'Series 2'), bty='n', lty=c(1, 3), col=1:2)
Related
I am looking to scale the x axis on my barplot to time, so as to accurately represent when measurements were taken.
I have these data frames:
> Botcv
Date Average SE
1 2014-09-01 4.0 1.711307
2 2014-10-02 5.5 1.500000
> Botc1
Date Average SE
1 2014-10-15 2.125 0.7180703
2 2014-11-12 1.000 0.4629100
3 2014-12-11 0.500 0.2672612
> Botc2
Date Average SE
1 2014-10-15 3.375 1.3354708
2 2014-11-12 1.750 0.4531635
3 2014-12-11 0.625 0.1829813
I use this code to produce a grouped barplot:
covaverage <- c(Botcv$Average,NA,NA,NA)
c1average <- c(NA,NA, Botc1$Average)
c2average <- c(NA,NA, Botc2$Average)
date <- c(Botcv$Date, Botc1$Date)
averagematrix <- matrix(c(covaverage,c1average, c2average), nrow=3, ncol=5, byrow=TRUE)
barplot(averagematrix,date, xlab="Date", ylab="Average", axis.lty=1, space=NULL,width=3,beside=T, ylim=c(0.00,6.00))
R plots the bars equal distances apart by default and I have been trying to find a workaround for this. I have seen several other solutions that utilise ggplot2 but I am producing plots for my masters thesis and would like to keep the appearance of my barplots in line with other graphs that I have created using base R graphics. I also want to add error bars to the plot. If anyone could provide a solution then I would be very grateful!! Thanks!
Perhaps you can use this as a start. It is probably easier to use boxplots, as they can be put at a given x position by using the at argument. For base barplots this cannot be done, but you can use rectangle instead to replicate the barplot look. Error bars can be added using arrows or segments.
bar_w = 1 # width of bars
offset = c(-1,1) # offset to avoid overlapping
cols = grey.colors(2) # colors for different types
# combine into a single data frame
d = data.frame(rbind(Botc1, Botc2), 'type' = c(1,1,1,2,2,2))
# set up empty plot with sensible x and y lims
plot(as.Date(d$Date), d$Average, type='n', ylim=c(0,4))
# draw data of data frame 1 and 2
for (i in unique(d$type)){
dd = d[d$type==i, ]
x = as.Date(dd$Date)
y = dd$Average
# rectangles
rect(xleft=x-bar_w+offset[i], ybottom=0, xright=x+bar_w+offset[i], ytop=y, col=cols[i])
# errors bars
arrows(x0=x+offset[i], y0=y-0.5*dd$SE, x1=x+offset[i], y1=y+0.5*dd$SE, col=1, angle=90, code=3, length = 0.1)
}
If what you want to get is simply the theme that will match the base theme the + theme_bw() in ggplot2 will achieve this:
data(mtcars)
require(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
theme_bw()
Result
Alternative
boxplot(mpg~cyl,data=mtcars)
If, as you said, the only thing you want to achieve is similar look, and you have working plot in the ggplot2 using the theme_bw() should produce plots that are indistinguishable from what would be derived via the standard plotting mechanism. If you feel so inclined you may tweak some minutiae details like font sizes, thickness of graph borders or visualisation of outliers.
I have the following dataframe
Op.1 Op.2 Site diet Horse ICS
35 25 a 1 1 10
32 31 a 1 2 10
19 32 a 1 3 10
17 26 a 1 4 10
25 19 a 1 5 10
25 17 a 1 6 10
#... to 432 observations
I have done Bland-Altman plots using the following function:
BAplot <- function(x,y,yAxisLim=c(-50,50),xlab="Average", ylab="Difference") {
d <- ((x + y)/2)
diff <- x - y
plot(diff ~ d,ylim=yAxisLim,xlim=c(0,60),xlab=xlab,ylab=ylab)
abline(h=(mean(na.omit(diff))-c(-0.96,0,0.96)*sd(na.omit(diff))),lty=2)
}
The plot obtained is fine. Now I am trying to give colours according to data$Site (4 levels: 0,1,2,3) and shapes according to the levels of data$ICS (6 levels: 10,11,12,13,14,15)
I wrote the following code:
clr <- c("a"="red","b"="blue","c"="green","d"="yellow")[data$Site]
shape <- c("10"="0","11"="1","12"="2","13"="3","14"="4","15"="5")[data$ICS]
plot.ops<-BAplot(data$Op.1,data$Op.2,xlab="(Op1 vs Op 2)/2", ylab="Op1-mean of aOp1+Op2",col=clr,pch=shape)
But it gives the error
Error in BAplot(data$Op.1, data$Op.2, xlab = "(Op1 vs Op 2)/2", ylab = "Op1-mean of Op1+Op2", :
unused arguments (col = clr, pch = shape)
I also tried to change shape <- c(10=0,11=1,12=2...) 1,2,3 are different shapes types in pch but it still doesn't work. Same said for clr.
What I ultimately wish to have is the plot with different colours for "site" and different shapes for "ICS".
This is meant to be something very simple but I think there might be a basic conceptual error, nevertheless I am stuck.
I also would add diet (2 levels) by using filled or emptied shapes... but can not get to that stage until I get this sorted first!
Many thanks,
M
I tried to replicate your code, and the problem is that shape is all made of NA.
This is due to the fact that data$ICS is numeric, not string.
You can use this to solve the issue (note that I removed the quotes from the number, otherwise the number themselves will be used as shapes, which is quite ugly:
shapes <- c("10"=0,"11"=1,"12"=2,"13"=3,"14"=4,"15"=5)[as.character(data$ICS)]
or, much simpler
shapes <- (1:5)[data$ICS-10]
This is what made it for me in the end:
a<-ifelse(data$ICS==10,"a",ifelse(data$ICS==11,"b",ifelse(data$ICS==12,"c",ifelse(data$ICS==13,"d",ifelse(data$ICS==14,"e","f"))))) #ICS as characters
cls<-c(2,"orange",7,3,6,4) [factor(a)] #10-11-12-13-14-15: red,orange,yellow,green,purple,blue
b<-data$Site
shapes<-c(0,1,2,8)[factor(b)] #Square is RDC liv, Circle is RDC V, Triangle is RVC V, Star is RVC CCJ
BAplot <- function(x,y,yAxisLim=c(-50,50),xlab="Average", ylab="Difference",col=cls,pch=shapes) {
d <- ((x + y)/2)
diff <- x - y
plot(diff ~ d,ylim=yAxisLim,xlim=c(0,60),xlab=xlab,ylab=ylab,col=cls,pch=shapes)
abline(h=(mean(na.omit(diff))-c(-0.96,0,0.96)*sd(na.omit(diff))),lty=2)
}
plot.ops<-BAplot(data$Op.1,data$Op.2,xlab="(Op1 vs Op 2)/2", ylab="Op1-mean of Op1+Op2",col=cls,pch=shapes)
title(main="Bland-Altman plots of Op1 vs Op2")
legend (34,53,legend=c("RDC Liver","RDC V","RVC V","RVC CCJ"), pch=c(0,1,2,8), pt.cex=2, y.intersp=0.8) #legend for shape
legend (49,53,legend=c("10th ICS","11th ICS","12th ICS","13th ICS","14th ICS","15th ICS"), pch=22, pt.cex=2, pt.bg=c(2,"orange",7,3,6,4), y.intersp=0.6) #legend for the colours
Not sure why but it would not work had I written
shapes<-c(0,1,2,8)[factor(data$Site)]
it only worked if I created
b<-data$Site
shapes<-c(0,1,2,8)[factor(b)]
Anyway, sorted now!
Many thanks,
Marco
I want to draw a number of similar plots with a loop.
What I do is:
plot(0, 0, type="l", col="white", xlim=range(1,N), ylim=range(0.5, 2.5)) # provide axes, frame, ...
for(col in colors)
{
X <- generate_X() # vector of random numbers
lines(1:N, X, type="l", col=col)
}
The problem is that random numbers sometimes go out of the range(0.5,2.5) and I want to lengthen ylim range. Atm I'm going to do it with min and max before plotting. But there must be much, much cleaner way which I poorly cant find anywhere.
I think I'm missing something basic about plotting, but I couldnt find the solution.
Thanks
I think there are two quick answers to the OP's question:
calculate the plot range before initializing the plot (implied by OP), or
use a "cleaner" plotting wrapper function.
Setup: First we need to define the variables and functions the OP implies and then generate some data to work with.
# Initialize our N number of X points and
# colors vector.
N <- 20
colors <- c("yellow", "red", "blue", "green")
# Create function 'generate_X' to perform
# as implied by the OP.
generate_X <- function(.N){
rnorm(n=.N, mean=0, sd=1)
}
# Generate the entire data frame
# using the 'matrix' function to shape
# the data quickly.
data <- data.frame(
id=1:N,
matrix(
generate_X(N*length(colors)),
ncol=length(colors)
)
)
The above code simply initializes the variables, function, and data needed for the OP's example.
Method 1: Calculate the plot range and initialize the plot. This is pretty easy using the 'range' function. In the data frame we created, there is an "id" column for our x values, so we use the range of 'data$id' for our x. Then, we find the range of all the data across every column EXCEPT the first column (data[,-1]) to find the overall y range. We initialize with the color white, since our background is also white. Otherwise, we would have a point in the lower-left and upper-right corners. I added x and y labels just for looks.
plot(
range(data$id),
range(data[,-1]),
col="white",
xlab="x",
ylab="y")
Next we just loop through and plot the lines.
for(i in 1:length(colors)){
lines(data$id, data[, i + 1], type="l", col=colors[i])
}
This is essentially the same thing the OP demonstrated, but it's adapted slightly to accept a data frame as input. It's far easier to reference columns using an integer counter (i in this case) rather than the list of colors.
Method 2: There are a lot of plot wrapper packages out there, and one of the most popular is the 'ggplot2' package, and for good reason. You can avoid a lot of the looping hassle with plots by feeding shaped data into a 'ggplot' function. The code here is much "cleaner" from a reading perspective.
# Load packages for shaping data and plotting.
library(reshape2)
library(ggplot2)
First, we need the 'reshape2' package, because we want to use "melted" data in our plot. This just makes the 'ggplot' code WAY cleaner. Then, we load up the 'ggplot2' package for the plotting.
For our plot, we initialize a plot without any instructions, so we can specify them in the geometry layer. If we were creating multiple layers from the same data, we would specify the options in the base plot layer, but for this, we are only creating a single geometry layer with lines. The + allows us to add plot layers.
Next, we choose a geometry layer ('geom_line' in this case) and specify the data as melt(data, id.vars="id"). This shapes our data for the 'ggplot' function to use with minimal code. We use the "id" column as the ID variable, since that contains our x values. The shaped data now looks more like this:
# id variable value
# 1 1 X1 -0.280035386
# 2 2 X1 -0.371020958
# 3 3 X1 -0.239889784
# 4 4 X1 0.450357442
# 5 5 X1 -0.801697283
# 6 6 X1 -0.453057841
# 7 7 X1 -0.451321958
# 8 8 X1 0.948124835
# 9 9 X1 2.724205279
# 10 10 X1 -0.725622824
# 11 11 X1 0.475545293
# 12 12 X1 0.533060822
# 13 13 X1 -1.928335572
# 14 14 X1 -0.466790259
# 15 15 X1 -1.606005895
# 16 16 X1 0.005678344
# 17 17 X1 -1.719827853
# 18 18 X1 0.601011314
# 19 19 X1 -2.056315661
# 20 20 X1 1.006169713
# 21 1 X2 -1.591227194
# ...
# 80 20 X4 -1.045224561
You don't need to get too hung up on the shaping. Just understand that "melted" data works better with the 'ggplot' functions. We specify our melted data as the data for our geometry layer, and then we use the 'aes' function to tell the geometry layer how to deal with our data. Our x values are in the "id" column, and our y values are in the "value" column. The next part is what removes the loops: we specify the color to be differentiated based on the "variable" column. In our melted data, the "variable" column contains the name of the column that the data originally came from, and using it to specify the color will tell 'ggplot' to automatically change the color for each new "variable" value.
ggplot() +
geom_line(
data=melt(data, id.vars="id"),
aes(
x=id,
y=value,
col=variable
),
lwd=1,
alpha=0.7)
I specified the line width ("lwd") and alpha values just to make the graph a little more readable.
I want to plot 2 graphs in 1 frame. Basically I want to compare the results.
Anyways, the code I tried is:
plot(male,pch=16,col="red")
lines(male,pch=16,col="red")
par(new=TRUE)
plot(female,pch=16,col="green")
lines(female,pch=16,col="green")
When I run it, I DO get 2 plots in a frame BUT it changes my y-axis. Added my plot below. Anyways, y-axis values are -4,-4,-3,-3,...
It's like both of the plots display their own axis.
Please help.
Thanks
You don't need the second plot. Just use
> plot(male,pch=16,col="red")
> lines(male, pch=16, col = "red")
> lines(female, pch=16, col = "green")
> points(female, pch=16, col = "green")
Note: that will set the frame boundaries based on the first data set, so some data from the second plot could be outside the boundaries of the plot. You can fix it by e.g. setting the limits of the first plot yourself.
For this kind of plot I usually like the plotting with ggplot2 much better. The main reason: It generalizes nicely to more than two lines without a lot of code.
The drawback for your sample data is that it is not available as a data.frame, which is required for ggplot2. Furthermore, in every case you need a x-variable to plot against. Thus, first let us create a data.frame out of your data.
dat <- data.frame(index=rep(1:10, 2), vals=c(male, female), group=rep(c('male', 'female'), each=10))
Which leaves us with
> dat
index vals group
1 1 -0.4334269341 male
2 2 0.8829902521 male
3 3 -0.6052638138 male
4 4 0.2270191965 male
5 5 3.5123679143 male
6 6 0.0615821014 male
7 7 3.6280155376 male
8 8 2.3508890457 male
9 9 2.9824432680 male
10 10 1.1938052833 male
11 1 1.3151289227 female
12 2 1.9956491556 female
13 3 0.8229389822 female
14 4 1.2062726250 female
15 5 0.6633392820 female
16 6 1.1331669670 female
17 7 -0.9002109636 female
18 8 3.2137052284 female
19 9 0.3113656610 female
20 10 1.4664434215 female
Note that my command assumes you have 10 data values each. That command would have to be adjusted according to your actual data.
Now we may use the mighty power of ggplot2:
library(ggplot2)
ggplot(dat, aes(x=index, y=vals, color=group)) + geom_point() + geom_line()
The call above has three elements: ggplot initializes the plot, tells R to use dat as datasource and defines the plot aesthetics, or better: Which aesthetic properties of the plot (such as color, position, size, etc.) are influenced by your data. We use the x and y-values as expected and furthermore set the color aesthetic to the grouping variable - that makes ggplot automatically plot two groups with different colors. Finally, we add two geometries, that pretty much do what is written above: Draw lines and draw points.
The result:
If you have your data saved in the standard way in R (in a data.frame), you end with one line of code. And if after some thousands years of evolution you want to add another gender, it is still one line of code.
I have a dataframe that looks like this:
person n start end
1 sam 6 0 6
2 greg 5 6 11
3 teacher 4 11 15
4 sam 4 15 19
5 greg 5 19 24
6 sally 5 24 29
7 greg 4 29 33
8 sam 3 33 36
9 sally 5 36 41
10 researcher 6 41 47
11 greg 6 47 53
Where start and end are times or durations (sam spoke from 0 to 6; greg from 6 to 11 etc.). n is how long (in this case # of words) the person spoke. I want to plot this as a time line in base R (I eventually may ask a similar question using ggplot2 but this answer is specific to base R [when I say base I mean the packages that come with a standard install]).
The y axis will be by person and the x axis will be time. Hopefully the final product looks something like this for the data above:
I would like to use base R to make this. I'm not sure how to approach this. My thoughts are to use a dot plot and plot a dotplot but leave out the dots. Then go over this with square end segments. I'm not sure about how this will work since the segments need numeric x and y points to make the segments and the y axis is categorical. Another thought is to convert the factors to numeric (assign each factor a number) and plot as a blank scatterplot and then go over with square end line segments. This could be a powerful tool in my field looking at speech patterns.
I thank you in advance for your help.
PS the argument for square ended line segments is segments(... , lend=2) to save time looking this information up for those not familiar with all the segment arguments.
You say you want a base R solution, but you don't say why. Since this is one line of code in ggplot, I show this anyway.
library(ggplot2)
ggplot(dat, aes(colour=person)) +
geom_segment(aes(x=start, xend=end, y=person, yend=person), size=3) +
xlab("Duration")
Pretty similar to #John's approach, but since I did it, I will post it :)
Here's a generic function to plot a gantt (no dependencies):
plotGantt <- function(data, res.col='resources',
start.col='start', end.col='end', res.colors=rainbow(30))
{
#slightly enlarge Y axis margin to make space for labels
op <- par('mar')
par(mar = op + c(0,1.2,0,0))
minval <- min(data[,start.col],na.rm=T)
maxval <- max(data[,end.col],na.rm=T)
res.colors <- rev(res.colors)
resources <- sort(unique(data[,res.col]),decreasing=T)
plot(c(minval,maxval),
c(0.5,length(resources)+0.5),
type='n', xlab='Duration',ylab=NA,yaxt='n' )
axis(side=2,at=1:length(resources),labels=resources,las=1)
for(i in 1:length(resources))
{
yTop <- i+0.1
yBottom <- i-0.1
subset <- data[data[,res.col] == resources[i],]
for(r in 1:nrow(subset))
{
color <- res.colors[((i-1)%%length(res.colors))+1]
start <- subset[r,start.col]
end <- subset[r,end.col]
rect(start,yBottom,end,yTop,col=color)
}
}
par(mar=op) # reset the plotting margins
}
Usage example:
data <- read.table(text=
'"person","n","start","end"
"sam",6,0,6
"greg",5,6,11
"teacher",4,11,15
"sam",4,15,19
"greg",5,19,24
"sally",5,24,29
"greg",4,29,33
"sam",3,33,36
"sally",5,36,41
"researcher",6,41,47
"greg",6,47,53',sep=',',header=T)
plotGantt(data, res.col='person',start.col='start',end.col='end',
res.colors=c('green','blue','brown','red','yellow'))
Result:
While the y-axis is categorical all you need to do is assign numbers to the categories (1:5) and track them. Using the default as.numeric() of the factor will usually number them alphabetically but you should check anyway. Make your plot with the xaxt = 'n' argument. Then use the axis() command to put in a y-axis.
axis(2, 1:5, myLabels)
Keep in mind that whenever you're plotting the only way to place things is with a number. Categorical x or y values are always just the numbers 1:nCategories with category name labels in place of the numbers on the axis.
Something like the following gets you close enough (assuming your data.frame object is called datf)...
datf$pNum <- as.numeric(datf$person)
plot(datf$pNum, xlim = c(0, 53), type = 'n', yaxt = 'n', xlab ='Duration (words)', ylab = 'person', main = 'Speech Duration')
axis(2, 1:5, sort(unique(datf$person)), las = 2, cex.axis = 0.75)
with(datf, segments(start, pNum, end, pNum, lwd = 3, lend=2))