I am relatively new to R. I am trying to plot a dataframe loaded from a csv file. The data consists of 6 columns like this:
xval,col1,col2,col3,col4,col5
The first column (xval) consist of a sequence of monotonically increasing positive integers (e.g. 10, 40, 60 etc), the other columns columns 1 to 5, consist of floating point numbers.
I want to create a plot in R as follows:
plot xval term on x axis
plot remaining columns (col1 ... col5) lines
create a legend legend with col2, ... col5 renamed
The data to be plotted (col1, ... col5) are 'snapshot' values so although I want to plot them as lines, I want the lines to be smoothed (i.e. interpolated).
I am looking for a snippet that help me create the plot once I have read the data into a dataframe. Any help will be appreciated.
Have a look at ggplot2
#create dummy data
n <- 200
dataset <- data.frame(xval = runif(n), col1 = rnorm(n), col2 = rnorm(n, sd = 2), col3 = rnorm(n, mean = seq(0, 2, length = n)), col4 = rnorm(n, sd = seq(0, 1, length = n)), col5 = rnorm(n, mean = 1))
#convert data to long format
library(reshape)
Molten <- melt(dataset, id.vars = "xval")
#plot it
library(ggplot2)
ggplot(Molten, aes(x = xval, y = value, colour = variable)) +
geom_smooth() + geom_point()
#some tweaking
ggplot(Molten, aes(x = xval, y = value, colour = variable)) +
geom_smooth(se = FALSE) + geom_point() + theme_bw() +
scale_x_continuous("the x label") + scale_x_continuous("the y label") +
scale_colour_discrete("")
Related
ggplot(data = dat) + geom_line(aes(x=foo,y=bar)) +geom_line(aes(x=foo_land,y=bar_land))
which creates a plot like the following:
I want to try and indicate the maximum values on this plot as well as add corresponding labels to the axis like:
The data for the maximum x and y values is stored in the dat file.
I was attempting to use
geom_hline() + geom_vline()
but I couldn't get this to work. I also believe that these lines will continue through the rest of the plot, which is not what I am trying to achieve. I should note that I would like to indicate the maximum y-value and its corresponding x value. The x-value is not indicated here since it is already labelled on the axis.
Reproducible example:
library(ggplot2)
col1 <- c(1,2,3)
col2 <- c(2,9,6)
df <- data.frame(col1,col2)
ggplot(data = df) +
geom_line(aes(x=col1,y=col2))
I would like to include a line which travels up from 2 on the x-axis and horizontally to the y-axis indicating the point 9, the maximum value of this graph.
Here's a start, although it does not make the axis text red where that maximal point is:
MaxLines <- data.frame(col1 = c(rep(df$col1[which.max(df$col2)], 2),
-Inf),
col2 = c(-Inf, rep(max(df$col2), 2)))
MaxLines creates an object that says where each of three points should be for two segments.
ggplot(data = df) +
geom_line(aes(x=col1,y=col2)) +
geom_path(data = MaxLines, aes(x = col1, y = col2),
inherit.aes = F, color = "red") +
scale_x_continuous(breaks = c(seq(1, 3, by = 0.5), df$col1[which.max(df$col2)])) +
scale_y_continuous(breaks = c(seq(2, 9, by = 2), max(df$col2)))
I need to plot lines that show median and IQR for 3 replicates, across multiple samples.
Data:
sampleid <- rep(1:20, each = 3)
replicate <- rep(1:3, 20)
sample1 <- seq(120,197, length.out = 60)
sample2 <- seq(113, 167, length.out = 60)
sample3 <- seq(90,180, length.out = 60)
What I have done so far?
df <- as.data.frame(cbind(sampleid,replicate,sample1, sample2, sample3))
library(reshape2)
long <- melt(df,id.vars = c('sampleid', 'replicate'))
ggplot(data = long, aes(x = variable, y = value, colour = factor(replicate))) + stat_summary(fun.data=median_hilow, conf.int=.5)
However, the plot of the IQR for replicates that I am getting are overlapped with each other for each sample. I would like to find out a way to "dodge" these 3 lines so that they are visible next to each other, without changing other parameters of the plot that I have achieved. Is this achievable?
You have to introduce jitter to the lines:
ggplot(data = long, aes(x = variable, y = value, colour = factor(replicate))) +
stat_summary(fun.data=median_hilow, fun.args = (conf.int=.5), position = "jitter")
Please note you also need to have your conf.int=5 wrapped in the fun.args.
Alternatively, change your x to factor(replicate) and add facet_wrap:
ggplot(data = long, aes(x = factor(replicate), y = value, colour = factor(replicate))) +
stat_summary(fun.data=median_hilow, fun.args = (conf.int=.5)) +
facet_wrap(~variable)
I have a data frame mydataAll with columns DESWC, journal, and highlight. To calculate the average and standard deviation of DESWC for each journal, I do
avg <- aggregate(DESWC ~ journal, data = mydataAll, mean)
stddev <- aggregate(DESWC ~ journal, data = mydataAll, sd)
Now I plot a horizontal stripchart with the values of DESWC along the x-axis and each journal along the y-axis. But for each journal, I want to indicate the standard deviation and average with a simple line. Here is my current code and the results.
stripchart2 <-
ggplot(data=mydataAll, aes(x=mydataAll$DESWC, y=mydataAll$journal, color=highlight)) +
geom_segment(aes(x=avg[1,2] - stddev[1,2],
y = avg[1,1],
xend=avg[1,2] + stddev[1,2],
yend = avg[1,1]), color="gray78") +
geom_segment(aes(x=avg[2,2] - stddev[2,2],
y = avg[2,1],
xend=avg[2,2] + stddev[2,2],
yend = avg[2,1]), color="gray78") +
geom_segment(aes(x=avg[3,2] - stddev[3,2],
y = avg[3,1],
xend=avg[3,2] + stddev[3,2],
yend = avg[3,1]), color="gray78") +
geom_point(size=3, aes(alpha=highlight)) +
scale_x_continuous(limit=x_axis_range) +
scale_y_discrete(limits=mydataAll$journal) +
scale_alpha_discrete(range = c(1.0, 0.5), guide='none')
show(stripchart2)
See the three horizontal geom_segments at the bottom of the image indicating the spread? I want to do that for all journals, but without handcrafting each one. I tried using the solution from this question, but when I put everything in a loop and remove the aes(), it give me an error that says:
Error in x - from[1] : non-numeric argument to binary operator
Can anyone help me condense the geom_segment() statements?
I generated some dummy data to demonstrate. First, we use aggregate like you have done, then we combine those results to create a data.frame in which we create upper and lower columns. Then, we pass these to the geom_segment specifying our new dataset. Also, I specify x as the character variable and y as the numeric variable, and then use coord_flip():
library(ggplot2)
set.seed(123)
df <- data.frame(lets = sample(letters[1:8], 100, replace = T),
vals = rnorm(100),
stringsAsFactors = F)
means <- aggregate(vals~lets, data = df, FUN = mean)
sds <- aggregate(vals~lets, data = df, FUN = sd)
df2 <- data.frame(means, sds)
df2$upper = df2$vals + df2$vals.1
df2$lower = df2$vals - df2$vals.1
ggplot(df, aes(x = lets, y = vals))+geom_point()+
geom_segment(data = df2, aes(x = lets, xend = lets, y = lower, yend = upper))+
coord_flip()+theme_bw()
Here, the lets column would resemble your character variable.
I need to add a legend of the two lines (best fit line and 45 degree line) on TOP of my two plots. Sorry I don't know how to add plots! Please please please help me, I really appreciate it!!!!
Here is an example
type=factor(rep(c("A","B","C"),5))
xvariable=seq(1,15)
yvariable=2*xvariable+rnorm(15,0,2)
newdata=data.frame(type,xvariable,yvariable)
p = ggplot(newdata,aes(x=xvariable,y=yvariable))
p+geom_point(size=3)+ facet_wrap(~ type) +
geom_abline(intercept =0, slope =1,color="red",size=1)+
stat_smooth(method="lm", se=FALSE,size=1)
Here is another approach which uses aesthetic mapping to string constants to identify different groups and create a legend.
First an alternate way to create your test data (and naming it DF instead of newdata)
DF <- data.frame(type = factor(rep(c("A", "B", "C"), 5)),
xvariable = 1:15,
yvariable = 2 * (1:15) + rnorm(15, 0, 2))
Now the ggplot code. Note that for both geom_abline and stat_smooth, the colour is set inside and aes call which means each of the two values used will be mapped to a different color and a guide (legend) will be created for that mapping.
ggplot(DF, aes(x = xvariable, y = yvariable)) +
geom_point(size = 3) +
geom_abline(aes(colour="one-to-one"), intercept =0, slope = 1, size = 1) +
stat_smooth(aes(colour="best fit"), method = "lm", se = FALSE, size = 1) +
facet_wrap(~ type) +
scale_colour_discrete("")
Try this:
# original data
type <- factor(rep(c("A", "B", "C"), 5))
x <- 1:15
y <- 2 * x + rnorm(15, 0, 2)
df <- data.frame(type, x, y)
# create a copy of original data, but set y = x
# this data will be used for the one-to-one line
df2 <- data.frame(type, x, y = x)
# bind original and 'one-to-one data' together
df3 <- rbind.data.frame(df, df2)
# create a grouping variable to separate stat_smoothers based on original and one-to-one data
df3$grp <- as.factor(rep(1:2, each = nrow(df)))
# plot
# use original data for points
# use 'double data' for abline and one-to-one line, set colours by group
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 3) +
facet_wrap(~ type) +
stat_smooth(data = df3, aes(colour = grp), method = "lm", se = FALSE, size = 1) +
scale_colour_manual(values = c("red","blue"),
labels = c("abline", "one-to-one"),
name = "") +
theme(legend.position = "top")
# If you rather want to stack the two keys in the legend you can add:
# guide = guide_legend(direction = "vertical")
#...as argument in scale_colour_manual
Please note that this solution does not extrapolate the one-to-one line outside the range of your data, which seemed to be the case for the original geom_abline.
I am making a very wide chart that, when output as a PNG file, takes up several thousand pixels in the x-axis; there is about 20 years of daily data. (This may or may not be regarded as good practise, but it is for my own use, not for publication.) Because the chart is so wide, the y-axis disappears from view as you scroll through the chart. Accordingly I want to add labels to the plot at 2-yearly intervals to show the values on the y-axis. The resulting chart looks like the one below, except that in the interests of keeping it compact I have used only 30 days of fake data and put labels roughly every 10th day:
This works more or less as required, but I wonder if there is some better way of approaching it as in this chart (see code below) I have a column for each of the 3 y-axis values of 120, 140 and 160. The real data has many more levels, so I would end up with 15 calls to geom_text to put everything on the plot area.
Q. Is there a simpler way to splat all 20-odd dates, with 15 labels per date, on to the chart at once?
require(ggplot2)
set.seed(12345)
mydf <- data.frame(mydate = seq(as.Date('2012-01-01'), as.Date('2012-01-31'), by = 'day'),
price = runif(31, min = 100, max = 200))
mytext <- data.frame(mydate = as.Date(c('2012-01-10', '2012-01-20')),
col1 = c(120, 120), col2 = c(140,140), col3 = c(160,160))
p <- ggplot(data = mydf) +
geom_line(aes(x = mydf$mydate, y = mydf$price), colour = 'red', size = 0.8) +
geom_text(data = mytext, aes(x = mydate, y = col1, label = col1), size = 4) +
geom_text(data = mytext, aes(x = mydate, y = col2, label = col2), size = 4) +
geom_text(data = mytext, aes(x = mydate, y = col3, label = col3), size = 4)
print(p)
ggplot2 likes data to be in long format, so melt()ing your text into long format lets you make a single call to geom_text():
require(reshape2)
mytext.m <- melt(mytext, id.vars = "mydate")
Then your plotting command becomes:
ggplot(data = mydf) +
geom_line(aes(x = mydf$mydate, y = mydf$price), colour = 'red', size = 0.8) +
geom_text(data = mytext.m, aes(x = mydate, y = value, label = value), size = 4)