R: ggplot: plot shows vertical lines instead of time course - r

I am trying to get a simple plot showing the time course of worry duration over 6 days for two groups. However, I get vertical lines instead of a line showing the time course.
This is what my data looks like:
> head(alldays_dur)
ParticipantID Session Day Time Worry_duration group
1 1 2 1 71804 15 intervention
2 1 4 1 56095 5 intervention
3 2 2 1 36739 15 intervention
4 2 4 1 45013 10 intervention
5 2 5 1 51026 5 intervention
This is the structure of my data
> str(alldays_dur)
'data.frame': 2620 obs. of 10 variables:
$ ParticipantID : num 113 113 113 113 113 113 113 113 113 113 ...
$ Session : num 9 10 11 12 14 15 16 21 22 24 ...
$ Day : Factor w/ 6 levels "1","2","3","4",..: 2 2 2 2 2 2 2 3 3
$ Time : num 37350 42862 47952 51555 61499 ...
$ Worry_duration: num 5 5 5 5 10 0 5 5 5 5 ...
$ group : Factor w/ 2 levels "Intervention group",..: 1 1 1 1 1 1
I have tried the following code:
p <- ggplot(alldays_dur, aes(x=Day, y=Worry_duration, group=1)) +
geom_line() +
labs(x = "Day",
y = "Mean worry duration in minutes per day")
print(p)
However, I get the following plot: plot
I have included the group=1 in the code after reading some earlier posts on this topic. However, it didn't help me as I had hoped.
Do you maybe have some useful tips for me? Thank you in advance.
Ps. I am sorry if the post is unclear in any way, this is my first time ever posting on stackoverflow, so I am not quite familiar with all the 'post-options' yet.

You need to summarize your data first, with ddply for example:
require(plyr) # ddply
require(ggplot2) # ggplot
# Creating dataset
raw_data = data.frame(Day = sample(c(1:6),100, replace = T),
group = sample(c("group_1", "group_2"),100, replace = T),
Worry_duration = sample(seq(0,30,5), 100, replace = T))
# Summarize
DF = ddply(raw_data, c("Day", "group"), summarize,
Worry_duration.mean = mean(Worry_duration, na.rm = T))
# Plot
ggplot(DF, aes(x = Day, y = Worry_duration.mean, group = group, color = group)) +
geom_line()+ xlab("Day") + ylab("Mean worry duration in minutes per day")

Related

How to change order for pyramid plots with ggplot2 to dataset order?

I have a dataset with climate suitability values (0-1) for tree species for both present and future.
I would like to visualise the data in a pyramid plot with the ggplot2 package, whereas present should be displayed on the left side of the plot and future on the right side and the tree species in the according order given in my raw dataset.
b2010<-read.csv("csi_before2010_abund_order.csv",header=T,sep = ";")
str(b2010)
'data.frame': 20 obs. of 7 variables:
$ species: Factor w/ 10 levels "Acer platanoides",..: 9 9 7 7 8 8 6 6 5 5 ...
$ time : Factor w/ 2 levels "future","present": 2 1 2 1 2 1 2 1 2 1 ...
$ grid1 : num 0.6001 0.5945 0.6366 0.0424 0.6941 ...
$ grid2 : num 0.6399 0.5129 0.6981 0.0399 0.711 ...
$ grid3 : num 0.6698 0.5212 0.6863 0.0446 0.6795 ...
$ mean : num 0.6366 0.5429 0.6737 0.0423 0.6949 ...
$ group : Factor w/ 1 level "before 2010": 1 1 1 1 1 1 1 1 1 1 ...
b2010$mean = ifelse(b2010$time == "future", b2010$mean * -1,b2010$mean)
head(b2010)
species time grid1 grid2 grid3 mean group
1 Tilia europaea present 0.60009009 0.63990200 0.66975713 0.63658307 before 2010
2 Tilia europaea future 0.59452874 0.51294094 0.52115256 -0.54287408 before 2010
3 Sorbus intermedia present 0.63659602 0.69813931 0.68629903 0.67367812 before 2010
4 Sorbus intermedia future 0.04242327 0.03990654 0.04460707 -0.04231229 before 2010
5 Tilia cordata present 0.69414478 0.71097034 0.67950863 0.69487458 before 2010
6 Tilia cordata future 0.55790818 0.53918493 0.51979470 -0.53896260 before 2010
ggplot(b2010, aes(x = factor(species), y = mean, fill = time)) +
geom_bar(stat = "identity") +
facet_share(~time, dir = "h", scales = "free", reverse_num = T) +
coord_flip()
Now, future and present are in the wrong order and also the species are ordered alphabetically, even though they are clearly "factors" and should therefore be ordered according to my dataset. I would very much appreciate your help.
Thank you and kind regards
You are misunderstanding how factors work. Bars are plotted in the order as printed by levels(b2010$species). In order to change this order, you'll have to manually reorder them, i.e.
b2010$species <- factor(b2010$species,
levels = c("Sorbus intermedia", "Tilia chordata"...))
These levels can naturally be also a function of some statistic, i.e. mean. To do that, you would do something along the lines of
myorder <- b2010[order(b2010$mean) & b2010$time == "present", "species"]
b2010$species <- factor(b2010$species, levels = myorder)

single instead multiple boxplots with ggplot

I would like to make a boxplot for a variable (Theta..vol..) depending on two factors (Tiefe) and (Ort).
> str(data)
'data.frame': 30 obs. of 6 variables:
$ Nummer : int > 1 2 3 4 5 6 7 8 9 10 ...
$ Name : int 11 12 13 14 15 16 17 18 19 20 ...
$ Ort : Factor w/ 2 levels "NNW","S": 2 2 2 2 2 2 2 2 2 2 ...
$ Tiefe : int 20 20 20 20 20 50 50 50 50 50 ...
$ Gerät : int 2 2 2 2 2 2 2 2 2 2 ...
$ Theta..vol..: num 15 16.4 14.9 16.6 10.6 22.1 17.6 10 18 20.3 ...
My code is:
ggplot(data, aes(x = Tiefe, y = Theta..vol.., fill=Ort))+geom_boxplot()
Since the variable(Tiefe) has 3 levels and the variable (Ort) has 2 levels I wish to see three paired boxplots (each pair for a single (Tiefe).
But I see just a single pair (one boxplot for one level of "Ort" and another boxplot for the second level of the "Ort"
What should I change to get three pairs for each "Tiefe"? Thank you
In your code, Tiefe is being read as an integer not a factor.
Easy fix using dplyr with ggplot2:
First I made some dummy data:
library(dplyr)
data <- tibble(
Ort = ifelse(runif(30) > 0.5, "NNW", "S"),
Tiefe = rep(c(20, 50, 75), times = 10),
Theta..vol.. = rnorm(30,15))
Next, we modify the Tiefe column before piping into the ggplot:
data %>%
mutate(Tiefe = factor(Tiefe)) %>%
ggplot(aes(x = Tiefe, y = Theta..vol.., fill = Ort)) +
geom_boxplot()

ggplot2 time series with an ordered factor on the x-axis

I'd be extremely grateful for your assistance with the following issue.
I wish to create a representative time series for different subjects who have undertaken a test at discrete intervals. The data frame is called Hayling.Impulsivity. Here is a sample of the data in wide format:
Subject Baseline 2-weeks 6-weeks 3-months
1 1 15 23 5 NA
2 2 15 27 3 4
3 3 5 7 0 19
4 4 1 5 2 6
5 5 3 7 18 27
6 6 0 2 19 2`
I then made Subject a factor:
Hayling.Impulsivity$Subject<-factor(Hayling.Impulsivity$Subject)
I then melted the data frame into long format using the reshape package:
Long.H.I.<-melt(Hayling.Impulsivity, id.vars="Subject", variable.name="Follow Up", value.name="Hayling AB Error Score")
I then ordered the measurement variables:
Long.H.I.$"Follow Up"<-factor(Long.H.I.$"Follow Up", levels=c("Baseline", "2-weeks", "6-weeks", "3-months"), ordered=TRUE)
Here's the structure of this data frame:
'data.frame': 52 obs. of 3 variables:
$ Subject : Factor w/ 13 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Follow Up : Ord.factor w/ 4 levels "Baseline"<"2-weeks"<..: 1 1 1 1 1 1 1 1 1 1 ...
$ Hayling AB Error Score: num 15 15 5 1 3 0 3 0 0 33 ...
Now I try to construct the time series in ggplot:
ggplot(Long.H.I., aes("Follow Up", "Hayling AB Error Score", group=Subject, colour=Subject))+geom_line()
But all I get is an empty plot. I'm not permitted to post an image to show you but the x and y axes are labelled only with "Follow Up" and "Hayling AB Error Score" respectively. There are no actual scales / values / categories on either axis and no points have been plotted.
Where have I gone wrong?
It looks like spaces in your column names are causing the problem even if you use aes_string. You could replace the spaces with underscores and then label the x and y axes explicitly. Code could look like:
Hayling.Impulsivity$Subject<-factor(Hayling.Impulsivity$Subject)
Long.H.I.<-melt(Hayling.Impulsivity, id.vars="Subject",
variable.name="Follow_Up", value.name="Hayling_AB_Error_Score")
Long.H.I.$Follow_Up <-factor(Long.H.I.$"Follow_Up",
levels=c("Baseline","2-weeks","6-Weeks","3-months"), ordered=TRUE)
ggplot(Long.H.I., aes(Follow_Up, Hayling_AB_Error_Score, group=Subject, colour=Subject))+
geom_line() +
labs(x="Follow Up", y="Hayling AB Error Score")

How do I get faceted barplot values to show as negative

I'm struggling a bit to make a vertically- faceted barplot. I added a 'thus far' version of my work below. My main issue is that the negative values aren't showing as I'd expect. Shouldn't there be some line, or tick, indicating 0, with negative bars registering below it? The code below should be fully reproducible. You can see several negative values in the final data set I'm trying to plot. I'm getting a rather verbose error beginning with 'Mapping a variable to y and also using stat="bin".' I sense it's likely related to my issue, but I'm not able to find or derive a concrete solution.
Also, as secondary points, if anyone has any advice past the current snag, my goal end- result would be to color those negative bars red, and the positive ones green, to add the 'spdrNames' to the y axis, to label the bars with the actual value, and to remove the illegible values from the x axis.
require('ggplot')
require('reshape')
require('tseries')
spdrTickers = c('XLY','XLP','XLE','XLF','XLV','XLI','XLB','XLK','XLU')
spdrNames = c('Consumer Discretionary','Consumer Staples', 'Energy',
'Financials','Health Care','Industrials','Materials','Technology',
'Utilities')
latestDate =Sys.Date()
dailyPrices = lapply(spdrTickers, function(ticker) get.hist.quote(instrument= ticker, start = "2012-01-01",
end = latestDate, quote="Close", provider = "yahoo", origin="1970-01-01", compression = "d", retclass="zoo"))
perf5Day = lapply(dailyPrices, function(x){(x-lag(x,k=-5))/lag(x,k=-5)})
perf20Day = lapply(dailyPrices, function(x){(x-lag(x,k=-20))/lag(x,k=-20)})
perf60Day = lapply(dailyPrices, function(x){(x-lag(x,k=-60))/lag(x,k=-60)})
names(perf5Day) = spdrTickers
names(perf20Day) = spdrTickers
names(perf60Day) = spdrTickers
perfsMerged = lapply(spdrTickers, function(spdr){merge(perf5Day[[spdr]],perf20Day[[spdr]],perf60Day[[spdr]])})
perfNames = c('1Week','1Month','3Month')
perfsMerged = lapply(perfsMerged, function(x){
names(x)=perfNames
return(x)
})
latestDataPoints = t(sapply(perfsMerged, function(x){return(x[nrow(x)])}))
latestDataPoints = data.frame(cbind(spdrTickers,latestDataPoints))
names(latestDataPoints) = c('Ticker', '1Week','1Month','3Month')
drm = melt(latestDataPoints, id.vars=c('Ticker'))
names(drm) = c('Ticker','Period','Value')
p = ggplot(drm, aes(x=Ticker,y=Value)) + geom_bar() + coord_flip() + facet_grid(. ~ Period)
Yields this:
Somehow you have converted your Value-values to a factor:
str(drm)
'data.frame': 27 obs. of 3 variables:
$ Ticker: Factor w/ 9 levels "XLB","XLE","XLF",..: 9 6 2 3 8 4 1 5 7 9 ...
$ Period: Factor w/ 3 levels "1Week","1Month",..: 1 1 1 1 1 1 1 1 1 2 ...
$ Value : Factor w/ 27 levels "0.0164396430248944",..: 2 4 5 1 8 3 7 6 9 11 ...
Probably happens here:
latestDataPoints = data.frame(cbind(spdrTickers,latestDataPoints))
> str( latestDataPoints )
'data.frame': 9 obs. of 4 variables:
$ Ticker: Factor w/ 9 levels "XLB","XLE","XLF",..: 9 6 2 3 8 4 1 5 7
$ 1Week : Factor w/ 9 levels "0.0164396430248944",..: 2 4 5 1 8 3 7 6 9
$ 1Month: Factor w/ 9 levels "-0.00139291932675571",..: 2 3 1 5 8 4 6 7 9
$ 3Month: Factor w/ 9 levels "-0.0110357512357742",..: 3 2 1 5 9 6 7 8 4
Since just before that step you had a numeric matrix from: t(sapply(perfsMerged, function(x){return(x[nrow(x)])}))
Then doing this:
latestDataPoints[2:4] <- lapply( latestDataPoints[2:4], function(x)
as.numeric(as.character(x)) )
drm = melt(latestDataPoints, id.vars=c('Ticker'))
names(drm) = c('Ticker','Period','Value')
p = ggplot(drm, aes(x=Ticker,y=Value)) + geom_bar() + coord_flip() +
facet_grid(. ~ Period)
png();print(p);dev.off()
Produces:
The construction data.frame(cbind(...)) is a real trap. I've seen is used by supposedly authoritative sources and it is a recurrent source of puzzlement. I think R would be a safer language to use if the interpreter would simply highlight that combination in red (along with as.numeric applied to factors.) When you cbind a character vector to a numeric matrix, you get an all character matrix.

ggplot2: overlay control group line on graph panel set

I have a stacked areaplot made with ggplot2:
dists.med.areaplot<-qplot(starttime,value,fill=dists,facets=~groupname,
geom='area',data=MDist.median, stat='identity') +
labs(y='median distances', x='time(s)', fill='Distance Types')+
opts(title=subt) +
scale_fill_brewer(type='seq') +
facet_wrap(~groupname, ncol=2) + grect #grect adds the grey/white vertical bars
It looks like this:
I want to add a an overlay of the profile of the control graph (bottom right) to all the graphs in the output (groupname==rowH is the control).
So far my best efforts have yielded this:
cline<-geom_line(aes(x=starttime,y=value),
data=subset(dists.med,groupname=='rowH'),colour='red')
dists.med.areaplot + cline
I need the 3 red lines to be 1 red line that skims the top of the dark blue section. And I need that identical line (the rowH line) to overlay each of the panels.
The dataframe looks like this:
> str(MDist.median)
'data.frame': 2880 obs. of 6 variables:
$ groupname: Factor w/ 8 levels "rowA","rowB",..: 1 1 1 1 1 1 1 1 1 1 ...
$ fCycle : Factor w/ 6 levels "predark","Cycle 1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ fPhase : Factor w/ 2 levels "Light","Dark": 2 2 2 2 2 2 2 2 2 2 ...
$ starttime: num 0.3 60 120 180 240 300 360 420 480 540 ...
$ dists : Factor w/ 3 levels "inadist","smldist",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 110 117 115 113 114 ...
The red line should be calculated as the sum of the value at each starttime, where groupname='rowH'. I have tried creating cline the following ways. Each results in an error or incorrect output:
#sums the entire y for all points and makes horizontal line
cline<-geom_line(aes(x=starttime,y=sum(value)),data=subset(dists.med,groupname=='rowH'),colour='red')
#using related dataset with pre-summed y's
> cline<-geom_line(aes(x=starttime,y=tot_dist),data=subset(t.med,groupname=='rowH'))
> dists.med.areaplot + cline
Error in eval(expr, envir, enclos) : object 'dists' not found
Thoughts?
ETA:
It appears that the issue I was having with 'dists' not found has to do with the fact that the initial plot, dists.med.areaplot was created via qplot. To avoid this issue, I can't build on a qplot. This is the code for the working plot:
cline.data <- subset(
ddply(MDist.median, .(starttime, groupname), summarize, value = sum(value)),
groupname == "rowH")
cline<-geom_line(data=transform(cline.data,groupname=NULL), colour='red')
dists.med.areaplot<-ggplot(MDist.median, aes(starttime, value)) +
grect + nogrid +
geom_area(aes(fill=dists),stat='identity') +
facet_grid(~groupname)+ scale_fill_brewer(type='seq') +
facet_wrap(~groupname, ncol=2) +
cline
resulting in this graphset:
This Learning R blog post should be of some help:
http://learnr.wordpress.com/2009/12/03/ggplot2-overplotting-in-a-faceted-scatterplot/
It might be worth computing the summary outside of ggplot with plyr.
cline.data <- ddply(MDist.median, .(starttime, groupname), summarize, value = sum(value))
cline.data.subset <- subset(cline.data, groupname == "rowH")
Then add it to the plot with
last_plot() + geom_line(data = transform(cline.data.subset, groupname = NULL), color = "red")

Resources