How do I get faceted barplot values to show as negative - r

I'm struggling a bit to make a vertically- faceted barplot. I added a 'thus far' version of my work below. My main issue is that the negative values aren't showing as I'd expect. Shouldn't there be some line, or tick, indicating 0, with negative bars registering below it? The code below should be fully reproducible. You can see several negative values in the final data set I'm trying to plot. I'm getting a rather verbose error beginning with 'Mapping a variable to y and also using stat="bin".' I sense it's likely related to my issue, but I'm not able to find or derive a concrete solution.
Also, as secondary points, if anyone has any advice past the current snag, my goal end- result would be to color those negative bars red, and the positive ones green, to add the 'spdrNames' to the y axis, to label the bars with the actual value, and to remove the illegible values from the x axis.
require('ggplot')
require('reshape')
require('tseries')
spdrTickers = c('XLY','XLP','XLE','XLF','XLV','XLI','XLB','XLK','XLU')
spdrNames = c('Consumer Discretionary','Consumer Staples', 'Energy',
'Financials','Health Care','Industrials','Materials','Technology',
'Utilities')
latestDate =Sys.Date()
dailyPrices = lapply(spdrTickers, function(ticker) get.hist.quote(instrument= ticker, start = "2012-01-01",
end = latestDate, quote="Close", provider = "yahoo", origin="1970-01-01", compression = "d", retclass="zoo"))
perf5Day = lapply(dailyPrices, function(x){(x-lag(x,k=-5))/lag(x,k=-5)})
perf20Day = lapply(dailyPrices, function(x){(x-lag(x,k=-20))/lag(x,k=-20)})
perf60Day = lapply(dailyPrices, function(x){(x-lag(x,k=-60))/lag(x,k=-60)})
names(perf5Day) = spdrTickers
names(perf20Day) = spdrTickers
names(perf60Day) = spdrTickers
perfsMerged = lapply(spdrTickers, function(spdr){merge(perf5Day[[spdr]],perf20Day[[spdr]],perf60Day[[spdr]])})
perfNames = c('1Week','1Month','3Month')
perfsMerged = lapply(perfsMerged, function(x){
names(x)=perfNames
return(x)
})
latestDataPoints = t(sapply(perfsMerged, function(x){return(x[nrow(x)])}))
latestDataPoints = data.frame(cbind(spdrTickers,latestDataPoints))
names(latestDataPoints) = c('Ticker', '1Week','1Month','3Month')
drm = melt(latestDataPoints, id.vars=c('Ticker'))
names(drm) = c('Ticker','Period','Value')
p = ggplot(drm, aes(x=Ticker,y=Value)) + geom_bar() + coord_flip() + facet_grid(. ~ Period)
Yields this:

Somehow you have converted your Value-values to a factor:
str(drm)
'data.frame': 27 obs. of 3 variables:
$ Ticker: Factor w/ 9 levels "XLB","XLE","XLF",..: 9 6 2 3 8 4 1 5 7 9 ...
$ Period: Factor w/ 3 levels "1Week","1Month",..: 1 1 1 1 1 1 1 1 1 2 ...
$ Value : Factor w/ 27 levels "0.0164396430248944",..: 2 4 5 1 8 3 7 6 9 11 ...
Probably happens here:
latestDataPoints = data.frame(cbind(spdrTickers,latestDataPoints))
> str( latestDataPoints )
'data.frame': 9 obs. of 4 variables:
$ Ticker: Factor w/ 9 levels "XLB","XLE","XLF",..: 9 6 2 3 8 4 1 5 7
$ 1Week : Factor w/ 9 levels "0.0164396430248944",..: 2 4 5 1 8 3 7 6 9
$ 1Month: Factor w/ 9 levels "-0.00139291932675571",..: 2 3 1 5 8 4 6 7 9
$ 3Month: Factor w/ 9 levels "-0.0110357512357742",..: 3 2 1 5 9 6 7 8 4
Since just before that step you had a numeric matrix from: t(sapply(perfsMerged, function(x){return(x[nrow(x)])}))
Then doing this:
latestDataPoints[2:4] <- lapply( latestDataPoints[2:4], function(x)
as.numeric(as.character(x)) )
drm = melt(latestDataPoints, id.vars=c('Ticker'))
names(drm) = c('Ticker','Period','Value')
p = ggplot(drm, aes(x=Ticker,y=Value)) + geom_bar() + coord_flip() +
facet_grid(. ~ Period)
png();print(p);dev.off()
Produces:
The construction data.frame(cbind(...)) is a real trap. I've seen is used by supposedly authoritative sources and it is a recurrent source of puzzlement. I think R would be a safer language to use if the interpreter would simply highlight that combination in red (along with as.numeric applied to factors.) When you cbind a character vector to a numeric matrix, you get an all character matrix.

Related

Where should I do reorder on bargraph to achieve make the bar group same squence as dataframe

I have a dataframe like this:
> str(mydata6)
'data.frame': 6 obs. of 4 variables:
$ Comparison : Factor w/ 6 levels "Decreased_Adult",..: 5 2 6 3 4 1
$ differential_IR_number: num 446 305 965 599 1799 ...
$ Stage : Factor w/ 3 levels "AdultvsE11","E14vsE11",..: 2 2 3 3 1 1
$ Change : Factor w/ 2 levels "Decrease","Increase": 2 1 2 1 2 1
column 1,3,4 are factors and column 2 are numeric
I used the following code to do a bargraph:
ggplot(mydata6, aes(x=Stage, y=differential_IR_number, fill=Change)) + #don't need to use "" for x= and y, comparing to the above code
geom_bar(stat = "identity", position = "stack") + #using stack to make decrease and increase stack with each other
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + #using theme function to change the labeling to be vertical
geom_text(aes(label=differential_IR_number), position=position_stack(vjust=0.5))
The result is following:
But I want the order to be E14vsE11 E18vsE11 and AdultvsE11, I tried to reorder/sort at different positions but none works.
Why it does not following the order of mydataframe?
The order is the one of the levels of the factor. You can set the order you want as follows:
mydata6$Stage <- factor(mydata6$Stage, levels = c("E14vsE11", "E18vsE11", "AdultvsE11"))

How to change order for pyramid plots with ggplot2 to dataset order?

I have a dataset with climate suitability values (0-1) for tree species for both present and future.
I would like to visualise the data in a pyramid plot with the ggplot2 package, whereas present should be displayed on the left side of the plot and future on the right side and the tree species in the according order given in my raw dataset.
b2010<-read.csv("csi_before2010_abund_order.csv",header=T,sep = ";")
str(b2010)
'data.frame': 20 obs. of 7 variables:
$ species: Factor w/ 10 levels "Acer platanoides",..: 9 9 7 7 8 8 6 6 5 5 ...
$ time : Factor w/ 2 levels "future","present": 2 1 2 1 2 1 2 1 2 1 ...
$ grid1 : num 0.6001 0.5945 0.6366 0.0424 0.6941 ...
$ grid2 : num 0.6399 0.5129 0.6981 0.0399 0.711 ...
$ grid3 : num 0.6698 0.5212 0.6863 0.0446 0.6795 ...
$ mean : num 0.6366 0.5429 0.6737 0.0423 0.6949 ...
$ group : Factor w/ 1 level "before 2010": 1 1 1 1 1 1 1 1 1 1 ...
b2010$mean = ifelse(b2010$time == "future", b2010$mean * -1,b2010$mean)
head(b2010)
species time grid1 grid2 grid3 mean group
1 Tilia europaea present 0.60009009 0.63990200 0.66975713 0.63658307 before 2010
2 Tilia europaea future 0.59452874 0.51294094 0.52115256 -0.54287408 before 2010
3 Sorbus intermedia present 0.63659602 0.69813931 0.68629903 0.67367812 before 2010
4 Sorbus intermedia future 0.04242327 0.03990654 0.04460707 -0.04231229 before 2010
5 Tilia cordata present 0.69414478 0.71097034 0.67950863 0.69487458 before 2010
6 Tilia cordata future 0.55790818 0.53918493 0.51979470 -0.53896260 before 2010
ggplot(b2010, aes(x = factor(species), y = mean, fill = time)) +
geom_bar(stat = "identity") +
facet_share(~time, dir = "h", scales = "free", reverse_num = T) +
coord_flip()
Now, future and present are in the wrong order and also the species are ordered alphabetically, even though they are clearly "factors" and should therefore be ordered according to my dataset. I would very much appreciate your help.
Thank you and kind regards
You are misunderstanding how factors work. Bars are plotted in the order as printed by levels(b2010$species). In order to change this order, you'll have to manually reorder them, i.e.
b2010$species <- factor(b2010$species,
levels = c("Sorbus intermedia", "Tilia chordata"...))
These levels can naturally be also a function of some statistic, i.e. mean. To do that, you would do something along the lines of
myorder <- b2010[order(b2010$mean) & b2010$time == "present", "species"]
b2010$species <- factor(b2010$species, levels = myorder)

How do you add jitter to a scatterplot matrix in ggpairs?

I want to add jitter to a scatterplot matrix. The question was addressed on the following page (and nowhere else) on stackoverflow:
How to produce a meaningful draftsman/correlation plot for discrete values
But both solutions to the jitter problem which were suggested there involve deprecated code (plotmatrix and params):
library(ggplot2)
plotmatrix(y) + geom_jitter(alpha = .2)
library(GGally)
ggpairs(y, lower = list(params = c(alpha = .2, position = "jitter")))
I would have simply commented asking for an update there so as to not create a new question, but that appears to require reputation points, and I'm new to the site. My apologies if I've done something wrong in posting the question.
EDIT:
Here's what the data looks like:
> str(EHRound4.subset)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 301 obs. of 22 variables:
$ Subject# : int 1 2 3 4 6 7 8 13 14 16 ...
$ Condition : Factor w/ 2 levels "CDR","Mturk": 1 1 1 1 1 1 1 1
1 1 ...
$ Launch4 : int 5 8 8 5 8 5 3 8 5 6 ...
$ NewSong4 : int 6 8 8 6 8 6 8 8 8 7 ...
$ StudCom5 : int 6 5 8 3 1 3 4 8 7 7 ...
$ Textbook5 : int 8 1 8 3 1 7 8 8 8 8 ...
And here's several attempts at getting jitter.
> ggpairs(EHRound4.subset, columns = 3:6,
ggplot2::aes(colour=Condition), lower = list(geom_jitter(alpha = .2)))
> ggpairs(EHRound4.subset, columns = 3:6,
ggplot2::aes(colour=Condition, alpha=.2), lower = list(geom_jitter()))
> ggpairs(EHRound4.subset, columns = 3:6,
ggplot2::aes(colour=Condition, alpha=.2, position="jitter"))
#user20650 answered the question in comments below the question. For completeness, here it is in the form of an answer:
Use wrap, such as:
library(GGally)
ggpairs(y, lower = list(continuous=wrap("points", position=position_jitter(height=3, width=3))))
By using position = position_jitter() instead of just position = "jitter" (which also works) the additional jitter parameters can also be controlled.

annotate a faceted geom_point

Using ggplot2, I would like annotate my faceted geom_pont plots : I am plotting some data per plant for 2 parameters and I would like to annotate each faceted plots with the population size of each plant which make the plot. Below is a similar example to my data.
Lets subset the CO2 dataset to make the example more relevant. I count the number of plant for which the uptake is above 20 and rename the column:
require(plyr)
require(dplyr)
require(ggplot2)
CO2_mod<-subset(CO2,uptake>20)
COUNT<-ddply(.data=CO2_mod,
.variable=.(Plant,Treatment),
.fun=count)
names(COUNT)[3] <- c("PopSize")
Here is the code for faceted plots based on treatments:
p1<-ggplot(CO2_mod, aes(x=Plant, y=uptake))
p2<-p1+geom_point(aes())+
facet_grid(Treatment~., scales="free")
p2
Now I would like to annotate each faceted plot with the PopSize value per Plant and per Treatment from the COUNT df.
I have tried this code without success:
y<-max(CO2_mod$uptake)+1
COUNT<-mutate(COUNT,y=paste0(y))
p2<-p1+geom_point(aes())+
facet_grid(Treatment~., scales="free")+
geom_text(data=COUNT, aes(x=Plant, y=y, label=PopSize),
colour="black")
p2
The error warning says : Error: Discrete value supplied to continuous scale
What would be the right way to do this?
thanks!
Inspecting COUNT shows that y is a character vector:
str(COUNT)
# 'data.frame': 10 obs. of 4 variables:
# $ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 2 3 4 5 6 7 8 9 12
# $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 2 2 2 1 1 1 2
# $ PopSize : int 6 6 6 6 6 6 5 6 5 2
# $ y : chr "46.5" "46.5" "46.5" "46.5" ...
If we modify COUNT so that y is numeric:
COUNT<-mutate(COUNT,y=as.numeric(y))
we get this plot:

R: ggplot: plot shows vertical lines instead of time course

I am trying to get a simple plot showing the time course of worry duration over 6 days for two groups. However, I get vertical lines instead of a line showing the time course.
This is what my data looks like:
> head(alldays_dur)
ParticipantID Session Day Time Worry_duration group
1 1 2 1 71804 15 intervention
2 1 4 1 56095 5 intervention
3 2 2 1 36739 15 intervention
4 2 4 1 45013 10 intervention
5 2 5 1 51026 5 intervention
This is the structure of my data
> str(alldays_dur)
'data.frame': 2620 obs. of 10 variables:
$ ParticipantID : num 113 113 113 113 113 113 113 113 113 113 ...
$ Session : num 9 10 11 12 14 15 16 21 22 24 ...
$ Day : Factor w/ 6 levels "1","2","3","4",..: 2 2 2 2 2 2 2 3 3
$ Time : num 37350 42862 47952 51555 61499 ...
$ Worry_duration: num 5 5 5 5 10 0 5 5 5 5 ...
$ group : Factor w/ 2 levels "Intervention group",..: 1 1 1 1 1 1
I have tried the following code:
p <- ggplot(alldays_dur, aes(x=Day, y=Worry_duration, group=1)) +
geom_line() +
labs(x = "Day",
y = "Mean worry duration in minutes per day")
print(p)
However, I get the following plot: plot
I have included the group=1 in the code after reading some earlier posts on this topic. However, it didn't help me as I had hoped.
Do you maybe have some useful tips for me? Thank you in advance.
Ps. I am sorry if the post is unclear in any way, this is my first time ever posting on stackoverflow, so I am not quite familiar with all the 'post-options' yet.
You need to summarize your data first, with ddply for example:
require(plyr) # ddply
require(ggplot2) # ggplot
# Creating dataset
raw_data = data.frame(Day = sample(c(1:6),100, replace = T),
group = sample(c("group_1", "group_2"),100, replace = T),
Worry_duration = sample(seq(0,30,5), 100, replace = T))
# Summarize
DF = ddply(raw_data, c("Day", "group"), summarize,
Worry_duration.mean = mean(Worry_duration, na.rm = T))
# Plot
ggplot(DF, aes(x = Day, y = Worry_duration.mean, group = group, color = group)) +
geom_line()+ xlab("Day") + ylab("Mean worry duration in minutes per day")

Resources