vertical line chart - change line plotting direction to top-down in R - r

I am looking for a way where data points are connected following a top-down manner to visualize a ranking. In that the y-axis represents the rank and the x-axis the attributes. With the normal setting the line connects the point starting from left to right. This results that the points are connected in the wrong order.
With the data below the line should be connected from (6,1) to (4,2) and then (5,3) etc. Optimally the ranking scale need to be inverted so that rank one starts on the top.
data <- read.table(header=TRUE, text='
attribute rank
1 6
2 5
3 4
4 2
5 3
6 1
7 7
8 11
9 10
10 8
11 9
')
plot(data$attribute,data$rank,type="l")
Is there a way to change the line drawing direction? My second idea would be to rotate the graph or maybe you have better ideas.
The graph I am trying to achieve is somewhat similar to this one:
example vertical line chart

You can do this with ggplot:
library(ggplot2)
ggplot(data, aes(y = attribute, x = rank)) +
geom_line() +
coord_flip() +
scale_x_reverse()
It solves the problem exactly the way you suggested. The first part of the command (ggplot(...) + geom_line()) creates an "ordinary" line plot. Note that I have already switched x- and y-coordinates. The next command (coord_flip()) flips x- and y-axis, and the last one (scale_x_reverse) changes the ordering of the x-axis (which is plotted as the y-axis) such that 1 is in the top left corner.
Just to show you that something like the example you linked in your question can be done with ggplot2, I add the following example:
library(tidyr)
data$attribute2 <- sample(data$attribute)
data$attribute3 <- sample(data$attribute)
plot_data <- pivot_longer(data, cols = -"rank")
ggplot(plot_data, aes(y = value, x = rank, colour = name)) +
geom_line() +
geom_point() +
coord_flip() +
scale_x_reverse()
If you intend to do your plots with R, learning ggplot2 is really worthwhile. You can find many examples on Cookbook for R.

Related

Log scale on bar plot brake axis values [duplicate]

This question already has answers here:
Bar plot with log scales
(2 answers)
Closed 2 years ago.
I'm making a the following bar plot with ggplot:
df %>% ggplot( aes(x= group,y= cases,fill=color ) ) +
geom_bar(stat="identity") +
theme_minimal()
Which gives the following result:
The issue is that the smaller colors are not visible, hence I tried to use a log scale:
df %>% ggplot( aes(x= group,y= cases,fill=color ) ) +
geom_bar(stat="identity") +
scale_y_log10(labels = comma) +
theme_minimal()
But this completelly broke the scales, now I´m getting a 10 MM value from nowhere and bar sizes are wrong
The data I´m ussing for this is the following:
index,group,color,cases
1,4,4,9
2,4,3,61
3,1,1,5000
4,4,2,138
5,4,1,246
6,3,1,359
7,2,1,2000
8,3,2,57
9,1,2,153
10,2,2,130
11,2,3,15
12,1,3,23
13,3,3,11
14,2,4,1
TL;DR: You cannot and should not use a log scale with a stacked barplot. If you want to use a log scale, use a "dodged" barplot instead. You'll also have better luck to use geom_col instead of geom_bar here and set your fill= variable as a factor.
Geom_col vs. geom_bar
Try using geom_col in place of geom_bar. You can use coord_flip() if the direction is not to your liking. See here for reference, but the gist of the issue is that geom_bar should be used when you want to plot against "count", and geom_col should be used when you want to plot against "values". Here, your y-axis is "cases" (a value), so use geom_col.
The Problem with log scales and Stacked Barplots
With that being said, u/Dave2e is absolutely correct. The plot you are getting makes sense, because the underlying math being done to calculate the y-axis values is: log10(x) + log10(y) + log10(z) instead of what you expected, which was log10(x + y + z).
Let's use the numbers in your actual data frame for comparison here. In "group 1", you have the following:
index group color cases
3 1 1 5000
9 1 2 153
12 1 3 23
So on the y-axis what's happening is the total value of a stacked barplot (without a log scale) will be the sum of all. In other words:
> 5000 + 153 + 23
[1] 5176
This means that each of the bars represents the correct relative size, and when you add them up (or stack them up), the total size of the bar is equivalent to the total sum. Makes sense.
Now consider the same case, but for a log10 scale:
> log10(5000) + log10(153) + log10(23)
[1] 7.245389
Or, just about 17.5 million. The total height of the bar is still the sum of all individual bars (because that's what a stacked barplot is), and you can still compare the relative sizes, but the sum total of the individual logs does not equal the log of the sum:
>log10(5000 + 153 + 23)
[1] 3.713994
Suggested Way to Change your Plot
Moral of the story: you can still use a log scale to "stretch out" the small bars, but don't stack them. Use postion='dodge':
df %>% ggplot( aes(x= group,y= log10(cases),fill=as.factor(color) ) ) +
geom_col(position='dodge') +
theme_minimal()
Finally, position='dodge' (or position=position_dodge(width=...)) does not work with fill=color, since df$color is not a factor (it's numeric). This is also why your legend is showing a gradient for a categorical variable. That's why I used as.factor(color) in the ggplot call here, although you can also just apply that to the original dataset with df$color <- as.factor(df$color) and do the same thing.

Overlay two bar plots with geom_bar()

I'm trying to overlay two bar plots on top of each other, not beside.
The data is from the same dataset. I want 'Block' on the x-axis and 'Start' and 'End' as overlaying bar plots.
Block Start End
1 P1L 76.80 0.0
2 P1S 68.87 4.4
3 P2L 74.00 0.0
4 P2S 74.28 3.9
5 P3L 82.22 7.7
6 P3S 80.82 17.9
My script is
ggplot(data=NULL,aes(x=Block))+
geom_bar(data=my_data$Start,stat="identity",position ="identity",alpha=.3,fill='lightblue',color='lightblue4')+
geom_bar(data=my_data$End,stat="identity",position ="identity",alpha=.8,fill='pink',color='red')
I get Error: ggplot2 doesn't know how to deal with data of class numeric
I've also tried
ggplot(my_data,aes(x=Block,y=Start))+
geom_bar(data=my_data$End, stat="identity",position="identity",...)
Anyone know how I can make it happen? Thank you.
Edit:
How to get dodge overlaying bars?
I edit this post, because my next question is relevant as it's the opposite problem of my original post.
#P.merkle
I had to change my plot into four bars showing the mean values of all Blocks labeled L and S. The L stand for littoral, and S for Sublittoral. They were exposed for two treatments: Normal and reduced.
I've calculated the means, and their standard deviation.
I need four bars with their respective error bars:
Normal/Littoral , Reduced/Littoral , Normal/Sublittoral , Reduced/Sublittoral.
Problem is when I plot it, both the littoral bars and both the sublittoral bars overlay each other! So now I want them not to overlap!
How can i make it happen? I've tried all sorts of position = 'dodge' andposition = position_dodge(newdata$Force), without luck...
My newdata contain this information:
Zonation Force N mean sd se
1 Litoral Normal 6 0.000000 0.000000 0.000000
2 Litoral Redusert 6 5.873333 3.562868 1.454535
3 Sublitoral Normal 6 7.280000 2.898903 1.183472
4 Sublitoral Redusert 6 21.461667 4.153535 1.695674
My script is this:
ggplot(data=cdata,aes(x=newdata$Force,y=newdata$mean))+
geom_bar(stat="identity",position ="dodge",
alpha=.4,fill='red', color='lightblue4',width = .6)+
geom_errorbar(aes(ymin=newdata$mean-sd,ymax=newdata$mean+sd),
width=.2, position=position_dodge(.9))
The outcome is unfortunately this
As of the error bars, it's clearly four bars there, but they overlap. Please, how can I solve this?
If you don't need a legend, Solution 1 might work for you. It is simpler because it keeps your data in wide format.
If you need a legend, consider Solution 2. It requires your data to be converted from wide format to long format.
Solution 1: Without legend (keeping wide format)
You can refine your aesthetics specification on the level of individual geoms (here, geom_bar):
ggplot(data=my_data, aes(x=Block)) +
geom_bar(aes(y=Start), stat="identity", position ="identity", alpha=.3, fill='lightblue', color='lightblue4') +
geom_bar(aes(y=End), stat="identity", position="identity", alpha=.8, fill='pink', color='red')
Solution 2: Adding a legend (converting to long format)
To add a legend, first use reshape2::melt to convert your data frame from wide format into long format.
This gives you two columns,
the variable column ("Start" vs. "End"),
and the value column
Now use the variable column to define your legend:
library(reshape2)
my_data_long <- melt(my_data, id.vars = c("Block"))
ggplot(data=my_data_long, aes(x=Block, y=value, fill=variable, color=variable, alpha=variable)) +
geom_bar(stat="identity", position ="identity") +
scale_colour_manual(values=c("lightblue4", "red")) +
scale_fill_manual(values=c("lightblue", "pink")) +
scale_alpha_manual(values=c(.3, .8))

Plot multiple histograms in one using ggplot2 in R

I am fairly new to R and ggplot2 and am having some trouble plotting multiple variables in the same histogram plot.
My data is already grouped and just needs to be plotted. The data is by week and I need to plot the number for each category (A, B, C and D).
Date A B C D
01-01-2011 11 0 11 1
08-01-2011 12 0 3 3
15-01-2011 9 0 2 6
I want the Dates as the x axis and the counts plotted as different colors according to a generic y axis.
I am able to plot just one of the categories at a time, but am not able to find an example like mine.
This is what I use to plot one category. I am pretty sure I need to use position="dodge" to plot multiple as I don't want it to be stacked.
ggplot(df, aes(x=Date, y=A)) + geom_histogram(stat="identity") +
labs(title = "Number in Category A") +
ylab("Number") +
xlab("Date") +
theme(axis.text.x = element_text(angle = 90))
Also, this gives me a histogram with spaces in between the bars. Is there any way to remove this? I tried spaces=0 as you would do when plotting bar graphs, but it didn't seem to work.
I read some previous questions similar to mine, but the data was in a different format and I couldn't adapt it to fit my data.
This is some of the help I looked at:
Creating a histogram with multiple data series using multhist in R
http://www.cookbook-r.com/Graphs/Plotting_distributions_%28ggplot2%29/
I'm also not quite sure what the bin width is. I think it is how the data should be spaced or grouped, which doesn't apply to my question since it is already grouped. Please advise me if I am wrong about this.
Any help would be appreciated.
Thanks in advance!
You're not really plotting histograms, you're just plotting a bar chart that looks kind of like a histogram. I personally think this is a good case for faceting:
library(ggplot2)
library(reshape2) # for melt()
melt_df <- melt(df)
head(melt_df) # so you can see it
ggplot(melt_df, aes(Date,value,fill=Date)) +
geom_bar() +
facet_wrap(~ variable)
However, I think in general, that changes over time are much better represented by a line chart:
ggplot(melt_df,aes(Date,value,group=variable,color=variable)) + geom_line()

How do you plot two vectors on x-axis and another on y-axis in ggplot2

I am trying to plot two vectors with different values, but equal length on the same graph as follows:
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(x,y,days)
a b days
1 23.33 33.33 1
2 24.33 34.33 2
3 25.33 35.33 3
4 26.33 36.33 4
5 27.33 37.33 5
etc..
I am trying to use ggplot2 to plot x and y on the x-axis and the days on the y-axis. However, I can't figure out how to do it. I am able to plot them individually and combine the graphs, but I want just one graph with both a and b vectors (different colors) on x-axis and number of days on y-axis.
What I have so far:
X<-ggplot(df, aes(x=a,y=days)) + geom_line(color="red")
Y<-ggplot(df, aes(x=b,y=days)) + geom_line(color="blue")
Is there any way to define the x-axis for both a and b vectors? I have also tried using the melt long function, but got stuck afterwards.
Any help is much appreciated. Thank you
I think the best way to do it is via a the approach of melting the data (as you have mentioned). Especially if you are going to add more vectors. This is the code
library(reshape2)
library(ggplot2)
a<-23:52
b<-33:62
days<-1:30
df<-data.frame(x=a,y=b,days)
df_molten=melt(df,id.vars="days")
ggplot(df_molten) + geom_line(aes(x=value,y=days,color=variable))
You can also change the colors manually via scale_color_manual.
A simpler solution is to use only ggplot. The following code will work in your case
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(a,b,days)
ggplot(data = df)+
geom_line(aes(x = df$days,y = df$a), color = "blue")+
geom_line(aes(x = df$days,y = df$b), color = "red")
I added the colors, you might want to use them to differentiate between your variables.

Multiple lines on one graph: Adding a text acronym ("legend") to the very the end of each line (ggplot2)

I have generated the following graph:
http://i47.tinypic.com/s3dd0m.png
I have a fair amount of long data (that can be downloaded here: http://www.sendspace.com/file/lfd31r), and the data looks like:
head(data)
-10:130 variable value
1 -10 Utilities 0.001680609
2 -9 Utilities 0.004652453
3 -8 Utilities -0.002441692
4 -7 Utilities -0.018639594
5 -6 Utilities -0.007587632
6 -5 Utilities 0.004526066
The code I used to generate this graphic:
ggplot(data=data,
aes(x=-10:130, y=value, colour=variable)) +
geom_line()
I want something that looks like the following graphic:
i46.tinypic.com/2cmvfrq.png
with the legend gone, but the acronym of the category displayed right at the end of each line in the same colour as the line itself. This will be necessary because there's too many lines and colours for the reader to understand what's what. Once you geniuses help me figure out how to solve this, I will then make a 4 panel plot (using facet_grid), each with 10 lines.
Thank you :)
To remove the legend you can use
+ opts(legend.position = 'none')
To add text to a plot you can use
+ annotate("text",x=XPOSITION,y=YPOSITION,label="TEXTFORPLOT",size=3.5)
A quick dirty attempt at solving your problem
library(ggplot2)
## Read in the data from your link. You will have to change this.
dat <- read.csv("~/Downloads/demo.csv")
head(dat)
## Get the y values - turns out they are all 130
label_summary <- aggregate(dat[,2], list(dat$variable), max)
## A quick method to reduce the names, by grabbing the first 3 characters
label_names <- substr(label_summary[,1],1,3)
## get the x values of each variable
label_x <- label_summary[,2]
# A method to get the last y value of each variable
label_y <- sapply(1:length(label_x), function(i) dat[with(dat, dat[, 2]==label_x[i]&dat[, 3]==label_summary[i,1]),"value"])
# Make the plot without legend and text
p <- ggplot(data=dat,aes(x=-10:130, y=value, colour=variable)) + geom_line() + opts(legend.position = 'none')
p
# Use an sapply function to add the labels one by one to the. This is needed because the label option within the annotate function only allow one label.
p + sapply(1:length(label_x), function(i) annotate("text",x=label_x[i]+10,y=label_y[i],label=label_names[i],size=3.5))

Resources