Reordering columns by y-value in R? - r

I have a dataframe structured like this:
> head(df)
Zip Crimes Population CPC
1 78701 2103 6841 0.3074
2 78719 186 1764 0.1054
3 78702 1668 21334 0.0782
4 78723 2124 28330 0.0750
5 78753 3472 49301 0.0704
6 78741 2973 44935 0.0662
And I'm plotting it using this function:
p = ggplot(df, aes(x=Zip, y=CPC)) + geom_col() + theme(axis.text.x = element_text(angle = 90))
And this is the graph I get:
How can I order the plot by CPC, where the highest Zip codes are on the left?

Convert Zip to a factor ordered by negative CPC. E.g., try df$Zip <- reorder(df$Zip, -df$CPC) before plotting. Here's a small example:
d <- data.frame(
x = c('a', 'b', 'c'),
y = c(5, 15, 10)
)
library(ggplot2)
# Without reordering
ggplot(d, aes(x, y)) + geom_col()
# With reordering
d$x <- reorder(d$x, -d$y)
ggplot(d, aes(x, y)) + geom_col()

Sort your data frame in descending order and then plot it:
library(dplyr)
df <- arrange(df,desc(CPC))
ggplot...

Related

ggplot2 not including each value on x-axis

I have the following data which I am trying to plot as a stacked area chart:
week Wildtype ARE
3 3770 3740
4 3910 3920
5 3660 3640
6 3750 3790
7 3940 3930
8 3940 3940
9 3830 3810
10 3710 3720
11 3730 3720
12 357 358
Using this code for a stacked area chart
library(reshape2)
library(ggplot2)
rm(list=ls())
df <- read.csv("Mo_data/mo_qpcr_data2.csv", comment.char = "#", sep=",")
df_melt <- melt(df, id=c("week"))
p1 <- ggplot() + geom_area(aes(y = value, x = week, fill = variable), data = df_melt)
p1
I get the plot that I want but it isn't quite right.
How do I change the plot so that the x-axis displays each week in the time series rather than just 5.0, 7.5 and 10.0?
I would add this to the code
+ scale_x_continuous(breaks= unique(df$week) )
library(reshape2)
library(ggplot2)
rm(list=ls())
df <- read.csv("Mo_data/mo_qpcr_data2.csv", comment.char = "#", sep=",")
df_melt <- melt(df, id=c("week"))
p1 <- ggplot() + geom_area(aes(y = value, x = week, fill = variable), data = df_melt) + scale_x_continuous(breaks= unique(df$week) )
p1
Ggplot is treating your Week column as a number (rightly so, because it is a number), so your x axis appears as continuous. If you want to treat weeks as discrete values you can:
1) Change the week column to characters or factors
df_melt$week <- as.factor(df_melt$week)
df_melt$week <- as.factor(df_melt$week)
2) Tell the x axis where you want the breaks with scale_axis_discrete more info here

Exploded 180 degree pie chart in R ggplot or ggvis (image included)?

Given a dataset with a factor column (X1) and a subtotal column (X2)
X1 X2
1 1 12
2 2 200
3 3 23
4 4 86
5 5 141
I would like to create a graphic like this:
which gives x2 as a percentage of the X2 total, divided by X1.
Edit: clarity and adding dataset for reproducability
For example
set.seed(1234)
df <- data.frame(x = 1:6)
df$y <- runif(nrow(df))
df$type <- sample(letters, nrow(df))
ggplot(df, aes(x+-.5, y, fill=type)) +
geom_bar(stat="identity", width=1) +
coord_polar(start = pi/2) +
scale_x_continuous(limits = c(0, nrow(df)*2)) +
geom_text(aes(label=scales::percent(y))) +
ggthemes::theme_map() + theme(legend.position = c(0,.15))
gives you

2x1 faceting with ggplot2

I'm trying to make a simple facet with histograms in ggplot2
data <- read.csv("/hist_distances.csv", check.names = FALSE, sep = ",")
mdata <- melt(data)
m <- ggplot(data, aes(x=Distance))
m + geom_histogram()
head(data)
Gives:
Times Distance
1 3.093060 260.8840
2 2.557780 187.4960
3 0.263611 10.6584
4 2.880000 184.5970
5 5.035000 281.3490
6 6.952780 251.4730
head(mdata)
gives:
variable value
1 Times 3.093060
2 Times 2.557780
3 Times 0.263611
4 Times 2.880000
5 Times 5.035000
6 Times 6.952780
and
tail(mdata)
gives:
variable value
1739 Distance 1.103670
1740 Distance 1.695610
1741 Distance 3.795020
1742 Distance 6.651960
1743 Distance 0.719843
1744 Distance 6.504050
This produces this graphic:
I have tried:
m <- ggplot(mdata, aes(x=value)) +
geom_histogram() +
m + facet_wrap(~ variable)
With no success.
How can I produce a facetted graph instead, with a histogram of variable "times" at the top and a histogram of variable "distances" at the bottom?
Use facet_grid(variable ~ .), where facet_grid(row ~ column):
df <- data.frame(Time = rnorm(100),
Distance = rnorm(100)
)
dfm <- melt(df)
ggplot(dfm, aes(x=value)) + geom_histogram() + facet_grid(variable ~ .)
Edit for follow-up comment:
If your data are on different scales, use facet_grid(variable ~ ., scales = "free").
See help(facet_grid) for options.

Vary scale of geom_point size by facet

I'm using ggplot with facet_wrap to generate 3 side-by-side plots with linear models. In addition, I have another dimension (let's call it "z") I'd like to visualize by varying the size of the points on the plots.
Currently, the plots I generate keep the size of the points on the same scale across all 3 facets. I would instead like to scale the point sizes by facet - that way, one can quickly tell which point contains the highest "z" value for each facet.
Is there any way to do this without creating 3 separate plots? I've included a sample of my data and the code I used below:
x <- c(0.03,1.32,2.61,3.90,5.20,6.48,7.77,0.75,2.04,3.33,4.62,5.91,7.20,8.49,0.41,1.70,3.00,4.28,5.57,6.86,8.15)
y <- c(650,526,382,110,72,209,60,559,296,76,48,64,20,22,50,102,176,21,20,25,5)
z <- c(391174,244856,836435,46282,40351,27118,17411,26232,59162,9737,1917,20575,1484,450,12071,13689,133326,1662,711,728,412)
facet <- c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C","C")
df <- data.frame(x,y,z,facet)
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=z)) +
geom_smooth(method="lm") +
facet_wrap(~facet)
The method below reassigns z to it's z-score within it's facet:
require(dplyr)
require(ggplot)
require(magrittr)
require(scales)
x <- c(0.03,1.32,2.61,3.90,5.20,6.48,7.77,0.75,2.04,3.33,4.62,5.91,7.20,8.49,0.41,1.70,3.00,4.28,5.57,6.86,8.15)
y <- c(650,526,382,110,72,209,60,559,296,76,48,64,20,22,50,102,176,21,20,25,5)
z <- c(391174,244856,836435,46282,40351,27118,17411,26232,59162,9737,1917,20575,1484,450,12071,13689,133326,1662,711,728,412)
facet <- c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C","C")
df <- data.frame(x,y,z,facet)
df %<>%
group_by(facet) %>%
mutate(z = scale(z)) # calculate point size within group
ggplot(df, aes(x=x, y=y, group = facet)) +
geom_point(aes(size=z)) +
geom_smooth(method="lm") +
facet_wrap(~facet )
Try to rescale size for each facet to take values in (0,1]:
df %>%
group_by(facet) %>%
mutate(newz = z/max(z)) %>%
ggplot(., aes(x=x, y=y)) +
geom_point(aes(size=newz)) +
geom_smooth(method="lm") +
facet_wrap(~facet)
I would just take the mean of the df$z by each df$facet
AverageFacet <- df %>% group_by(facet) %>% summarize(meanwithinfacet= mean(z, na.rm=TRUE))
df <- merge(df, AverageFacet)
df$pointsize<- df$z - df$meanwithinfacet
Now each point size depends on the mean of the facets
> head(df,10)
facet x y z meanwithinfacet pointsize
1 A 0.03 650 391174 229089.57 162084.429
2 A 1.32 526 244856 229089.57 15766.429
3 A 2.61 382 836435 229089.57 607345.429
4 A 3.90 110 46282 229089.57 -182807.571
5 A 5.20 72 40351 229089.57 -188738.571
6 A 6.48 209 27118 229089.57 -201971.571
7 A 7.77 60 17411 229089.57 -211678.571
8 B 0.75 559 26232 17079.57 9152.429
9 B 2.04 296 59162 17079.57 42082.429
and plot
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=pointsize)) +
geom_smooth(method="lm") +
facet_wrap(~facet)
Looks like this, not sure about the legend though.
You could also instead of using the absolute difference from the mean use the how many standard deviates from the mean a given z is
AverageFacet <- df %>% group_by(facet) %>% summarize(meanwithinfacet= mean(z, na.rm=TRUE), sdwithinfacet= sd(z, na.rm=TRUE))
df <- merge(df, AverageFacet)
df$absoluteDiff<- df$z - df$meanwithinfacet
df$SDfromMean <- df$absoluteDiff / df$sdwithinfacet
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=SDfromMean)) +
geom_smooth(method="lm") +
facet_wrap(~facet)

Stacked bar chart

I would like to create a stacked chart using ggplot2 and geom_bar.
Here is my source data:
Rank F1 F2 F3
1 500 250 50
2 400 100 30
3 300 155 100
4 200 90 10
I want a stacked chart where x is the rank and y is the values in F1, F2, F3.
# Getting Source Data
sample.data <- read.csv('sample.data.csv')
# Plot Chart
c <- ggplot(sample.data, aes(x = sample.data$Rank, y = sample.data$F1))
c + geom_bar(stat = "identity")
This is as far as i can get. I'm not sure of how I can stack the rest of the field values.
Maybe my data.frame is not in a good format?
You said :
Maybe my data.frame is not in a good format?
Yes this is true. Your data is in the wide format You need to put it in the long format. Generally speaking, long format is better for variables comparison.
Using reshape2 for example , you do this using melt:
dat.m <- melt(dat,id.vars = "Rank") ## just melt(dat) should work
Then you get your barplot:
ggplot(dat.m, aes(x = Rank, y = value,fill=variable)) +
geom_bar(stat='identity')
But using lattice and barchart smart formula notation , you don't need to reshape your data , just do this:
barchart(F1+F2+F3~Rank,data=dat)
You need to transform your data to long format and shouldn't use $ inside aes:
DF <- read.table(text="Rank F1 F2 F3
1 500 250 50
2 400 100 30
3 300 155 100
4 200 90 10", header=TRUE)
library(reshape2)
DF1 <- melt(DF, id.var="Rank")
library(ggplot2)
ggplot(DF1, aes(x = Rank, y = value, fill = variable)) +
geom_bar(stat = "identity")
Building on Roland's answer, using tidyr to reshape the data from wide to long:
library(tidyr)
library(ggplot2)
df <- read.table(text="Rank F1 F2 F3
1 500 250 50
2 400 100 30
3 300 155 100
4 200 90 10", header=TRUE)
df %>%
gather(variable, value, F1:F3) %>%
ggplot(aes(x = Rank, y = value, fill = variable)) +
geom_bar(stat = "identity")
You will need to melt your dataframe to get it into the so-called long format:
require(reshape2)
sample.data.M <- melt(sample.data)
Now your field values are represented by their own rows and identified through the variable column. This can now be leveraged within the ggplot aesthetics:
require(ggplot2)
c <- ggplot(sample.data.M, aes(x = Rank, y = value, fill = variable))
c + geom_bar(stat = "identity")
Instead of stacking you may also be interested in showing multiple plots using facets:
c <- ggplot(sample.data.M, aes(x = Rank, y = value))
c + facet_wrap(~ variable) + geom_bar(stat = "identity")

Resources