adjust y-axis for missings in ggplot2? - r

First some toy data:
df = read.table(text =
"id year value sex
1 2000 0 1
1 2001 1 0
1 2002 0 1
1 2003 0 0
2 2000 0 0
2 2002 0 0
2 2003 1 0
3 2002 0 1
4 2000 0 0
4 2001 0 1
4 2002 1 0
4 2003 0 1 ", sep = "", header = TRUE)
When I want to visualize year by id for sex==1, I do
df2 <- df[df$sex==1,]
p <- ggplot(df2, aes(y=id))
p <- p + geom_point(aes(x=year))
p
How can I hide observation 2 from the graph so that the distance between each remaining id's is the same? Is there a general way how to adjust the distance between two ticks on the y-axis when my breaks are id?
Does the solution also works when using facets?
p <- ggplot(df, aes(y=id))
p <- p + geom_point(aes(x=year))
p <- p + facet_grid(sex ~.)

Edited based on OP's clarification
Create individual plots and use the gridExtra package.
I am not sure if this is what you are looking for, but the use of reorder() should help.
Just to test it out, I changed the "id" value of 4 to be 7 in your toy dataframe.
To drop levels in individual plots, you can create 2 plots and then place them side by side.
df2 <- df[df$sex==1,]
p1 <- ggplot(df2, aes(y=(reorder(id, id))))
p1 <- p1 + geom_point(aes(x=year))
p1
df3 <- df[df$sex==0,]
p2 <- ggplot(df3, aes(y=(reorder(id, id))))
p2 <- p2 + geom_point(aes(x=year))
If you notice, the id's without data are dropped. For example the following doesn't have id=2.
Now, you can use the gridExtra package which is meant for this purpose, to print out both the plots p1 and p2.
require(gridExtra)
grid.arrange(p1, p2, ncol=2)
facet_grid() includes all levels by design
Using facet_grid directly won't work, but this is by design. Facet_grip has the drop=TRUE by default. Notice that you are not seeing id's=5 or 6. If an id appears in any one panel, it is included in all the other panels to facilitate comparison.
p <- ggplot(df, aes(y=reorder(id, id)))
p <- p + geom_point(aes(x=year))
p <- p + facet_grid(sex ~.)
p

Related

In R ggplot, how do I stack two dotplots?

My data is,
$ Age : int 20 25 30 35 40 45 50 55 60
$ Test.Positive : int 1 0 1 1 2 2 0 1 0
$ Test.Negative : int 0 1 3 2 4 1 3 1 1
I am able to create individual dot plots for each as,
YM_R = rep(Age,YM)
df1 <- as.data.frame(YM_R)
YP_R = rep(Age,YP)
df2 <- as.data.frame(YP_R)
gm <- ggplot(df1) +
geom_dotplot(aes(x=df1$YM_R, y="Y-"), color='green', fill='green', binwidth = 2)
gm <- ggplot(df2) +
geom_dotplot(aes(x=df2$YP_R, y="Y+"), color='red', fill='red', binwidth = 2)
But I don't know how to combine them. Sample of how I want is in the image attached. Any pointers appreciated.
I suggest instead of thinking about "combining" plots, look instead to "facet" them.
Using an example from ?geom_dotplot:
library(ggplot2)
ggplot(mtcars, aes(mpg)) +
geom_dotplot(method="histodot", binwidth=1.5)
By adding a single call to facet_grid (there's facet_wrap as well), we can break them out:
ggplot(mtcars, aes(mpg)) +
geom_dotplot(method="histodot", binwidth=1.5) +
facet_grid(cyl ~ .)

how to make the boxplots with dot points and labels?

I have a dataframe as below
G1 G2 G3 G4 group
S_1 0 269.067 0.0817233 243.22 N
S_2 0 244.785 0.0451406 182.981 N
S_3 0 343.667 0.0311259 351.329 N
S_4 0 436.447 0.0514887 371.236 N
S_5 0 324.709 0 293.31 N
S_6 0 340.246 0.0951976 393.162 N
S_7 0 382.889 0.0440337 335.208 N
S_8 0 368.021 0.0192622 326.387 N
S_9 0 267.539 0.077784 225.289 T
S_10 0 245.879 0.368655 232.701 T
S_11 0 17.764 0 266.495 T
S_12 0 326.096 0.0455578 245.6 T
S_13 0 271.402 0.0368059 229.931 T
S_14 0 267.377 0 248.764 T
S_15 0 210.895 0.0616382 257.417 T
S_16 0.0401525 183.518 0.0931699 245.762 T
S_17 0 221.535 0.219924 203.275 T
Now I want to make a multiboxplot with all the 4 genes in columns. The first 8 rows are for normal samples an rest 9 rows are tumor samples so for each gene I should be able to make 2 box plots with labels of tissues. I am able to make individual boxplots but how should I put all the 4 genes in one plot and also label the tissue for each boxplots and use the stripchart points. Is there a easy way to do it? I can only make individual plots using the row and column names but cannot mark the labels based on column groups in the plot and also plot the points with the stripchart. Any help will be appreciated. Thanks
with facet_wrap:
head(df)
G1 G2 G3 G4 group
S_1 0 269.067 0.0817233 243.220 N
S_2 0 244.785 0.0451406 182.981 N
S_3 0 343.667 0.0311259 351.329 N
S_4 0 436.447 0.0514887 371.236 N
S_5 0 324.709 0.0000000 293.310 N
S_6 0 340.246 0.0951976 393.162 N
library(reshape2)
df <- melt(df)
library(ggplot2)
ggplot(df, aes(x = variable,y = value, group=group, col=group)) +
facet_wrap(~variable, scales = 'free') + geom_boxplot()
Not sure what you mean with stripchart points, I assumed you wanted to visualize the actual points overlaid on the boxplots. Would the following suffice?
library(ggplot2)
library(dplyr)
library(reshape2)
melt(df) %>%
ggplot(aes(x = variable, y = value, col = group)) +
geom_boxplot() +
geom_jitter()
Where df is the above data frame. Result:

barplots in R comparing data from two columns

I have the following:
> ArkHouse2014 <- read.csv(file="C:/Rwork/ar14.csv", header=TRUE, sep=",")
> ArkHouse2014
DISTRICT GOP DEM
1 AR-60 3,951 4,001
2 AR-61 3,899 4,634
3 AR-62 5,130 4,319
4 AR-100 6,550 3,850
5 AR-52 5,425 3,019
6 AR-10 3,638 5,009
7 AR-32 6,980 5,349
What I would like to do is make a barplot (or series of barplots) to compare the totals in the second and third columns on the y-axis while the x-axis would display the information in the first column.
It seems like this should be very easy to do, but most of the information on making barplots that I can find has you make a table from the data and then barplot that, e.g.,
> table(ArkHouse2014$GOP)
2,936 3,258 3,508 3,573 3,581 3,588 3,638 3,830 3,899 3,951 4,133 4,166 4,319 4,330 4,345 4,391 4,396 4,588
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4,969 5,130 5,177 5,343 5,425 5,466 5,710 5,991 6,070 6,100 6,234 6,490 6,550 6,980 7,847 8,846
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I don't want the counts of how many have each total, I'd like to just represent the quantities visually. I feel pretty stupid not being able to figure this out, so thanks in advance for any advice you have to offer me.
Here's an option using libraries reshape2 and ggplot2:
I first read your data (with dec = ","):
df <- read.table(header=TRUE, text="DISTRICT GOP DEM
1 AR-60 3,951 4,001
2 AR-61 3,899 4,634
3 AR-62 5,130 4,319
4 AR-100 6,550 3,850
5 AR-52 5,425 3,019
6 AR-10 3,638 5,009
7 AR-32 6,980 5,349", dec = ",")
Then reshape it to long format:
library(reshape2)
df_long <- melt(df, id.var = "DISTRICT")
Then create a barplot using ggplot:
library(ggplot2)
ggplot(df_long, aes(x = DISTRICT, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge")
or if you want the bars stacked:
ggplot(df_long, aes(x = DISTRICT, y = value, fill = variable)) +
geom_bar(stat = "identity")

Plotting a dot for every n observations

I want to archieve the following plot type using ggplot:
using the following data:
t <- read.table(header=T, row.names=NULL,
colClasses=c(rep("factor",3),"numeric"), text=
"week team level n.persons
1 A 1 50
1 A 2 20
1 A 3 30
1 B 1 50
1 B 2 20
2 A 2 20
2 A 3 40
2 A 4 20
2 B 3 30
2 B 4 20")
so far, by applying this transformation
t0 <- t[ rep(1:nrow(t), t$n.persons %/% 10 ) , ]
and plotting
ggplot(t0) + aes(x=week, y=level, fill=team) +
geom_dotplot(binaxis="y", stackdir="center",
position=position_dodge(width=0.2)
i could generate
A: How to archieve that dots of different teams dodge each other vertically and do not overlap?
B: Is it possible that the whole pack of dots is always centered, i.e.
no dodging occurs if there are only dots of one team in one place?
The following code stops the overlap:
t0 <- t[ rep(1:nrow(t), t$n.persons %/% 10 ) , ]
t0$level <- as.numeric(t0$level) # This changes the x-axis to numerics
t0$level <- ifelse(t0$team == "B", (t0$level+.1), t0$level) # This adds .1 to the position on the x-axis if the team is 'B'
ggplot(t0) + aes(x=week, y=level, fill=team) + geom_dotplot(binaxis="y", stackdir="center",
position=position_dodge(width=0.2))
Here is the output:
You could also minus a value to move the dot downwards if you would prefer that.
If you want the line exactly between the dots this code should do it:
t0$level <- ifelse(t0$team == "B", (t0$level+.06), t0$level)
t0$level <- ifelse(t0$team == "A", (t0$level-.06), t0$level)
Output:
I'm not sure off the top of my head how to skip the above ifelse when there is only one team at a given coordinate. I'd imagine you'd need to do a count of unique team labels at each coordinate and only if that count was > 1 then run the code above.

R for loop and a series of plots: obtaining always the same plot

I'm trying to get a series of plots from the following dataset and for loop:
> head(all5new[c(6,70,22:23)])#This is a snapshot of my dataset. There is more species, see below.
setID fishery blackdog smoothdog
11 1 TRAWL-PAND.BOR. 0 0
12 1 TRAWL-PAND.BOR. 0 0
13 1 TRAWL-REDFISH 0 0
14 1 TRAWL-PAND.BOR. 0 0
21 10 TRAWL-PAND.BOR. 0 0
22 10 TRAWL-PAND.BOR. 0 0
> elasmo #This is the list of the species for which I would like to have individual barplots
[1] "blackdog" "smoothdog" "spinydog" "mako" "porbeagle"
[6] "blue" "greenland" "portuguese" "greatwhite" "mackerelNS"
[11] "dogfish" "basking" "thresher" "deepseacat" "atlsharp"
[16] "oceanicwt" "roughsagre" "dusky" "sharkNS" "sand"
[21] "sandbar" "smoothhammer" "tiger" "wintersk" "abyssalsk"
[26] "arcticsk" "barndoorsk" "roundsk" "jensensk" "littlesk"
[31] "richardsk" "smoothsk" "softsk" "spinysk" "thorny"
[36] "whitesk" "stingrays" "skateNS" "manta" "briersk"
[41] "pelsting" "roughsting" "raysNS" "skateraysNS" "allSHARK"
[46] "allSKATE" "PELAGIC"
This is my for loop. The code works fine when I run it for one species, however when I run it for all, I always get the same barplot. I know it must be just a quick fix adding for example [[i]] somewhere in the code, but I tried different things without any success.
for (i in elasmo) {
# CALUCLATE THE CATCH PER UNIT OF EFFORT (KG/SET) FOR ALL SPECIES FOR EACH FISHERY
test<-ddply(all5new,.(fishery),summarize, sets=length(as.factor(setID)),LOGcpue=log((sum(i)/length(as.factor(setID)))))
#TAKE THE FIRST 10 FISHERY WITH THE HIGHEST LOGcpue
x<-test[order(-test$LOGcpue)[1:10],]
#REORDER THE FISHERY FACTOR ACCORDINGLY (FOR GGPLOT2, TO HAVE EACH LEVEL IN ORDER)
list<-x$fishery
x$fishery <- factor(x$fishery, levels =list)
#BAR PLOT
graph<-ggplot(x, aes(fishery,LOGcpue)) + geom_bar() + coord_flip() +
geom_text(aes(label=sets,hjust=0.5,vjust=-1),size=4,angle = 270)
#SAVE GRAPH IN NEW DIR
ggsave(graph,filename=paste("barplot",i,".png",sep=""))
}
Here's a subset of my dataset after melting: mydata.
> data.melt<-melt(all5new, id.vars=c("tripID","setID","fishery"), measure.vars = c(22:23))
> head(data.melt);dim(data.melt)
tripID setID fishery variable value
1 1 1 TRAWL-PAND.BOR. blackdog 0
2 1 1 TRAWL-PAND.BOR. blackdog 0
3 1 1 TRAWL-REDFISH blackdog 0
4 1 1 TRAWL-PAND.BOR. blackdog 0
5 1 10 TRAWL-PAND.BOR. blackdog 0
6 1 10 TRAWL-PAND.BOR. blackdog 0
[1] 350100 5
Here's a workflow I use for generating lots of graphs, adapted to your dataset (or, my interpretation of it). This is a nice illustration of the power of plyr, I think. For your application, I don't think calculation times really matter. What is more important for you is generating easy-to-read code, and I think plyr is good for this.
#Load packages
require(plyr)
require(reshape)
require(ggplot2)
#Recreate your data set, with only two species
setID <- rep(1:5, each=4, times=1)
fishery <- gl(10, 2)
blackdog <- sample(1:5, size=20, replace=TRUE)
smoothdog <- sample(1:5, size=20, replace=TRUE)
df <- data.frame(setID, fishery, blackdog, smoothdog)
#Melt the data frame
dfm <- melt(df, id.vars <- c("setID", "fishery"))
#Calculate LOGcpue for each fish at each fishery
cpueDF <- ddply(dfm, c("fishery", "variable"), summarise, LOGcpue = log(sum(value)/length(value)))
#Plot all the data in one (potentially huge) faceted plot.
#(I often use huge plots like this for onscreen analysis
# - obviouly it can't be printed in practice, but you can get a visual overview of the data)
ggplot(cpueDF, aes(x=fishery, y=LOGcpue)) + geom_bar() + coord_flip() + facet_wrap(~variable)
ggsave("giant plot.pdf", height=30, width=30, units="in")
#Print each plot individually to screen, and save it, and put it in a list
printGraph <- function(df) {
p <-ggplot(df, aes(x=fishery, y=LOGcpue)) +
geom_bar() + coord_flip()
print(p)
fn <- paste(df$variable[1], ".png")
ggsave(fn)
printGraph <- p
}
plotList <- dlply(cpueDF, .(variable), printGraph)
#Now pick out the top n fisheries for each fish
cpueDFtopN <- ddply(cpueDF, .(variable), function(x) head(x[order(x$LOGcpue, decreasing=T),], n=5))
ggplot(cpueDFtopN, aes(x=fishery, y=LOGcpue)) + geom_bar() +
coord_flip() + facet_wrap(~variable, scales="free")

Resources