stack bars in plot without preserving label order - r

ggplot preserves the order of stacked bars according to labels:
d <- read.table(text='Day Location Length Amount
1 2 3 1
1 1 4 2
3 3 3 2
3 2 5 1',header=T)
d$Amount<-as.factor(d$Amount) # in real world is not numeric
ggplot(d, aes(x = Day, y = Length)) +
geom_bar(aes(fill = Amount), stat = "identity")
What I desired is something similar as the result of the plot without the as.factor line. That is: that the greater bars are always on top. However, I cannot do that with my data because I have categories, not numbers.
Similar post: https://www.researchgate.net/post/R_ggplot2_Reorder_stacked_plot
Solution can come in other R package
Note: data.frame is only demonstrative.

I came up with this solution:
(1) First, sort data.frame by column of values in decreasing order
(2) Then duplicate column of values, as factor.
(3) In ggplot group by new factor (values)
d <- read.table(text='Day Length Amount
1 3 1
1 4 2
3 3 2
3 5 1',header=T)
d$Amount<-as.factor(d$Amount)
d <- d[order(d$Length, decreasing = TRUE),] # (1)
d$LengthFactor<-factor(d$Length, levels= unique(d$Length) ) # (2)
ggplot(d)+
geom_bar(aes(x=Day, y=Length, group=LengthFactor, fill=Amount), # (3)
stat="identity", color="white")
{
library(data.table)
sam<-data.frame(population=c(rep("PRO",8),rep("SOM",4)),
allele=c("alele1","alele2","alele3","alele4",rep("alele5",2),
rep("alele3",2),"alele2","alele3","alele3","alele2"),
frequency=rep(c(10,5,4,6,7,16),2) #,rep(1,6)))
)
sam <- setDT(sam)[, .(frequencySum=sum(frequency)), by=.(population,allele)]
sam <- sam[order(sam$frequency, decreasing = TRUE),] # (1)
# (2)
sam$frequency<-factor(sam$frequency, levels = unique(sam$frequency) )
library(ggplot2)
ggplot(sam)+
geom_bar(aes(x=population, y=frequencySum, group=frequency, fill=allele), # (3)
stat="identity", color="white")
}

Related

Identify and plot datapoints surrounded by NAs

I am using ggplot2 and geom_line() to make a lineplot of a large number of time series. The dataset has a high number of missing values, and I am generally happy that lines are not drawn across missing segments, as this would look awkard.
My problem is that single non-NA datapoints surrounded by NAs (or points at the beginning/end of the series with an NA on the other side) are not plotted. A potential solution would be adding geom_point() for all observations, but this increases my filesize tenfold, and makes the plot harder to read.
Thus, I want to identify only those datapoints that do not get shown with geom_line() and add points only for those. Is there a straightforward way to identify these points?
My data is currently in long format, and the following MWE can serve as an illustration. I want to identify rows 1 and 7 so that I can plot them:
library(ggplot2)
set.seed(1)
dat <- data.frame(time=rep(1:5,2),country=rep(1:2,each=5),value=rnorm(10))
dat[c(2,6,8),3] <- NA
ggplot(dat) + geom_line(aes(time,value,group=country))
> dat
time country value
1 1 1 -0.6264538
2 2 1 NA
3 3 1 -0.8356286
4 4 1 1.5952808
5 5 1 0.3295078
6 1 2 NA
7 2 2 0.4874291
8 3 2 NA
9 4 2 0.5757814
10 5 2 -0.3053884
You can use zoo::rollapply function to create a new column with values surrended with NA only. Then you can simply plot those points. For example:
library(zoo)
library(ggplot2)
foo <- data.frame(time =c(1:11), value = c(1 ,NA, 3, 4, 5, NA, 2, NA, 4, 5, NA))
# Perform sliding window processing
val <- c(NA, NA, foo$value, NA, NA) # Add NA at the ends of vector
val <- rollapply(val, width = 3, FUN = function(x){
if (all(is.na(x) == c(TRUE, FALSE, TRUE))){
return(x[2])
} else {
return(NA)
}
})
foo$val_clean <- val[c(-1, -length(val))] # Remove first and last values
foo$val_clean
ggplot(foo) + geom_line(aes(time, value)) + geom_point(aes(time, val_clean))
Do you mean something like this?
library(tidyverse)
dat %>%
na.omit() %>%
ggplot() +
geom_line(aes(time, value, group = country))

Setting order of scale_x_discrete when there are repeated levels

I want to make usual geom_point plot using ggplot. But some of x values are repeated and I want to repeat them again in the x axis. So I tried scale_x_discrete and followed the example at here change-the-order-of-a-discrete-x-scale but I was not able to do what I want.
Here is my example
x = c(seq(1,4),seq(2,4))
y= (seq(1,7))
ex=rep(c("ex1","ex2"),c(4,3))
df <- data.frame(x,y,ex)
x y ex
1 1 1 ex1
2 2 2 ex1
3 3 3 ex1
4 4 4 ex1
5 2 5 ex2
6 3 6 ex2
7 4 7 ex2
ggplot(df, aes(x=factor(x),y=y)) +
geom_point(size=4) +
scale_x_discrete(limits=c(seq(1,4),seq(2,4)))
with discrete x repeat values, the repeated x axis values is not shown. How can repeat 2,3,4 values again after 1,2,3,4 in the x axis?
Thanks
Because you want not x but a combination of repeat and x as x-axis, it is a natural idea to give aes(x) the combination.
ggplot(df, aes(x = interaction(x, ex), y = y)) +
geom_point(size=4) +
scale_x_discrete(labels = df$x)

R: correctly reorder factor levels - avoid duplicated factor levels? {ggplot2}

The questions abut duplicated levels in factors resulting in:
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
duplicated levels in factors are deprecated
has been addressed multiple times. However, I still can't figure out how to transform my data correctly, with aim to avoid the introduction of duplicated levels in my data?
I have a data frame, want to make a plot and change the order of levels in my plot. That where my duplicated levels are created, and I can't rewrite my order to not to introduce them. Please, how to write my factors levels reordering correctly?
df1<-data.frame(year = rep(2002:2005, 5),
rate = sample(30,20),
gridcode = rep(1:2, each = 10),
distance = rep(c(100,200), 10))
# change order - !!! how to write this correctly?
df1$gridcode <- factor(df1$gridcode,
levels=df1$gridcode[
order(df1$gridcode, decreasing = TRUE)])
# plot values
ggplot(df1,aes(x = distance,
y= rate,
fill = as.factor(gridcode))) +
geom_bar(position = "stack", stat = "identity") +
facet_grid(. ~ year)
You need to wrap a unique around your levels= specification, otherwise you are assigning the levels heaps of times:
unique(df1$gridcode)[order(unique(df1$gridcode), decreasing = TRUE)]
#[1] 2 1
vs.
df1$gridcode[order(df1$gridcode, decreasing = TRUE)]
#[1] 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1

Pass a condition to a function so that function works for all the values of the condition passed....in R

This may be silly, but I am not getting how to do it,
What I want?
My function goes like this.
plot_exp <-
function(i){
dat <- subset(dat.frame,En == i )
ggplot(dat,aes(x=hours, y=variable, fill = Mn)) +
geom_point(aes(x=hours, y=variable, fill = Mn),size = 3,color = Mi) + geom_smooth(stat= "smooth" , alpha = I(0.01))
}
ll <- lapply(seq_len(EXP), plot_exp)
do.call(grid.arrange, ll)
and I have two variables
Var1, Var2 (Which will be passed through the command line, so cant group it using subset)
I want to run the above function for var1 and var2, my function produces two plots for each complete execution. So now it should produce 2 plots for var1 and two plots for var2.
I just want to know how can I apply the logic here to handle what I want? Thank you
This is what data.frame looks like
En Mn Hours var1 var2
1 1 1 0.1023488 0.6534707
1 1 2 0.1254325 0.5423215
1 1 3 0.1523245 0.2542354
1 2 1 0.1225425 0.2154533
1 2 2 0.1452354 0.4521255
1 2 3 0.1853324 0.2545545
2 1 1 0.1452369 0.2321542
2 1 2 0.1241241 0.2525212
2 1 3 0.0542232 0.2626214
2 2 1 0.8542154 0.2154522
2 2 2 0.0215420 0.5245125
2 2 3 0.2541254 0.2542512
I will table the above data.frame as input and I want to run my function once for var1 and produce two plots and then again run the same function for var2 and produce two more plots, then combine all of then using grid.arrange.
The variable values I have to read from the command line and then I have to do the following to get the required data out of main data frame.
subset((aggregate(cbind(variable1,variable2)~En+Mn+Hours,a, FUN=mean)))
after I read from the commandline and store them inside the "variable1" and "variable2" if I directly call them in the above command its not working. what should I do to enter those two variable values inside the command line.
I made a few changes and ran it on your sample data. Basically i just needed to use aes_string rather than aes to allow for a variable with a column name.
myvars<-c("var1", "var2")
plot_exp <- function(i, plotvar) {
dat <- subset(dat.frame,En == i )
ggplot(dat,aes_string(x="Hours", y=plotvar, fill = "Mn")) +
geom_point(aes(color=Mn), size = 3) +
geom_smooth(stat= "smooth" , alpha = I(0.01), method="loess")
}
ll <- do.call(Map, c(plot_exp, expand.grid(i=1:2, plotvar=myvars, stringsAsFactors=F)))
do.call(grid.arrange, ll)
(I'm not sure why the colors of the legends are messed up in the image, they look fine on screen)
For subsetting, use
myvars <- c("var1", "var2")
subset(a[,myvars], a[,c("En","Mn","Hours")], FUN=mean)

Add a row in a sorted data frame : which solutions?

There is something I don't understand.
I've this data frame :
Var1 Freq
1 2008-05 1
2 2008-07 7
3 2008-08 5
4 2008-09 3
I need to append a row on second position, for exemple it would be :
2008-06 0
I followed this (Add a new row in specific place in a dataframe). First step : add an index column ; second step : append rows with an index number for each ; then, sort it.
df$ind <- seq_len(nrow(df))
df <- rbind(df,data.frame(Var1 = "2008-06", Freq = "0",ind=1.1))
df <- df[order(df$ind),]
Ok, everything seems good. Even if I don't know why a column called "row.names" has appeared, I get :
row.names Var1 Freq ind
1 1 2008-05 1 1
2 5 2008-06 0 1.1
3 2 2008-07 7 2
4 3 2008-08 5 3
5 4 2008-09 3 4
Now, I plot it, with ggplot2.
ggplot(df, aes(y = Freq, x = Var1)) + geom_bar()
Here we are. On the X axis, "2008-06" is placed at the end, after "2008-09" (ie with the index 5). In clear, the data frame has not been sorted, in despite of it seems to be.
Where I'm wrong ? Thanks for help...
Try this:
df$Var1 <- factor(df$Var1, df$Var1[order(df$ind)])
If you want ggplot2 to order labels, you have to specify the ordering yourself.
You might also want to look into converting Var1 to some sort of date class, then dispensing with the index variable altogether. This would makes things clearer, I think. The zoo package actually has a nice class for representing months of a given year, and you could use this for Var1. For example:
library(zoo)
df$Var1 <- as.yearmon(df$Var1)
df <- rbind(df,data.frame(Var1 = as.yearmon("2008-06"), Freq = "0"))
Now you can just order your data frame by Var1 without having to worry about keeping an index:
> df[order(df$Var1), ]
Var1 Freq
1 May 2008 1
5 Jun 2008 0
2 Jul 2008 7
3 Aug 2008 5
4 Sep 2008 3
A plot in ggplot2 will turn out as expected:
ggplot(df, aes(as.Date(Var1), Freq)) + geom_bar(stat="identity")
Though you do have to convert Var1 to Date, since ggplot2 doesn't understand yearmon objects.
It is because somewhere along the way you got a factor in the mix. This produces what you're after (without the rownames column):
df <- read.table(text=" Var1 Freq
1 2008-05 1
2 2008-07 7
3 2008-08 5
4 2008-09 3", header=TRUE, stringsAsFactors = FALSE)
df$ind <- seq_len(nrow(df))
df <- rbind(df,data.frame(Var1 = "2008-06", Freq = "0",ind=1.1, stringsAsFactors = FALSE))
df <- df[order(df$ind),]
ggplot(df, aes(y = Freq, x = Var1)) + geom_bar()
Notice the stringsAsFactors = FALSE?
As far as the order goes if you already have factors (as you do) you need to reorder the factor. If you want more detailed info see this post

Resources