ggplot2 - include one level of a factor in all facets - r

I have some time series data that is facet wrapped by a variable 'treatment'. One of the levels of this 'treatment' factor the a negative control & I want to include it in every facet.
For example using R dataset 'Theoph':
data("Theoph")
head(Theoph)
Subject Wt Dose Time conc
1 1 79.6 4.02 0.00 0.74
2 1 79.6 4.02 0.25 2.84
3 1 79.6 4.02 0.57 6.57
4 1 79.6 4.02 1.12 10.50
5 1 79.6 4.02 2.02 9.66
6 1 79.6 4.02 3.82 8.58
Theoph$Subject <- factor(Theoph$Subject, levels = unique(Theoph$Subject)) # set factor order
ggplot(Theoph, aes(x=Time, y=conc, colour=Subject)) +
geom_line() +
geom_point() +
facet_wrap(~ Subject)
How could I include the data corresponding to Subject '1' (the control) to be included in each facet? (And ideally removing the facet that contains Subject 1's data alone.)
Thank you!

To have a certain subject appear in every facet, we need to replicate it's data for every facet. We'll create a new column called facet, replicate the Subject 1 data for each other value of Subject, and for Subject != 1, set facet equal to Subject.
every_facet_data = subset(Theoph, Subject == 1)
individual_facet_data = subset(Theoph, Subject != 1)
individual_facet_data$facet = individual_facet_data$Subject
every_facet_data = merge(every_facet_data,
data.frame(Subject = 1, facet = unique(individual_facet_data$facet)))
plot_data = rbind(every_facet_data, individual_facet_data)
library(ggplot2)
ggplot(plot_data, aes(x=Time, y=conc, colour=Subject)) +
geom_line() +
geom_point() +
facet_wrap(~ facet)

Related

How to overplot geom_histogram with stat_bin or geom_line with multiple groupings?

I'm trying to create a plot which has:
histogram of values in "historic" time period, created from method "A"
histogram of values in "future" time period, created from method "A"
either stat_bin or geom_line of values in both "historic" and "future" time period, created from method "B"
example data:
draw method Parameter Value
1 A historic 0.99
1 A future 0.98
1 B historic 0.97
1 B future 0.96
2 A historic 0.9
2 A future 0.88
2 B historic 0.95
2 B future 0.94
3 A historic 0.97
3 A future 0.94
3 B historic 0.91
3 B future 0.89
ggplot(df,aes(x=Value,color=Parameter,fill=Parameter)) +
scale_color_discrete(name="Period",labels=c("historic","future")) +
scale_fill_discrete(name="Period",labels=c("historic","future"),guide="none") +
geom_histogram(aes(y=..density..),
breaks=seq(.8,1.0,by=0.01),
alpha=0.4,position="identity") +
theme(axis.title.x=element_blank(),axis.text.x=element_blank(),
axis.title.y=element_blank()) +
scale_x_continuous(breaks=seq(.8,1.00,by=0.01)) +coord_flip() +
theme(legend.position = "bottom") +
geom_line(data=subset(df,method == "B"),
aes(x=Value),stat="density")
In the image, it looks like the histogram is plotting all of the "method" values. But in the histograms, I only want method == "A" (and Parameter == "historic" and "future"). Is there any way to create different types of plots based on two types of groupings? The geom_line should only be plotting method == "B", Parameter == "historic","future", and geom_histogram should only be plotting , method == "A", Parameter == "historic","future".
I'd like the final result to look like this: (either the left, with geom_line, or the right, with stat_bin)
Plotting what you requested, i.e. histogram bars from method "A" and line from method "B".
Reading in example data:
x <- '
draw method Parameter Value
1 A historic 0.99
1 A future 0.98
1 B historic 0.97
1 B future 0.96
2 A historic 0.9
2 A future 0.88
2 B historic 0.95
2 B future 0.94
3 A historic 0.97
3 A future 0.94
3 B historic 0.91
3 B future 0.89
'
df <- read.table(textConnection(x), header = TRUE)
Plotting:
ggplot() +
geom_histogram(data=df %>% filter(method=="A"),
aes(x=Value, y=..density.., fill=Parameter),
breaks=seq(.8, 1.0, by=0.01),
alpha=0.4, position="identity") +
geom_line(data=df %>% filter(method=="B"),
aes(x=Value, colour=Parameter), stat="density") +
scale_fill_discrete(name="", labels=c("Historic, A","Future, A")) +
scale_colour_discrete(name="", labels=c("Historic B","Future B")) +
coord_flip()

Improve the autoBinning Histogram

I'm doing auto Binning Histogram for my second time, but it looks elementary. I'm seeking help to improve it.
what I have tried is
> DAta <- read.table(text="Species DNA LINE LTR SINE Helitron Unclassified Unmasked
+ darius 2.68 10.37 18.00 1.52 3.64 0.03 63.79
+ Derian 2.74 10.59 16.61 1.56 4.24 0.03 64.23
+ rats 2.77 10.97 15.20 1.57 4.69 0.03 64.77
+ Mouos 2.53 10.42 17.33 1.42 3.68 0.02 64.6", header=TRUE)
> library(reshape2)
> DF1 <- melt(DF, id.var="Rank")
> DF1 <- melt(DAta, id.var="Species")
> library(ggplot2)
> ggplot(DF1, aes(x = Species, y = value, fill = variable)) +
+ geom_bar(stat = "identity")
Output:
How can I make the species name in Italic?
The order of the histogram should be as the same as the input? start from left to right (darius, Derian, rats and Mouos)
Colours and style to look better and reasonable.
There are 3 questions here:
To change the axis labels to italics, one needs adjust the
x.axis.text, see the question/answers referenced at the bottom.
To change the ordering of the axis labels, you need to specify the
variable Species as a factor variable defining the desire order of
the levels.
Finally, to change the color scheme, use the
scale_fill_ function. I like the colorBrewer package with several good color schemes available. There
are few other define scale_fill options available.
Note: this a barchart and not a histogram.
See the comments for additional details:
DAta <- read.table(text="Species DNA LINE LTR SINE Helitron Unclassified Unmasked
darius 2.68 10.37 18.00 1.52 3.64 0.03 63.79
Derian 2.74 10.59 16.61 1.56 4.24 0.03 64.23
rats 2.77 10.97 15.20 1.57 4.69 0.03 64.77
Mouos 2.53 10.42 17.33 1.42 3.68 0.02 64.6", header=TRUE)
#updated method to reshape data. tidyr is replacement for reshape2
library(tidyr)
library(tidyr)
DF1 <- pivot_longer(DAta, cols=-1, names_to = "Classification", values_to = "Value" )
#Set Species as factors defining the order of the labels
DF1$Species<-factor(DF1$Species, levels=c("darius", "Derian", "rats", "Mouos"))
library(ggplot2)
ggplot(DF1, aes(x = Species, y = Value, fill = Classification)) +
geom_bar(stat = "identity") +
scale_fill_brewer(palette = "Pastel1") +
theme(axis.text.x = element_text(face="italic"))
Option: If the number of columns or the naming of the columns can change then here is a potential option for maintaining the proper ordering of the Species names:
#retrieves column names from original dataframe the 2nd to the end
# assumes the columns are "Species" and then only the species names
DF1$Species<-factor(DF1$Species, levels= names(DAta)[-1])
To adjust the axis labels here is a good reference:
Changing font size and direction of axes text in ggplot2

How to plot a graph with four lines?

I am doing an eye-tracking experiment trying to find out the influence of two languages on the fixation proportions of participants on two different Areas of interest (AOIs), along the time.
My independent variables: Language (L1 vs. L2), AOI (AOI1 vs. AOI2), and time (divided into 50 time bins already). I want to plot a graph with four lines, each line stands for the fixation percentage of "L1 AOI1", "L1 AOI2", "L2 AOI1" and "L2 AOI2". An example of my data.frame is as follows:
Stimulus Bin Language AOI percentage
1 1 L1 AOI1 0.75
1 1 L1 AOI2 0.12
1 1 L2 AOI1 0.54
1 1 L2 AOI2 0.36
...
10 1 L1 AOI1 0.85
10 1 L1 AOI2 0.10
10 1 L2 AOI1 0.60
10 1 L2 AOI2 0.23
...
10 7 L1 AOI1 0.64
10 7 L1 AOI2 0.14
10 7 L2 AOI1 0.66
10 7 L2 AOI2 0.21
...
I think I do not need to melt my data, right? because it is already in a long format.
I have draw two graphs with facet_wrap as follows, but how could I get ONE graph with all those information?
ggplot(data,aes(Bin, percentage, linetype = Language)) +`enter code here`
facet_wrap(~ AOI)+
stat_summary(fun.y = mean,geom = "line")+
stat_summary(fun.data = mean_se,geom = "ribbon",
color = NA, alpha = 0.3) +
theme_bw(base_size = 10) +
labs(x = "2000 ms since picture onset (50 time bins)",
y = "fixation proportion") +
scale_linetype_manual(values = c("solid","dashed"))
Any ideas would be of great help to me.
Thanks!
facet_wrap is not the function you need for that. Instead you can add color = AOI in the ggplot(aes()) in addition to linetype = Language. It will make different colors for OAI and different linetypes for Language, so 4 different lines on the same graph.
This post may interest you : https://stackoverflow.com/a/3777592/10580543

Plot data from different csv files in one graph using R + ggplot

I have multiple .csv files, every on of this has a column (called: Data) that I want to compare with each other. But first, I have to group the values in a column of each file. In the end I want to have multiple colored "lines" with the mean value of each group in one graph. I will describe the process I use to get the graph I want below. This works for a single file but I don't know how to add multiple "lines" of multiple files in one graph using ggplot.
This is what I got so far:
data = read.csv(file="my01data.csv",header=FALSE, sep=",")
A single .csv File looks like the following, but without the headline
ID Data Range
1,63,5.01
2,61,5.02
3,65,5.00
4,62,4.99
5,62,4.98
6,64,5.01
7,71,4.90
8,72,4.93
9,82,4.89
10,82,4.80
11,83,4.82
10,85,4.79
11,81,4.80
After getting the data I group it with the following lines:
data["Group"] <- NA
data[(data$Range>4.95), "Group"] <- 5.0
data[(data$Range>4.85 & data$Range<4.95), "Group"] <- 4.9
data[(data$Range>4.75 & data$Range<4.85), "Group"] <- 4.8
The final data looks like this:
myTable <- "ID Data Range Group
1 63 5.01 5.00
2 61 5.02 5.00
3 65 5.00 5.00
4 62 4.99 5.00
5 62 4.98 5.00
6 64 5.01 5.00
7 71 4.90 4.90
8 72 4.93 4.90
9 72 4.89 4.90
10 82 4.80 4.80
11 83 4.82 4.80
10 85 4.79 4.80
11 81 4.80 4.80"
myData <- read.table(text=myTable, header = TRUE)
To plot this dataframe I use the following lines:
( pplot <- ggplot(data=myDAta, aes(x=myDAta$Group, y=myDAta$Data))
+ stat_summary(fun.y = mean, geom = "line", color='red')
+ xlab("Group")
+ ylab("Data")
)
Which results in a graph like this:
I assume you have the names of your .csv-files stored in a vector named file_names. Then you can run the following code and should get a different line for each file:
library(ggplot2)
data_list <- lapply(file_names, read.csv , header=FALSE, sep=",")
data_list <- lapply(seq_along(data_list), function(i){
df <- data_list[[i]]
df$Group <- round(df$Range, 1)
df$DataNumber <- i
df
})
finalTable <- do.call(rbind, data_list)
finalTable$DataNumber <- factor(finalTable$DataNumber)
ggplot(finalTable, aes(x=Group, y=Data, group = DataNumber, color = DataNumber)) +
stat_summary(fun.y = mean, geom = "line") +
xlab("Group") +
ylab("Data")
How it works
First the different datasets are read with read.csv into a list data_list. Then each data.frame in that list is assigned a Group.
I used round here with k=1, which means it rounds to one decimal point (I figured that's what your are doing).
Then also a unique number (in this case simply the index of the list) is assigned to each data.frame. After that the list is combined to one data.frame with rbind and then DataNumber is turned into a factor (prettier for plotting). Finally I added DataNumber as a group and color variable to the plot.
You can add another line by using stat_summary again; you can define the data and aes argument to any other dataset:
#some pseudo data for testing
my_other_data <- myData
my_other_data$Data <- my_other_data$Data * 0.5
pplot <- ggplot(data=myData, aes(x=Group, y=Data)) +
stat_summary(fun.y = mean, geom = "line", color='red') +
stat_summary(data=my_other_data, aes(x=Group, y=Data),
fun.y = mean, geom = "line", color='green') +
xlab("Group") +
ylab("Data")
pplot
Why not creating a classifying column ("Class")
myTable1$Class <- "table1"
myTable1
"ID Data Range Group Class
1 63 5.01 5.00 table1
2 61 5.02 5.00 table1
3 65 5.00 5.00 table1"
myTable2$Class <- "table2"
myTable2
"ID Data Range Group Class
1 63 5.01 5.00 table2
2 61 5.02 5.00 table2
3 65 5.00 5.00 table2"
And merging dataframe
dfBIND <- rbind(myTable1, MyTable2)
So that you can ggplot with a grouping or coloring variable
pplot <- ggplot(data=dfBIND, aes(x= dfBIND$Group, y= dfBIND$Data, group=Class)) +
stat_summary(fun.y = mean, geom = "line", color='red') +
xlab("Group") +
ylab("Data")

Reorder stacks in horizontal stacked barplot (R)

I'm trying to make a horizontal stacked barplot using ggplot. Below are the actual values for three out of 300 sites in my data frame. Here's where I've gotten to so far, using info pulled from these previous questions which I admit I may not have fully understood.
df <- data.frame(id=c("AR001","AR001","AR001","AR001","AR002","AR002","AR002","AR003","AR003","AR003","AR003","AR003"),
landuse=c("agriculture","developed","forest","water","agriculture","developed","forest","agriculture","developed","forest","water","wetlands"),
percent=c(38.77,1.76,59.43,0.03,69.95,0.42,29.63,65.4,3.73,15.92,1.35,13.61))
df
id landuse percent
1 AR001 agriculture 38.77
2 AR001 developed 1.76
3 AR001 forest 59.43
4 AR001 water 0.03
5 AR002 agriculture 69.95
6 AR002 developed 0.42
7 AR002 forest 29.63
8 AR003 agriculture 65.40
9 AR003 developed 3.73
10 AR003 forest 15.92
11 AR003 water 1.35
12 AR003 wetlands 13.61
str(df)
'data.frame': 12 obs. of 3 variables:
$ id : Factor w/ 3 levels "AR001","AR002",..: 1 1 1 1 2 2 2 3 3 3 ...
$ landuse: Factor w/ 5 levels "agriculture",..: 1 2 3 4 1 2 3 1 2 3 ...
$ percent: num 38.77 1.76 59.43 0.03 69.95 ...
df <- transform(df,
landuse.ord = factor(
landuse,
levels=c("agriculture","forest","wetlands","water","developed"),
ordered =TRUE))
cols <- c(agriculture="maroon",forest="forestgreen",
wetlands="gold", water="dodgerblue", developed="darkorchid")
ggplot(df,aes(x = id, y = percent, fill = landuse.ord, order=landuse.ord)) +
geom_bar(position = "stack",stat = "identity", width=1) +
coord_flip() +
scale_fill_manual(values = cols)
which produces this graph.
What I would like to do is to reorder the bars so that they are in descending order by value for the agriculture category - in this example AR002 would be at the top, followed by AR003 then AR001. I tried changing the contents of aes to aes(x = reorder(landuse.ord, percent), but that eliminated the stacking and seemed to have maybe summed the percentages for each land use category:
I would like to have the stacks in order, from left to right: agriculture, forest, wetlands, water, developed. I tried doing that with the transform part of the code, which put it in the correct order in the legend, but not in the plot itself?
Thanks in advance... I have made a ton of progress based on answers to other peoples' questions, but seem to now be stuck at this point!
Update: here is the finished graph for all 326 sites!
Ok based on your comments, I believe this is your solution. Place these lines after cols<-...:
#create df to sort by argiculture's percentage
ag<-filter(df, landuse=="agriculture")
#use the df to sort and order df$id's levels
df$id<-factor(df$id, levels=ag$id[order(ag$percent)], ordered = TRUE)
#sort df, based on ordered ids and ordered landuse
df<-df[order(df$id, df$landuse.ord),]
ggplot(df,aes(x = id, y = percent, fill = landuse.ord, order=landuse.ord)) +
geom_bar(position = "stack",stat = "identity", width=1) +
coord_flip() +
scale_fill_manual(values = cols)
The comments should clarify each of the lines purposes. This will reorder your original data frame, if that is a problem I would create a copy and then operate on the new copy.

Resources