Aggregate data and plot in a bar plot in R - r

i have a data set with parameter_variations and a score. This score has four scales: like, anth, comf and ueq.

The bargraph.CI function accepts raw data, not aggregated data. So try the following:
bargraph.CI(parameter_variants, response=score, group=scale, data=dat,
main="likeability", legend=TRUE)
This should give you one "two-way" plot. If you don't like the look of it, there are many arguments that make superficial adjustments. Check the help page for details.
To obtain separate plots for each of the four scales, I think you can do something like this:
library(dplyr)
dat %>%
filter(scale=="like") %>% # change the value here.
bargraph.CI(parameter_variants, response=score, data=., main="likeability")
Base R solution:
with(subset(dat, subset=scale=="like"),
bargraph.CI(parameter_variants, response=score, main="likeability")
)

Related

R: Cleaning GGally Plots

I am using the R programming language and I am new the GGally library. I followed some basic tutorials online and ran the following code:
#load libraries
library(GGally)
library(survival)
library(plotly)
I changed some of the data types:
#manipulate the data
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
Now I visualize:
#make the plots
#I dont know why, but this comes out messy
ggparcoord(data, groupColumn = "sex")
#Cleaner
ggparcoord(data)
Both ggparcoord() code segments successfully ran, however the first one came out pretty messy (the axis labels seem to have been corrupted). Is there a way to fix the labels?
In the second graph, it makes it difficult to tell how the factor variables are labelled on their respective axis (e.g. for the "sex" column, is "male" the bottom point or is "female" the bottom type). Does anyone know if there is a way to fix this?
Finally, is there a way to use the "ggplotly()" function for "ggally" objects?
e.g.
a = ggparcoord(data)
ggplotly(a)
Thanks
Looks like your data columns get converted to a factor when adding the groupColumn. To prevent that you could exclude the groupColumn from the columns to be plotted:
BTW: Not sure about the general case. But at least for ggparcoord ggplotly works.
library(GGally)
library(survival)
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
#I dont know why, but this comes out messy
ggparcoord(data, seq(ncol(data))[!names(data) %in% "sex"], groupColumn = "sex")

Set common y axis limits from a list of ggplots

I am running a function that returns a custom ggplot from an input data (it is in fact a plot with several layers on it). I run the function over several different input data and obtain a list of ggplots.
I want to create a grid with these plots to compare them but they all have different y axes.
I guess what I have to do is extract the maximum and minimum y axes limits from the ggplot list and apply those to each plot in the list.
How can I do that? I guess its through the use of ggbuild. Something like this:
test = ggplot_build(plot_list[[1]])
> test$layout$panel_scales_x
[[1]]
<ScaleContinuousPosition>
Range:
Limits: 0 -- 1
I am not familiar with the structure of a ggplot_build and maybe this one in particular is not a standard one as it comes from a "custom" ggplot.
For reference, these plots are created whit the gseaplot2 function from the enrichplot package.
I dont know how to "upload" an R object but if that would help, let me know how to do it.
Thanks!
edit after comments (thanks for your suggestions!)
Here is an example of the a gseaplot2 plot. GSEA stands for Gene Set Enrichment Analysis, it is a technique used in genomic studies. The gseaplot2 function calculates a running average and then plots it and another bar plot on the bottom.
and here is the grid I create to compare the plots generated from different data:
I would like to have a common scale for the "Running Enrichment Score" part.
I guess I could try to recreate the gseaplot2 function and input all of the datasets and then create the grid by facet_wrap, but I was wondering if there was an easy way of extracting parameters from a plot list.
As a reproducible example (from the enrichplot package):
library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
wpgmtfile <- system.file("extdata/wikipathways-20180810-gmt-Homo_sapiens.gmt", package="clusterProfiler")
wp2gene <- read.gmt(wpgmtfile)
wp2gene <- wp2gene %>% tidyr::separate(term, c("name","version","wpid","org"), "%")
wpid2gene <- wp2gene %>% dplyr::select(wpid, gene) #TERM2GENE
wpid2name <- wp2gene %>% dplyr::select(wpid, name) #TERM2NAME
ewp2 <- GSEA(geneList, TERM2GENE = wpid2gene, TERM2NAME = wpid2name, verbose=FALSE)
gseaplot2(ewp2, geneSetID=1, subplots=1:2)
And this is how I generate the plot list (probably there is a much more elegant way):
plot_list = list()
for(i in 1:3) {
fig_i = gseaplot2(ewp2,
geneSetID=i,
subplots=1:2)
plot_list[[i]] = fig_i
}
ggarrange(plotlist=plot_list)

Plot LOESS (STL) decomposition using Ggvis

I want to be able to plot the three different elements of The Seasonal Trend Decomposition using Loess (STL) with Ggvis.
However, I recive this error:
Error: data_frames can only contain 1d atomic vectors and lists
I am using the nottem data set.
# The Seasonal Trend Decomposition using Loess (STL) with Ggvis
# Load nottem data set
library(datasets)
nottem <- nottem
# Decompose using stl()
nottem.stl = stl(nottem, s.window="periodic")
# Plot decomposition
plot(nottem.stl)
Now, this is the information I am interested in. In order to make this into a plot that I can play around with I transform this into a data frame using the xts-package. So far so good.
# Transform nottem.stl to a data.frame
library(xts)
df.nottem.stl <- as.data.frame(as.xts(nottem.stl$time.series))
# Add date to data.frame
df.nottem.stl$date <- data.frame(time = seq(as.Date("1920-01-01"), by = ("months"), length =240))
# Glimpse data
glimpse(df.nottem.stl)
# Plot simple line of trend
plot(df.nottem.stl$date, df.nottem.stl$trend, type = "o")
This is pretty much the plot I want. However, I want to be able to use it with Shiny and therefore Ggvis is preferable.
# Plot ggvis
df.nottem.stl%>%
ggvis(~date, ~trend)%>%
layer_lines()
This is where I get my error.
Any hints on what might go wrong?
First of all your df.nottem.stl data.frame contains a Date data.frame, so you should be using the date$time column. Then using the layer_paths function instead of the layer_lines will make it work. I always find layer_paths working better than layer_lines:
So this will work:
library(ggvis)
df.nottem.stl%>%
ggvis(~date$time, ~trend)%>%
#for points
layer_points() %>%
#for lines
layer_paths()
Output:

Stacked bar in R

I have a table exported in csv from PostgreSQL and I'd like to create a stacked bar graph in R. It's my first project in R.
Here's my data and what I want to do:
It the quality of the feeder bus service for a certain provider in the area. For each user of the train, we assign a service quality based of synchronization between the bus and the train at the train stations and calculate the percentage of user that have a ideal or very good service, a correct service, a deficient service or no service at all (linked to that question in gis.stackexchange)
So, It's like to use my first column as my x-axis labels and my headers as my categories. The data is already normalized to 100% for each row.
In Excel, it's a couple of clicks and I wouldn't mind typing a couple of line of codes since it's the final result of an already quite long plpgsql script... I'd prefer to continue to code instead of moving to Excel (I also have dozens of those to do).
So, I tried to create a stacked bar using the examples in Nathan Yau's "Visualize This" and the book "R in Action" and wasn't quite successful. Normally, their examples use data that they aggregate with R and use that. Mine is already aggregated.
So, I've finally come up with something that works in R:
but I had to transform my data quite a bit:
I had to transpose my table and remove my now-row (ex-column) identifier.
Here's my code:
# load libraries
library(ggplot2)
library(reshape2)
# load data
stl <- read.csv("D:/TEMP/rabat/_stl_rabattement_stats_mtl.csv", sep=";", header=TRUE)
# reshape for plotting
stl_matrix <- as.matrix(stl)
# make a quick plot
barplot(stl_matrix, border=NA, space=0.1, ylim=c(0, 100), xlab="Trains", ylab="%",
main="Qualité du rabattement, STL", las = 3)
Is there any way that I could use my original csv and have the same result?
I'm a little lost here...
Thanks!!!!
Try the ggplot2 and reshape library. You should be able to get the chart you want with
stl$train_order <- as.numeric(rownames(stl))
stl.r <- melt(stl, id.vars = c("train_no", "train_order"))
stl.r$train_no <- factor(
stl.r$train_no,
levels = stl$train_no[order(stl$train_order)])
ggplot(stl.r, aes(x = factor(train_no), y = value, fill = variable)) + geom_bar(stat = 'identity')
It appears that you transposed the matrix manually. This can be done in R with the t() function.
Add the following line after the as.matrix(stl) line:
stl_matrix <- t(stl_matrix)

Sorting of categorical variables in ggplot

Good day, I wish to produce a graphic using ggplot2, but not using its default sorting of the categorical variable (alphabetically, in script: letters), but using the associated value of a continuous variable (in script: number) .
Here is an example script:
library(ggplot2)
trial<-data.frame(letters=letters, numbers=runif(n=26,min=1,max=26))
trial<-trial[sample(1:26,26),]
trial.plot<-qplot(x=numbers, y=letters, data=trial)
trial.plot
trial<-trial[order(trial$numbers),]
trial.plot<-qplot(x=numbers, y=letters, data=trial)
trial.plot
trial.plot+stat_sort(variable=numbers)
The last line does not work.
I'm pretty sure stat_sort does not exist, so it's not surprising that it doesn't work as you think it should. Luckily, there's the reorder() function which reorders the level of a categorical variable depending on the values of a second variable. I think this should do what you want:
trial.plot <- qplot( x = numbers, y = reorder(letters, numbers), data = trial)
trial.plot
If you could be more specific about how you want it to look, I think the community could make improvements on my answer, regardless is this what you are looking for:
qplot(numbers, reorder(letters, numbers), data=trial)

Resources