NP chart using ggplot2 - r

how i can generate NP chart using ggplot2?
I made simple Rscript which generates bar, point charts. I am supplying data by csv file. how many columns do i need to specify and in gplot functions what arguments do i need to pass?
I am very new to R, ggplots.
EDIT :
This is what is meant by an NP chart.
Current code attempt:
#load library ggplot2
library(ggplot2)
#get arguments
args <- commandArgs(TRUE)
pdfname <- args[1]
graphtype <- args[2]
datafile <- args[3]
#read csv file
tasks <- read.csv(datafile , header = T)
#name the pdf from passed arg 1
pdf(pdfname)
#main magic that generates the graph
qplot(x,y, data=tasks, geom = graphtype)
#clean up
dev.off()
In .csv file there are 2 columns x,y i call this script by Rscript cne.R 11_16.pdf "point" "data.csv".
Thanks you very much #mathematical.coffee this is what i need but
1> I am reading data from csv file which contains following data
this is my data
Month,Rate
"Jan","37.50"
"Feb","32.94"
"Mar","25.00"
"Apr","33.33"
"May","33.08"
"Jun","29.09"
"Jul","12.00"
"Aug","10.00"
"Sep","6.00"
"Oct","23.00"
"Nov","9.00"
"Dec","14.00"
2> I want to display value on each plotting point. and also display value for UCL,Cl,LCL, and give different label to x and y.
Problem when i read data it is not in the same order as in csv file. how to fix it?

You combine ggplot(tasks,aes(x=x,y=y)) with geom_line and geom_point to get the lines connected by points.
If you additionally want the UCL/LCL/etc drawn you add in a geom_hline (horizontal line).
To add text to these lines you can use geom_text.
An example:
library(ggplot2)
# generate some data to use, say monthly up to a year from today.
n <- 12
tasks <- data.frame(
x = seq(Sys.Date(),by="month",length=n),
y = runif(n) )
CL = median(tasks$y) # substitue however you calculate CL here
LCL = quantile(tasks$y,.25) # substitue however you calculate LCL here
UCL = quantile(tasks$y,.75) # substitue however you calculate UCL here
limits = c(UCL,CL,LCL)
lbls = c('UCL','CL','LCL')
p <- ggplot(tasks,aes(x=x,y=y)) + # store x/y values
geom_line() + # add line
geom_point(aes(colour=(y>LCL&y<UCL))) + # add points, colour if outside limits
opts(legend.position='none', # remove legend for colours
axis.text.x=theme_text(angle=90)) # rotate x axis labels
# Now add in the limits.
# horizontal lines + dashed for upper/lower and solid for the CL
p <- p + geom_hline(aes(yintercept=limits,linetype=lbls)) + # draw lines
geom_text(aes(y=limits,x=tasks$x[n],label=lbls,vjust=-0.2,cex=.8)) # draw text
# display
print(p)
which gives:

Related

Plot single block of color in R

I am trying to figure out how to plot a single block of color in R. I am trying to visualize a region of a genome with color. I am starting with a matrix that has 1 row and 6049 columns.
l1_canon <- matrix( nrow = 1, ncol = 6049, data = "_" )
Next, I have blocks that differentiate major regions of this element:
l1_canon[,1:909] <- "5' UTR"
l1_canon[,910:1923] <- "ORF1"
l1_canon[,1990:5814] <- "ORF2"
l1_canon[,49:420] <- "CPG"
l1_canon[,5815:6049] <- "3' UTR"
l1_canon[,211:225] <- "RXRA::VDR"
I have assigned colors to the different categories:
l1_colors <- list()
l1_colors[["5' UTR"]] <- "#26A064" # "#ea0064"
l1_colors[["ORF1"]] <- "#3095C7" # "#008a3f"
l1_colors[["ORF2"]] <- "#CA6BAA" # "#116eff"
l1_colors[["CPG"]] <- "#B38241" # "#cf00dc"
l1_colors[["3' UTR"]] <- "#CCCCCC" # "#dddddd"
l1_colors[["RXRA::VDR"]] <- "#FFFFFF"
l1_colors[["_"]] <- "#000000"
But I can't figure out how to plot this. I am looking for something like the color ramp functions in R , and have been trying to adapt the code unsuccessfully.
I tried assigning colors like so
for ( i in l1_canon ){
l1_color <- l1_colors[ l1_canon ]
}
and using it in the code that was used to generate the color ramp plots, but I am getting errors. I am aware that having 6000+ columns is going to make this weird visually, but, it's what I need! I am hoping I can make the individual color blocks small enough to fit on a screen. Eventually, this bar is going to be annotation above another image.
TY for your help! :)
I don't fully understand what you want, but you could use ggplot2 as follows:
# Find the run lengths of the regions
rle1 = rle(l1_canon[1,])
# Turn the run lengths into a data frame
df=data.frame(lengths=rle1$lengths, V=rle1$values)
# Align the colours with the regions
df$color <- unlist(l1_colors)[df$V]
# Plot a single stacked bar on its side with no annotation
ggplot(df, aes(x=1,group=seq_along(V),label=V, fill=color,y=lengths)) +
geom_bar(stat="identity",color="black")+
scale_fill_identity() +
theme_void() +
coord_flip()+
scale_y_reverse()

R : Create list of plots with for loop

I try to create a list of plots of my data using a for loop to filter (="TAB_tmp2") and add the new plot in the list (="ListeGRAPH"). I think the problem comes from the difference of filter data table (="TAB_tmp2").
I have read several topics on the web about that but I can't find a solution which could works in this case.
My code :
rm(list=ls()) # delete objects
#====================================
# Create data for the example
#====================================
TAB = data.frame(Types_Mesures = c(rep(1,3),rep(2,5),rep(3,10)))
TAB$ID_mesuresParType=NA
TAB$Mesures=log(c(1:length(TAB$Types_Mesures)))
Nb_Types=length(unique(TAB$Types_Mesures)) # in the real data, the number of "Types_Mesures" can change
for (x in 1:Nb_Types) {
TAB_tmp=TAB[TAB$Types_Mesures==x,2]
TAB[TAB$Types_Mesures==x,2]=c(1:length(TAB_tmp))
}
#====================================
# List of plots
#====================================
library(gridExtra)
library(ggplot2)
INPUTDirectory= "D:/TEST/"
setwd(dir=INPUTDirectory)
ListeGRAPH <- list()
for (x in 1:Nb_Types) {
TAB_tmp2=TAB[TAB$Types_Mesures==x,]
ListeGRAPH[[x]] <- ggplot(data = TAB_tmp2) +
geom_line(aes(x = TAB_tmp2$ID_mesuresParType, y = TAB_tmp2$Mesures))
# #Save graph
# png(filename = paste("TAB_plot_T",x,".png", sep = ""))
# print(ListeGRAPH[[x]])
# graphics.off()
}
gridExtra::grid.arrange(grobs = ListeGRAPH)
When I run the code, I have this error :
Error: Aesthetics must be either length 1 or the same as the data (3):
x, y
It seems that grid.arrange don't accept plots of different dimensions ?
How could I do to make the list of plots with this kind of table ? In my real data the number of "Types_Mesures" can change.
More over, I think the for loop don't allow to use a temporary variable (="TAB_tmp2") to create the list of plot but this code works when I save my plot in PNG files.
Thanks a lot for you help !
The problem is actually not with grid.arrange. When you're creating the plots with ggplot, you do not need to use $ for indexing of columns. So instead of:
ListeGRAPH[[x]] <- ggplot(data = TAB_tmp2) +
geom_line(aes(x = TAB_tmp2$ID_mesuresParType, y = TAB_tmp2$Mesures))
you should use:
ListeGRAPH[[x]] <- ggplot(data = TAB_tmp2) +
geom_line(aes(x = ID_mesuresParType, y = Mesures))
and then you will be able to plot the results using grid.arrange.

Extracting the exact coordinates of a mouse click in an interactive plot

In short: I'm looking for a way to get the exact coordinates of a series of mouse positions (on-clicks) in an interactive x/y scatter plot rendered by ggplot2 and ggplotly.
I'm aware that plotly (and several other interactive plotting packages for R) can be combined with Shiny, where a box- or lazzo select can return a list of all data points within the selected subspace. This list will be HUGE in most of the datasets I'm analysing, however, and I need to be able to do the analysis reproducibly in an R markdown format (writing a few, mostly less than 5-6, point coordinates is much more readable). Furthermore, I have to know the exact positions of the clicks to be able to extract points within the same polygon of points in a different dataset, so a list of points within the selection in one dataset is not useful.
The grid.locator() function from the grid package does almost what I'm looking for (the one wrapped in fx gglocator), however I hope there is a way to do the same within an interactive plot rendered by plotly (or maybe something else that I don't know of?) as the data sets are often HUGE (see the plot below) and thus being able to zoom in and out interactively is very much appreciated during several iterations of analysis.
Normally I have to rescale the axes several times to simulate zooming in and out which is exhausting when doing it MANY times. As you can see in the plot above, there is a LOT of information in the plots to explore (the plot is about 300MB in memory).
Below is a small reprex of how I'm currently doing it using grid.locator on a static plot:
library(ggplot2)
library(grid)
p <- ggplot(mtcars, aes(wt, mpg)) +
geom_point()
locator <- function(p) {
# Build ggplot object
ggobj <- ggplot_build(p)
# Extract coordinates
xr <- ggobj$layout$panel_ranges[[1]]$x.range
yr <- ggobj$layout$panel_ranges[[1]]$y.range
# Variable for selected points
selection <- data.frame(x = as.numeric(), y = as.numeric())
colnames(selection) <- c(ggobj$plot$mapping$x, ggobj$plot$mapping$y)
# Detect and move to plot area viewport
suppressWarnings(print(ggobj$plot))
panels <- unlist(current.vpTree()) %>%
grep("panel", ., fixed = TRUE, value = TRUE)
p_n <- length(panels)
seekViewport(panels, recording=TRUE)
pushViewport(viewport(width=1, height=1))
# Select point, plot, store and repeat
for (i in 1:10){
tmp <- grid.locator('native')
if (is.null(tmp)) break
grid.points(tmp$x,tmp$y, pch = 16, gp=gpar(cex=0.5, col="darkred"))
selection[i, ] <- as.numeric(tmp)
}
grid.polygon(x= unit(selection[,1], "native"), y= unit(selection[,2], "native"), gp=gpar(fill=NA))
#return a data frame with the coordinates of the selection
return(selection)
}
locator(p)
and from here use the point.in.polygon function to subset the data based on the selection.
A possible solution could be to add, say 100x100, invisible points to the plot and then use the plotly_click feature of event_data() in a Shiny app, but this is not at all ideal.
Thanks in advance for your ideas or solutions, I hope my question was clear enough.
-- Kasper
I used ggplot2. Besides the materials at https://shiny.rstudio.com/articles/plot-interaction.html, I'd like to mention the following:
Firstly, when you create the plot, don't use "print( )" within "renderPlot( )", or the coordinates would be wrong. For instance, if you have the following in UI:
plotOutput("myplot", click = "myclick")
The following in the Server would work:
output$myplot <- renderPlot({
p = ggplot(data = mtcars, aes(x=mpg, y=hp)) + geom_point()
p
})
But the clicking coordinates would be wrong if you do:
output$myplot <- renderPlot({
p = ggplot(data = mtcars, aes(x=mpg, y=hp)) + geom_point()
print(p)
})
Then, you could store the coordinates by adding to the Server:
mydata = reactiveValues(x_values = c(), y_values = c())
observeEvent(input$myclick, {
mydata$x_values = c(mydata$x_values, input$myclick$x)
mydata$y_values = c(mydata$y_values, input$myclick$y)
})
In addition to X-Y coordinates, when you use facet with ggplot2, you refer to the clicked facet panel by
input$myclick$panelvar1

Creating Hexbins with Dates in R hexbin()

I am trying to create hexbins where the x-axis is a date using the hexbin function in the hexbin package in R. When I feed in my data, it seems to convert the dates into a numeric, which gets displayed on the x-axis. I want it force the x-axis to be a date.
#Create Hex Bins
hbin <- hexbin(xData$Date, xData$YAxis, xbins = 80)
#Plot using rBokeh
figure() %>%
ly_hexbin(hbin)
This gives me:
Here's a brute force approach using the underlying grid plotting package. The axes are ugly; maybe someone with better grid skills than I could pretty them up.
# make some data
x = seq.Date(as.Date("2015-01-01"),as.Date("2015-12-31"),by='days')
y = sample(x)
# make the plot and capture the plot
p <- plot(hexbin(x,y),yaxt='n',xaxt='n')
# calculate the ticks
x_ticks_date <-
x_ticks <- axTicks(1, log = FALSE, usr = as.numeric(range(x)),
axp=c(as.numeric(range(x)) ,5))
class(x_ticks_date) <- 'Date'
y_ticks_date <-
y_ticks <- axTicks(1, log = FALSE, usr = as.numeric(range(y)),
axp=c(as.numeric(range(y)) ,5))
class(y_ticks_date) <- 'Date'
# push the ticks to the view port.
pushViewport(p$plot.vp#hexVp.off)
grid.xaxis(at=x_ticks, label = format(y_ticks_date))
grid.yaxis(at=y_ticks, label = format(y_ticks_date))

How to annotate subplots with ggplot from rpy2?

I'm using Rpy2 to plot dataframes with ggplot2. I make the following plot:
p = ggplot2.ggplot(iris) + \
ggplot2.geom_point(ggplot2.aes_string(x="Sepal.Length", y="Sepal.Width")) + \
ggplot2.facet_wrap(Formula("~Species"))
p.plot()
r["dev.off"]()
I'd like to annotate each subplot with some statistics about the plot. For example, I'd like to compute the correlation between each x/y subplot and place it on the top right corner of the plot. How can this be done? Ideally I'd like to convert the dataframe from R to a Python object, compute the correlations and then project them onto the scatters. The following conversion does not work, but this is how I'm trying to do it:
# This does not work
#iris_df = pandas.DataFrame({"Sepal.Length": rpy2.robjects.default_ri2py(iris.rx("Sepal.Length")),
# "Sepal.Width": rpy2.robjects.default_ri2py(iris.rx("Sepal.Width")),
# "Species": rpy2.robjects.default_ri2py(iris.rx("Species"))})
# So we access iris using R to compute the correlation
x = iris_py.rx("Sepal.Length")
y = iris_py.rx("Sepal.Width")
# compute r.cor(x, y) and divide up by Species
# Assume we get a vector of length Species saying what the
# correlation is for each Species' Petal Length/Width
p = ggplot2.ggplot(iris) + \
ggplot2.geom_point(ggplot2.aes_string(x="Sepal.Length", y="Sepal.Width")) + \
ggplot2.facet_wrap(Formula("~Species")) + \
# ...
# How to project correlation?
p.plot()
r["dev.off"]()
But assuming I could actually access the R dataframe from Python, how could I plot these correlations? thanks.
The solution is to create a dataframe with a label for each sample plotted. The dataframe's column should match the corresponding column name of the dataframe with the original data. Then this can be plotted with:
p += ggplot2.geom_text(data=labels_df, mapping=ggplot2.aes_string(x="1", y="1", mapping="labels"))
where labels_df is the dataframe containing the labels and labels is the column name of labels_df with the labels to be plotted. (1,1) in this case will be the coordinate position of the label in each subplot.
I found that #user248237dfsf's answer didn't work for me. ggplot got confused between the data frame I was plotting and the data frame I was using for labels.
Instead, I used
ggplot2_env = robjects.baseenv'as.environment'
class GBaseObject(robjects.RObject):
#classmethod
def new(*args, **kwargs):
args_list = list(args)
cls = args_list.pop(0)
res = cls(cls._constructor(*args_list, **kwargs))
return res
class Annotate(GBaseObject):
_constructor = ggplot2_env['annotate']
annotate = Annotate.new
Now, I have something that works just like the standard annotate.
annotate(geom = "text", x = 1, y = 1, label = "MPC")
One minor comment: I don't know if this will work with faceting.

Resources