I am using the R package circlize to create a circos plot.
I am aiming to create something similar to Figure 2 in this paper: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004812.
I would like to custom specify where to shade parts of the chromosomes with different, manually entered colours, but I am struggling.
Reproducible code:
### load packages
library("tidyverse")
library("circlize")
### Generate mock data
# Chromosome sizes - genome with 5 chromosomes size 1-5kb
chrom <- c(1,2,3,4,5)
start <- c(0,0,0,0,0)
end <- c(1000,1700,2200,3100,5000)
chr_sizes_df <- data.frame(chrom,start,end)
# Areas of interest - where I want 'shade_col' shading
chrom_num <- c(1,1,2,2,3,3,3,4,4,5,5,5)
chr <- c("chr1","chr1","chr2","chr2","chr3","chr3","chr3","chr4","chr4","chr5","chr5","chr5")
start <- c(0,900,0,1550,0,800,2000,0,2800,0,3000,4800)
end <- c(150,1000,185,1700,210,1000,2200,300,3100,400,3300,5000)
chr_regions_df <- data.frame(chr,start,end)
# Recombinations - to be depicted with lines connecting chromosomes
chr1 <- c(1,2,2,3,3,3,3,4,4,5,5,5,5)
chr1_pos <- c(100,150,170,20,2100,900,950,200,3000,100,3100,3300,4900)
chr2 <- c(1,4,2,1,3,3,5,5,4,3,5,4,2)
chr2_pos <- c(100,3000,170,100,100,900,3200,4800, 3050,10,3100,3300,40)
location <- c("Non coding", "Coding", "Non coding", "Non coding", "Coding", "Coding", "Coding", "Non coding", "Non coding", "Non coding", "Coding", "Coding", "Non coding")
sv_df <- data.frame(chr1,chr1_pos,chr2,chr2_pos,location)
# SNPs - to be depicted with dots or lines
chrom <- c(1,1,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5,5,5,5)
pos <- c(350,600,200,650,700,300,1100,1500,2000,400,1500,1800,2000,2700,200,1000,1050,2000,2500,4950)
snp_df <- data.frame(chrom,pos)
### Prepare for plot
# Generate colour scheme
sv_df$location_col <- ifelse(sv_df$location=="Coding", "#FB8072",
ifelse(sv_df$location=="Non coding", "#80B1D3",
"#e9e9e9")
)
# Specify chromosome block shading
shade_col <- "#3F75AB"
# Format rearrangement data
nuc1 <- sv_df %>% select(chr1,chr1_pos) # Start positions
nuc2 <- sv_df %>% select(chr2,chr2_pos) # End positions
### Generating plot
## Basic circos graphic parameters
circos.clear()
circos.par(cell.padding=c(0,0,0,0),
track.margin=c(0,0.05),
start.degree = 90,
gap.degree = 3,
clock.wise = TRUE)
## Sector details
circos.initialize(factors = chr_sizes_df$chrom,
xlim = cbind(chr_sizes_df$start, chr_sizes_df$end))
## Generate basic outline with chromosomes
circos.track(ylim=c(0, 1), panel.fun=function(x, y) {
chr=CELL_META$sector.index
xlim=CELL_META$xlim
ylim=CELL_META$ylim
circos.text(mean(xlim), mean(ylim), chr)
},bg.col="#cde3f9", bg.border=TRUE, track.height=0.1)
## Add recombinations - coloured by coding vs non-coding etc
circos.genomicLink(nuc1, nuc2,
col=sv_df$location_col,
h.ratio=0.6,
lwd=3)
The above code produces the plot shown below:
I want to use chr_regions_df to specify the chromosome areas for shading using shade_col. Have tried a few things - draw.sector doesn't work well because it requires to know the angles rather than positions, which is hard to work out. There are cytoband options using circos.initializeWithIdeogram() but this seems to use pre-specified cytoband formats for certain species, rather than custom made areas for shading as in my use case (also why I couldn't use supplying user defined color in r circlize package).
Many thanks for your help.
To draw custom colored areas within chromosomes, use circos.genomicTrackPlotRegion, where you need to provide a bed-like data frame with an additional column specifying the color to be used for each area.
#the first column should match the chromosome names used in 'circos.initialize'
chrom_num <- c(1,1,2,2,3,3,3,4,4,5,5,5)
#chr <- c("chr1","chr1","chr2","chr2","chr3","chr3","chr3","chr4","chr4","chr5","chr5","chr5")
start <- c(0,900,0,1550,0,800,2000,0,2800,0,3000,4800)
end <- c(150,1000,185,1700,210,1000,2200,300,3100,400,3300,5000)
shade_col <- c("blue","red","blue","red","blue","red","blue","red","blue","red","blue","red")
chr_regions_df <- data.frame(chrom_num,start,end,shade_col)
After running circos.initialize, draw the chromosomes with their shaded area. In panel.fun, the first argument (region) contains the coordinates of each feature while the second (value) contains all but the first 3 columns of the data frame.
circos.genomicTrackPlotRegion(chr_regions_df, ylim = c(0, 1),
panel.fun = function(region, value, ...) {
col = value$shade_col
circos.genomicRect(region, value,
ybottom = 0, ytop = 1,
col = col, border = NA)
xlim = get.cell.meta.data("xlim")
circos.rect(xlim[1], 0, xlim[2], 1, border = "black")
ylim = get.cell.meta.data("ylim")
chr = get.current.sector.index()
circos.text(mean(xlim), mean(ylim), chr)
}, bg.col = "#cde3f9", bg.border=TRUE, track.height=0.1)
I'm doing some basic statistics in R and I'm trying to have a different color for each iteration of the loop. So all the data points for i=1 should have the same color, all the data points for i=2 should have the same color etc. The best would be to have different colors for the varying i ranging from yellow to blue for exemple. (I already tried to deal with Colorramp etc. but I didn't manage to get it done.)
Thanks for your help.
library(ggplot2)
#dput(thedata[,2])
#c(1.28994585412464, 1.1317747077577, 1.28029504741834, 1.41172820353708,
#1.13172920065253, 1.40276516298315, 1.43679599499374, 1.90618019359643,
#2.33626745030772, 1.98362330686504, 2.22606615548188, 2.40238822720322)
#dput(thedata[,4])
#c(NA, -1.7394747097211, 2.93081902519318, -0.33212717268786,
#-1.78796119503752, -0.5080871442002, -0.10110379236627, 0.18977632798691,
#1.7514277696687, 1.50275797771879, -0.74632159611221, 0.0978774103243802)
#OR
#dput(thedata[,c(2,4)])
#structure(list(LRUN74TTFRA156N = c(1.28994585412464, 1.1317747077577,
#1.28029504741834, 1.41172820353708, 1.13172920065253, 1.40276516298315,
#1.43679599499374, 1.90618019359643, 2.33626745030772, 1.98362330686504,
#2.22606615548188, 2.40238822720322), SELF = c(NA, -1.7394747097211,
#2.93081902519318, -0.33212717268786, -1.78796119503752, -0.5080871442002,
#-0.10110379236627, 0.18977632798691, 1.7514277696687, 1.50275797771879,
#-0.74632159611221, 0.0978774103243802)), row.names = c(NA, 12L
#), class = "data.frame")
x1=1
xn=x1+3
plot(0,0,col="white",xlim=c(0,12),ylim=c(-5,7.5))
for(i in 1:3){
y=thedata[x1:xn,4]
x=thedata[x1:xn,2]
reg<-lm(y~x)
points(x,y,col=colors()[i])
abline(reg,col=colors()[i])
x1=x1+4
xn=x1+3
}
The basic idea of colorRamp and colorRampPalette is that they are functionals - they are functions that return functions.
From the help page:
colorRampPalette returns a function that takes an integer argument (the required number of colors) and returns a character vector of colors (see rgb) interpolating the given sequence (similar to heat.colors or terrain.colors).
So, we'll get a yellow-to-blue palette function from colorRampPalette, and then we'll give it the number of colors we want along that ramp to actually get the colors:
# create the palette function
my_palette = colorRampPalette(colors = c("yellow", "blue"))
# test it out, see how it works
my_palette(3)
# [1] "#FFFF00" "#7F7F7F" "#0000FF"
my_palette(5)
# [1] "#FFFF00" "#BFBF3F" "#7F7F7F" "#3F3FBF" "#0000FF"
# Now on with our plot
x1 = 1
xn = x1 + 3
# Set the number of iterations (number of colors needed) as a variable:
nn = 3
# Get the colors from our palettte function
my_cols = my_palette(nn)
# type = 'n' means nothing will be plotted, no points, no lines
plot(0, 0, type = 'n',
xlim = c(0, 12),
ylim = c(-5, 7.5))
# plot
for (i in 1:nn) {
y = thedata[x1:xn, 2]
x = thedata[x1:xn, 1]
reg <- lm(y ~ x)
# use the ith color
points(x, y, col = my_cols[i])
abline(reg, col = my_cols[i])
x1 = x1 + 4
xn = x1 + 3
}
You can play with just visualizing the palette---try out the following code for different n values. You can also try out different options, maybe different starting colors. I like the results better with the space = "Lab" argument for the palette.
n = 10
my_palette = colorRampPalette(colors = c("yellow", "blue"), space = "Lab")
n_palette = my_palette(n)
plot(1:n, rep(1, n), col = n_palette, pch = 15, cex = 4)
Besides of lacking a reproducible example, you seem to have some misconceptions.
First, the function colors doesn't take a numeric argument, see ?colors. So if you want to fetch a different color in each iteration, you need to call it like colors()[i]. The code should look something similar to this (in absence of a reproducible example):
for (i in 20:30){
plot(1:10, 1:10, col = colors()[i])
}
Please bear in mind that the call of x1 and xn in your first and second lines inside the for loop, before defining them will cause an error too.
I've taken this code from this site to make a correlation matrix heatmap. How do I format the numbers in the heatmap to have only 2 decimal places worth?:
http://blog.revolutionanalytics.com/2014/08/quantitative-finance-applications-in-r-8.html
library(xts)
library(Quandl)
my_start_date <- "1998-01-05"
SP500.Q <- Quandl("YAHOO/INDEX_GSPC", start_date = my_start_date, type = "xts")
RUSS2000.Q <- Quandl("YAHOO/INDEX_RUT", start_date = my_start_date, type = "xts")
NIKKEI.Q <- Quandl("NIKKEI/INDEX", start_date = my_start_date, type = "xts")
HANG_SENG.Q <- Quandl("YAHOO/INDEX_HSI", start_date = my_start_date, type = "xts")
DAX.Q <- Quandl("YAHOO/INDEX_GDAXI", start_date = my_start_date, type = "xts")
CAC.Q <- Quandl("YAHOO/INDEX_FCHI", start_date = my_start_date, type = "xts")
KOSPI.Q <- Quandl("YAHOO/INDEX_KS11", start_date = my_start_date, type = "xts")
# Depending on the index, the final price for each day is either
# "Adjusted Close" or "Close Price". Extract this single column for each:
SP500 <- SP500.Q[,"Adjusted Close"]
RUSS2000 <- RUSS2000.Q[,"Adjusted Close"]
DAX <- DAX.Q[,"Adjusted Close"]
CAC <- CAC.Q[,"Adjusted Close"]
KOSPI <- KOSPI.Q[,"Adjusted Close"]
NIKKEI <- NIKKEI.Q[,"Close Price"]
HANG_SENG <- HANG_SENG.Q[,"Adjusted Close"]
# The xts merge(.) function will only accept two series at a time.
# We can, however, merge multiple columns by downcasting to *zoo* objects.
# Remark: "all = FALSE" uses an inner join to merge the data.
z <- merge(as.zoo(SP500), as.zoo(RUSS2000), as.zoo(DAX), as.zoo(CAC),
as.zoo(KOSPI), as.zoo(NIKKEI), as.zoo(HANG_SENG), all = FALSE)
# Set the column names; these will be used in the heat maps:
myColnames <- c("SP500","RUSS2000","DAX","CAC","KOSPI","NIKKEI","HANG_SENG")
colnames(z) <- myColnames
# Cast back to an xts object:
mktPrices <- as.xts(z)
# Next, calculate log returns:
mktRtns <- diff(log(mktPrices), lag = 1)
head(mktRtns)
mktRtns <- mktRtns[-1, ] # Remove resulting NA in the 1st row
require(gplots)
generate_heat_map <- function(correlationMatrix, title)
{
heatmap.2(x = correlationMatrix, # the correlation matrix input
cellnote = correlationMatrix, # places correlation value in each cell
main = title, # heat map title
symm = TRUE, # configure diagram as standard correlation matrix
dendrogram="none", # do not draw a row dendrogram
Rowv = FALSE, # keep ordering consistent
trace="none", # turns off trace lines inside the heat map
density.info="none", # turns off density plot inside color legend
notecol="black") # set font color of cell labels to black
}
corr1 <- cor(mktRtns) * 100
generate_heat_map(corr1, "Correlations of World Market Returns, Jan 1998 - Present")
You might want the color values to use the full unrounded number, but show a rounded number.
In that case do this...
generate_heat_map <- function(correlationMatrix, title)
{
heatmap.2(x = correlationMatrix, # the correlation matrix input
cellnote = round(correlationMatrix, 2), # places correlation value in each cell
main = title, # heat map title
symm = TRUE, # configure diagram as standard correlation matrix
dendrogram="none", # do not draw a row dendrogram
Rowv = FALSE, # keep ordering consistent
trace="none", # turns off trace lines inside the heat map
density.info="none", # turns off density plot inside color legend
notecol="black") # set font color of cell labels to black
}
If you want the colors to match the numbers shown exactly. Leave the existing function alone and change the input...
corr1 <- round(cor(mktRtns) * 100, 2)
generate_heat_map(corr1, "Correlations of World Market Returns, Jan 1998 - Present")
This is an advanced question.
I use my own layout for the chartSeries quantmod function, and I can even create my own newTA. Everything works fine. But ...
What I want to do but I can't:
a) Manipulate the legend of each of the 3 charts:
- move to other corner, (from "topleft" to "topright")
- change the content
- remove completely if needed ...
b) My indicator generates 2 legends:
value1
value2
same as above ... how could I modify them? how could I delete them?
c) control position and range of yaxis (place it on the left / right
or even remove them
same when there is a secundary axis on the graph
d) Modify main legend (the one in the top right
where is written the range of dates
A working sample code:
# Load Library
library(quantmod)
# Get Data
getSymbols("SPY", src="yahoo", from = "2010-01-01")
# Create my indicator (30 values)
value1 <- rnorm(30, mean = 50, sd = 25)
value2 <- rnorm(30, mean = 50, sd = 25)
# merge with the first 30 rows of SPY
dataset <- merge(first(SPY, n = 30),
value1,
value2)
# **** data has now 8 columns:
# - Open
# - High
# - Low
# - Close
# - Volume
# - Adjusted
# - a (my indicator value 1)
# - b (my indicator value 2)
#
# create my TA function - This could also be achieve using the preFUN option of newTA
myTAfun <- function(a){
# input: a: function will receive whole dataset
a[,7:8] # just return my indicator values
}
# create my indicator to add to chartSeries
newMyTA <- newTA(FUN = myTAfun, # chartSeries will pass whole dataset,
# I just want to process the last 2 columns
lty = c("solid", "dotted"),
legend.name = "My_TA",
col = c("red", "blue")
)
# define my layout
layout(matrix(c(1, 2, 3), 3, 1),
heights = c(2.5, 1, 1.5)
)
# create the chart
chartSeries(dataset,
type = "candlesticks",
main = "",
show.grid = FALSE,
name = "My_Indicator_Name",
layout = NULL, # bypass internal layout
up.col = "blue",
dn.col = "red",
TA = c(newMyTA(),
addVo()
),
plot = TRUE,
theme = chartTheme("wsj")
)
I have tried using legend command, and also the option legend.name (with very limited control of the output).
I have had a look at the chob object returned by chartSeries, but I can't figure out what to do next ...
Image below:
After some time learning a little bit more about R internals, S3 and S4 objects, and quantmod package, I've come up with the solution. It can be used to change anything in the graph.
A) If the legend belongs to a secundary indicator window:
Do not print the chartSeries (type option plot = FALSE) and get the returned "chob" object.
In one of the slots of the "chob" object there is a "chobTA" object with 2 params related to legend. Set them to NULL.
Finally, call the hidden function chartSeries.chob
In my case:
#get the chob object
my.chob <- chartSeries(dataset,
type = "candlesticks",
main = "",
show.grid = FALSE,
name = "My_Indicator_Name",
layout = NULL, # bypass internal layout
up.col = "blue",
dn.col = "red",
TA = c(newMyTA(),
addVo()
),
plot = FALSE, # do not plot, just get the chob
#plot = TRUE,
theme = chartTheme("wsj")
)
#if the legend is in a secundary window, and represents
#an indicator created with newTA(), this will work:
my.chob#passed.args$TA[[1]]#params$legend <- NULL
my.chob#passed.args$TA[[1]]#params$legend.name <- NULL
quantmod:::chartSeries.chob(my.chob)
B) In any other case, it is possible to modify "chartSeries.chob", "chartTA", "chartBBands", etc and then call chartSeries.chob
In my case:
fixInNamespace("chartSeries.chob", ns = "quantmod")
quantmod:::chartSeries.chob(my.chob)
It is just enough with adding "#" at the beginning of the lines related to legend().
That's it.