R - Add trace in Plotly if condition is met - r

I am creating a scatter plot graph and I would like the points connected by a trace if a condition is met. My data is separated into sequences, if an X and Y coordinate is part of the same sequence, I would like there to be a trace. My sample data and code snippet is below.
Sample Data:
X Y Seq
1 3 1
2 5 1
1 4 1
3 1 2
4 5 2
6 3 3
3 4 3
In this example I would like points (1, 3), (2, 5), (1, 4) traced, points (3, 1), (4, 5) traced, and points (6, 3), (3, 4) traced. There should be a break in the trace if a new sequence starts.
Code:
plot_ly (data, x = data$X , y = data$Y,
type = "scatter",
mode="markers")%>%
add_trace(data$Seq==shift(data$Seq, type="lag"), mode="lines")
Here is an image of the plot that my actual data is giving me. You can see the points are being plotted but there is no break.

The problem lies in your use of add_trace. You're passing what I assume is a subset of your data to the first argument of add_trace when this argument expects an existing plot/trace. The problem is, since you're piping in with %>% the function is inheriting the original data and ignoring your subset.
Note that the below will give the same plot even though my variable NO has nothing to do with the plot:
X=c(1,2,1,3,4,6,3)
Y=c(3,5,4,1,5,3,4)
seq=c(1,1,1,2,2,3,3)
dataX <- data.frame(X,Y,seq)
NO <- "this won't work"
plot_ly plot_ly (dataX, x = dataX$X , y = dataX$Y,
type = "scatter",
mode="markers") %>%
add_trace(NO, mode="lines")
You can fix this with inherit=F, but then it won't work because add_trace is trying to add something to the plot NO which isn't a plot (and your subset wouldn't work either)
plot_ly (dataX, x = dataX$X , y = dataX$Y,
type = "scatter",
mode="markers") %>%
add_trace(NO, mode="lines", inherit=FALSE)
## No trace type specified:
When you add traces you want to be explicit in the x= and y=. Then you can allow it to automatically inherit the previous plot/trace, or specify one. As for what you're trying to do, you could build it up with a loop:
#make the plot
p <- plot_ly (dataX, x = dataX$X , y = dataX$Y,
type = "scatter",
mode="markers")
#build it up
for(i in levels(factor(dataX$seq))){
#subset data
dataFilt <- dataX[dataX$seq==i,]
#add it
p <- add_trace(p, x=dataFilt$X, y=dataFilt$Y,mode="lines",color ='yellow')
}
p
This makes a new series each time so it's a bit of a work around. You can hide the legend and it looks correct:
p %>%
layout(showlegend = FALSE)

Related

Percentage stacked barplot in Julia

I would like to create a percentage stacked barplot in Julia. In R we may do the following:
set.seed(7)
data <- matrix(sample(1:30,6), nrow=3)
colnames(data) <- c("A","B")
rownames(data) <- c("V1","V2","V3")
library(RColorBrewer)
cols <- brewer.pal(3, "Pastel1")
df_percentage <- apply(data, 2, function(x){x*100/sum(x,na.rm=T)})
barplot(df_percentage, col=cols, border="white", xlab="group")
Created on 2022-12-29 with reprex v2.0.2
I am now able to create the axis in percentages, but not to make it stacked and percentage for each stacked bar like above. Here is some reproducible code:
using StatsPlots
measles = [38556, 24472]
mumps = [20178, 23536]
chickenPox = [37140, 32169]
ticklabel = ["A", "B"]
foo = #. measles + mumps + chickenPox
my_range = LinRange(0, maximum(foo), 11)
groupedbar(
[measles mumps chickenPox],
bar_position = :stack,
bar_width=0.7,
xticks=(1:2, ticklabel),
yticks=(my_range, 0:10:100),
label=["measles" "mumps" "chickenPox"]
)
Output:
This is almost what I want. So I was wondering if anyone knows how to make a stacked percentage barplot like above in Julia?
You just need to change the maximum threshold of the LinRange to be fitted to the maximum value of bars (which is 1 in this case), and change the input data for plotting to be the proportion of each segment:
my_range = LinRange(0, 1, 11)
foo = #. measles + mumps + chickenPox
groupedbar(
[measles./foo mumps./foo chickenPox./foo],
bar_position = :stack,
bar_width=0.7,
xticks=(1:2, ["A", "B"]),
yticks=(my_range, 0:10:100),
label=["measles" "mumps" "chickenPox"],
legend=:outerright
)
If you want to have the percentages on each segment, then you can use the following function:
function percentages_on_segments(data)
first_phase = permutedims(data)[end:-1:1, :]
a = [0 0;first_phase]
b = accumulate(+, 0.5*(a[1:end-1, :] + a[2:end, :]), dims=1)
c = vec(b)
annotate!(
repeat(1:size(data, 1), inner=size(data, 2)),
c,
["$(round(100*item, digits=1))%" for item=vec(first_phase)],
:white
)
end
percentages_on_segments([measles./foo mumps./foo chickenPox./foo])
Note that [measles./foo mumps./foo chickenPox./foo] is the same data that I passed to the groupedbar function:

How to shade custom blocks in Circlize package in R

I am using the R package circlize to create a circos plot.
I am aiming to create something similar to Figure 2 in this paper: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004812.
I would like to custom specify where to shade parts of the chromosomes with different, manually entered colours, but I am struggling.
Reproducible code:
### load packages
library("tidyverse")
library("circlize")
### Generate mock data
# Chromosome sizes - genome with 5 chromosomes size 1-5kb
chrom <- c(1,2,3,4,5)
start <- c(0,0,0,0,0)
end <- c(1000,1700,2200,3100,5000)
chr_sizes_df <- data.frame(chrom,start,end)
# Areas of interest - where I want 'shade_col' shading
chrom_num <- c(1,1,2,2,3,3,3,4,4,5,5,5)
chr <- c("chr1","chr1","chr2","chr2","chr3","chr3","chr3","chr4","chr4","chr5","chr5","chr5")
start <- c(0,900,0,1550,0,800,2000,0,2800,0,3000,4800)
end <- c(150,1000,185,1700,210,1000,2200,300,3100,400,3300,5000)
chr_regions_df <- data.frame(chr,start,end)
# Recombinations - to be depicted with lines connecting chromosomes
chr1 <- c(1,2,2,3,3,3,3,4,4,5,5,5,5)
chr1_pos <- c(100,150,170,20,2100,900,950,200,3000,100,3100,3300,4900)
chr2 <- c(1,4,2,1,3,3,5,5,4,3,5,4,2)
chr2_pos <- c(100,3000,170,100,100,900,3200,4800, 3050,10,3100,3300,40)
location <- c("Non coding", "Coding", "Non coding", "Non coding", "Coding", "Coding", "Coding", "Non coding", "Non coding", "Non coding", "Coding", "Coding", "Non coding")
sv_df <- data.frame(chr1,chr1_pos,chr2,chr2_pos,location)
# SNPs - to be depicted with dots or lines
chrom <- c(1,1,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5,5,5,5)
pos <- c(350,600,200,650,700,300,1100,1500,2000,400,1500,1800,2000,2700,200,1000,1050,2000,2500,4950)
snp_df <- data.frame(chrom,pos)
### Prepare for plot
# Generate colour scheme
sv_df$location_col <- ifelse(sv_df$location=="Coding", "#FB8072",
ifelse(sv_df$location=="Non coding", "#80B1D3",
"#e9e9e9")
)
# Specify chromosome block shading
shade_col <- "#3F75AB"
# Format rearrangement data
nuc1 <- sv_df %>% select(chr1,chr1_pos) # Start positions
nuc2 <- sv_df %>% select(chr2,chr2_pos) # End positions
### Generating plot
## Basic circos graphic parameters
circos.clear()
circos.par(cell.padding=c(0,0,0,0),
track.margin=c(0,0.05),
start.degree = 90,
gap.degree = 3,
clock.wise = TRUE)
## Sector details
circos.initialize(factors = chr_sizes_df$chrom,
xlim = cbind(chr_sizes_df$start, chr_sizes_df$end))
## Generate basic outline with chromosomes
circos.track(ylim=c(0, 1), panel.fun=function(x, y) {
chr=CELL_META$sector.index
xlim=CELL_META$xlim
ylim=CELL_META$ylim
circos.text(mean(xlim), mean(ylim), chr)
},bg.col="#cde3f9", bg.border=TRUE, track.height=0.1)
## Add recombinations - coloured by coding vs non-coding etc
circos.genomicLink(nuc1, nuc2,
col=sv_df$location_col,
h.ratio=0.6,
lwd=3)
The above code produces the plot shown below:
I want to use chr_regions_df to specify the chromosome areas for shading using shade_col. Have tried a few things - draw.sector doesn't work well because it requires to know the angles rather than positions, which is hard to work out. There are cytoband options using circos.initializeWithIdeogram() but this seems to use pre-specified cytoband formats for certain species, rather than custom made areas for shading as in my use case (also why I couldn't use supplying user defined color in r circlize package).
Many thanks for your help.
To draw custom colored areas within chromosomes, use circos.genomicTrackPlotRegion, where you need to provide a bed-like data frame with an additional column specifying the color to be used for each area.
#the first column should match the chromosome names used in 'circos.initialize'
chrom_num <- c(1,1,2,2,3,3,3,4,4,5,5,5)
#chr <- c("chr1","chr1","chr2","chr2","chr3","chr3","chr3","chr4","chr4","chr5","chr5","chr5")
start <- c(0,900,0,1550,0,800,2000,0,2800,0,3000,4800)
end <- c(150,1000,185,1700,210,1000,2200,300,3100,400,3300,5000)
shade_col <- c("blue","red","blue","red","blue","red","blue","red","blue","red","blue","red")
chr_regions_df <- data.frame(chrom_num,start,end,shade_col)
After running circos.initialize, draw the chromosomes with their shaded area. In panel.fun, the first argument (region) contains the coordinates of each feature while the second (value) contains all but the first 3 columns of the data frame.
circos.genomicTrackPlotRegion(chr_regions_df, ylim = c(0, 1),
panel.fun = function(region, value, ...) {
col = value$shade_col
circos.genomicRect(region, value,
ybottom = 0, ytop = 1,
col = col, border = NA)
xlim = get.cell.meta.data("xlim")
circos.rect(xlim[1], 0, xlim[2], 1, border = "black")
ylim = get.cell.meta.data("ylim")
chr = get.current.sector.index()
circos.text(mean(xlim), mean(ylim), chr)
}, bg.col = "#cde3f9", bg.border=TRUE, track.height=0.1)

How to enter data into ggplots

I'm trying to enter the below data into a data frame, to make a ggplot line graph.
#functions for the hh budget and utility functions
pqxf <- function(y)(1*y) # replace p with price of y
pqyf <- function(x)(-1.25*x)+20 # -1.25 is the wage rate
utilityf <- function(x)80*(1/(x)) # 80 is the utility provided
hours <- c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)
#functions are turned into data frames
pqy <- data.frame("consumption" =
pqxf(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
pqx <- data.frame("leisure" =
pqxf(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
utility <- data.frame("utility" =
utilityf(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
#each data frame is combined into a single data frame, that will be used for tables and charts
hh <- data.frame(pqx, pqy, utility, hours)
print(hh)
#this shows the utility, and the cost of x and y, one data frame
library(ggplot2)
ggplot(hh, aes(x=pqx, y=hours))+
xlim(0,20)+ylim(0,20)+ # limits set for the assignment
labs(x = "leisure(hours)",y="counsumption(units)")+
geom_line(aes(x = pqx, y = pqy))+
geom_line(aes(x = pqx, y = utility))+
geom_point(aes(x=8,y=10))+ #values of x and y of tangent point
geom_hline(yintercept = 10,linetype="dotted")+ # y of tangent point
geom_vline(xintercept = 8,linetype = "dotted")+ #x of tangent point
geom_text(label="E", x=8,y=10,hjust=-1,size=2)+
geom_text(label="-1.25(units/hour)= -w = MRS", x=9,y=2,hjust=.02,size=2)+
geom_text(label="U=80", x=4,y=19,hjust=1,size=2)
when I enter I get the following message:
Error in is.finite(x) : default method not implemented for type 'list'
Should I store data in a different format than a data frame? format my data frame differently, or set up ggplot differently, so that it can handle lists?
Try to replace pqx with leisure, and pqy with comsumption.

R print groups of data points in different colors

I'm doing some basic statistics in R and I'm trying to have a different color for each iteration of the loop. So all the data points for i=1 should have the same color, all the data points for i=2 should have the same color etc. The best would be to have different colors for the varying i ranging from yellow to blue for exemple. (I already tried to deal with Colorramp etc. but I didn't manage to get it done.)
Thanks for your help.
library(ggplot2)
#dput(thedata[,2])
#c(1.28994585412464, 1.1317747077577, 1.28029504741834, 1.41172820353708,
#1.13172920065253, 1.40276516298315, 1.43679599499374, 1.90618019359643,
#2.33626745030772, 1.98362330686504, 2.22606615548188, 2.40238822720322)
#dput(thedata[,4])
#c(NA, -1.7394747097211, 2.93081902519318, -0.33212717268786,
#-1.78796119503752, -0.5080871442002, -0.10110379236627, 0.18977632798691,
#1.7514277696687, 1.50275797771879, -0.74632159611221, 0.0978774103243802)
#OR
#dput(thedata[,c(2,4)])
#structure(list(LRUN74TTFRA156N = c(1.28994585412464, 1.1317747077577,
#1.28029504741834, 1.41172820353708, 1.13172920065253, 1.40276516298315,
#1.43679599499374, 1.90618019359643, 2.33626745030772, 1.98362330686504,
#2.22606615548188, 2.40238822720322), SELF = c(NA, -1.7394747097211,
#2.93081902519318, -0.33212717268786, -1.78796119503752, -0.5080871442002,
#-0.10110379236627, 0.18977632798691, 1.7514277696687, 1.50275797771879,
#-0.74632159611221, 0.0978774103243802)), row.names = c(NA, 12L
#), class = "data.frame")
x1=1
xn=x1+3
plot(0,0,col="white",xlim=c(0,12),ylim=c(-5,7.5))
for(i in 1:3){
y=thedata[x1:xn,4]
x=thedata[x1:xn,2]
reg<-lm(y~x)
points(x,y,col=colors()[i])
abline(reg,col=colors()[i])
x1=x1+4
xn=x1+3
}
The basic idea of colorRamp and colorRampPalette is that they are functionals - they are functions that return functions.
From the help page:
colorRampPalette returns a function that takes an integer argument (the required number of colors) and returns a character vector of colors (see rgb) interpolating the given sequence (similar to heat.colors or terrain.colors).
So, we'll get a yellow-to-blue palette function from colorRampPalette, and then we'll give it the number of colors we want along that ramp to actually get the colors:
# create the palette function
my_palette = colorRampPalette(colors = c("yellow", "blue"))
# test it out, see how it works
my_palette(3)
# [1] "#FFFF00" "#7F7F7F" "#0000FF"
my_palette(5)
# [1] "#FFFF00" "#BFBF3F" "#7F7F7F" "#3F3FBF" "#0000FF"
# Now on with our plot
x1 = 1
xn = x1 + 3
# Set the number of iterations (number of colors needed) as a variable:
nn = 3
# Get the colors from our palettte function
my_cols = my_palette(nn)
# type = 'n' means nothing will be plotted, no points, no lines
plot(0, 0, type = 'n',
xlim = c(0, 12),
ylim = c(-5, 7.5))
# plot
for (i in 1:nn) {
y = thedata[x1:xn, 2]
x = thedata[x1:xn, 1]
reg <- lm(y ~ x)
# use the ith color
points(x, y, col = my_cols[i])
abline(reg, col = my_cols[i])
x1 = x1 + 4
xn = x1 + 3
}
You can play with just visualizing the palette---try out the following code for different n values. You can also try out different options, maybe different starting colors. I like the results better with the space = "Lab" argument for the palette.
n = 10
my_palette = colorRampPalette(colors = c("yellow", "blue"), space = "Lab")
n_palette = my_palette(n)
plot(1:n, rep(1, n), col = n_palette, pch = 15, cex = 4)
Besides of lacking a reproducible example, you seem to have some misconceptions.
First, the function colors doesn't take a numeric argument, see ?colors. So if you want to fetch a different color in each iteration, you need to call it like colors()[i]. The code should look something similar to this (in absence of a reproducible example):
for (i in 20:30){
plot(1:10, 1:10, col = colors()[i])
}
Please bear in mind that the call of x1 and xn in your first and second lines inside the for loop, before defining them will cause an error too.

R quantmod chartSeries newTA chob - modify legend and axis (primary and secundary)

This is an advanced question.
I use my own layout for the chartSeries quantmod function, and I can even create my own newTA. Everything works fine. But ...
What I want to do but I can't:
a) Manipulate the legend of each of the 3 charts:
- move to other corner, (from "topleft" to "topright")
- change the content
- remove completely if needed ...
b) My indicator generates 2 legends:
value1
value2
same as above ... how could I modify them? how could I delete them?
c) control position and range of yaxis (place it on the left / right
or even remove them
same when there is a secundary axis on the graph
d) Modify main legend (the one in the top right
where is written the range of dates
A working sample code:
# Load Library
library(quantmod)
# Get Data
getSymbols("SPY", src="yahoo", from = "2010-01-01")
# Create my indicator (30 values)
value1 <- rnorm(30, mean = 50, sd = 25)
value2 <- rnorm(30, mean = 50, sd = 25)
# merge with the first 30 rows of SPY
dataset <- merge(first(SPY, n = 30),
value1,
value2)
# **** data has now 8 columns:
# - Open
# - High
# - Low
# - Close
# - Volume
# - Adjusted
# - a (my indicator value 1)
# - b (my indicator value 2)
#
# create my TA function - This could also be achieve using the preFUN option of newTA
myTAfun <- function(a){
# input: a: function will receive whole dataset
a[,7:8] # just return my indicator values
}
# create my indicator to add to chartSeries
newMyTA <- newTA(FUN = myTAfun, # chartSeries will pass whole dataset,
# I just want to process the last 2 columns
lty = c("solid", "dotted"),
legend.name = "My_TA",
col = c("red", "blue")
)
# define my layout
layout(matrix(c(1, 2, 3), 3, 1),
heights = c(2.5, 1, 1.5)
)
# create the chart
chartSeries(dataset,
type = "candlesticks",
main = "",
show.grid = FALSE,
name = "My_Indicator_Name",
layout = NULL, # bypass internal layout
up.col = "blue",
dn.col = "red",
TA = c(newMyTA(),
addVo()
),
plot = TRUE,
theme = chartTheme("wsj")
)
I have tried using legend command, and also the option legend.name (with very limited control of the output).
I have had a look at the chob object returned by chartSeries, but I can't figure out what to do next ...
Image below:
After some time learning a little bit more about R internals, S3 and S4 objects, and quantmod package, I've come up with the solution. It can be used to change anything in the graph.
A) If the legend belongs to a secundary indicator window:
Do not print the chartSeries (type option plot = FALSE) and get the returned "chob" object.
In one of the slots of the "chob" object there is a "chobTA" object with 2 params related to legend. Set them to NULL.
Finally, call the hidden function chartSeries.chob
In my case:
#get the chob object
my.chob <- chartSeries(dataset,
type = "candlesticks",
main = "",
show.grid = FALSE,
name = "My_Indicator_Name",
layout = NULL, # bypass internal layout
up.col = "blue",
dn.col = "red",
TA = c(newMyTA(),
addVo()
),
plot = FALSE, # do not plot, just get the chob
#plot = TRUE,
theme = chartTheme("wsj")
)
#if the legend is in a secundary window, and represents
#an indicator created with newTA(), this will work:
my.chob#passed.args$TA[[1]]#params$legend <- NULL
my.chob#passed.args$TA[[1]]#params$legend.name <- NULL
quantmod:::chartSeries.chob(my.chob)
B) In any other case, it is possible to modify "chartSeries.chob", "chartTA", "chartBBands", etc and then call chartSeries.chob
In my case:
fixInNamespace("chartSeries.chob", ns = "quantmod")
quantmod:::chartSeries.chob(my.chob)
It is just enough with adding "#" at the beginning of the lines related to legend().
That's it.

Resources