Renko Chart in R - r

I am trying to construct Renko Chart using the obtained from Yahoo finance and was wondering if there is any package to do so. I had a look at the most financial packages but was only able to find Candlestick charts.
For more information on Renko charts use the link given here

Really cool question! Apparently, there is really nothing of that sort available for R. There were some attempts to do similar things (e.g., waterfall charts) on various sites, but they all don't quite hit the spot. Soooo... I made a little weekend project out of it with data.table and ggplot.
rrenko
There are still bugs, instabilities, and visual things that I would love to optimize (and the code is full of commented out debug notes), but the main idea should be there. Open for feedback and points for improvement.
Caveats: There are still case where the data transformation screws up, especially if the size is very small or very large. This should be fixable in the near future. Also, the renko() function at the moment expects a dataframe with two columns: date (x-axis) and close (y-axis).
Installation
devtools::install_github("RomanAbashin/rrenko")
library(rrenko)
Code
renko(df, size = 5, style = "modern") +
scale_y_continuous(breaks = seq(0, 150, 10)) +
labs(x = "", y = "")
renko(df, size = 5, style = "classic") +
scale_y_continuous(breaks = seq(0, 150, 10)) +
labs(x = "", y = "")
Data
set.seed(1702)
df <- data.frame(date = seq.Date(as.Date("2014-05-02"), as.Date("2018-05-04"), by = "week"),
close = abs(100 + cumsum(sample(seq(-4.9, 4.9, 0.1), 210, replace = TRUE))))
> head(df)
date close
1: 2014-05-02 104.0
2: 2014-05-09 108.7
3: 2014-05-16 111.5
4: 2014-05-23 110.3
5: 2014-05-30 108.9
6: 2014-06-06 106.5

I'm R investment developer, I used some parts of Roman's code to optimize some lines of my Renko code. Roman's ggplot skills are awesome. The plot function was just possible because of Roman's code.
If someone is interesting:
https://github.com/Kinzel/k_rrenko
It will need the packages: xts, ggplot2 and data.table
"Ativo" need to be a xts, with one of columns named "close" to work.
EDIT:
After TeeKea request, how to use it is simple:
"Ativo" is a EURUSD xts 15-min of 2015-01-01 to 2015-06-01. If the "close" column is not found, it will be used the last one.
> head(Ativo)
Open High Low Close
2015-01-01 20:00:00 1.20965 1.21022 1.20959 1.21006
2015-01-01 20:15:00 1.21004 1.21004 1.20979 1.21003
2015-01-01 20:30:00 1.21033 1.21041 1.20982 1.21007
2015-01-01 20:45:00 1.21006 1.21007 1.20978 1.21002
2015-01-01 21:00:00 1.21000 1.21002 1.20983 1.21002
2015-01-02 00:00:00 1.21037 1.21063 1.21024 1.21037
How to use krenko_plot:
krenko_plot(Ativo, 0.01,withDates = F)
Link to image krenko_plot
Compared to plot.xts
plot.xts(Ativo, type='candles')
Link to image plot.xts
There are two main variables: size and threshold.
"size" is the size of the bricks. Is needed to run.
"threshold" is the threshold of new a brick. Default is 1.
The first brick is removed to ensure reliability.

Here's a quick and dirty solution, adapted from a python script here.
# Get some test data
library(rvest)
url <- read_html("https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20170602&end=20181126")
df <- url %>% html_table() %>% as.data.frame()
# Make sure to have your time sequence the right way up
data <- apply(df[nrow(df):1, 3:4], 1, mean)
# Build the renko function
renko <- function(data, delta){
pre <- data[1]
xpos <- NULL
ypos <- NULL
xneg <- NULL
yneg <- NULL
for(i in 1:length(data)){
increment <- data[i] - pre
incrementPerc <- increment / pre
pre <- data[i]
if(incrementPerc > delta){
xpos <- c(xpos, i)
ypos <- c(ypos, data[i])
}
if(incrementPerc < -delta){
xneg <- c(xneg, i)
yneg <- c(yneg, data[i])
}
}
signal <- list(xpos = xpos,
ypos = unname(ypos),
xneg = xneg,
yneg = unname(yneg))
return(signal)
}
# Apply the renko function and plot the outcome
signals <- renko(data = data, delta = 0.05)
plot(1:length(data), data, type = "l")
points(signals$xneg, signals$yneg, col = "red", pch = 19)
points(signals$xpos, signals$ypos, col = "yellowgreen", pch = 19)
NOTE: This is not a renko chart (thanks to #Roman). Buy and sell signals are displayed only. See reference mentioned above...

Related

Generate multiple plots in base R with loop function then concatenate by matching group variables

I have a data frame (below, my apologies for the verbose code, this is my first attempt at generating reproducible random data) that I'd like to loop through and generate individual plots in base R (specifically, ethograms) for each subject's day and video clip (e.g. subj-1/day1/clipB). After generating n graphs, I'd like to concatenate a PDF for each subj that includes all days + clips, and have each row correspond to a single day. I haven't been able to get past the generating individual graphs, however, so any help would be greatly appreciated!
Data frame
n <- 20000
library(stringi)
test <- as.data.frame(sprintf("%s", stri_rand_strings(n, 2, '[A-Z]')))
colnames(test)<-c("Subj")
test$Day <- sample(1:3, size=length(test$Subj), replace=TRUE)
test$Time <- sample(0:600, size=length(test$Subj), replace=TRUE)
test$Behavior <- as.factor(sample(c("peck", "eat", "drink", "fly", "sleep"), size = length(test$Time), replace=TRUE))
test$Vid_Clip <- sample(c("Clip_A", "Clip_B", "Clip_C"), size = length(test$Time), replace=TRUE)
Sample data from data frame:
> head(test)
Subj Day Time Behavior Vid_Clip
1 BX 1 257 drink Clip_B
2 NP 2 206 sleep Clip_B
3 ZF 1 278 peck Clip_B
4 MF 2 391 sleep Clip_A
5 VE 1 253 fly Clip_C
6 ID 2 359 eat Clip_C
After adapting this code, I am able to successfully generate a single plot (one at a time):
Subset single subj/day/clip:
single_subj_day_clip <- test[test$Vid_Clip == "Clip_B" & test$Subj == "AA" & test$Day == 1,]
After which, I can generate the graph I'm after by running the following lines:
beh_numb <- nlevels(single_subj_day_clip$Behavior)
mar.default <- c(5,4,4,2) + 0.1
par(mar = mar.default + c(0, 4, 0, 0))
plot(single_subj_day_clip$Time,
xlim=c(0,max(single_subj_day_clip$Time)), ylim=c(0, beh_numb), type="n",
ann=F, yaxt="n", frame.plot=F)
for (i in 1:length(single_subj_day_clip$Behavior)) {
ytop <- as.numeric(single_subj_day_clip$Behavior[i])
ybottom <- ytop - 0.5
rect(xleft=single_subj_day_clip$Subj[i], xright=single_subj_day_clip$Time[i+1],
ybottom=ybottom, ytop=ytop, col = ybottom)}
axis(side=2, at = (1:beh_numb -0.25), labels=levels(single_subj_day_clip$Behavior), las = 1)
mtext(text="Time (sec)", side=1, line=3, las=1)
Example graph from randomly generate data(sorry for link - newb SO user so until I'm at 10 reputation pts, I can't embed an image directly)
Example graph from actual data
Ideal per subject graph
Thank you all in advance for your input.
Cheers,
Dan
New and hopefully correct answer
The code is too long to post it here, so there is a link to the Dropbox folder with data and code. You can check this html document or run this .Rmd file on your machine. Please check if all required packages are installed. There is the output of the script.
There are additional problem in the analysis - some events are registered only once, at a single time point between other events. So there is no "width" of such bars. I assigned width of such events to 1000 ms, so some (around 100 per 20000 observations) of them are out of scale if they are at the beginning or at the end of the experiment (and if the width for such events is equal to zero). You can play with the code to fix this behavior.
Another problem is the different colors for the same factors on the different plots. I need some fresh air to fix it as well.
Looking into the graphs, you can notice that sometimes, it seems that some observation with a very short time are overlapping with other observations. But if you zoom the pdf to the maximum - you will see that they are not, and there is a 'holes' in underlying intervals, where they are supposed to be.
Lines, connecting the intervals for different kinds of behavior are helping to follow the timecourse of the experiment. You can uncomment corresponding parts of the code, if you wish.
Please let me know if it works.
Old answer
I am not sure it is the best way to do it, but probably you can use split() and after that lapply through your tables:
Split your data.frame by Subj, Day, and Vid_clip:
testl <- split(test, test[, c(1, 2, 5)], drop = T)
testl[[1123]]
# Subj Day Time Behavior Vid_Clip
#8220 ST 2 303 fly Clip_A
#9466 ST 2 463 fly Clip_A
#9604 ST 2 32 peck Clip_A
#10659 ST 2 136 peck Clip_A
#13126 ST 2 47 fly Clip_A
#14458 ST 2 544 peck Clip_A
Loop through the list with your data and plot to .pdf:
mar.default <- c(5,4,4,2) + 0.1
par(mar = mar.default + c(0, 4, 0, 0))
nbeh = nlevels(test$Behavior)
pdf("plots.pdf")
invisible(
lapply(testl, function(l){
plot(x = l$Time, xlim = c(0, max(l$Time)), ylim = c(0, nbeh),
type = "n", ann = F, yaxt = "n", frame.plot = F)
lapply(1:nbeh, function(i){
ytop <- as.numeric(l$Behavior[i]); ybot <- ytop - .5
rect(l$Subj[i], ybot, l$Time[i + 1], ytop, col = ybot)
})
axis(side = 2, at = 1:nbeh - .25, labels = levels(l$Behavior), las = 1)
mtext(text = "Time (sec)", side = 1, line = 3, las = 1)
})
)
dev.off()
You should probably check output here before you run code on your PC. I didn't edit much your plot-code, so please check it twice.

PCA analysis wrong output

That's the example data:
structure(c(368113, 87747.35, 508620.5, 370570.5, 87286.5, 612728,
55029, 358521, 2802880, 2045399.5, 177099, 317974.5, 320687.95,
6971292.55, 78949, 245415.95, 50148.5, 67992.5, 97634, 56139.5,
371719.2, 80182.7, 612078.5, 367822.5, 80691, 665190.65, 28283.5,
309720, 2853241.5, 1584324, 135482.5, 270959, 343879.1, 6748208.5,
71534.9, 258976, 28911.75, 78306, 56358.7, 46783.5, 320882.85,
53098.3, 537383.5, 404505.5, 89759.7, 624120.55, 40406, 258183.5,
3144610.45, 1735583.5, 122013.5, 249741, 362585.35, 5383869.15,
23172.2, 223704.45, 40543.7, 68522.5, 43187.05, 29745, 356058.5,
89287.25, 492242.5, 452135.5, 97253.55, 575661.95, 65739.5, 334703.5,
3136065, 1622936.5, 131381.5, 254362, 311496.3, 5627561, 68210.6,
264610.1, 45851, 65010.5, 32665.5, 39957.5, 362476.75, 59451.65,
548279, 345096.5, 93363.5, 596444.2, 11052.5, 252812, 2934035,
1732707.55, 208409.5, 208076.5, 437764.25, 16195882.45, 77461.25,
205803.85, 30437.5, 75540, 49576.75, 48878, 340380.5, 43785.35,
482713, 340315, 64308.5, 517859.85, 11297, 268993.5, 3069028.5,
1571889, 157561, 217596.5, 400610.65, 5703337.6, 50640.65, 197477.75,
40070, 66619, 81564.55, 41436.5, 367592.3, 64954.9, 530093, 432025,
87212.5, 553901.65, 20803.5, 333940.5, 3027254.5, 1494468, 195221,
222895.5, 494429.45, 7706885.75, 60633.35, 192827.1, 29857.5,
81001.5, 112588.65, 68904.5, 338822.5, 56868.15, 467350, 314526.5,
105568, 749456.1, 19597.5, 298939.5, 2993199.2, 1615231.5, 229185.5,
280433.5, 360156.15, 5254889.1, 79369.5, 175434.05, 40907.05,
70919, 65720.15, 53054.5), .Dim = c(20L, 8L), .Dimnames = list(
c("Anne", "Greg", "thomas", "Chris", "Gerard", "Monk", "Mart",
"Mutr", "Aeqe", "Tor", "Gaer", "Toaq", "Kolr", "Wera", "Home",
"Terlo", "Kulte", "Mercia", "Loki", "Herta"), c("Day_Rep1",
"Day_Rep2", "Day_Rep3", "Day_Rep4", "Day2_Rep1", "Day2_Rep2",
"Day2_Rep3", "Day2_Rep4")))
I would like to perform a nice PCA analysis. I expect that replicates from Day will be nicely correlated with each other and replicates from Day2 together. I was trying to perform some analysis using the code below:
## log transform
data_log <- log(data[, 1:8])
#vec_EOD_EON
dt_PCA <- prcomp(data_log,
center = TRUE,
scale. = TRUE)
library(devtools)
install_github("ggbiplot", "vqv")
library(ggbiplot)
g <- ggbiplot(dt_PCA, obs.scale = 1, var.scale = 1,
groups = colnames(dt_PCA), ellipse = TRUE,
circle = TRUE)
g <- g + scale_color_discrete(name = "")
g <- g + theme(legend.direction = 'horizontal',
legend.position = 'top')
print(g)
However, the output is not what I am looking for:
but I am looking for something more like that:
I would like to use dots for each row in the data and different colors for each of the replicate. Would be cool to use the similar colors for Day replicates and as well for Day2.
Obtained data with ggplot:
Let's imagine you save your data into df.
library(ggplot2)
pc_df <- prcomp(t(df), scale.=TRUE)
pc_table <- as.data.frame(pc_df$x[,1:2]) # extracting 1st and 2nd component
experiment_regex <- '(^[^_]+)_Rep(\\d+)' # extracting replicate and condition from your experiment names
pc_table$replicate <- as.factor(sub(experiment_regex,'\\2', rownames(pc_table)))
pc_table$condition <- as.factor(sub(experiment_regex,'\\1', rownames(pc_table)))
ggplot(pc_table, aes(PC1, PC2, color=condition, shape=replicate)) +
geom_point() +
xlab(sprintf('PC1 - %.1f%%', # extracting the percentage of each PC and print it on the axes
summary(pc_df)$importance[2,1] * 100)) +
ylab(sprintf('PC2 - %.1f%%',
summary(pc_df)$importance[2,2] * 100))
The first thing you have to do, to get your data in the correct shape is to transform it using t(). This might be already what you are looking for.
I prefer to do the plots with my own function and I wrote the steps down to get a nice plot with ggplot2.
UPDATE:
Since you were asking in the comments. Here is an example where an experiment was repeated on a different day. Replicate 1 and 2 on one day, and a few days later replicate 3 and 4.
The difference on both days are higher then the changes in the conditions (day has 49% variance, experiment has only 20% variance explained).
This is not a good experiment and should be repeated!

Trying to program trading signals in R

I new new to R and am trying to program a pair trading strategy in R.
I have already written the code for downloading the data. And have created additional columns and prepared the data. Now i need to calculate the trading signals.
My signal rules are as follows.
- If Z-Score is greater than 2.25 , Sell the pair; Buy back when Z-Score is less than 0.25.
- If Z-Score is less than -2.25 , Buy the pair; sell (Exit) when z-score is above -0.25.
- close any open position if there is a change in signal.
When we sell a pair, we sell the first stock and buy the second stock. In this case, we sell ACC and Buy Ambujacem.
When we buy a pair, we buy the first stock and sell the second stock. In this case, we buy ACC and Sell Ambujacem.
Could anyone help me with the coding for the trading signals.
Enclosing the code.
Regards,
Subash
# Trading Code
library(quantmod)
getSymbols("ACC.NS", from=as.Date('2007-01-01'), to=as.Date('2015-07-24'))
getSymbols("AMBUJACEM.NS", from=as.Date('2007-01-01'), to=as.Date('2015-07-24'))
acc=ACC.NS[,6]
amb=AMBUJACEM.NS[,6]
t.zoo <- merge(acc, amb, all=TRUE)
t.zoo=as.data.frame(t.zoo)
typeof(t.zoo)
t.zoo=na.omit(t.zoo)
#adding columns
t.zoo$spread <- 0
t.zoo$adfTest <- 0
t.zoo$mean <- 0
t.zoo$stdev <- 0
t.zoo$zScore <- 0
t.zoo$signal <- 0
t.zoo$BuyPrice <- 0
t.zoo$SellPrice <- 0
t.zoo$LongReturn <- 0
t.zoo$ShortReturn <- 0
t.zoo$Slippage <- 0
t.zoo$TotalReturn <- 0
#preparing the data
#Calculating the pair ratio
t.zoo$pairRatio <- t.zoo$ACC.NS.Adjusted/t.zoo$AMBUJACEM.NS.Adjusted
#Calculate the log prices of the two time series
t.zoo$LogA <- log10(t.zoo$ACC.NS.Adjusted)
t.zoo$LogB <- log10(t.zoo$AMBUJACEM.NS.Adjusted)
#Calculating the spread
t.zoo$spread <- t.zoo$ACC.NS.Adjusted/t.zoo$AMBUJACEM.NS.Adjusted
#Calculating the mean
# Computes the mean using the SMA function
# choose the number of days for calculating the mean
SMAdays = 20
t.zoo$mean <- SMA(t.zoo$spread,SMAdays)
#Calculating the Std Deviation
t.zoo$stdev <- rollapply(t.zoo$spread,20,sd, fill=NA, align='right')
#Calculating the Z Score
t.zoo$zScore <- (t.zoo$pairRatio - t.zoo$mean)/t.zoo$spread
View(t.zoo)
#Calculation of trading signals and trading prices
#Trigger sell or buy signal if Z Score moves above 2.25 or below -2.25.
# Close position if Z Score reaches 0.2 or -0.2.
# close any open position if there is a change in signal.
I think the main issue was to come up with trading signals for a strategy that depends not only on the current level of indicator but also on the direction from which the indicator is crossed.
There were a number of problems with the code posted in comments, including use of single = for comparisons . So I've worked it afresh
Here's my attempt at solving this. It seems to be fine. I've added some plotting code to eyeball the results. I suggest you check the result over different periods.
This code comes after the one in the original question . Only difference is that I have kept t.zoo as an xts/zoo object and not converted it to data.frame. Also, I've multiplied zScores with 100
It generates trigger dates and also a column depicting the state of strategy. Calculating returns would be easy from there
colnames(t.zoo)
#t.zoo must be an xts object
#working on a separate xts object
sigs<- t.zoo[, c("ACC.NS.Adjusted", "AMBUJACEM.NS.Adjusted" , "zScore")]
# creating my own triggers as there are not enough good values
# buyTrig<- mean(t.zoo$zScore ,na.rm = T) - 1*sd(t.zoo$zScore ,na.rm = T)
# sellTrig<- (-1) * buyTrig
# sqOffTrig<- mean(t.zoo$zScore ,na.rm = T) - 0.5*sd(t.zoo$zScore ,na.rm = T)
# Another approach: scaling tz.zoo to fit your criterion
sigs$zScore<- sigs$zScore*100
buyTrig<- (-2.25)
sellTrig<- (-1) * buyTrig
sqOffTrig<- 0.25
cat ( buyTrig, sellTrig , sqOffTrig)
hist(sigs$zScore, breaks = 40)
abline(v=c(buyTrig,sellTrig), col="red")
abline(v=c(-sqOffTrig, sqOffTrig), col="green")
sum(sigs$zScore >= -sqOffTrig & sigs$zScore<= sqOffTrig , na.rm = T) # 139
sigs$action<- 0
sigs$mode <- NA
sigs$zLag<- lag.xts(sigs$zScore,1)
sigs[19:22,]
#these are not the real trigger dates, but they will serve our purpose
# along with na.locf
buyTrigDays<- time(sigs[sigs$zScore<= buyTrig & sigs$zLag > buyTrig, ])
sellTrigDays<- time(sigs[sigs$zScore>= sellTrig & sigs$zLag < sellTrig, ])
#square offs
buySqOffDays<- time( sigs[sigs$zScore>= (-1*sqOffTrig) & sigs$zLag < (-1*sqOffTrig), ] )
buySqOffDays
sellSqOffDays<- time( sigs[sigs$zScore<= (sqOffTrig) & sigs$zLag > (sqOffTrig), ] )
sellSqOffDays
sigs$mode[buyTrigDays]=1 ; sigs$mode[sellTrigDays]= -1;
sigs$mode[buySqOffDays]=0 ; sigs$mode[sellSqOffDays]= 0;
sigs$mode
# use local fill to repeat these triggered position into future
# till you meet another non NA value
sigs$mode<- na.locf(sigs$mode, fromLast = F)
plot((sigs$zScore["2015"] ))
points(sigs$zScore[sigs$mode==1], col="red", on=1, pch = 19)
points(sigs$zScore[sigs$mode==-1], col="green", on=1 , pch = 19)
points(sigs$zScore[sigs$mode==0], col="blue", on=1)
sum(is.na(sigs$mode))
#now to get the real dates when square off is triggered
trigdays<- time( sigs[diff(sigs$mode,1) != 0, ] ) #when the value changes
squareOffTrigger_real<- time(sigs[sigs$mode==0][trigdays])
buyTrigger_real<- time(sigs[sigs$mode==1] [trigdays])
sellTrigger_real<- time(sigs[sigs$mode==-1][trigdays])
#check
length(sellTrigger_real) + length(buyTrigger_real) == length(squareOffTrigger_real)
plot(sigs$zScore["2015"])
points(sigs$zScore[buyTrigger_real] , col="blue", pch = 19, on=1)
points(sigs$zScore[sellTrigger_real] , col="red", pch = 19, on=1)
points(sigs$zScore[squareOffTrigger_real] , col="green", pch = 19, on=1)
abline(h=c(-sqOffTrig, sqOffTrig) , col= "green" )
# further calculations can be easily made using either the mode
# column or the trigger dates computed at the end

Add a line to coplot {graphics}, classic approaches don't work

I found coplot {graphics} very useful for my plots. However, I would like to include there not only one line, but add there one another. For basic graphic I just need to add = TRUE to add another line, or tu use plot(..) and lines(..). For {lattice} I can save my plots as objects
a<-xyplot(..)
b<-xyplot(..)
and display it simply by a + as.layer(b). No one of these approaches works for coplot(), apparently because creating objects as a<-coplot() doesn't produce trellis graphic but NULL object.
Please, any help how to add data line in coplot()? I really like its graphic so I wish to keep it. Thank you !!
my exemle data are here: http://ulozto.cz/xPfS1uRH/repr-exemple-csv
My code:
sub.tab<-read.csv("repr_exemple.csv", , header = T, sep = "")
attach(sub.tab)
cells.f<-factor(cells, levels=c(2, 25, 100, 250, 500), # unique(cells.in.cluster)???
labels=c("size2", "size25", "size100", "size250", "size500"))
perc.f<-factor(perc, levels=c(5, 10), # unique(cells.in.cluster)???
labels=c("perc5", "perc10"))
# how to put these plots together?
a<- coplot(max_dist ~ time |cells.f + perc.f, data = sub.tab,
xlab = "ticks", type = "l", col = "black", lwd = 1)
b<- coplot(mean_dist ~ time |cells.f * perc.f, data = sub.tab,
xlab = "ticks", type = "l", col = "grey", lwd = 1)
a + as.layer(b) # this doesn't work
Please, how to merge these two plots (grey and black lines)? I couldn't figure it out... Thank you !
Linking to sample data isn't really as helpful. Here's a randomly created sample data set
set.seed(15)
dd <- do.call("rbind",
do.call("Map", c(list(function(a,b) {
cbind.data.frame(a,b, x=1:5,
y1=cumsum(rpois(5,7)),
y2=cumsum(rpois(5,9)))
}),
expand.grid(a=letters[1:5], b=letters[20:22])))
)
head(dd)
# a b x y1 y2
# 1 a t 1 8 16
# 2 a t 2 13 28
# 3 a t 3 25 35
# 4 a t 4 33 45
# 5 a t 5 39 57
# 6 b t 1 4 12
I will note the coplot is a base graphics function, not Lattice. But it does have a panel= parameter. And you can have the coplot() take care of subsetting your data for you (well, calculating the indexes at least). But, like other base graphics functions, plotting different groups isn't exactly trivial. You can do it in this case with
coplot(y~x|a+b,
# make a fake y col to cover range of all y1 and y2 values
cbind(dd, y=seq(min(dd$y1, dd$y2), max(dd$y1, dd$y2), length.out=nrow(dd))),
#request subscripts to be sent to panel function
subscripts=TRUE,
panel=function(x,y,subscripts, ...) {
# draw group 1
lines(x, dd$y1[subscripts])
# draw group 2
lines(x, dd$y2[subscripts], col="red")
})
This gives

R: Mapping multiple Routes using ggmap

I have been trying to make a different this lovely flowing data visualization for some time this week but keep hitting a snag in the final implementation.
Here is the data set I'm using. I have melded it into a frame with the three key bits of information I want to display: startinglatlong, endinglatlong, and number of trips.
I got closest using the idea posted here, but two hit a snag on two items:
1) making the size of the line change based on the number of trips
2) getting the google api to allow me to call this many rows (I have 55,704 in my data set).
counts is the name of my full df, with looks like so:
head(counts)
X from_station_id.x to_station_id.x From_Station_Lat From_Station_Long End_Station_Lat End_Station_Long n eichel
1 1 5 5 41.87396 -87.62774 41.87396 -87.62774 275 41.87395806 -87.62773949
2 2 5 13 41.87396 -87.62774 41.93250 -87.65268 1 41.93250008 -87.65268082
3 3 5 14 41.87396 -87.62774 41.85809 -87.65107 12 41.858086 -87.651073
4 4 5 15 41.87396 -87.62774 41.85645 -87.65647 19 41.856453 -87.656471
5 5 5 16 41.87396 -87.62774 41.91033 -87.67252 7 41.910329 -87.672516
6 6 5 17 41.87396 -87.62774 41.90332 -87.67273 5 41.90332 -87.67273
thomas
1 41.87395806 -87.62773949
2 41.87395806 -87.62773949
3 41.87395806 -87.62773949
4 41.87395806 -87.62773949
5 41.87395806 -87.62773949
6 41.87395806 -87.62773949
Then I set about making an easier df for the function in the idea post, a la:
start<-c(counts[1:10,9])
dest<-c(counts[1:10,10])
I thought I might add in numbers into the function so I tagged on n (maybe not the best naming convention, but stick with me here).
n <- c(counts[1:10, 8])
then the route searching function:
leg <-function(start, dest){
r<- route(from=start,to=dest,mode = c("bicycling"),structure = c("legs"))
c<- geom_leg(aes(x = startLon, y = startLat, xend = endLon, yend = endLat),
alpha = 2/4, size = 2, data = r, colour = 'blue')
return (c)
}
base map:
a<-qmap('Chicago', zoom = 12, maptype="roadmap", color="bw")
now the magic:
for (n in 1:10){
#l<-leg(start[n], dest[n])
l<-leg(as.character(df[n,1]), as.character(df[n,2]))
a<-a+l
}
a
This worked.
unfortunately when I tried to run it on a larger subset it would run for a little bit and then go:
Information from URL : http://maps.googleapis.com/maps/api/directions/json? origin=41.88871604+-87.64444785&destination=41.87395806+-87.62773949&mode=bicycling&units=metric&alternatives=false&sensor=false
Error: (list) object cannot be coerced to type 'integer'
I understand from searching here and elsewhere that this can be due to Google gating api calls, and so tried adding in Sys.sleep(1), but that would break, so went to Sys.sleep(1.5) and frankly that still seems to. Even that is a pretty expensive call, given that for +55k rows you're looking at +23 hours of calls. My code was:
for (n in 1:30){
#l<-leg(start[n], dest[n])
l<-leg(as.character(df[n,1]), as.character(df[n,2]))
Sys.sleep(1.5)
a <- a + l
a}
this seemed to run but when I entered "a" I got:
Error in eval(expr, envir, enclos) : object 'startLon' not found
Finally as mentioned I'd like to visualize thicker lines for more used routes. typically I'd do this via the aes and doing something like:
geom_path(
aes(x = lon, y = lat), colour = 'red', size = n/100,
data = df, lineend = 'round'
)
so it would read column n and grant a size based on number of routes. for that to work here I need that number to bind to the directions route, so I wrote a second function like this:
leg <-function(start, dest, n){
r<- route(from=start,to=dest,mode = c("bicycling"),structure = c("route"))
c<- geom_leg(aes(x = startLon, y = startLat, xend = endLon, yend = endLat),
alpha = 2/4, size = n/10, data = r, colour = 'blue')
return (c)
}
for (n in 1:55704){
#l<-leg(start[n], dest[n])
l<-leg(as.character(df[n,1]), as.character(df[n,2]), as.numeric(df[n,3]))
Sys.sleep(1)
a <- a+l
}
This ran for a minute and then died on the error:
Error: (list) object cannot be coerced to type 'integer'
but a shorter version got tantalizingly close:
for (n in 2:6){
#l<-leg(start[n], dest[n])
l<-leg(as.character(df[n,1]), as.character(df[n,2]), as.numeric(df[n,3]))
Sys.sleep(1)
a <- a+l
}
it worked, as far as I can tell, but nothing more than like 30. Sadly the longer version just kind of runs out. basically I think that if I can get past the error message I'm almost there, I just don't want to have to spend days running the query. All help and input welcome. thank you for your time.
ok, so after a lot of noodling and modifying the above I finally settled on the looping solution that works:
leg <-function(start, dest, n){
r<- route(from=start,to=dest,mode = c("walking"),structure = c("route"))
c<- geom_path(aes(x = lon, y = lat),
alpha = 2/4, size = as.numeric(n)/500, data = r, colour = 'blue')
Sys.sleep(runif(1, 3.0, 7.5))
return (c)
}
a <- qmap('Chicago', zoom = 12, maptype = 'road', color="bw")
for (n in 101:200){
l<-leg(as.character(df[n,1]), as.character(df[n,2]),as.character(df[n,3]))
a<-a+l
}
a
this worked fairly well. the only bumps were when it the google api would reject the call. after I added the random variable sys.sleep in there it worked without a hitch. That said, I still never tried more than 150 at a go (limited my mapping to a sample of the top 10% of routes for ease of visual and for function). Finally after some happy illustrator time I ended up with a good looking map. Thanks to the community for the interest and for providing the looping idea.

Resources