I have been trying to make a different this lovely flowing data visualization for some time this week but keep hitting a snag in the final implementation.
Here is the data set I'm using. I have melded it into a frame with the three key bits of information I want to display: startinglatlong, endinglatlong, and number of trips.
I got closest using the idea posted here, but two hit a snag on two items:
1) making the size of the line change based on the number of trips
2) getting the google api to allow me to call this many rows (I have 55,704 in my data set).
counts is the name of my full df, with looks like so:
head(counts)
X from_station_id.x to_station_id.x From_Station_Lat From_Station_Long End_Station_Lat End_Station_Long n eichel
1 1 5 5 41.87396 -87.62774 41.87396 -87.62774 275 41.87395806 -87.62773949
2 2 5 13 41.87396 -87.62774 41.93250 -87.65268 1 41.93250008 -87.65268082
3 3 5 14 41.87396 -87.62774 41.85809 -87.65107 12 41.858086 -87.651073
4 4 5 15 41.87396 -87.62774 41.85645 -87.65647 19 41.856453 -87.656471
5 5 5 16 41.87396 -87.62774 41.91033 -87.67252 7 41.910329 -87.672516
6 6 5 17 41.87396 -87.62774 41.90332 -87.67273 5 41.90332 -87.67273
thomas
1 41.87395806 -87.62773949
2 41.87395806 -87.62773949
3 41.87395806 -87.62773949
4 41.87395806 -87.62773949
5 41.87395806 -87.62773949
6 41.87395806 -87.62773949
Then I set about making an easier df for the function in the idea post, a la:
start<-c(counts[1:10,9])
dest<-c(counts[1:10,10])
I thought I might add in numbers into the function so I tagged on n (maybe not the best naming convention, but stick with me here).
n <- c(counts[1:10, 8])
then the route searching function:
leg <-function(start, dest){
r<- route(from=start,to=dest,mode = c("bicycling"),structure = c("legs"))
c<- geom_leg(aes(x = startLon, y = startLat, xend = endLon, yend = endLat),
alpha = 2/4, size = 2, data = r, colour = 'blue')
return (c)
}
base map:
a<-qmap('Chicago', zoom = 12, maptype="roadmap", color="bw")
now the magic:
for (n in 1:10){
#l<-leg(start[n], dest[n])
l<-leg(as.character(df[n,1]), as.character(df[n,2]))
a<-a+l
}
a
This worked.
unfortunately when I tried to run it on a larger subset it would run for a little bit and then go:
Information from URL : http://maps.googleapis.com/maps/api/directions/json? origin=41.88871604+-87.64444785&destination=41.87395806+-87.62773949&mode=bicycling&units=metric&alternatives=false&sensor=false
Error: (list) object cannot be coerced to type 'integer'
I understand from searching here and elsewhere that this can be due to Google gating api calls, and so tried adding in Sys.sleep(1), but that would break, so went to Sys.sleep(1.5) and frankly that still seems to. Even that is a pretty expensive call, given that for +55k rows you're looking at +23 hours of calls. My code was:
for (n in 1:30){
#l<-leg(start[n], dest[n])
l<-leg(as.character(df[n,1]), as.character(df[n,2]))
Sys.sleep(1.5)
a <- a + l
a}
this seemed to run but when I entered "a" I got:
Error in eval(expr, envir, enclos) : object 'startLon' not found
Finally as mentioned I'd like to visualize thicker lines for more used routes. typically I'd do this via the aes and doing something like:
geom_path(
aes(x = lon, y = lat), colour = 'red', size = n/100,
data = df, lineend = 'round'
)
so it would read column n and grant a size based on number of routes. for that to work here I need that number to bind to the directions route, so I wrote a second function like this:
leg <-function(start, dest, n){
r<- route(from=start,to=dest,mode = c("bicycling"),structure = c("route"))
c<- geom_leg(aes(x = startLon, y = startLat, xend = endLon, yend = endLat),
alpha = 2/4, size = n/10, data = r, colour = 'blue')
return (c)
}
for (n in 1:55704){
#l<-leg(start[n], dest[n])
l<-leg(as.character(df[n,1]), as.character(df[n,2]), as.numeric(df[n,3]))
Sys.sleep(1)
a <- a+l
}
This ran for a minute and then died on the error:
Error: (list) object cannot be coerced to type 'integer'
but a shorter version got tantalizingly close:
for (n in 2:6){
#l<-leg(start[n], dest[n])
l<-leg(as.character(df[n,1]), as.character(df[n,2]), as.numeric(df[n,3]))
Sys.sleep(1)
a <- a+l
}
it worked, as far as I can tell, but nothing more than like 30. Sadly the longer version just kind of runs out. basically I think that if I can get past the error message I'm almost there, I just don't want to have to spend days running the query. All help and input welcome. thank you for your time.
ok, so after a lot of noodling and modifying the above I finally settled on the looping solution that works:
leg <-function(start, dest, n){
r<- route(from=start,to=dest,mode = c("walking"),structure = c("route"))
c<- geom_path(aes(x = lon, y = lat),
alpha = 2/4, size = as.numeric(n)/500, data = r, colour = 'blue')
Sys.sleep(runif(1, 3.0, 7.5))
return (c)
}
a <- qmap('Chicago', zoom = 12, maptype = 'road', color="bw")
for (n in 101:200){
l<-leg(as.character(df[n,1]), as.character(df[n,2]),as.character(df[n,3]))
a<-a+l
}
a
this worked fairly well. the only bumps were when it the google api would reject the call. after I added the random variable sys.sleep in there it worked without a hitch. That said, I still never tried more than 150 at a go (limited my mapping to a sample of the top 10% of routes for ease of visual and for function). Finally after some happy illustrator time I ended up with a good looking map. Thanks to the community for the interest and for providing the looping idea.
Related
Consider the following code that creates two ts time-series foo & bar:
x = 1:22
foo = ts(x, start = 1.5, end = 106.5, frequency = 0.2)
bar = ts(x, start = 2.5, end = 107.5, frequency = 0.2)
The foo and bar objects are identical except for the start and end values: for bar they are both larger by 1. Neither start/end values are exactly multiples of the frequency, but that shouldn't pose a problem.
Indeed, for both foo and bar, a window of an arbitrary size can be successfully extracted:
stats::window(foo, start = 20, end = 30) # works fine
stats::window(bar, start = 20, end = 30) # works fine too
But if I try to assign values to these windows, only foo works:
window(foo, start = 20, end = 30) <- NA # works fine
window(bar, start = 20, end = 30) <- NA # ERROR!!!
Error in attr(y, "tsp") <- c(ystart, yend, xfreq) :
invalid time series parameters specified
the internal working of window<-.ts basically calls the stats::window function, so it should work just as well as calling the window() function explicitly.
My understanding is that in the ts definition, 'start' and 'end' are just in arbitrary units, eg: seconds. So ts(x, start = 1.5, end = 106.5, frequency = 0.2) may mean: a series that starts at second 1.5 and ends at second 106.5, where each number represents 5 seconds (1/frequency).
The stat::window function then, just selects the values that are within its start-end boundaries, eg: from 20 to 30 seconds. And indeed the time() for both windows is the same and seems to confirm this:
time(window(foo, start = 20, end = 30))
[1] 21.5 26.5
time(window(bar, start = 20, end = 30))
[1] 22.5 27.5
The fact that one series starts at 1.5s and the other at 2.5s has absolutely no impact on the windowing procedure. Yet when assigning values to it, my logic breaks.
Things get even wilder by removing one cycle from bar:
qux = ts(1:21, start = 2.5, end = 102.5, frequency = 0.2)
window(qux, start = 20, end = 30) <- NA #ERROR!!
Error in `window<-.ts`(`*tmp*`, start = 20, end = 30, value = NA) :
times to be replaced do not match
A different error! I think I am failing to understand some fundamental concept.
So what am I missing here?
As you say window<-.ts() uses window() internally to create the new object. It's done via eval.parent(), where window() is called with extend=TRUE, and this is when the error occurs. As such we can simplify our analysis by instead considering the following pair
window(foo, start=20, end=30, extend=TRUE) # works fine
# Time Series:
# Start = 20
# End = 25
# Frequency = 0.2
# [1] 5 6
window(bar, start=20, end=30, extend=TRUE) # error
# Error in attr(y, "tsp") <- c(ystart, yend, xfreq) :
# invalid time series parameters specified
the attr(y, "tsp") <- c(ystart, yend, xfreq) mentioned in the error message happens at the very end of stats:::window.default. The actual values used are
# for foo
y <- 5:6
attr(y, "tsp") <- c(20, 25, 0.2)
# for bar
y <- 5:6
attr(y, "tsp") <- c(20, 30, 0.2)
The 25/30 discrepancy is due to rounding earlier in the code, but why does one work, while the other doesn't? It's the frequency. A frequency of less than one seems a bit odd to me. See, if you set the frequency to 12, f.ex, you have 12 samples per period or cycle. Maybe once a month, or maybe every second hour, but every 12th sample should have something in common. If you set frequency to 1, you effectively have no period, you sample once a year and know of no significant cycles longer that are in multiple of years. What would a frequency of less than 1 mean? I guess a frequency of say 0.5 could mean that sampling is done every second year, and 0.2 then every fifth year? Maybe that's informative, I don't know.
But why the error? While both 25-20 and 30-20 divides by 5 just fine, the latter one is one element short. Use y <- 5:7 instead and it works just fine.
Why does it work for foo but not bar? Because rounding.
But why though? Probably because no-one tested the function using time series with frequency of less than one.
A possible fix could be to use zoo class time series, instead of standard ts. It works well in this case, but I haven't done any other tests.
library(zoo)
foo.z <- as.zoo(foo)
bar.z <- as.zoo(bar)
window(foo.z, start = 20, end = 30) <- NA # works fine
window(bar.z, start = 20, end = 30) <- NA # also works fine
I'm new to R and programming in general, and I'm struggling with a for-loop for building the lx function in a life table.
I have the age function x, the death function qx (the probability that someone aged exactly x will die before reaching age x+1), and the surviving function px = 1 - qx.
I want to write a function that returns a vector with all the lx values from first to last age in my table. The function is simple...
I've defined cohort = 1000000. The first age in my table is x = 5, so, considering x = 5...
l_(x) = cohort
And, from now on, l_(x+n) = l_(x+n-1)*p_(x+n-1)
I've searched about for-loops, and I can only get my code working for lx[1] and lx[2], and I get nothing for lx[n] if n > 2.
I wrote that function:
living_x <- function(px, cohort){
result <- vector("double", length(px))
l_x <- vector("double", length(px))
for (i in 1:length(px)){
if (i == 1){
l_x[i] = cohort
}
else l_x[i] = l_x[i-1]*px[i-1]
result[i] = l_x
print(result)
}
}
When I run it, I get several outputs (more than length(px)) and "There were 50 or more warnings (use warnings() to see the first 50)".
When I run warnings(), I get "In result[i] <- l_x : number of items to replace is not a multiple of replacement length" for every number.
Also, everything I try besides it give me different errors or only calculate lx for lx[1] and lx[2]. I know there's something really wrong with my code, but I still couldn't identify it. I'd be glad if someone could give me a hint to find out what to change.
Thank you!
Here's an approach using dplyr from the tidyverse packages, to use px to calculate lx. This can be done similarly in "Base R" using excerpt$lx = 100000 * cumprod(1 - lag(excerpt$qx)).
lx is provided in the babynames package, so we can check our work:
library(tidyverse)
library(babynames)
# Get excerpt with age, qx, and lx.
excerpt <- lifetables %>%
filter(year == 2010, sex == "F") %>%
select(x, qx_given = qx, lx_given = lx)
excerpt
# A tibble: 120 x 3
x qx_given lx_given
<dbl> <dbl> <dbl>
1 0 0.00495 100000
2 1 0.00035 99505
3 2 0.00022 99471
4 3 0.00016 99449
5 4 0.00012 99433
6 5 0.00011 99421
7 6 0.00011 99410
8 7 0.0001 99399
9 8 0.0001 99389
10 9 0.00009 99379
# ... with 110 more rows
Using that data to estimate lx_calc:
est_lx <- excerpt %>%
mutate(px = 1 - qx_given,
cuml_px = cumprod(lag(px, default = 1)),
lx_calc = cuml_px * 100000)
And finally, comparing visually the given lx with the one calculated based on px. They match exactly.
est_lx %>%
gather(version, val, c(lx_given, lx_calc)) %>%
ggplot(aes(x, val, color = version)) + geom_line()
I could do it in a very simple way after thinking for some minutes more.
lx = c()
for (i in 2:length(px)){
lx[1] = 10**6
lx[i] = lx[i-1]*px[i-1]
}
I am trying to construct Renko Chart using the obtained from Yahoo finance and was wondering if there is any package to do so. I had a look at the most financial packages but was only able to find Candlestick charts.
For more information on Renko charts use the link given here
Really cool question! Apparently, there is really nothing of that sort available for R. There were some attempts to do similar things (e.g., waterfall charts) on various sites, but they all don't quite hit the spot. Soooo... I made a little weekend project out of it with data.table and ggplot.
rrenko
There are still bugs, instabilities, and visual things that I would love to optimize (and the code is full of commented out debug notes), but the main idea should be there. Open for feedback and points for improvement.
Caveats: There are still case where the data transformation screws up, especially if the size is very small or very large. This should be fixable in the near future. Also, the renko() function at the moment expects a dataframe with two columns: date (x-axis) and close (y-axis).
Installation
devtools::install_github("RomanAbashin/rrenko")
library(rrenko)
Code
renko(df, size = 5, style = "modern") +
scale_y_continuous(breaks = seq(0, 150, 10)) +
labs(x = "", y = "")
renko(df, size = 5, style = "classic") +
scale_y_continuous(breaks = seq(0, 150, 10)) +
labs(x = "", y = "")
Data
set.seed(1702)
df <- data.frame(date = seq.Date(as.Date("2014-05-02"), as.Date("2018-05-04"), by = "week"),
close = abs(100 + cumsum(sample(seq(-4.9, 4.9, 0.1), 210, replace = TRUE))))
> head(df)
date close
1: 2014-05-02 104.0
2: 2014-05-09 108.7
3: 2014-05-16 111.5
4: 2014-05-23 110.3
5: 2014-05-30 108.9
6: 2014-06-06 106.5
I'm R investment developer, I used some parts of Roman's code to optimize some lines of my Renko code. Roman's ggplot skills are awesome. The plot function was just possible because of Roman's code.
If someone is interesting:
https://github.com/Kinzel/k_rrenko
It will need the packages: xts, ggplot2 and data.table
"Ativo" need to be a xts, with one of columns named "close" to work.
EDIT:
After TeeKea request, how to use it is simple:
"Ativo" is a EURUSD xts 15-min of 2015-01-01 to 2015-06-01. If the "close" column is not found, it will be used the last one.
> head(Ativo)
Open High Low Close
2015-01-01 20:00:00 1.20965 1.21022 1.20959 1.21006
2015-01-01 20:15:00 1.21004 1.21004 1.20979 1.21003
2015-01-01 20:30:00 1.21033 1.21041 1.20982 1.21007
2015-01-01 20:45:00 1.21006 1.21007 1.20978 1.21002
2015-01-01 21:00:00 1.21000 1.21002 1.20983 1.21002
2015-01-02 00:00:00 1.21037 1.21063 1.21024 1.21037
How to use krenko_plot:
krenko_plot(Ativo, 0.01,withDates = F)
Link to image krenko_plot
Compared to plot.xts
plot.xts(Ativo, type='candles')
Link to image plot.xts
There are two main variables: size and threshold.
"size" is the size of the bricks. Is needed to run.
"threshold" is the threshold of new a brick. Default is 1.
The first brick is removed to ensure reliability.
Here's a quick and dirty solution, adapted from a python script here.
# Get some test data
library(rvest)
url <- read_html("https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20170602&end=20181126")
df <- url %>% html_table() %>% as.data.frame()
# Make sure to have your time sequence the right way up
data <- apply(df[nrow(df):1, 3:4], 1, mean)
# Build the renko function
renko <- function(data, delta){
pre <- data[1]
xpos <- NULL
ypos <- NULL
xneg <- NULL
yneg <- NULL
for(i in 1:length(data)){
increment <- data[i] - pre
incrementPerc <- increment / pre
pre <- data[i]
if(incrementPerc > delta){
xpos <- c(xpos, i)
ypos <- c(ypos, data[i])
}
if(incrementPerc < -delta){
xneg <- c(xneg, i)
yneg <- c(yneg, data[i])
}
}
signal <- list(xpos = xpos,
ypos = unname(ypos),
xneg = xneg,
yneg = unname(yneg))
return(signal)
}
# Apply the renko function and plot the outcome
signals <- renko(data = data, delta = 0.05)
plot(1:length(data), data, type = "l")
points(signals$xneg, signals$yneg, col = "red", pch = 19)
points(signals$xpos, signals$ypos, col = "yellowgreen", pch = 19)
NOTE: This is not a renko chart (thanks to #Roman). Buy and sell signals are displayed only. See reference mentioned above...
High level question is in the subject title: what can you do to debug linear optimisation when using R lp.
The detailed issue is that I have a working program adapted from: [http://pena.lt/y/2014/07/24/mathematically-optimising-fantasy-football-teams/][1]
Based on player data it chooses an optimal 15 man squad - handy for start of year or when you can change all players
I have changed it to:
1) Read player data from an Excel file (which I can supply - just tell me how)
2) Add 2 constraints to show players I definitely want to include in team and those I definitely don't.
Player data has the following columns:
web_name
team_name
type_name
now_cost
total_points
InTeam
In
Out
Good start, so I go about modelling the normal weeks when you can only transfer 1 player. I think I have the right constraint but now lp chooses about 200 players for me - not 15. Something very wrong - but I can't see it how it gets there.
I have tried going back from my new code to strip out the new feature and it still works.
I have tried removing the In/Out constraints and keeping the new "1 change" constraint. Same result.
Have upgraded packages and to latest R
Any pointers?
Code is
#Straight lift from Web - http://pena.lt/y/2014/07/24/mathematically-optimising-fantasy-football-teams/
# plus extra constraints to exclude and include specific players via Excel In/Out columns
# This variant looks to limit changes (typically 1 or 2) for a normal week
library(gdata)
library(lpSolve)
library(stringr)
library(RCurl)
library(jsonlite)
library(plyr)
excelfile<-"C:/Users/mike/Documents/FF/Start2015R.xlsx"
df=read.xls(excelfile)
# Constants
num_teams = 20
num_constraints = 8
# InTeam,In,Out,Cost + 4 positions
#Create the constraints
num_gk = 2
num_def = 5
num_mid = 5
num_fwd = 3
team_size = num_gk + num_def + num_mid + num_fwd
#max_cost = 1000
max_cost = 998
#max_cost = 2000
max_changes = 2
min_same = team_size - max_changes
# Create vectors to constrain by position
df$Goalkeeper = ifelse(df$type_name == "Goalkeeper", 1, 0)
df$Defender = ifelse(df$type_name == "Defender", 1, 0)
df$Midfielder = ifelse(df$type_name == "Midfielder", 1, 0)
df$Forward = ifelse(df$type_name == "Forward", 1, 0)
# Create vector to constrain by max number of players allowed per team
team_constraint = unlist(lapply(unique(df$team_name), function(x, df){
ifelse(df$team_name==x, 1, 0)
}, df=df))
# next we need the constraint directions. First is for MinSame
const_dir <- c(">=","=","=","=", "=", "=", "=", rep("<=", 21))
# The vector to optimize against
objective = df$total_points
# Put the complete matrix together
# nrow is number of constraints
const_mat = matrix(c(df$Inteam,df$In,df$Out,df$Goalkeeper, df$Defender, df$Midfielder, df$Forward,
df$now_cost, team_constraint),
nrow=( num_constraints + length(unique(df$team_name))),
byrow=TRUE)
const_rhs = c(min_same ,sum(df$In),0,num_gk, num_def, num_mid, num_fwd, max_cost, rep(3, num_teams))
# And solve the linear system
x = lp ("max", objective, const_mat, const_dir, const_rhs, all.bin=TRUE, all.int=TRUE)
print(arrange(df[which(x$solution==1),], desc(Goalkeeper), desc(Defender), desc(Midfielder), desc(Forward), desc(total_points)))
print (df[which(x$solution==1),"web_name",drop=FALSE], row.names = FALSE)
# what changed
df[which(x$solution != df$InTeam),"web_name",drop=FALSE]
while trying to Krig benzene values from
WELL.ID X Y BENZENE
1 MW-02 268.8155 282.83 0.00150
2 IW-06 271.6961 377.01 0.00050
3 IW-07 251.0236 300.41 0.01040
4 IW-08 278.9238 300.37 0.03190
5 MW-10 281.4008 414.15 2.04000
6 MW-12 391.3973 449.40 0.01350
7 MW-13 309.5307 335.55 0.01940
8 MW-15 372.8967 370.04 0.01620
9 MW-17 250.0000 428.04 0.01900
10 MW-24 424.4025 295.69 0.00780
11 MW-28 419.3205 250.00 0.00100
12 MW-29 352.9197 277.27 0.00031
13 MW-31 309.3174 370.92 0.17900
i generate a grid (the property these wells reside on) like so
setwd("C:/.....")
getwd()
require(geoR)
require(ggplot2)
a <- read.table("krigbenz_loc.csv", sep = ",", header = TRUE)
b <- data.matrix(a)
c <- as.geodata(b, coords.col = 2:3, data.col = 4, )
ggplot(a, aes(x= X, y= Y, colour="green", label=WELL.ID)) + geom_point() + geom_text(aes(label=WELL.ID),hjust=0, vjust=0)
x.range <- as.integer(range(a[,2]))
y.range <- as.integer(range(a[,3]))
x = seq(from=x.range[1], to=x.range[2], by=1)
y = seq(from=y.range[1], to=y.range[2], by=1)
length(x)
length(y)
xv <- rep(x,length(y))
yv <- rep(y, each=length(x))
in_mat <- as.matrix(cbind(xv, yv))
look at the variogram.. (not very pretty but working on it)
### variogram ###
## on geo-object
v1 <- variog(c)
length(v1$n)
v1.summary <- cbind(c(1:11), v1$v, v1$n)
colnames(v1.summary) <- c("lag", "semi-variance", "# of pairs")
v1.summary
plot(v1, type = "b", main = "Variogram: BENZENE at CRAIG BP")
use ksline to generate krig values...
## variance of benzene readings = sd^2
sd <- sd(a$BENZENE)
var = sd^2
fitted_model <- variofit(vario=v1, ini.cov.pars=c(var, .29), cov.model='exp')
q <- ksline(c, cov.model=fitted_model$cov.model, cov.pars=fitted_model$cov.pars,
nugget=fitted_model$nugget, locations=in_mat)
but then its hold the phones, error when I try to image the results!!!!
> image(q, val = q$predict)
Error in eval(x$call$geodata, envir = attr(x, "parent.env"))$borders :
object of type 'builtin' is not subsettable
this seems to be completely out of left field as I have gone over this several times... I googled the error and it seems that i am trying to call a subset of a function and the answer 90% of the time is that my syntax is wrong somewhere but I have checked everything and I can not figure it out... any help would be greatly appreciated.
thanks
ZR
This looks like a bad evaluation situation in geoR. I mean, a bug!
If you rename your c object to something else, it works:
ccc =c
q <- ksline(ccc, cov.model=fitted_model$cov.model, cov.pars=fitted_model$cov.pars,
nugget=fitted_model$nugget, locations=in_mat)
image(q) # now works
This would be because image.kriging is trying to get something from the original c object, but its not evaluating it in the right context so it gets the R base c function (the word "builtin" in the error was my clue here).
ksline help also says
The function ‘krige.conv’ should be preferred, unless moving
neighborhood is to be used.
so maybe you should try that - it might not have the same problem! Note that it has a different set of arguments to ksline.