Repeat function until index value exceeds threshold in R - r

I need to apply a function until an index limit (distance, in this case) is met. I'm trying to figure out a way to apply the function repeatedly while avoiding recursion issues.
Example:
I want to apply the following code until total_dist = flight_distance (2500 km). The distance traveled in a given flight depends on the energy available. The flight proceeds as a series of jumps and stops--expending and obtaining energy, respectively. If there is enough energy at the start, the flight can be finished in only two jumps (with one stop). Sometimes two or more stops are necessary. I can't know this ahead of time.
So how can I modify jump_metrics to get it to repeat until the total distance covered is 2500?
flight_distance = 2500
flight_markers = c(1:flight_distance)
TD_cost_km = rnorm(2500, 5, 1)
potential_stops = c(1:(flight_distance-1))
cumulative_flight_cost = vector("list", length=length(potential_stops))
for(i in 1:length(potential_stops)) {
cumulative_flight_cost[[i]] = cumsum(TD_cost_km[flight_markers>potential_stops[i]])
}
max_fuel_quantiles = seq(0, 1, length=flight_distance)
jump_metrics = function() {
start_fuel_prob = qbeta(runif(flight_distance, pbeta(max_fuel_quantiles,1,1),
pbeta(max_fuel_quantiles,1,1)), 1.45*2, 1)
start_energy_level_est =
rnorm(1, sample(1544 + (start_fuel_prob * 7569), 1, replace=T),
0.25)
start_max_energy = ifelse(start_energy_level_est < 1544, 1544,
start_energy_level_est)
fuel_level = start_max_energy - cumulative_flight_cost[[1]]
dist_traveled = length(fuel_level[fuel_level>(max(fuel_level)*0.2)])
dist_left = flight_distance - dist_traveled
partial_dist = 1 + dist_traveled
dist_dep_max_energy = c(rep(start_max_energy, length(1:dist_left)),
seq(start_max_energy, 1544,
length.out=length((dist_left+1):flight_distance)))
next_max_energy = dist_dep_max_energy[partial_dist]
next_fuel_level = next_max_energy - cumulative_flight_cost[[partial_dist]]
next_dist_traveled = length(next_fuel_level[next_fuel_level >
(max(next_fuel_level)*0.2)])
total_dist = next_dist_traveled + partial_dist
list(partial_dist, next_dist_traveled, total_dist)
}
jump_metrics()

Related

Weighted average cost of short-term stock trading

I'm new to programming and trying to learn Julia. I tried to compute the weighted average cost of short-term stock trading activities as I did before in R. I rewrite the code in Julia, unfortunately, it return the incorrect result in data frame format.
I tried to investigate the result of each iteration step by changing return vwavg to println([volume[i], s, unitprice[i], value[i], t, vwavg[i], u]) and the output is correct. is it a problem with rounding?
Really appreciate your help
# create trial dataset
df = DataFrame(qty = [3, 2, 2, -7, 4, 4, -3,-2, 4, 4, -2, -3],
price = [100.0, 99.0, 101.0, 103.0, 95.0, 93.0, 90.0, 90.0, 93.0, 95.0, 93.0, 92.0])
# create function for weighted average cost of stock price
function vwacost(volume, unitprice)
value = Vector{Float64}(undef, length(volume))
vwavg = Vector{Float64}(undef, length(volume))
for i in 1:length(volume)
s = 0
t = 0
u = 0
if volume[i]>0
value[i] = (volume[i]*unitprice[i]) + t
volume[i] = volume[i] + s
vwavg[i] = value[i]/volume[i]
u = vwavg[i]
s = volume[i]
t = value[i]
else
volume[i] = volume[i] + s
value[i] = u * volume[i]
s = volume[i]
t = value[i]
vwavg[i] = u
end
return vwavg
end
end
out = transform(df, [:qty, :price] => vwacost)
Simple error:
for i in 1:length(volume)
...
return vwavg
end
should be:
for i in 1:length(volume)
...
end
return vwavg
You are currently returning the result after the first loop iteration, which is why your resulting vwawg vector has only one (the first) calculated entry, with all other entries being zero/whatever was in memory when you created the vwawg vector in the first place.
Ok, the second problem of changing original df that result in incorrect result can be solved by copy(df):
out = select(copy(df), [:qty, :price] => vwacost => :avgcost)
thus, the original df will not change and the result will consistent over time.

Improving the speed

Here is my code in Julia and I would like to improve its speed since it is slow for large dataset. I provided the code with a small example so it can be executed and produce the results. I think that bottleneck is using find function in the loop which causes the code to be very slow but I don't know how I can replace it with sth faster.
A = [[1,2,3,4,5], [2,3,4,5,6,7,8], [4,7,8,9], [9,10], [2,3,4,5]]
mx = maximum(maximum(ar))
idx_new = zeros(Int, mx)
flag = ones(Int, mx);
Hscore = rand(1, length(A))
thresh = 0.2 * sum(Hscore)
acc_q = 0
pos = sortperm(vec(Hscore))
iter = 1
while acc_q < thresh
acc_q = acc_q + Hscore[pos[iter]]
nd = A[pos[iter]]
fd_flag = flag[nd]
cc = in.(fd_flag, 2)
node = nd[findall(x->x==0, cc)]
dd = nd[findall(x->x!=0, cc)]
TF = isempty(dd)
if TF == true
q_val = Hscore[pos[iter]]
acc_q = acc_q + q_val
idx_new[vec(node)] .= (val + 1)
flag[node] .= 2
val = val + 1;
iter = iter + 1
end # end of if TF
end ## end of while loop
While "please improve my code" is not a right question style for StackOverflow, generally when searching many times for element among many many options these are the first two that you might consider:
Sort the list of elements (with sort!) and use searchsorted to find the desired element
Use Set(mylist) to create a hash set and than search within the set.

(R) Error in optim - attempt to apply non-function, when function is defined

not sure what I'm doing wrong here. I'm trying to get a cross-validation score for a mixture-of-two-gammas model.
llikGammaMix2 = function(param, x) {
if (any(param < 0) || param["p1"] > 1) {
return(-Inf)
} else {
return(sum(log(
dgamma(x, shape = param["k1"], scale = param["theta1"]) *
param["p1"] + dgamma(x, shape = param["k2"], scale = param["theta2"]) *
1
(1 - param["p1"])
)))
}
}
initialParams = list(
theta1 = 1,
k1 = 1.1,
p1 = 0.5,
theta2 = 10,
k2 = 2
)
for (i in 1:nrow(cichlids)) {
SWS1_training <- cichlids$SWS1 - cichlids$SWS1[i]
SWS1_test <- cichlids$SWS1[i]
MLE_training2 <-
optim(
par = initialParams,
fn = llikGammaMix2,
x = SWS1_training,
control = list(fnscale = -1)
)$par
LL_test2 <-
optim(
par = MLE_training2,
fn = llikGammaMix2,
x = SWS1_test,
control = list(fnscale = -1)
)$value
}
print(LL_test2)
This runs until it gets to the first optim(), then spits out Error in fn(par, ...) : attempt to apply non-function.
My first thought was a silly spelling error somewhere, but that doesn't seem to be the case. Any help is appreciated.
I believe the issue is in the return statement. It's unclear if you meant to multiply or add the last quantity (1 - param["p1"])))) to the return value. Based on being a mixture, I'm guessing you mean for it to be multiplied. Instead it just hangs at the end which throws issues for the function:
return(sum(log(dgamma(x, shape = param["k1"], scale = param["theta1"]) *
param["p1"] +
dgamma(x, shape = param["k2"], scale = param["theta2"]) *
(1 - param["p1"])))) ## ISSUE HERE: Is this what you meant?
There could be other issues with the code. I would double check that the function you are optimizing is what you think it ought to be. It's also hard to tell unless you give a reproducible example we might be able to use. Try to clear up the above issue and let us know if there are still problems.

Calculate RSI indicator according to tradingview?

I would like to calculate RSI 14 in line with the tradingview chart.
According to there wiki this should be the solution:
https://www.tradingview.com/wiki/Talk:Relative_Strength_Index_(RSI)
I implemented this is in a object called RSI:
Calling within object RSI:
self.df['rsi1'] = self.calculate_RSI_method_1(self.df, period=self.period)
Implementation of the code the calculation:
def calculate_RSI_method_1(self, ohlc: pd.DataFrame, period: int = 14) -> pd.Series:
delta = ohlc["close"].diff()
ohlc['up'] = delta.copy()
ohlc['down'] = delta.copy()
ohlc['up'] = pd.to_numeric(ohlc['up'])
ohlc['down'] = pd.to_numeric(ohlc['down'])
ohlc['up'][ohlc['up'] < 0] = 0
ohlc['down'][ohlc['down'] > 0] = 0
# This one below is not correct, but why?
ohlc['_gain'] = ohlc['up'].ewm(com=(period - 1), min_periods=period).mean()
ohlc['_loss'] = ohlc['down'].abs().ewm(com=(period - 1), min_periods=period).mean()
ohlc['RS`'] = ohlc['_gain']/ohlc['_loss']
ohlc['rsi'] = pd.Series(100 - (100 / (1 + ohlc['RS`'])))
self.currentvalue = round(self.df['rsi'].iloc[-1], 8)
print (self.currentvalue)
self.exportspreadsheetfordebugging(ohlc, 'calculate_RSI_method_1', self.symbol)
I tested several other solution like e.g but non return a good value:
https://github.com/peerchemist/finta
https://gist.github.com/jmoz/1f93b264650376131ed65875782df386
Therefore I created a unittest based on :
https://school.stockcharts.com/doku.php?id=technical_indicators:relative_strength_index_rsi
I created an input file: (See excel image below)
and a output file: (See excel image below)
Running the unittest (unittest code not included here) should result in but is only checking the last value.
if result == 37.77295211:
log.info("Unit test 001 - PASSED")
return True
else:
log.error("Unit test 001 - NOT PASSED")
return False
But again I cannot pass the test.
I checked all values by help with excel.
So now i'm a little bit lost.
If I'm following this question:
Calculate RSI indicator from pandas DataFrame?
But this will not give any value in the gain.
a) How should the calculation be in order to align the unittest?
b) How should the calculation be in order to align with tradingview?
Here is a Python implementation of the current RSI indicator version in TradingView:
https://github.com/lukaszbinden/rsi_tradingview/blob/main/rsi.py
I had same issue in calculating RSI and the result was different from TradingView,
I have found RSI Step 2 formula described in InvestoPedia and I changed the code as below:
N = 14
close_price0 = float(klines[0][4])
gain_avg0 = loss_avg0 = close_price0
for kline in klines[1:]:
close_price = float(kline[4])
if close_price > close_price0:
gain = close_price - close_price0
loss = 0
else:
gain = 0
loss = close_price0 - close_price
close_price0 = close_price
gain_avg = (gain_avg0 * (N - 1) + gain) / N
loss_avg = (loss_avg0 * (N - 1) + loss) / N
rsi = 100 - 100 / (1 + gain_avg / loss_avg)
gain_avg0 = gain_avg
loss_avg0 = loss_avg
N is the number of period for calculating RSI (by default = 14)
the code is put in a loop to calculate all RSI values for a series.
For those who are experience the same.
My raw data contained ticks where the volume is zero. Filtering this OLHCV rows will directly give the good results.

SI model in Rhadoop

i want to measure the diffusion of information on my graph using SI model. i define a set of initial infected nodes. i was based on this code : Susceptible-Infected model for network diffusion to develop my appropriate. but when i run my code in graph of 5000 nodes, it runs during hours. Here is my code:
get_infected1 = function(g, transmission_rate, diffusers){
infected=list()
Susceptible<-setdiff(V(g)$name,diffusers)
toss = function(freq) {
tossing = NULL
coins = c(1, 0)
probabilities = c(transmission_rate, 1-transmission_rate )
for (i in 1:freq ) tossing[i] = sample(coins, 1, rep=TRUE, prob=probabilities)
tossing = sum(tossing)
return (tossing)
}
infected[[1]] = diffusers
update_diffusers = function(diffusers){
nearest_neighbors<-data.frame()
for (i in 1:length(diffusers)){
L<-as.character(diffusers[i])
Nei1 <- unique(neighbors(g,(V(g)$name == L),1))
Nei1<-intersect(Susceptible,Nei1)
nearest_neighbors1 = data.frame(table(unlist(Nei1)))
nearest_neighbors = unique(rbind(nearest_neighbors,nearest_neighbors1))
}
nearest_neighbors = subset(nearest_neighbors, !(nearest_neighbors[,1]%in%diffusers))
keep = unlist(lapply(nearest_neighbors[,2],toss))
new = as.numeric(as.character(nearest_neighbors[,1][keep >= 1]))
for (j in 1:length(new)){ #fill the vector
c<-new[j]
vec[j]<-V(g)$name[c]
}
new_infected = as.vector(vec)
diffusers = unique(c(diffusers, new_infected))
return(diffusers)
}
# get infected nodes
total_time = 1
node_number=vcount(g)
while(length(Susceptible) > 0){
infected[[total_time+1]] = sort(update_diffusers(infected[[total_time]]))
Susceptible<-setdiff(Susceptible, infected[[total_time+1]])
total_time = total_time + 1
}
# return the infected nodes list
return(infected)
}
Each node of initial infected nodes infects his neighbors with some probability, so as output we get the list of infected nodes in each step.
I want to adjust this code to run on RHadoop system. but i am newbie in RHadoop. i don't know where exactly i should modify, and how could i introduce my graph on hadoop?? please any suggestions?

Resources