how to conduct blackjack simulation with replacement - r

how would I Write code to simulate 5,000 repetitions of 2 cards being dealt to a player, whereby the cards are drawn with replacement. Using the relative freq of of a blackjack incidence In 5,000 repetitions of two cards being drawn (example; at a casino), provide an estimate of the prob of attaining a blackjack.
Ive tried something like this:
set.seed(5000)
handValue = function(cards) {
value = sum(cards)
# Check for an Ace and change value if it doesn't bust
if (any(cards == 1) && value <= 11)
value = value + 10
# Check bust (set to 0); check Blackjack (set to 21.5)
if(value > 21)
0
else if (value == 21 && length(cards) == 2)
21.5 # Blackjack
else
value
}
But I'm not sure how to exactly simulate with replacement and this code was some rough ideas so it may be well off the mark.

How to simulate drawing 2 cards without replacement
sample(c(1:9,rep(10,4)),2,replace=F)
Edit: with replacement
sample(c(1:9,rep(10,4)),2,replace=T)
Edit2: example with 5 replications
replicate(5,{sample(c(1:9,rep(10,4)),2,replace=T)})

Related

Max Recursion Depth Error With Grid Search Problem

I've written out a potential solution to a Leetcode problem but I get this error involving maximum recursion depth. I'm really unsure what I'm doing wrong. Here's what I've tried writing:
def orangesRotting(grid):
R,C = len(grid), len(grid[0])
seen = set()
min_time = 0
def fresh_search(r,c, time):
if ((r,c,time) in seen or r < 0 or c < 0 or r >= R or c >= C or grid[r][c] == 0):
return
elif grid[r][c] == 2:
seen.add((r,c,0))
elif grid[r][c] == 1:
seen.add((r,c, time + 1))
fresh_search(r+1,c,time+1)
fresh_search(r-1,c,time+1)
fresh_search(r,c+1,time+1)
fresh_search(r,c-1,time+1)
for i in range(R):
for j in range(C):
if grid[i][j] == 2:
fresh_search(i,j,0)
for _,_,t in list(seen):
min_time = max(min_time,t)
return min_time
Even on a simple input like grid = [[2,1,1], [1,1,0], [0,1,1]]. The offending line always appears to be at the if statement
if ((r,c,time) in seen or r < 0 or c < 0 or r >= R or c >= C or grid[r][c] == 0):
Please note, I'm not looking for help in solving the problem, just understanding why I'm running into this massive recursion issue. For reference, here is the link to the problem. Any help would be appreciated.
So let's trace through what you are doing here. You iterate through the entire grid and if the value for that cell is 2 you call fresh_search for that cell. We'll start with [0,0]
In fresh_search you then add the cell with times seen = 0 to your set.
Now for all neighboring cells you call fresh_search so we'll just look at r+1. For r+1 your method fresh_search adds the cell to your set with times seen = 1 and then calls fresh_search again with all neighboring cells.
Next we'll just look at r-1 which is our origin and now fresh_search is being called with this cell and times seen = 2. Now this value isn't in the set yet because (0,0,0) != (0,0,2) so it adds it to the set and again calls fresh_search with the r+1 cell but now times seen = 3
and so on and so forth until max recursion.

I want to calculate the timedifference between to times

I want to calculate the difference of two columns of a dataframe containing times. Since not always a value from the same column ist bigger/later, I have to do a workaround with an if-clause:
counter = 1
while(counter <= nrow(data)){
if(data$time_end[counter] - data$time_begin[counter] < 0){
data$chargingDuration[counter] = 1-abs(data$time_end[counter]-data$time_begin[counter])
}
if(data$time_end[counter] - data$time_begin[counter] > 0){
data$chargingDuration[counter] = data$time_end[counter]-data$time_begin[counter]
}
counter = counter + 1
}
The output I get is a decimalvalue smaller than 1 (i.e.: 0,53322 meaning half a day)... However, if I use my console and calculate the timedifference manually for a single line, I get my desired result looking like 02:12:03...
Thanks for the help guys :)

Find local minima in noisy data with exact criteria and without for-loop

I have a timerow and would like to find minima which fulfill specific criteria. Those are that the number of points within the valleys (below the red line) between 2 peaks should exceed a certain value and also the number of points above the red line should exceed a certain value for each peak neighboring the valley. Also the lower of the two peaks should be resolved at a value lower than 50% of its height (meaning that the max(intensity) of the lower of the two peaks should be at least 2 fold the intensity of the lowest intensity within the valley between the two peaks - as calculated below in the code sample). I drew the red line here at a specific height but in reality those unresolved peaks can have any height and can also be seperated by any distance. So what I am doing at the moment is to "scan" with the red line over each point of the time series which is of course very slow and inefficient.
So here is the for-loop I am using at the moment:
detect_double_peaks <- function(pot.doubleP.v, Min.PpP = 10) {
peak.dt <-
data.table(
breakP = NA,
breakH = NA,
resolved = NA
)
for (point in pot.doubleP.v[pot.doubleP.v > 0 &
pot.doubleP.v < 0.8 * max(pot.doubleP.v)]) {
doublePeak.rle <- S4Vectors::Rle(pot.doubleP.v > point)
doublePeak.rle.dt <-
data.table(
idx = as.numeric(seq.int(length(
doublePeak.rle#values
))),
values = doublePeak.rle#values,
lengths = doublePeak.rle#lengths,
start = start(doublePeak.rle),
end = end(doublePeak.rle)
)
doublePeak.rle.dt_p <-
doublePeak.rle.dt[values == TRUE & lengths > Min.PpP]
if (nrow(doublePeak.rle.dt_p) > 1) {
for(peak in 1:nrow(doublePeak.rle.dt_p)-1){
doublePeak.rle.dt_v <- doublePeak.rle.dt[idx > doublePeak.rle.dt_p[peak]$idx & idx < doublePeak.rle.dt_p[peak + 1]$idx]
if(sum(doublePeak.rle.dt_v[values == FALSE]$lengths) >= max(max(doublePeak.rle.dt_p[peak]$lengths, doublePeak.rle.dt_p[peak+1]$lengths) * 0.5, Min.PpP)){
dp.p_height_h <-
max(max(pot.doubleP.v[(doublePeak.rle.dt_p[peak]$start):(doublePeak.rle.dt_p[peak]$end)]),
max(pot.doubleP.v[(doublePeak.rle.dt_p[peak + 1]$start):(doublePeak.rle.dt_p[peak + 1]$end)]))# - baseL
dp.p_height_l <-
min(max(pot.doubleP.v[(doublePeak.rle.dt_p[peak]$start):(doublePeak.rle.dt_p[peak]$end)]),
max(pot.doubleP.v[(doublePeak.rle.dt_p[peak + 1]$start):(doublePeak.rle.dt_p[peak + 1]$end)]))# - baseL
breakH <-
min(pot.doubleP.v[min(doublePeak.rle.dt[idx > doublePeak.rle.dt_p[peak]$idx]$start):max(doublePeak.rle.dt[idx < doublePeak.rle.dt_p[peak+1]$idx]$end)])# - baseL
resolved <-
breakH / dp.p_height_l * 100
breakP <-
which.min(pot.doubleP.v[min(doublePeak.rle.dt[idx > doublePeak.rle.dt_p[peak]$idx]$start):max(doublePeak.rle.dt[idx < doublePeak.rle.dt_p[peak+1]$idx]$end)]) +
doublePeak.rle.dt_p[peak]$end
peak.dt <- rbind(peak.dt,
data.table(breakP = breakP,
breakH = breakH,
resolved = resolved))
}
}
}
}
if(nrow(peak.dt) == 1) {return(NULL)} else{
return(na.omit(unique(peak.dt, by="breakP")))
}
}
Here are some example data:
testvector <- c(13126.177734375, 12040.060546875, 10810.6171875, 10325.94140625,
13492.8359375, 33648.5703125, 14402.603515625, 29920.12890625,
24316.224609375, 36019.26171875, 34492.4609375, 53799.82421875,
45988.72265625, 47930.453125, 67438.9140625, 61231.83984375,
56710.9140625, 62301.6796875, 54844.7578125, 70913.578125, 81028.1640625,
75234.203125, 59611.05078125, 79240.4375, 52313.3828125, 78758.2734375,
87918.5859375, 80764.7421875, 108035.5390625, 76263.875, 72401.6796875,
83167.640625, 76173.96875, 66241.4296875, 68687.4375, 52107.83984375,
45672.5390625, 51907.33203125, 39967.453125, 58856.90625, 52402.53125,
36980.3125, 43365.76171875, 40480.75, 39057.96484375, 31622.58984375,
23830.455078125, 27393.30078125, 30675.208984375, 27327.48046875,
25150.08984375, 23746.212890625, 9637.625, 19065.58984375, 21367.40625,
6789.0625, 9892.7490234375, 26820.685546875, 19965.353515625,
28281.462890625, 25495.0703125, 28808.416015625, 40244.03125,
35159.421875, 35257.984375, 39971.8046875, 34710.4453125, 60987.73828125,
50620.06640625, 58757.69140625, 52998.97265625, 55601.96484375,
69057.9453125, 58486.52734375, 66115.4765625, 80801.7578125,
77444.6015625, 43545.48828125, 79545.0703125, 50352.484375, 77401.8671875,
85118.421875, 80521.9296875, 68945.8125, 93098.0234375, 83065.8046875,
95970.8203125, 74141.8828125, 90298.75, 81251.0234375, 99658.3359375,
88812.2578125, 81939.4921875, 82632.1015625, 100125.0078125,
71627.84375, 70560.1484375, 77831.765625, 68122.328125, 79049.140625,
88000.890625, 64897.4453125, 57333.3046875, 68185.3046875, 67742.3515625,
58941.85546875, 63184.8671875, 36998.67578125, 45416.58984375,
31547.3359375, 32141.58203125, 35292.9765625, 30511.861328125,
25419.716796875, 23901.431640625, 15616.8759765625, 14469.16015625,
15026.0009765625, 18321.42578125, 15820.861328125, 19532.056640625,
13230.6240234375, 14586.76953125, 14912.642578125, 8541.5224609375,
21740.98046875, 19588.986328125, 18603.662109375, 19656.5625,
10812.94921875, 18379.3359375, 31242.716796875, 25626.0390625,
42446.71875, 27782.294921875, 38450.703125, 39070.97265625, 52914.375,
56484.47265625, 47741.88671875, 52397.18359375, 79378.2109375,
77866.078125, 55902.09765625, 66988.2265625, 63571.01171875,
66192.53125, 79989.8046875, 57204.59765625, 51172.9921875, 49612.16015625,
60508.0390625, 69518.09375, 48079.5625, 48691.0390625, 33679.12890625,
30697.470703125, 31209.359375, 49656.16796875, 32041.912109375,
13851.48828125, 29316.44921875, 31586.216796875, 45422.19921875,
24208.515625, 31496.083984375, 26250.646484375, 14318.302734375
)
For this vector the minima at 56 and 125 should be returned.
Those should be returned because when scanning with the red line through the points of the vector there is at least one iteration at which there are more than Min.PpP = 10 consecutive points above the red line on each side of the valley, and with the same red line there are also more than Min.PpP = 10 points in the valley. The reason why point 4 should not be returned as minima is that no matter where we put the red line the valley will never exceed 3 points (which is lower than Min.PpP = 10) and the peak on the left side of that minima would only be 1 point (which is also lower than Min.PpP = 10).
I am aware of functions like pastecs::turnpoints. However, they do not seem to offer the implementation of criteria as I want to use them.
So it there any other more efficent way to achive that?
Ps.:
I have also put another criteria in the example code which says that there should be at least halve as many points in the vally as there are for the peak with the smaller number of points even when Min.PpP is exceeded:
if(sum(doublePeak.rle.dt_v[values == FALSE]$lengths) >= max(max(doublePeak.rle.dt_p[peak]$lengths, doublePeak.rle.dt_p[peak+1]$lengths) * 0.5, Min.PpP))
However I guess thats not really important for this problem.

R: Update/generate variables without loop

How can I generate or update variables without using a loop? mutate doesn't work here (at least I don't know how to get it to work for this problem) because I need to calculate stuff from multiple rows in another data set.
I'm supposed to replicate the regression results of an academic paper, and I'm trying to generate some variables required in the regression. The following is what I need.
I have 2 relevant data sets for this question, subset (containing
geocoded residential property transactions) and sch_relocs (containing the date
of school relocation events as well as their locations)
I need to calculate the distance between each residential property and the nearest (relocated) school
If the closest school is one that relocated to the area near the residential property, the dummy variable new should be 1 (if the school relocated away from the area, then new should be 0)
If the relocated school moved only a small distance, and a house is within the overlapping portion of the respective 2km radii around the school locations, the dummy variable overlap should be 1, otherwise 0
If the distance to the nearest school is <= 2km, the dummy variable in_zone should be 1. If the distance is between 2km and 4km, these transactions are considered controls, and hence in_zone should be 0. If the distance is greater than 4km, I should drop the observations from the data
I have tried to do this using a for loop, but it's taking ages to run (it's still not done running after one night), so I need a better way to do it. Here's my code (very messy, I think the above explanation is a lot easier if you want to figure out what I'm trying to do.
for (i in 1:as.integer(tally(subset))) {
# dist to new sch locations
for (j in 1:as.integer(tally(sch_relocs))) {
dist = distHaversine(c(subset[i,]$longitude, subset[i,]$latitude),
c(sch_relocs[j,]$new_lon, sch_relocs[j,]$new_lat)) / 1000
if (dist < subset[i,]$min_dist_new) {
subset[i,]$min_dist_new = dist
subset[i,]$closest_new_sch = sch_relocs[j,]$school_name
subset[i,]$date_new_loc = sch_relocs[j,]$date_reloc
}
}
# dist to old sch locations
for (j in 1:as.integer(tally(sch_relocs))) {
dist = distHaversine(c(subset[i,]$longitude, subset[i,]$latitude),
c(sch_relocs[j,]$old_lon, sch_relocs[j,]$old_lat)) / 1000
if (dist < subset[i,]$min_dist_old) {
subset[i,]$min_dist_old = dist
subset[i,]$closest_old_sch = sch_relocs[j,]$school_name
subset[i,]$date_old_loc = sch_relocs[j,]$date_reloc
}
}
# generate dummy "new"
if (subset[i,]$min_dist_new < subset[i,]$min_dist_old) {
subset[i,]$new = 1
subset[i,]$date_move = subset[i,]$date_new_loc
}
else if (subset[i,]$min_dist_new >= subset[i,]$min_dist_old) {
subset[i,]$date_move = subset[i,]$date_old_loc
}
# find overlaps
if (subset[i,]$closest_old_sch == subset[i,]$closest_new_sch &
subset[i,]$min_dist_old <= 2 &
subset[i,]$min_dist_new <= 2) {
subset[i,]$overlap = 1
}
# find min dist
subset[i,]$min_dist = min(subset[i,]$min_dist_old, subset[i,]$min_dist_new)
# zoning
if (subset[i,]$min_dist <= 2) {
subset[i,]$in_zone = 1
}
else if (subset[i,]$min_dist <= 4) {
subset[i,]$in_zone = 0
}
else {
subset[i,]$in_zone = 2
}
}
Here's how the data sets look like (just the relevant variables)
subset data set with desired result (first 2 rows):
sch_relocs data set (full with only relevant columns)

Percentage from negative target

I have these set of targets and actuals:
Actual: "-20" / Target: "-10"
Actual" "50" / Target: "-5"
Actual: "-10" / target: "30"
Target values are anticipated values for each of the 3 categories and actual values are year to date actual values.
On the first category; in was anticipated that there would be -10 sales compared to the previous period. It turned out to be -20 at the end of the current period. The answer could be -100% or -200%. None of these percentages make sense since percentage completed shouldn't be a negative amount. Another reason that makes the percentages unreasonable is that I cannot perceive the difference between 100% and -100% in this case.
On the 2nd category, it was anticipated that there would be 5 less sales in the current period but turns out there was actually 50 sales in the current period. The answer should be +1100% if we agree that every amount of 5 is a 100%.
EDIT: Same as above, the answer for the third category should be -133%
I want to see how much of the target is fulfilled. If actual=target then the answer is 100% although this doesn't make sense if both the actual and the target are negative amount.
If I use (actual/target)*100 negative amounts are always wrong. I need a general formula to calculate the correct answer. I don't mind if the formula has many conditional definitions. How can I do this?
When involving negative amount, you should always know what it is that you are looking for.
example 1:
if you use the absolute value, you should agree that target=10 and actual=-5 is and should be 50%. however, the 'pure' mathematical way to look at it is -50%.
A logical explanation for this is that actual=0 is, as logic predicts, 0%, -5 is even worse! since not only no progress was made, but rather a regression occurred, hence -50% is an understandable result.
example 2:
When both are negative then for target=-10 and actual=-20, since anchoring point is 0, the 'pure' mathematical result is 200% - and is correct (depending on your point of view of course) since you wanted a decrease of 10 and got a decrease of 20.
Note:if you want to define your wanted output differently, do so and we will try and come up with a 'custom' percentage calculation method.
Edit:
What you could try in your case (Although I must say I don't agree with this approach):
if target>0 and actual > 0 : (the usual) :
(actual/target)*100
if target < 0 and actual < 0 : (the usual negative) :
if (target>actual) - actual is worse than expected :
-(actual/target)*100
if (target < actual) - actual is better than expected :
(actual/target)*100
if target>0 and actual < 0 :
((actual-target)/target)*100
corresponds with target=50 , actual = -100 -> result = -300%
if target<0 and actual > 0 :
(abs(target)+actual)/abs(target))*100
so that for target = -50 , but actual = 100 -> result = 300%
I believe that covers your options and suits your needs.
Edit:
A good approach to your issue from my point of view is to look at absolute values (rather than differential values):
Lets say your sales in month A is 200, and you want a 10% increase in month A+1 -> set your target at 220 and then compare the actual to it, you can also compare it month's A actual and overall a report would use the absolute values for comparison, those are always positive, and can be understood more clearly.
now this:
target = -10% , actual +5% and base value of last month 100
will simply be this:
target = 90 actual =105 => Overall performance of 105/90 , or (105/90)-1 higher than expected.
If you want to treat Actual “-50” / Target “50” as 100% fulfilled, you should use the absolute value function in your formula.
| ((actual / target) * 100) |
How you use it in your code depends on the language. In JavaScript, your formula would be like this:
Math.abs((actual / target) * 100)
If this is not how you want your scoring to work, please provide an example of what the score should be when the target or actual is negative.
Based on your edit with more details about what you want, here is some JavaScript that implements that formula:
function percent_difference(target, actual) {
if (target === 0 || actual === 0) {
if (actual > target) {
return 100;
} else if (actual < target) {
return -100;
} else {
return 0;
}
}
var relative_to = Math.min(Math.abs(actual), Math.abs(target));
var distance_from_target_to_actual = actual - target;
var fraction_difference = distance_from_target_to_actual / relative_to;
return 100 * fraction_difference;
}
I tried to avoid unnecessary if statements to keep the code simple.
The function passes these tests:
function test_percent_difference() {
console.log("percent_difference(-10, -20)", percent_difference(-10, -20), "should be", -100);
console.log("percent_difference(-5, 50)", percent_difference(-5, 50), "should be", 1100);
console.log("percent_difference(30, -10)", percent_difference(30, -10), "should be", -400);
console.log("percent_difference(15, 0)", percent_difference(15, 0), "should be", 100);
console.log("percent_difference(0, 0)", percent_difference(0, 0), "should be", 0);
}
You can run it for yourself in your browser in this jsFiddle.
Here is the solution with R.
Assume your data is sample with Target and Actual columns:
sample<-structure(list(Actual = c(-20L, 50L, -10L), Target = c(-10L,
-5L, 30L)), .Names = c("Actual", "Target"), row.names = c(NA,
-3L), class = "data.frame")
sample<-
Actual Target
1 -20 -10
2 50 -5
3 -10 30
#first I compute the percentage deviation as ((Actual-Target)/Actual)*100
#then I will use following two conditions:
# if Actual<0 and Target>0 multiply by -1
#if Actual<0 and Target<0 and if absolute(Actual)>absolute(Target) multiply by -1 else leave as original percent
sample$diff<-with(sample,((Actual-Target)/Actual)*100)
> sample
Actual Target diff
1 -20 -10 50
2 50 -5 110
3 -10 30 400
sample$percent<-with(sample,ifelse((Actual<0 & Target>0),diff*(-1),ifelse((Actual<0 & Target<0),ifelse((abs(Actual)>abs(Target)),diff*-1,diff),diff)))
> sample
Actual Target diff percent
1 -20 -10 50 -50
2 50 -5 110 110
3 -10 30 400 -400
#delete diff column
sample$diff<-NULL
#your final output
> sample
Actual Target percent
1 -20 -10 -50
2 50 -5 110
3 -10 30 -400
Updated:
To match your answers:
sample$diff<-with(sample,((Actual-Target)/Target)*100)
sample$percent<-with(sample,ifelse((Actual<0 & Target>0),diff,ifelse((Actual<0 & Target<0),ifelse((abs(Actual)>abs(Target)),diff*(-1),diff),diff*(-1))))
> sample
Actual Target diff percent
1 -20 -10 100.0000 -100.0000
2 50 -5 -1100.0000 1100.0000
3 -10 30 -133.3333 -133.3333

Resources