Draw a plot in R as the function varies - r

I have this code named prob.
prob<-function(k){
start=1
for(i in 1:k-1){
cumm=start*(1-i/365)
start=cumm
}
return(start)
}
Then, I created this function, opp.
opp<-function(a){
1-prob(a)
}
Now, I want to plot opp from 1 to 25.
For example, I want to see opp(1), opp(2), opp(3), opp(4)...., opp(25)
I have tried
plot(opp(a=x),from=1,to=25)
or
plot(opp,from=1,to=25)
or
plot(1:25,opp[1:25])
...
none of these work...
so frustrated... please help!

I would fix the opp function to be vectorized before plotting.
You can just Vectorize(opp) for a quick fix:
prob<-function(k){
start=1
for(i in 1:k-1){
cumm=start*(1-i/365)
start=cumm
}
return(start)
}
opp<-function(a){
1-prob(a)
}
vopp <- Vectorize(opp)
vopp(1:25)
# [1] 0.000000000 0.002739726 0.008204166 0.016355912 0.027135574 0.040462484
# [7] 0.056235703 0.074335292 0.094623834 0.116948178 0.141141378 0.167024789
# [13] 0.194410275 0.223102512 0.252901320 0.283604005 0.315007665 0.346911418
# [19] 0.379118526 0.411438384 0.443688335 0.475695308 0.507297234 0.538344258
# [25] 0.568699704
Alternatively, doing it a more "R" way would be:
opp2 <- function(k) 1 - cumprod(1 - (k - 1) / 365)
opp2(1:25)
# [1] 0.000000000 0.002739726 0.008204166 0.016355912 0.027135574 0.040462484
# [7] 0.056235703 0.074335292 0.094623834 0.116948178 0.141141378 0.167024789
# [13] 0.194410275 0.223102512 0.252901320 0.283604005 0.315007665 0.346911418
# [19] 0.379118526 0.411438384 0.443688335 0.475695308 0.507297234 0.538344258
# [25] 0.568699704
And both give me this
plot(opp2(1:25))
plot(vopp(1:25))

Related

How to add new value to existing dataset so that only the range changes but mean remains the same in R?

Hi I'm a student studying statistic, as my textbook does not include much of the R coding but more of the basic calculation. Hence, would like to ask if it is there a way in R, for adding additional number to the existing generated set with specific mean and range?
1(a) Apply R to simulate a set of 100 numbers, with mean value of 20 and standard deviation of 2. List out the set of numbers.
> x <- rnorm(100,20,2)
> print(x)
[1] 20.59256 20.66069 12.68841 21.13575 24.09587 21.69535 20.18661 21.71236 20.92864 19.63182 22.12583 19.06238
[13] 18.73813 22.59813 17.30012 16.98957 20.74050 21.28319 19.75426 20.62065 20.20814 18.16406 22.24261 22.05673
[25] 21.27086 18.78538 21.86479 18.03242 21.00538 20.27731 22.59440 23.24389 20.20846 19.73281 19.50040 20.51712
[37] 20.16493 23.56715 21.25884 18.37542 19.84470 19.81911 16.94701 19.06637 17.74580 18.03151 19.57144 16.45314
[49] 20.89975 21.86249 17.42996 23.52514 21.17759 20.20160 18.11839 21.69716 16.93685 20.62335 20.37935 22.46131
[61] 17.78489 19.90424 17.67674 20.20571 21.60567 20.41897 20.25134 22.44366 19.06513 20.62692 24.04101 24.03634
[73] 20.15566 20.33157 20.22881 20.54014 19.49401 17.34388 19.94099 18.71450 19.24386 19.91813 18.71863 20.94027
[85] 17.55676 17.18079 24.96868 24.09565 19.87488 20.06114 19.21374 18.39874 21.01435 18.38329 20.91788 21.45158
[97] 20.43168 21.80438 20.50405 23.07149
(b) Add another 2 numbers to the set simulated in Question 1(a), such that the new set now has (same) mean of 20, but range becomes 200. List out the set of numbers.
First create reproducible data:
set.seed(42)
x <- rnorm(100,20,2)
mean(x)
# [1] 20.06503
range(x)
# [1] 14.01382 24.57329
(x2 <- mean(x) + c(-100, 100))
# [1] -79.93497 120.06503
To keep the mean the same we need to add points 100 above the mean and 100 below the mean. Fortunately these points lie beyond the original range.
mean(c(x, x2))
# [1] 20.06503
diff(range(c(x, x2)))
# [1] 200
The mean is the same and the range is now 200.
As you need a range of 200, then each aggregation should be current_range-+desired_range/2
Solution in code:
> x <- rnorm(100,20,2)
>
> x
[1] 17.84671 19.02797 23.83426 21.28975 20.35738 19.35365 22.57753 15.09991 18.18989 21.61537 20.97786 20.74412 20.95964
[14] 20.00677 13.79552 16.65435 23.48840 19.50842 25.10979 21.10134 19.15891 22.58312 23.65634 17.89358 17.98529 22.33547
[27] 20.84291 21.28044 22.37447 16.89740 19.95510 17.67625 19.64634 18.07762 21.50655 18.62182 18.59671 15.53542 12.85074
[40] 19.06638 19.90743 18.64610 20.71322 22.78706 22.33449 22.30899 17.09384 21.57055 19.88208 18.85795 18.52198 23.70028
[53] 22.91794 20.24993 20.63627 19.01672 19.34706 17.42375 21.88536 20.91214 21.16099 23.54738 21.40821 21.06485 23.95725
[66] 21.09893 16.15641 21.28983 19.27113 17.89774 23.24801 23.23136 22.67976 23.21619 20.17257 21.09512 16.83565 22.17975
[79] 20.50282 23.86079 14.97483 16.91109 18.66540 21.79649 21.01789 18.81188 19.77038 25.04698 17.69211 20.04085 17.29910
[92] 18.98335 16.37297 19.78979 18.83341 16.60093 19.41327 17.85721 22.55003 16.67850
>
> mean(x)
[1] 19.99774
>
> sd(x)
[1] 2.494173
>
> range <- range(x)[2]-range(x)[1]
>
> range
[1] 12.25905
>
> x <- c(x,range+100,range-100)
>
> mean(x)
[1] 19.846
>
> sd(x)
[1] 14.3276
>
> range <- range(x)[2]-range(x)[1]
>
> range
[1] 200
>

How to output in R all possible deviations of a word for a fixed distance value?

I have a word and want to output in R all possible deviatons (replacement, substitution, insertion) for a fixed distance value into a vector.
For instance, the word "Cat" and a fixed distance value of 1 results in a vector with the elements "cot", "at", ...
I'm going to assume that you want all actual words, not just permutations of the characters with an edit distance of 1 that would include non-words such as "zat".
We can do this using adist() to compute the edit distance between your target word and all eligible English words, taken from some word list. Here, I used the English syllable dictionary from the quanteda package (you did tag this question as quanteda after all) but this could have been any vector of English dictionary words from any other source as well.
To narrow things down, we first exclude all words that are different in length from the target word by your distance value.
distfn <- function(word, distance = 1) {
# select eligible words for efficiency
eligible_y_words <- names(quanteda::data_int_syllables)
wordlengths <- nchar(eligible_y_words)
eligible_y_words <- eligible_y_words[wordlengths >= (nchar(word) - distance) &
wordlengths <= (nchar(word) + distance)]
# compute Levenshtein distance
distances <- utils::adist(word, eligible_y_words)[1, ]
# return only those for the requested distance value
eligible_y_words[distances == distance]
}
distfn("cat", 1)
## [1] "at" "bat" "ca" "cab" "cac" "cad" "cai" "cal" "cam" "can"
## [11] "cant" "cao" "cap" "caq" "car" "cart" "cas" "cast" "cate" "cato"
## [21] "cats" "catt" "cau" "caw" "cay" "chat" "coat" "cot" "ct" "cut"
## [31] "dat" "eat" "fat" "gat" "hat" "kat" "lat" "mat" "nat" "oat"
## [41] "pat" "rat" "sat" "scat" "tat" "vat" "wat"
To demonstrate how this works on longer words, with alternative distance values.
distfn("coffee", 1)
## [1] "caffee" "coffeen" "coffees" "coffel" "coffer" "coffey" "cuffee"
## [8] "toffee"
distfn("coffee", 2)
## [1] "caffey" "calfee" "chafee" "chaffee" "cofer" "coffee's"
## [7] "coffelt" "coffers" "coffin" "cofide" "cohee" "coiffe"
## [13] "coiffed" "colee" "colfer" "combee" "comfed" "confer"
## [19] "conlee" "coppee" "cottee" "coulee" "coutee" "cuffe"
## [25] "cuffed" "diffee" "duffee" "hoffer" "jaffee" "joffe"
## [31] "mcaffee" "moffet" "noffke" "offen" "offer" "roffe"
## [37] "scoffed" "soffel" "soffer" "yoffie"
(Yes, according to the CMU pronunciation dictionary, those are all actual words...)
EDIT: Make for all permutations of letters, not just actual words
This involves permutations from the alphabet that have the fixed edit distances from the input word. Here I've done it not particular efficiently by forming all permutations of letters within the eligible ranges, and then computing their edit distance from the target word, and then selecting them. So it's a variation of above, except instead of a dictionary, it uses permuted words.
distfn2 <- function(word, distance = 1) {
result <- character()
# start with deletions
for (i in max((nchar(word) - distance), 0):(nchar(word) - 1)) {
result <- c(
result,
combn(unlist(strsplit(word, "", fixed = TRUE)), i,
paste,
collapse = "", simplify = TRUE
)
)
}
# now for changes and insertions
for (i in (nchar(word)):(nchar(word) + distance)) {
# all possible edits
edits <- apply(expand.grid(rep(list(letters), i)),
1, paste0,
collapse = ""
)
# remove original word
edits <- edits[edits != word]
# get all distances, add to result
distances <- utils::adist(word, edits)[1, ]
result <- c(result, edits[distances == distance])
}
result
}
For the OP example:
distfn2("cat", 1)
## [1] "ca" "ct" "at" "caa" "cab" "cac" "cad" "cae" "caf" "cag"
## [11] "cah" "cai" "caj" "cak" "cal" "cam" "can" "cao" "cap" "caq"
## [21] "car" "cas" "aat" "bat" "dat" "eat" "fat" "gat" "hat" "iat"
## [31] "jat" "kat" "lat" "mat" "nat" "oat" "pat" "qat" "rat" "sat"
## [41] "tat" "uat" "vat" "wat" "xat" "yat" "zat" "cbt" "cct" "cdt"
## [51] "cet" "cft" "cgt" "cht" "cit" "cjt" "ckt" "clt" "cmt" "cnt"
## [61] "cot" "cpt" "cqt" "crt" "cst" "ctt" "cut" "cvt" "cwt" "cxt"
## [71] "cyt" "czt" "cau" "cav" "caw" "cax" "cay" "caz" "cata" "catb"
## [81] "catc" "catd" "cate" "catf" "catg" "cath" "cati" "catj" "catk" "catl"
## [91] "catm" "catn" "cato" "catp" "catq" "catr" "cats" "caat" "cbat" "acat"
## [101] "bcat" "ccat" "dcat" "ecat" "fcat" "gcat" "hcat" "icat" "jcat" "kcat"
## [111] "lcat" "mcat" "ncat" "ocat" "pcat" "qcat" "rcat" "scat" "tcat" "ucat"
## [121] "vcat" "wcat" "xcat" "ycat" "zcat" "cdat" "ceat" "cfat" "cgat" "chat"
## [131] "ciat" "cjat" "ckat" "clat" "cmat" "cnat" "coat" "cpat" "cqat" "crat"
## [141] "csat" "ctat" "cuat" "cvat" "cwat" "cxat" "cyat" "czat" "cabt" "cact"
## [151] "cadt" "caet" "caft" "cagt" "caht" "cait" "cajt" "cakt" "calt" "camt"
## [161] "cant" "caot" "capt" "caqt" "cart" "cast" "catt" "caut" "cavt" "cawt"
## [171] "caxt" "cayt" "cazt" "catu" "catv" "catw" "catx" "caty" "catz"
Also works with other edit distances, although it becomes very slow for longer words.
d2 <- distfn2("cat", 2)
set.seed(100)
c(head(d2, 50), sample(d2, 50), tail(d2, 50))
## [1] "c" "a" "t" "ca" "ct" "at" "aaa" "baa"
## [9] "daa" "eaa" "faa" "gaa" "haa" "iaa" "jaa" "kaa"
## [17] "laa" "maa" "naa" "oaa" "paa" "qaa" "raa" "saa"
## [25] "taa" "uaa" "vaa" "waa" "xaa" "yaa" "zaa" "cba"
## [33] "aca" "bca" "cca" "dca" "eca" "fca" "gca" "hca"
## [41] "ica" "jca" "kca" "lca" "mca" "nca" "oca" "pca"
## [49] "qca" "rca" "cnts" "cian" "pcatb" "cqo" "uawt" "hazt"
## [57] "cpxat" "aaet" "ckata" "caod" "ncatl" "qcamt" "cdtp" "qajt"
## [65] "bckat" "qcatr" "cqah" "rcbt" "cvbt" "bbcat" "vcaz" "ylcat"
## [73] "cahz" "jcgat" "mant" "jatd" "czlat" "cbamt" "cajta" "cafp"
## [81] "cizt" "cmaut" "qwat" "jcazt" "hdcat" "ucant" "hate" "cajtl"
## [89] "caaty" "cix" "nmat" "cajit" "cmnat" "caobt" "catoi" "ncau"
## [97] "ucoat" "ncamt" "jath" "oats" "chatz" "ciatz" "cjatz" "ckatz"
## [105] "clatz" "cmatz" "cnatz" "coatz" "cpatz" "cqatz" "cratz" "csatz"
## [113] "ctatz" "cuatz" "cvatz" "cwatz" "cxatz" "cyatz" "czatz" "cabtz"
## [121] "cactz" "cadtz" "caetz" "caftz" "cagtz" "cahtz" "caitz" "cajtz"
## [129] "caktz" "caltz" "camtz" "cantz" "caotz" "captz" "caqtz" "cartz"
## [137] "castz" "cattz" "cautz" "cavtz" "cawtz" "caxtz" "caytz" "caztz"
## [145] "catuz" "catvz" "catwz" "catxz" "catyz" "catzz"
This could be speeded up by less brute force formation of all permutations and then applying adist() to them - it could consist of changes or insertions of known edit distances generated algorithmically from letters.

Finding first N consecutive composite numbers

The following is guaranteed to return N consecutive composite numbers:
(N+1)!+2,(N+1)!+3........(N+1)!+(N+1)
I used this to find 5 consecutive composite numbers in R using:
N=5
for(i in 2:6){a=factorial(N+1)+i;print(a);}
# [1] 722
# [1] 723
# [1] 724
# [1] 725
# [1] 726
However, I want first 'N' consecutive composite numbers, which this code is not guaranteed to return. For instance, for N=5, I want 24,25,26,27,28.
You can generate a list of primes with numbers:::Primes or numbers:::primeSieve (thanks to #Nicola for pointing to this function in a comment!), compute the gaps between each with diff, and then return a sequence from the first prime number whose gap is at least N with seq:
library(numbers)
primes <- as.integer(numbers:::primeSieve(100000000)) # About 9 seconds
d <- diff(primes)
firstNComposite <- function(N) {
valid <- which(d >= N+1)
if (length(valid) == 0) {
stop("Need to generate more prime numbers")
} else {
seq(primes[valid[1]]+1, length.out=N)
}
}
firstNComposite(5)
# [1] 24 25 26 27 28
firstNComposite(200)
# [1] 20831324 20831325 20831326 20831327 20831328 20831329 20831330 20831331 20831332 20831333 20831334
# [12] 20831335 20831336 20831337 20831338 20831339 20831340 20831341 20831342 20831343 20831344 20831345
# [23] 20831346 20831347 20831348 20831349 20831350 20831351 20831352 20831353 20831354 20831355 20831356
# [34] 20831357 20831358 20831359 20831360 20831361 20831362 20831363 20831364 20831365 20831366 20831367
# [45] 20831368 20831369 20831370 20831371 20831372 20831373 20831374 20831375 20831376 20831377 20831378
# [56] 20831379 20831380 20831381 20831382 20831383 20831384 20831385 20831386 20831387 20831388 20831389
# [67] 20831390 20831391 20831392 20831393 20831394 20831395 20831396 20831397 20831398 20831399 20831400
# [78] 20831401 20831402 20831403 20831404 20831405 20831406 20831407 20831408 20831409 20831410 20831411
# [89] 20831412 20831413 20831414 20831415 20831416 20831417 20831418 20831419 20831420 20831421 20831422
# [100] 20831423 20831424 20831425 20831426 20831427 20831428 20831429 20831430 20831431 20831432 20831433
# [111] 20831434 20831435 20831436 20831437 20831438 20831439 20831440 20831441 20831442 20831443 20831444
# [122] 20831445 20831446 20831447 20831448 20831449 20831450 20831451 20831452 20831453 20831454 20831455
# [133] 20831456 20831457 20831458 20831459 20831460 20831461 20831462 20831463 20831464 20831465 20831466
# [144] 20831467 20831468 20831469 20831470 20831471 20831472 20831473 20831474 20831475 20831476 20831477
# [155] 20831478 20831479 20831480 20831481 20831482 20831483 20831484 20831485 20831486 20831487 20831488
# [166] 20831489 20831490 20831491 20831492 20831493 20831494 20831495 20831496 20831497 20831498 20831499
# [177] 20831500 20831501 20831502 20831503 20831504 20831505 20831506 20831507 20831508 20831509 20831510
# [188] 20831511 20831512 20831513 20831514 20831515 20831516 20831517 20831518 20831519 20831520 20831521
# [199] 20831522 20831523

Appending or storing a for loop in r in a list

I'm essentially calculating probabilities that I hope to plot. My goal was to put these in a list or a vector, but I've been having trouble doing so. I wrote this for loop and I can print out all the values, but I can't seem to store them in a list.
probs <- list()
for (i in seq(0,1,length=100)) {
append(probs,1 - mean(rbinom(100000,6,i)==4))
}
I've been able to print my results (see below), but not store them.
for (i in seq(0,1,length=100)) {
print(1 - mean(rbinom(100000,6,i)==4))
}
When I see what is in the list of probabilities I get an empty list. Why is this the case?
append doesn't mutate the list, it just returns a new list, which gets lost because it's not saved anywhere.
Try this:
probs <- list()
for (i in seq(0,1,length=100)) {
probs <- append(probs,1 - mean(rbinom(100000,6,i)==4))
}
or use a vector instead:
probs <- vector()
for (i in seq(0,1,length=100)) {
probs <- append(probs,1 - mean(rbinom(100000,6,i)==4))
}
probs # obviously it varies per execution
[1] 1.00000 1.00000 1.00000 0.99999 0.99994 0.99988 0.99981 0.99971 0.99946
[10] 0.99921 0.99864 0.99824 0.99758 0.99674 0.99573 0.99466 0.99271 0.99031
[19] 0.98862 0.98669 0.98389 0.98074 0.97781 0.97457 0.97028 0.96523 0.96225
[28] 0.95483 0.94936 0.94447 0.93965 0.93278 0.92486 0.91704 0.90908 0.90305
[37] 0.89365 0.88608 0.87596 0.86834 0.85762 0.84787 0.83998 0.82830 0.81918
[46] 0.80843 0.80078 0.78881 0.77933 0.77061 0.76032 0.75416 0.74201 0.73520
[55] 0.72575 0.71545 0.70855 0.70364 0.69809 0.69209 0.68604 0.68294 0.67610
[64] 0.67491 0.67463 0.67226 0.67048 0.66950 0.67413 0.67556 0.67900 0.68065
[73] 0.68591 0.69526 0.70135 0.70930 0.71619 0.72765 0.73874 0.75130 0.76355
[82] 0.77658 0.79165 0.80553 0.82090 0.83728 0.84947 0.86944 0.88331 0.90126
[91] 0.91595 0.92875 0.94334 0.95669 0.96918 0.97930 0.98813 0.99400 0.99856
[100] 1.00000

When to use approxfun vs. approx

The documentation for approxfun states that it is "often more useful than approx". I'm struggling to get my head around approxfun. When would approxfun be more useful than approx (and when would approx be more useful)?
approx returns the value of the approximated function at (either) specified points or at a given number of points. approxfun returns a function which can then be evaluated at some specific points. If you need the approximation at points that you know at the time of making the approximation, approx will do that for you. If you need a function (in the mathematical sense) which will return the value of the approximation for some argument given later, approxfun is what you need.
Here are some examples.
dat <- data.frame(x=1:10, y=(1:10)^2)
The output from approx and approxfun using this data
> approx(dat$x, dat$y)
$x
[1] 1.000000 1.183673 1.367347 1.551020 1.734694 1.918367 2.102041
[8] 2.285714 2.469388 2.653061 2.836735 3.020408 3.204082 3.387755
[15] 3.571429 3.755102 3.938776 4.122449 4.306122 4.489796 4.673469
[22] 4.857143 5.040816 5.224490 5.408163 5.591837 5.775510 5.959184
[29] 6.142857 6.326531 6.510204 6.693878 6.877551 7.061224 7.244898
[36] 7.428571 7.612245 7.795918 7.979592 8.163265 8.346939 8.530612
[43] 8.714286 8.897959 9.081633 9.265306 9.448980 9.632653 9.816327
[50] 10.000000
$y
[1] 1.000000 1.551020 2.102041 2.653061 3.204082 3.755102
[7] 4.510204 5.428571 6.346939 7.265306 8.183673 9.142857
[13] 10.428571 11.714286 13.000000 14.285714 15.571429 17.102041
[19] 18.755102 20.408163 22.061224 23.714286 25.448980 27.469388
[25] 29.489796 31.510204 33.530612 35.551020 37.857143 40.244898
[31] 42.632653 45.020408 47.408163 49.918367 52.673469 55.428571
[37] 58.183673 60.938776 63.693878 66.775510 69.897959 73.020408
[43] 76.142857 79.265306 82.551020 86.040816 89.530612 93.020408
[49] 96.510204 100.000000
> approxfun(dat$x, dat$y)
function (v)
.C(C_R_approxfun, as.double(x), as.double(y), as.integer(n),
xout = as.double(v), as.integer(length(v)), as.integer(method),
as.double(yleft), as.double(yright), as.double(f), NAOK = TRUE,
PACKAGE = "stats")$xout
<bytecode: 0x05244854>
<environment: 0x030632fc>
More examples of usage:
a <- approx(dat$x, dat$y)
af <- approxfun(dat$x, dat$y)
plot(dat)
points(a, pch=2)
plot(dat)
curve(af, add=TRUE)
or another example where a function is needed:
> uniroot(function(x) {af(x)-4}, interval=c(1,10))
$root
[1] 1.999994
$f.root
[1] -1.736297e-05
$iter
[1] 24
$estim.prec
[1] 6.103516e-05

Resources