Creating Group Constraints in PortfolioAnalytics for R - r

I am working to put together portfolio optimizations with 11 securities using the PortfolioAnalytics package in R. Of the 11, 5 are equity funds, 2 are preferred stock funds, 3 are fixed income, and 1 is a money market fund. I would like to set my asset class allocations to 55% equity, 10% preferred, 30% fixed income, and 5% money market to be fully invested with no leverage and no turnover. What I would hope to see as the output is the various permutations of portfolios but static asset class allocations.
I have tried to use the add.constraint function to achieve this and I've used the following code:
port <- add.constraint(portfolio = port, type="group",
groups= list(c(1:5),(6:7),c(8:10),c(11)),
group_min=c(0.55, 0.1, 0.3, 0.05),
group_max=c(0.55, 0.1, 0.3, 0.05),
group_pos= c(1,1,1,1))
When I attempt to generate random portfolios I get the following error message:
rportfolios <- random_portfolios(port, permutations = 5000, rp_method = "sample")
Error in rp_transform(w = tmp_group_w, min_sum = cLO[j], max_sum = cUP[j], :
Infeasible portfolio created, perhaps increase max_permutations and/or adjust your parameters.
Any thoughts on where I am going wrong?

William, I think the problem is caused by the hard group constraints and the way that the package's random portfolio generator works. Since the portfolios are randomly created it is rare for the generator to produce portfolios that exactly match your criteria given the small number permutations that are tried (i.e. 5000).
This may not be ideal for your problem, but if you provide a little bit of wiggle room in each groups' min-max then the random generator is more likely to create a portfolio that falls within said range. For example instead of setting min=max=0.55, try min=0.5495 and max=0.555 and at the same time increase the permutations to 10k or more. I had the same problem and resolved it this way.

Related

Confused for the Code for over-sampling with R

The code below is about oversampling houses with over 10 rooms, may I ask what does prob = ifelse(housing.df$ROOMS>10, 0.9, 0.01) mean? Thanks a lot.
s <- sample(row.names(housing.df), 5, pro = ifelse(housing.df$ROOMS>10, 0.9, 0.01))
housing.df[s.]
I imagine the purpose of this ccode is to first check to see if a given house in the data set has ten rooms. If that is the case then it gets a probability of 90%, otherwise it gets a probability of 10%
sample with sample from the given house names using this associated probability thus favouring those houses with more than ten rooms when it samples. This creates your over sample.
Is this what you mean?

How to calculate and plot a "beta-delta discounting model"?

My code for getting a proper plot in R does not seem to work (I am new to R and I am having difficulties with coding).
Basically, using the concept of temporal discounting in the form of beta-delta model, we are supposed to calculate the subjective value for $10 at every delay from 0 to 365.
The context of the homework is that we have to account for the important exception that if a reward is IMMEDIATE, there’s no discount, but if it occurs at any delay there’s both an exponential discount and a delay penalty.
I created a variable called BetaDeltaValuesOf10, which is 366 elements long and represents the subjective value for $10 at each delay from 0 to 365.
The code needs to have the following properties using for-loops and an if-else statement:
1) IF the delay is 0, the subjective value is the objective magnitude (and should be saved to the appropriate element of BetaDeltaValuesOf10.
2) OTHERWISE, calculate the subjective value at the exponentially discounted rate, assuming 𝛿 = .98 and apply a delay penalty of .8, then save it to the appropriate element of BetaDeltaValuesOf10.
The standard code given to us to help us in creating the code is as follows:
BetaDeltaValuesOf10 = 0
Delays = 0:365
Code(Equation) to get subjective value/preference using exponential discounting model:
ExponentialDecayValuesOf10 = .98^Delays*10
0.98 is the discount rate which ranges between 0 and 1.
Delays is the number of time periods in the future when the later reward will be delivered
10 is the subjective value of $10
Code(Equation) to get subjective value using beta-delta model:
0.8*0.98^Delays*10
0.8 is the delay penalty
The code I came up with in trying to satisfy the above mentioned properties is as follows:
for(t in 1:length(Delays)){BetaDeltaValuesOf10 = 0.98^0*10
if(BetaDeltaValuesOf10 == 0){0.98^t*10}
else {0.8*0.98^t*10}
}
So, I tried the code and did not get any error. But, when I try to plot the outcome of the code, my plot comes up blank.
To plot I used the code:
plot(BetaDeltaValuesOf10,type = 'l', ylab = 'DiscountedValue')
I believe that my code is actually faulty and that is why I am not getting a proper outcome for my plot.
Please let me know of the amendments to the code and if the community needs any clarification, I will try to clarify as soon as I can.
result <- double(length=366)
delays <- 0:365
val <- 10
delta <- 0.98
penalty <- 0.8
for(t in seq_along(delays)) {
result[t] <- val * delta^delays[t] * penalty^(delays[t]>0)
}
plot(x=delays, y=result, pch=20)

prop.test alternative statement usage

I am testing if a sending information to consumers about promotion convince them to buy anything. Out of 100k consumers we randomly selected 90% of them and sended them catalogs. After sometime we tracked who have bought.
To recreate the problem lets use:
set.seed(1)
got <- rbinom(n=100000, size=1, prob=0.1)
bought <- rbinom(n=100000, size=1, prob=0.05)
table(got, bought)
bought
got 0 1
0 85525 4448
1 9567 460
As I read on here I should use prop.test(table(got, bought), correct=FALSE) function, but i want to check not only if the proportions are equal, but if the proportion of those who bought during promotion, for the group who got the leaflet was greater then in those who didn't get it.
Should I use argument alternative = "less" or , alternative = "greater"? and dose the order or got and bought is impotent?
You usually want to use a two sided alternative (for all you know sending promotion annoys people and they are less likely to purchase).
prop.test is doing a chi square test which by definition does not look at which group is bigger.
You could do a t.test like this
t.test(bought ~ got, data = data.frame(got = got, bought = bought))
Depending on your typical conversion rate and sample size and alpha you can get confidence intervals implying negative conversion rates so a Bootstrapping or Bayesian approach may be better suited.

What is the probability of a TERM for a specific TOPIC in Latent Dirichlet Allocation (LDA) in R

I'm working in R, package "topicmodels". I'm trying to work out and better understand the code/package. In most of the tutorials, documentation I'm reading I'm seeing people define topics by the 5 or 10 most probable terms.
Here is an example:
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
lda <- LDA(AssociatedPress[1:20,], k = 5)
topics(lda)
terms(lda)
terms(lda,5)
so the last part of the code returns me the 5 most probable terms associated with the 5 topics I've defined.
In the lda object, i can access the gamma element, which contains per document the probablity of beloning to each topic. So based on this I can extract the topics with a probability greater than any threshold I prefer, instead of having for everyone the same number of topics.
But my second step would then to know which words are strongest associated to the topics. I can use the terms(lda) function to pull this out, but this gives me the N so many.
In the output I've also found the
lda#beta
which contains the beta per word per topic, but this is a Beta value, which I'm having a hard time interpreting. They are all negative values, and though I see some values around -6, and other around -200, i can't interpret this as a probability or a measure to see which words and how much stronger certain words associate to a topic. Is there a way to pull out/calculate anything that can be interpreted as such a measure.
many thanks
Frederik
The beta-matrix gives you a matrix with dimension #topics x #terms. The values are log-likelihoods, therefore you exp them. The given probabilities are of the type
P(word|topic) and these probabilities only add up to 1 if you take the sum over the words but not over the topics P(all words|topic) = 1 and NOT P(word|all topics) = 1.
What you are searching for is P(topic|word) but I actually do not know how to access or calculate it in this context. You will need P(word) and P(topic) I guess. P(topic) should be:
colSums(lda#gamma)/sum(lda#gamma)
Becomes more obvious if you look at the gamma-matrix, which is #document x #topics. The given probabilites are P(topic|document) and can be interpreted as "what is the probability of topic x given document y". The sum over all topics should be 1 but not the sum over all documents.

Error probability function

I have DNA amplicons with base mismatches which can arise during the PCR amplification process. My interest is, what is the probability that a sequence contains errors, given the error rate per base, number of mismatches and the number of bases in the amplicon.
I came across an article [Cummings, S. M. et al (2010). Solutions for PCR, cloning and sequencing errors in population genetic analysis. Conservation Genetics, 11(3), 1095–1097. doi:10.1007/s10592-009-9864-6]
that proposes this formula to calculate the probability mass function in such cases.
I implemented the formula with R as shown here
pcr.prob <- function(k,N,eps){
v = numeric(k)
for(i in 1:k) {
v[i] = choose(N,k-i) * (eps^(k-i)) * (1 - eps)^(N-(k-i))
}
1 - sum(v)
}
From the article, suggest we analysed an 800 bp amplicon using a PCR of 30 cycles with 1.85e10-5 misincorporations per base per cycle, and found 10 unique sequences that are each 3 bp different from their most similar sequence. The probability that a novel sequences was generated by three independent PCR errors equals P = 0.0011.
However when I use my implementation of the formula I get a different value.
pcr.prob(3,800,0.0000185)
[1] 5.323567e-07
What could I be doing wrong in my implementation? Am I misinterpreting something?
Thanks
I think they've got the right number (0.00113), but badly explained in their paper.
The calculation you want to be doing is:
pbinom(3, 800, 1-(1-1.85e-5)^30, lower=FALSE)
I.e. what's the probability of seeing less than three modifications in 800 independent bases, given 30 amplifications that each have a 1.85e-5 chance of going wrong. I.e. you're calculating the probability it doesn't stay correct 30 times.
Somewhat statsy, may be worth a move…
Thinking about this more, you will start to see floating-point inaccuracies when working with very small probabilities here. I.e. a 1-x where x is a small number will start to go wrong when the absolute value of x is less than about 1e-10. Working with log-probabilities is a good idea at this point, specifically the log1p function is a great help. Using:
pbinom(3, 800, 1-exp(log1p(-1.85e-5)*30), lower=FALSE)
will continue to work even when the error incorporation rate is very low.

Resources