Use of Bayesian formula in R? [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am looking for a package in R that can help me to calculate the posterior probability of an event. Is there any?
Alright, I am working on such a data set
age education grade pass
group1 primary 50 no
group2 tertiary 20 no
group1 secondary 70 yes
group2 secondary 67 yes
group1 secondary 55 yes
group1 secondary 49 no
group1 secondary 76 yes
I have the prior probability of a student passing the exam is 0.6, Now I need to get the posterior probability of a student pass given his age , education level, and grade
I know I should get first P(age=group1| pass=yes)* P(education=primary| pass=yes)* P(grade>50 |pass=yes)
But this should be done for each case (row) and I have a date set with 1000 rows
So, I thought I can get a function helps me in this!

There are many packages/functions available. You should check out the Bayesian inference Task View
Also, if your prior isn't a full distribution and is only a point estimate of a probability, you're probably not actually doing bayesian inference, just using bayes rule in a frequentist framework. Very different. But now we're getting into CrossValidated territory.

This is the answer for only one variable (education) and variable (pass):
# get the prior probabilities
prior<- c(prior_no, prior_yes)
# get contingency table of values for mydata
edu_table<- with(mydata,table(mydata$pass, mydata$education))
# get the sum across (pass)
tots<- apply(edu_table,1,sum)
# create matrix of 0's
ppn<- edu_table*0
post<- edu_table*0
# use a loop to get the prior probabilities& posterior probabilities
for(i in 1:length(tots)){
for( j in 1: 4){
ppn[i,j]=edu_table[i,j]/tots[i]
post[i,j]=prior[i]*ppn[i,j]/(prior[1]*ppn[1,j]+prior[2]*ppn[2,j])
}
}
ppn # probability of education=j given y=i
post # posterior probability of y=i given education=j

Related

How to preform a t distribution in R with a 90% confidence level [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 days ago.
Improve this question
A hard Drive manufacturers is required to ensure that the mean time between failures for its new hard drive is 1 million hours. A stress test is designed that can simulate the workload at a much faster rate. The test is designed so that a test lasting 10 days is equivalent to the hard drive lasting 1 million hours. In stress tests of 15 hard drives, the average is 9.5 days with a standard deviation of 1 day. Does a 90% confidence level include 10 days?
t<-(9.5-10)/(1/sqrt(15))-1.94
I think the next step is finding the critical t value, but I am not sure how to do that with a confidence level of 90%.
The critical value for a two-tailed t-test at alpha = 0.1 (90% confidence level) with (say) 15 degrees of freedom is:
alpha <- 0.1
df <- 15
qt(1-alpha/2, df = df)
## 1.753
compute 1-alpha/2 to get the upper bound such that the probability of lying in the upper tail is alpha/2 (division by two because we're doing a two-tailed test)
apply the quantile function (inverse CDF) of the t distribution with the appropriate number of degrees of freedom.
You could look up the number in the back of a stats book, too, but these days it might be easier to find a computer ... you could also get this from an online calculator (use 0.05 as your alpha level since it gives a one-tailed critical value).

doing likelihood plot in R for binomial model [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Can anybody tell me how do i plot the maximum likelihood values L(ˆθM, M) versus M for a suitable range of M values for the count data provided in frogs and then estimate the total number of frogs living in the pond and the probability of appearance in R?
These were the questions asked:
questions and i have answered a and b
I have my pmf of my module and finded likelihood and log likelihood of my binomial model and you can see how much code i have written so far please help!
solutions to a,b and c so far
# importing the necessary modules
library(tidyverse)
library(ggplot2)
# loading the data
load("~/Statistical Modelling and Inference/aut2020.RData")
# Assigning a variable to the data
data <- frogs
# Assigning n to the length of the data
n <- length(frogs$counts)
n
theta_hat <- sum(frogs$counts)/M
loglik <- function(theta, y, M, data){
# Computes the log_likelihood for binomial model
sum_y <- sum(data$counts)
M <- sum_y / n
sum(log(dbinom(M,y))) + sum(y)*log(theta) + n*M - sum(y)*log(1-theta)}
Data looks like this:
in r script
when readed
Since you have already found the likelihood function in your answer (a), you can see that it is a function of M and theta - both unknown.
After estimating theta you have the MLE estimator - let's call it theta_hat.
In the dataframe frogs you have all the count observations y_i (known). So, using the known data and the ML estimate theta_hat that means the likelihood can be plotted for some (reasonable) range of values of M (you might need to try different ranges). So plot L(theta_hat, M) as a function of M. Bear in mind though that the estimate theta_hat will change as you change M so take that into account. The point where L(theta_hat, M) is maximized are you ML estimates for theta and M.

Weighting Brand Entropy using Frequency of Purchases [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a list of purchases for every customer and I am trying to determine brand loyalty. Based on this list I have calculated each customer's brand entropy which I am using as a proxy for brand loyalty. For example, if a customer only purchase brand_a then then their entropy will be 0 and they are very brand loyal. However, if the customer purchases brand_a, brand_b and others then their entropy will be high and they are not very brand loyal.
# Dummy Data
CUST_ID <- c("c_X","c_X","c_X","c_Y","c_Y","c_Z")
BRAND <- c("brand_a","brand_a","brand_a","brand_a","brand_b","brand_a")
PURCHASES <- data.frame(CUST_ID,BRAND)
# Casting from PURCHASES to grouped_by CUST_ID
library(plyr)
library(dplyr)
library(data.table)
ENTROPY <- PURCHASES %>%
group_by(CUST_ID, BRAND) %>%
summarise(count = n()) %>%
dcast(CUST_ID ~ BRAND, value.var = "count")
ENTROPY[is.na(ENTROPY)] <- 0
# Calculating Entropy
library(entropy)
ENTROPY$entropy <- NA
for (i in 1:nrow(ENTROPY)){
ENTROPY[i,4] <- entropy(as.numeric(as.vector(ENTROPY[i,2:3])), method="ML")
}
# Calculating Frequency
ENTROPY$frequency <- ENTROPY$brand_a + ENTROPY$brand_b
ENTROPY
However, my problem is that entropy does not account for the quantity of purchases of each customer. Consider the following cases:
1) Customer_X has made 3 purchases, each time it is brand_a. Their entropy is 0.
2) Customer_Z has made 1 purchase, it is brand_a. Their entropy is 0.
Naturally, we are more sure that Customer_X is more brand loyal then Customer_Z. Therefore, I would like to weight the entropy calculations by the frequency. However, Customer_X: 0/3 = 0 and Customer_Z: 0/1 = 0.
Essentially, I want a clever way to have Customer_X to have a low value for my brand loyalty and Customer_Z to have a higher value. One thought was to use a CART/Decision Tree/Random Forest Model, but if it can be done using clever math, that would be ideal.
I think the index that you want is entropy normalised by some expectation for the entropy given the number of purchases. Essentially, fit a curve to the graph of entropy vs number of purchases, and then divide each entropy by the expectation given by the curve.
Now this doesn't solve your problem with super-loyal customers which have 0 entropy. But I think the question there is subtly different: Is the apparent loyalty due to chance (low count) or is it real? This is a distinct question to how loyal is that customer. Essentially, you want to know the probability of a observing such a data point.
You could compute the probability of only having bought a single brand given the number of purchases from your data, if the 0 entropy events are your only pain point.
Alternatively, you could determine the full joint probability distribution for entropy and number of purchases (instead of just the mean), e.g. by density estimation, and then compute the conditional probability observing a given entropy given the number of purchases.

distribution from percentage with R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have distribution of parameter (natural gas mixture composition) expressed in percents. How to test such data for distribution parameters (it should be gamma, normal or lognormal distribution) and generate random composition based on that parameters in R?
This might be a better question for CrossValidated, but:
it is not generally a good idea to choose from among a range of possible distributions according to goodness of fit. Instead, you should choose according to the qualitative characteristics of your data, something like this:
Frustratingly, this chart doesn't actually have the best choice for your data (composition, continuous, bounded between 0 and 1 [or 0 and 100]), which is a Beta distribution (although there are technical issues if you have values of exactly 0 or 100 in your sample).
In R:
## some arbitrary data
z <- c(2,8,40,45,56,58,70,89)
## fit (beta values must be in (0,1), not (0,100), so divide by 100)
(m <- MASS::fitdistr(z/100,"beta",start=list(shape1=1,shape2=1)))
## sample 1000 new values
z_new <- 100*rbeta(n=1000,shape1=m$estimate["shape1"],
shape2=m$estimate["shape2"])

Adjusting regression weight based on feedback [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
Let's say I want to predict a dependent variable D, where:
D<-rnorm(100)
I cannot observe D, but I know the values of three predictor variables:
I1<-D+rnorm(100,0,10)
I2<-D+rnorm(100,0,30)
I3<-D+rnorm(100,0,50)
I want to predict D by using the following regression equation:
I1 * w1 + I2 * w2 + I3 * w3 = ~D
however, I do not know the correct values of the weights (w), but I would like to fine-tune them by repeating my estimate:
in the first step I use equal weights:
w1= .33, w2=.33, w3=.33
and I estimate D using these weights:
EST= I1 * .33 + I2 * .33 + I3 *. 33
I receive feedback, which is a difference score between D and my estimate (diff=D-EST)
I use this feedback to modify my original weights and fine-tune them to eventually minimize the difference between D and EST.
My question is:
Is the difference score sufficient for being able to fine-tune the weights?
What are some ways of manually fine-tuning the weights? (e.g. can I look at the correlation between diff and I1,I2,I3 and use that as a weight?
The following command,
coefficients(lm(D ~ I1 + I2 + I3))
will give you the ideal weights to minimize diff.
Your defined diff will not tell you enough to manually manipulate the weights correctly as there is no way to isolate the error component of each I.
The correlation between D and the I's is not sufficient either as it only tells you the strength of the predictor, not the weight. If your I's are truly independent (both from each other, all together and w.r.t. D - a strong assumption, but true when using rnorm for each), you could try manipulating one at a time and notice how it affects diff, but using a linear regression model is the simplest way to do it.

Resources