Formatting regression line equation using ggplot2 in R - r

I am curious how one would edit the following solution from Jayden so that the equation may be formatted y = bx + a or y = bx - a? I wanted to make it look as clean as possible.
lm_eqn = function(m) {
l <- list(a = format(coef(m)[1], digits = 2),
b = format(abs(coef(m)[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
if (coef(m)[2] >= 0) {
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,l)
} else {
eq <- substitute(italic(y) == a - b %.% italic(x)*","~~italic(r)^2~"="~r2,l)
}
as.character(as.expression(eq));
}
I have tried eliminated in the %.% and that throws up an error and I have tried inverting the order, but am having issues with the syntax in the if/else section of the function. I also would like to make it where the equation is formatted such that the coeff (a) is presented without the negative sign. abs(a) returns |a|. Thanks for any input! It is appreciated!
This follows from another thread( Adding Regression Line Equation and R2 on graph)

If you want it in b*x+a form then just:
if (coef(m)[2] >= 0) {
eq <- substitute(italic(y) ==
b %.% italic(x) + a*","~~italic(r)^2~"="~r2, l)
} else {
eq <- substitute(italic(y) ==
- b %.% italic(x) + a *"," ~~ italic(r)^2 ~"="~r2, l)
}
Writing R expressions requires understanding that there is a syntax rule: token/separator/token, but you can use either "+" or "-" as a unary separator. The upper portion of the plotmath symbol table in ?plotmath has the acceptable separators. Spaces and linefeeds get ignored.

What error are you seeing? This works for me to give bx ± a as requested. You have to move the abs() to the definition of a instead of b and test coef(m)[1] instead of 2...
lm_eqn = function(m) {
l <- list(a = format(abs(coef(m)[1]), digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
if (coef(m)[1] >= 0) {
eq <- substitute(italic(y) == b %.% italic(x) + a*","~~italic(r)^2~"="~r2,l)
} else {
eq <- substitute(italic(y) == b %.% italic(x) - a*","~~italic(r)^2~"="~r2,l)
}
as.character(as.expression(eq));
}

Related

R substitute function prints c() concatenated into equation

I had an old function that worked like a charm:
lm_eqn = function(m) {
l <- list(a = format(coef(m)[1], digits = 2),
b = format(abs(coef(m)[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
eq <- substitute(italic(C)[i] == a + b %.% italic(I)[i]*","~~italic(r)^2~"="~r2,l)
as.character(as.expression(eq));
}
where m was an lm model. This would produce an equation like the following:
y = 0.3 + 4.4x, r = 0.67
which could then be used in a ggplot to show the model formula with its graph. The problem is that the same equation now incorporates uncalled for symbols:
y = c(0.3) + c(4.4)x, r=0.67
The concatenated c() is now included for each variable from the list I am accruing - and I don't know why. Does anyone know how to
a) prevent this, or
b) correct it?
Note: the problem seems to emerge in substitution, the output of eq is:
"italic(y) == c(`(Intercept)` = \"0.3\") + c(x = \"4.4\") %.% italic(x) * \",\" ~ ~italic(r)^2 ~ \"=\" ~ \"0.67\""
It looks like substitute's output includes the c() for the intercept and slope.
edit
m in this case is a generic lm element. For example
x <- c(5,3,6,8,2,6)
y <- c(2,6,3,7,4,9)
test.lm <- lm(y~x)
lm_eqn(test.lm)
[1] "italic(C)[i] == c(`(Intercept)` = \"3.3\") + c(x = \"0.37\") %.% italic(I)[i] * \",\" ~ ~italic(r)^2 ~ \"=\" ~ \"0.0969\""
You apparently need to unname the coef() values:
lm_eqn = function(m) {
l <- list(a = format(unname(coef(m))[1], digits = 2),
b = format(abs(unname(coef(m))[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
eq <- bquote( italic(C)[i] == .(l$a) + .(l$b) %.% italic(I)[i]*","~~italic(r)^2~"="~.(l$r2))
as.character(as.expression(eq));
}
I also think you need to clarify exactly what you are hoping to see. At the moment you are creating an expression vector with two elements and then you are converting that to a character. The fact that ggplot requires character values for its "expressions" makes it quite difficult to look at a character value and figure out what will be displayed, so you should probably expand your test code to include that manner in which this value will be delivered. (It's much easier to look at a real R expression.) I think there are mechanisms that allow unevaluated expressions to be passed to ggplot annotations and titles but they seem incredibly convoluted to my eyes.
Could also use substitute which requires specifying a list that has named elements.
lm_eqn = function(m) {
l <- list(a = format(unname(coef(m))[1], digits = 2),
b = format(abs(unname(coef(m))[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
eq <- substitute( italic(C)[i] == a + b %.% italic(I)[i]*","~~italic(r)^2 == r2, env=l) )
as.character(as.expression(eq));
}
lm_eqn(test.lm)
[1] "italic(C)[i] == \"3.3\" + \"0.37\" %.% italic(I)[i] * \",\" ~ ~italic(r)^2 == \"0.0969\""

Sampling a log-concave distribution using the adaptive rejection sampling method (R)

I am not very familiar with R. I have been trying to use the implementation of the adaptive rejection sampling method in R, in order to sample from the following distribution:
here is my R code:
library(ars)
g1 <- function(x,r){(1./r)*((1-x)^r)}
f1 <- function(x,a,k) {
add<-0
for(i in 1:k) {
add<- add+g1(x,i)
}
res <- (a* add)+(a-1)*log(x)+k*log(1-x)
return(res)
}
g2 <- function(x,r){(1-x)^(r-1)}
f1prima <- function(x,a,k) {
add<-0
for(i in 1:k) {
add<- add-g2(x,i)
}
res <- (a* add)+(a-1)/x-k/(1-x)
return(res)
}
mysample1<-ars(20,f1,f1prima,x=c(0.001,0.09),m=2,emax=128,lb=TRUE,xlb=0.0, ub=TRUE, xub=1,a=0.5,k=100)
The function is a log-concave, but I get different error messages when I run ars and fiddling around with the input parameters won't help here. Any suggestion would be appreciated.
First thing, which you already noticed is that your log-concave function is not very well defined at x=0 and x=1.0. So useful interval would be something like 0.01...0.99, not 0.0...1.0
Second, I don't like the idea to compute hundreds of terms in your summation term.
So, good idea might be to express it in following way, starting with derivative
S1N-1 qi is obviously geometric series and could be replaced with
(1-qN)/(1-q), where q=1-x.
This is derivative, so to get to similar term in function itself, just integrate it.
http://www.wolframalpha.com/input/?i=integrate+(1-q%5EN)%2F(1-q)+dq will return Gauss Hypergeometric function 2F1 plus logarithm
-qN+1 2F1(1, N+1; N+2; q)/(N+1) - log(1-q)
NB: It is the same integral as Beta before, but dealing with it was a bit more cumbersome
So, code to compute those terms:
library(gsl)
library(ars)
library(ggplot2)
Gauss2F1 <- function(a, b, c, x) {
ifelse(x >= 0.0 & x < 1.0, hyperg_2F1(a, b, c, x), hyperg_2F1(c - a, b, c, 1.0 - 1.0/(1.0 - x))/(1.0 - x)^b)
}
f1sum <- function(x, N) {
q <- 1.0 - x
- q^(N+1) * Gauss2F1(1, N+1, N+2, q)/(N+1) - log(1.0 - q)
}
f1sum.1 <- function(x, N) {
q <- 1.0 - x
res <- rep(0.0, length.out = length(x))
s <- rep(1.0, length.out = length(x))
for(k in 1:N) {
s <- s * q / as.numeric(k)
res <- res + s
}
res
}
f1 <- function(x, a, N) {
a * f1sum(x, N) + (a - 1.0)*log(x) + N*log(1.0 - x)
}
f1.1 <- function(x, a, N) {
a * f1sum.1(x, N) + (a - 1.0)*log(x) + N*log(1.0 - x)
}
f1primesum <- function(x, N) {
q <- 1.0 - x
(1.0 - q^N)/(1.0 - q)
}
f1primesum.1 <- function(x, N) {
res <- rep(0.0, length.out = length(x))
s <- rep(1.0, length.out = length(x))
for(k in 1:N) {
res <- res + s
s <- s * q
}
-res
}
f1prime <- function(x, a, N) {
a* f1primesum(x, N) + (a - 1.0)/x - N/(1.0 - x)
}
f1prime.1 <- function(x, a, N) {
a* f1primesum.1(x, N) + (a - 1.0)/x - N/(1.0 - x)
}
p <- ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
stat_function(fun = f1, args = list(0.5, 100), colour = "#4271AE") +
stat_function(fun = f1.1, args = list(0.5, 100), colour = "#1F3552") +
scale_x_continuous(name = "X", breaks = seq(0, 1, 0.2), limits=c(0.001, 0.5)) +
scale_y_continuous(name = "F") +
ggtitle("Log-concave function")
p
As you can see, I've implemented both versions - one using summation and another using analytical form of sums. Computed data for a=0.5, N=100.
First, there is a bit of a difference between direct sum and 2F1 - I attribute it to precision loss in summation.
Second, more important result - function is NOT log-concave. No questions why ars() if failing left and right. See graph below

Inverse Association Rules

Association rules are a very common technique when you want to figure out which events happens together (like burger and bread mostly sell together). In marketing this technique is used to find out the complimentary products.
I am looking for a technique to extract the "Substitute Products" and to be it is like Inverse Association rules to find out which events are less likely happens together. Is there any algorithm or technique available in Spark, R, Python, etc. for this?
Thanks,
Amir
I've done an a very practical implementation for Substitution Rule Mining using Teng, Hsieh and Chen (2002) for R. Maybe it can help you:
# Used packages:
library(arules)
SRM <- function(TransData, MinSup, MinConf, pMin, pChi, itemLabel, nTID){
# Packages ----------------------------------------------------------------
if (sum(search() %in% "package:arules") == 0) {
stop("Please load package arules")
}
# Checking Input data -----------------------------------------------------
if (missing(TransData)) {
stop("Transaction data is missing")
}
if (is.numeric(nTID) == FALSE) {
stop("nTID has to be one numeric number for the count of
Transactions")
}
if (length(nTID) > 1) {
stop("nTID has to be one number for the count of Transactions")
}
if (is.character(itemLabel) == FALSE) {
stop("itemLabel has to be a character")
}
# Concrete Item sets ---------------------------------------------------
# adding complements to transaction data
compl_trans <- addComplement(TransData,labels = itemLabel)
compl_tab <- crossTable(compl_trans,"support")
compl_tab_D <- as.data.frame(compl_tab)
# ordering matrix
compl_tab_D <- compl_tab_D[order(rownames((compl_tab))),order(colnames((compl_tab)))]
# Chi Value ---------------------------------------------------------------
# empty data frame for loop
complement_data <- data.frame(Chi = as.numeric(),
Sup_X.Y = as.numeric(),
X = as.character(),
Sup_X = as.numeric(),
Y = as.character(),
Sup_Y = as.numeric(),
CX = as.character(),
SupCX = as.numeric(),
CY = as.character(),
Sup_CY = as.numeric(),
Conf_X.CY = as.numeric(),
Sup_X.CY = as.numeric(),
Conf_Y.CX = as.numeric(),
SupY_CX = as.numeric())
# first loop for one item
for ( i in 1 : (length(itemLabel) - 1)) {
# second loop combines it with all other items
for (u in (i + 1) : length(itemLabel)) {
# getting chi value from Teng
a <- itemLabel[i]
b <- itemLabel[u]
ca <- paste0("!", itemLabel[i])
cb <- paste0("!", itemLabel[u])
chiValue <- nTID * (
compl_tab[ca, cb] ^ 2 / (compl_tab[ca, ca] * compl_tab[cb, cb]) +
compl_tab[ca, b] ^ 2 / (compl_tab[ca, ca] * compl_tab[b, b]) +
compl_tab[a, cb] ^ 2 / (compl_tab[a, a] * compl_tab[cb, cb]) +
compl_tab[a, b] ^ 2 / (compl_tab[a, a] * compl_tab[b, b]) - 1)
# condition to be dependent
if (compl_tab[a, b] > compl_tab[a, a] * compl_tab[b, b] && chiValue >= qchisq(pChi, 1) &&
compl_tab[a, a] >= MinSup && compl_tab[b, b] >= MinSup ) {
chi_sup <- data.frame(Chi = chiValue,
Sup_X.Y = compl_tab[a, b],
X = a,
Sup_X = compl_tab[a, a],
Y = b,
Sup_Y = compl_tab[b, b],
CX = ca,
SupCX = compl_tab[ca, ca],
CY = cb,
Sup_CY = compl_tab[cb, cb],
Conf_X.CY = compl_tab[a, cb] / compl_tab[a, a],
Sup_X.CY = compl_tab[a, cb],
Conf_Y.CX = compl_tab[ca, b] / compl_tab[b, b],
SupY_CX = compl_tab[ca, b])
try(complement_data <- rbind(complement_data, chi_sup))
}
}
}
if (nrow(complement_data) == 0) {
stop("No complement item sets could have been found")
}
# changing mode of
complement_data$X <- as.character(complement_data$X)
complement_data$Y <- as.character(complement_data$Y)
# calculating support for concrete itemsets with all others and their complements -------------------
## with complements
matrix_trans <- as.data.frame(as(compl_trans, "matrix"))
sup_three <- data.frame(Items = as.character(),
Support = as.numeric())
setCompl <- names(matrix_trans)
# 1. extracts all other values than that are not in the itemset
for (i in 1 : nrow(complement_data)) {
value <- setCompl[ !setCompl %in% c(complement_data$X[i],
complement_data$Y[i],
paste0("!", complement_data$X[i]),
paste0("!",complement_data$Y[i]))]
# 2. calculation of support
for (u in value) {
count <- sum(rowSums(matrix_trans[, c(complement_data$X[i], complement_data$Y[i], u )]) == 3)
sup <- count / nTID
sup_three_items <- data.frame(Items = paste0(complement_data$X[i], complement_data$Y[i], u),
Support=sup)
sup_three <- rbind(sup_three, sup_three_items)
}
}
# Correlation of single items-------------------------------------------------------------
# all items of concrete itemsets should be mixed for correlation
combis <- unique(c(complement_data$X, complement_data$Y))
# empty object
rules<- data.frame(
Substitute = as.character(),
Product = as.character(),
Support = as.numeric(),
Confidence = as.numeric(),
Correlation = as.numeric())
# first loop for one item
for (i in 1 : (length(combis) - 1)) {
# second loop combines it with all other items
for (u in (i + 1) : length(combis)) {
first <- combis[i]
second <- combis[u]
corXY <- (compl_tab[first, second] - (compl_tab[first, first] * compl_tab[second, second])) /
(sqrt((compl_tab[first, first] * (1 - compl_tab[first,first])) *
(compl_tab[second, second] * (1 - compl_tab[second, second]))))
# confidence
conf1 <- compl_tab[first, paste0("!", second)] / compl_tab[first, first]
conf2 <- compl_tab[second, paste0("!", first)] / compl_tab[second, second]
two_rules <- data.frame(
Substitute = c(paste("{", first, "}"),
paste("{", second, "}")),
Product = c(paste("=>", "{", second, "}"),
paste("=>", "{", first, "}")),
Support = c(compl_tab[first, paste0("!", second)], compl_tab[second, paste0("!", first)]),
Confidence = c(conf1, conf2),
Correlation = c(corXY, corXY)
)
# conditions
try({
if (two_rules$Correlation[1] < pMin) {
if (two_rules$Support[1] >= MinSup && two_rules$Confidence[1] >= MinConf) {
rules <- rbind(rules, two_rules[1, ])
}
if (two_rules$Support[2] >= MinSup && two_rules$Confidence[2] >= MinConf) {
rules <- rbind(rules, two_rules[2, ])
}
} })
}
}
# Correlation of concrete item pairs with single items --------------------
# adding variable for loop
complement_data$XY <- paste0(complement_data$X, complement_data$Y)
# combination of items
for (i in 1 : nrow(complement_data)){
# set of combinations from dependent items with single items
univector <- c(as.vector(unique(complement_data$X)), as.vector(unique(complement_data$Y)))
univector <- univector[!univector %in% c(complement_data$X[i], complement_data$Y[i])]
combis <- c(complement_data[i,"XY"], univector)
for (u in 2 : length(combis)) {
corXYZ <-(sup_three[sup_three$Items == paste0(combis[1], combis[u]),2] -
complement_data[complement_data$XY == combis[1],"Sup_X.Y"] *
compl_tab[combis[u],combis[u]]) /
(sqrt((complement_data[complement_data$XY == combis[1],"Sup_X.Y"] *
(1 - complement_data[complement_data$XY == combis[1],"Sup_X.Y"]) *
compl_tab[combis[u],combis[u]] * (1 - compl_tab[combis[u],combis[u]]))))
dataXYZ <- data.frame(
Substitute = paste("{", combis[1], "}"),
Product = paste("=>", "{", combis[u], "}"),
Support = sup_three[sup_three$Items == paste0(combis[1], "!", combis[u]),2],
Confidence = sup_three[sup_three$Items == paste0(combis[1], "!", combis[u]),2] /
complement_data[complement_data$XY == combis[1],"Sup_X.Y"],
Correlation = corXYZ)
# conditions
if (dataXYZ$Correlation < pMin && dataXYZ$Support >= MinSup && dataXYZ$Confidence >= MinConf) {
try(rules <- rbind(rules, dataXYZ))
}
}
}
if (nrow(rules) == 0) {
message("Sorry no rules could have been calculated. Maybe change input conditions.")
} else {
return(rules)
}
# end
}
I think a better explanation is in my blog:
http://mattimeyer.github.io/2016-12-21-Substitution-Rule-Mining/

Simpifying Output of Deriv and Extract the Coefficients in R

I have two calculations for partial derivatives of an equation in R.
p_deriv_m <- Deriv(eqn, 'm')
#"-(2 * (6 - (b + m)) + 4 * (5 - (2 * m + b)) + 6 * (7 - (3 * m + b)) + 8 * (10 - (4 * m + b)))"
p_deriv_b <- Deriv(eqn, 'b')
#"-(2 * (10 - (4 * m + b)) + 2 * (5 - (2 * m + b)) + 2 * (6 - (b + m)) + 2 * (7 - (3 * m + b)))"
I would like to...
(1) Simplify these equations into something like of the form (making up the coefficients here) p_deriv_m = 8 + 9b - 10m and p_deriv_b = 10 + 15b + 8m
(2) Extract the Coefficients from these partial derivative equations so I can solve for m, b when the partial derivatives both equal 0. Using the examples I made up in (1) above...
9b - 10m = -8
15b + 8m = -10
Pop those numbers into a matrix and solve like this solution here - Solving simultaneous equations with R outputting and m and b.
If anyone knows how I can do (1) and/or (2), help would be greatly appreciated.
Rest of my code for reference:
library(Ryacas)
library(Deriv)
x_p <- c(1,2,3,4)
y_p <- c(6,5,7,10)
# Turn m and b into symbols
b <- Ryacas::Sym("b")
m <- Ryacas::Sym("m")
# Create a function
rss <- function(b,m,x_points, y_points)
(y_points[1] - (b + x_points[1]*m))^2 +
(y_points[2] - (b + x_points[2]*m))^2 +
(y_points[3] - (b + x_points[3]*m))^2 +
(y_points[4] - (b + x_points[4]*m))^2
# Create the equation
eqn <- rss(b,m,x_p,y_p)
p_deriv_m <- Deriv(eqn, 'm')
p_deriv_b <- Deriv(eqn, 'b')
ANSWER: Ended up doing it manually
sls_manual.R
# Doing a linear regression manually - want to find m and b
# Such that rss is minimized
library(Ryacas)
library(Deriv)
source('get_coeff.r')
# Sample Points - keeping the number of points small for now for
# the purposes of this example
x_p <- c(1,2,3,4)
y_p <- c(6,5,7,10)
b <- Ryacas::Sym("b")
m <- Ryacas::Sym("m")
# Create a function
rss <- function(b,m,x_points, y_points)
(y_points[1] - (b + x_points[1]*m))^2 +
(y_points[2] - (b + x_points[2]*m))^2 +
(y_points[3] - (b + x_points[3]*m))^2 +
(y_points[4] - (b + x_points[4]*m))^2
# Create the equation
eqn <- rss(b,m,x_p,y_p)
p_deriv_m <- Deriv(eqn, 'm')
p_deriv_b <- Deriv(eqn, 'b')
simplified_m_deriv <- yacas(Expand(p_deriv_m))
simplified_b_deriv<- yacas(Expand(p_deriv_b))
row_1_coeff <- get_coefficients(simplified_m_deriv)
row_2_coeff <- get_coefficients(simplified_b_deriv)
r_1_coeff <- c(row_1_coeff[[1]][1], row_1_coeff[[2]][1], row_1_coeff[[3]][1])
r_2_coeff <- c(row_2_coeff[[1]][1], row_2_coeff[[2]][1], row_2_coeff[[3]][1])
A <- matrix(data=c(r_1_coeff[1], r_1_coeff[2]
,r_2_coeff[1], r_2_coeff[2])
,nrow=2, ncol=2, byrow=TRUE)
b <- matrix(data=c((-1*r_1_coeff[3]),(-1*r_2_coeff[3]))
,nrow=2, ncol=1, byrow=TRUE)
result <- solve(A,b)
m_coeff = result[1]
b_coeff = result[2]
# Last step is to verify that this does the same thing as lm:w
# fit <- lm(y_p ~ x_p)
# fit
get_coeff.R
get_coefficients <- function(exp) {
# Take out the whitespace
g <- gsub(" ", "", as.character(exp))
# Sub the minuses for a +-
g2 <- gsub("-", "+-", g)
g3 <- gsub("[()]", "", g2)
# break at the plusses
g4 <- strsplit(g3, "[//+]")
b_coeff = 0
m_coeff = 0
other_coeff = 0
i = 1
while(i <= 3)
{
piece <- as.character(g4[[1]][i])
contains_b = grepl("b",piece)
contains_m = grepl("m",piece)
contains_both = contains_b & contains_m
if (contains_b == TRUE){
b_coeff = as.numeric(gsub("[//*b|b//*]", "", piece))
} else if (contains_m == TRUE){
m_coeff = as.numeric(gsub("[//*m|m//*]", "", piece))
} else if (contains_both == FALSE) {
other_coeff = as.numeric(piece)
} else {
}
i = i + 1
}
output <- list(m_coeff,b_coeff, other_coeff)
return(output)
}

Writing the results from a nested loop into another vector in R

I'm pretty new to R, and am struggling a bit with it. I have the following code:
repeat {
if (t > 1000)
break
else {
y1 <- rpois(50, 15)
y2 <- rpois(50, 15)
y <- c(y1, y2)
p_0y <- matrix(nrow = max(y) - min(y), ncol = 1)
i = min(y)
while (i <= max(y)) {
p_0y[i - min(y), ] = (length(which(y1 == i))/50)
i <- i + 1
}
p_y <- matrix(nrow = max(y) - min(y), ncol = 1)
j = min(y)
while (j <= max(y)) {
p_y[j - min(y), ] = (length(which(y == j))/100)
j <- j + 1
}
p_0yx <- p_0y[rowSums(p_0y == 0) == 0]
p_yx <- p_y[rowSums(p_0y == 0) == 0]
g = 0
logvect <- matrix(nrow = (length(p_yx)), ncol = 1)
while (g <= (length(p_yx))) {
logvect[g, ] = (p_0yx[g])/(p_yx[g])
g <- g + 1
}
p_0yx %*% (log2(logvect))
print(p_0yx %*% (log2(logvect)))
t <- t + 1
}
}
i am happy with everything up to the last line, but instead of printing the value of p_0yx%*%(log2(logvect)) to the screen i would like to store this as another vector. any ideas? i have tried doing it a similar way as in the nested loop but doesnt seem to work.
Thanks
The brief answer is to first declare a variable. Put it before everything you've posted here. I'm going to call it temp. It will hold all of the values.
temp <- numeric(1000)
Then, instead of your print line use
temp[t] <- p_0yx %*% log2(logvect)
As an aside, your code is doing some weird things. Look at the first index of p_0y. It is effectively an index to item 0, in that matrix. R starts indexing at 1. When you create the number of rows in that matrix you use max(y) - min(y). If the max is 10 and the min is 1 then there's only 9 rows. I'm betting you really wanted to add one. Also, your code is very un R-like with all of the unnecessary while loops. For example, your whole last loop (and the initialization of logvect) can be replaced with:
logvect = (p_0yx)/(p_yx)
But back to the errors.. and some more Rness... could the following code...
p_0y <- matrix(nrow = max(y) - min(y), ncol = 1)
i = min(y)
while (i <= max(y)) {
p_0y[i - min(y), ] = (length(which(y1 == i))/50)
i <- i + 1
}
maybe be replaced more correctly with?
p_0y <- numeric(max(y) - min(y) + 1)
p_0y[sort(unique(y1)) - min(y1) + 1] = table(y1)/50
p_0y <- matrix(p_0y, ncol = 1)
(similar rethinking of the rest of your code could eliminate the rest of the loops as well)

Resources