I am a beginner in R. Here's the formula I'm trying to code to find the lambda that maximizes the log likelihood of some bigrams. When the bigrams are not found, the P_b (bigram) function fails, but the P_u (unigram) function should provide the unigram result (lambda = 0).
It works for bigrams that are found. When they're not found, tho, I only get numeric(0), not the unigram result.
p.mix <- function(w2, w1) {
(1-lambda) * uni.dfrm$prob[uni.dfrm$token==w2] + lambda * p.bi(w2,w1)
}
The p.bi() function looks complicated because of the indexing so I'm reluctant to post it but it does work when the bigrams are found. It just looks up the count of times w' appears after w and divides it by the times w appears, but I have to go through another vector of vocabulary words so it looks ugly.
When w' is never found occurring after w, instead of a zero count, there's no row at all, which is what apparently causes the numeric(0) result. That's what the mixed model is supposed to solve, but I can't get it to work. Any ideas how this can work?
You can add a test for the case where w2 is numeric(0) for example :
p.mix <- function(w2, w1) {
if(length(w2)>0){
res <- (1-lambda) * uni.dfrm$prob[uni.dfrm$token==w2] +
lambda * p.bi(w2,w1)
}else res <- 0
res
}
EDIT
p.mix <- function(w2, w1) {
if(length(w2) && length(uni.dfrm$prob[uni.dfrm$token==w2]) > 0)
(1-lambda) * uni.dfrm$prob[uni.dfrm$token==w2] + lambda * p.bi(w2,w1)
else 0
}
Related
I am trying to integrate the next function with respect x
integrand <- function(x) {
f1 <- pnorm((1/sqrt(u/x))*( sqrt((t*u*v)/x) - sqrt(x/(t*u*v)) ))}
where,
v=10
u=5
However, I need to integrate considering different values of t, so tried defining a sequence of values as:
t=seq(0,100,0.1)
And used the sapply function as:
data=sapply(t, function(x) integrate(integrand, lower = 0 , upper = 10000)$value )
I got these errors:
Error in integrate(integrand, lower = 0, upper = 10000) :
evaluation of function gave a result of wrong length
In addition: Warning messages:
1: In (t * u * v)/x : longer object length is not a multiple of shorter object length
2: In x/(t * u * v) : longer object length is not a multiple of shorter object length
3: In (1/sqrt(u/x)) * (sqrt((t * u * v)/x) - sqrt(x/(t * u * v))) :
longer object length is not a multiple of shorter object length
I haven't had any luck.
I would greatly appreciate any help.
Regards!
You can still use sapply like so:
sapply(t, function(t) {
integrate(function(x) {
pnorm((1/sqrt(u/x))*( sqrt((t*u*v)/x) - sqrt(x/(t*u*v)) ))
}, lower = 0, upper = 1000)$value
})
Output
[1] 0.000000 5.416577 10.251273 15.146418 20.084907 25.049283 ...
A previous post have a similar problem with an specific solution here
the code would result as:
t=seq(0,100,0.1)
fu<- list()
int<- numeric()
for(i in 1:length(t))
{
fu[[i]] = function(x){
f1 <- pnorm((1/sqrt(u/x))*( sqrt((t[i]*u*v)/x) - sqrt(x/(t[i]*u*v)) ));
}
int[i] = integrate(h[[i]], lower=0, upper=1000)$value
}
int
my project needs the EM algorithm below, where is all the code. The error is in the while loop, which is where the hope and maximization steps are. The error message is "Error in while (abs (Elogv [r] - Elogv [r - 1])> = 1e-06) {: missing value where TRUE / FALSE needed". How do I resolve this error if the while loop contains no true and false commands, and if I have already checked in detail that there are no errors in the commands and no NA's value? Grateful for the attention, who can save me.
n=100
u<-runif(n)
QUANTIL <- function(u){
Q <- rep(NA, length(u))
for (i in 1:length(u)) {
if(u[i] < 0.2634253829){
Q[i] <- 1*tan(pi*(0.9490353482*u[i]-0.5))+0
}
if(u[i]>=0.2634253829 && u[i] < 0.7365746171){
Q[i] <- 1*qnorm(1.4428629504*u[i]-0.2214315)+0
}
if(u[i]>0.7365746171){
Q[i] <- 1*tan(pi*(0.9490353482*u[i]-0.4490353))+0
}
}
return(Q)
}
x<-QUANTIL(u)
y<-c(sort(x))
i<-seq(1,n)
v<-c(i/(n+1))
t<-QUANTIL(v)
mi<-median(y)
s<-c(y[26:73])
sigma<-sqrt(sum((s-mi)^2)/(n-1))
p=0.4731492342
alpha<-(2*t^3)/(1+t^2)^2
beta<-(1-t^2)/(1+t^2)^2
eta<-(t^4-t^2)/(1+t^2)^2
lambda<-2*t/(1+t^2)^2
gama<-(-t^2)
delta<-2*t
k<-((p*0.6930665173/sigma*sqrt(2*pi))*exp((-1/2*sigma^2)*((y-mi)^2)))/(((p*0.6930665173/sigma*sqrt(2*pi))*exp((-1/2*sigma^2)*(y-mi)^2))+((((1-p)*1.0537015317/sigma*pi))*(1/(1+((y-mi)/sigma)^2))))
r<-2
Elogv<-sum(k*((-1/2)*((y-mi)/sigma)^2))-sum(k*log(sigma*sqrt(2*pi)))-sum((1-k)*log(sigma*pi))-sum((1-k)*log(1+((y-mi)/sigma)^2))+sum(k*log(p))+(n-sum(k))*log(1-p)+log(0.6930665173)*sum(k)+log(1.0537015317)*sum(1-k)
Elogv[1]<-0
while (abs(Elogv[r]-Elogv[r-1])>=0.000001) {
w<-(2*beta-2*k*beta+k)
q<-k*delta+2*lambda*(1-k)
sigma<-(sum(y*w)*sum(q)-sum(w)*sum(y*q))/(-2*sum(alpha*(1-k))*sum(q)+sum(w)*sum(k*gama-1)+2*sum(w)*sum(eta*(1-k)))
mi<-(sum(y*w)+2*sigma*sum(alpha*(1-k)))/sum(w)
k<-((p*0.6930665173/sigma*sqrt(2*pi))*exp((-1/2*sigma^2)*((y-mi)^2)))/(((p*0.6930665173/sigma*sqrt(2*pi))*exp((-1/2*sigma^2)*(y-mi)^2))+((((1-p)*1.0537015317/sigma*pi))*(1/(1+((y-mi)/sigma)^2))))
Elogv[r]<-sum(k*((-1/2)*((y-mi)/sigma)^2))-sum(k*log(sigma*sqrt(2*pi)))-sum((1-k)*log(sigma*pi))-sum((1-k)*log(1+((y-mi)/sigma)^2))+sum(k*log(p))+(n-sum(k))*log(1-p)+log(0.6930665173)*sum(k)+log(1.0537015317)*sum(1-k)
r<-r+1
It looks to me that the length of Elogv is 1? Thus Elogv[r] has no entry (r is 2!), i.e. evaluates to NA, thus the abs(Elogv[r]-Elogv[r-1]) is NA.
You need Elogv[2] <- 0 before starting the loop?
I am experimenting with the functional programming paradigm in R. I have defined a function that sums a sequence of integers from n to m. When I use sum() the function returns the expected result:
sumRange <- function(n, m) {
if (n <= m) {
return(sum(n, sumRange((n + 1), m)))
}
}
sumRange(1, 10)
# [1] 55
However, when I use the + operator the function returns numeric(0):
sumRange <- function(n, m) {
if (n <= m) {
return(n + sumRange((n + 1), m))
}
}
sumRange(1, 10)
# numeric(0)
Why does the operator + not work in this recursive function? Is there a way to rewrite the function so that it does?
The issue is that you never specify an else condition, hence at the end of the recursion it appears that R is returning NULL when the if condition fails. Returning 0 as the else condition fixes your problem:
sumRange <- function(n, m) return(ifelse (n <= m, (n + sumRange((n+1), m)), 0))
sumRange(1, 10)
[1] 55
Note that this is essentially defining a base case for your recursion. A base case, when hit, ends the recursion and causes the calls on the stack to be unwound.
To see the issue with the way you phrased your code, try writing out your function explicitly:
sumRange <- function(n, m) {
if (n <= m) {
return(n + sumRange((n+1), m))
}
// but what gets returned if n > m ?
// this is undefined behavior
}
I'm not an R guru, but my understanding is that R was written in C, and C might allow a recursion like this with no else condition. But the behavior is not well defined and you should not be relying on it.
Demo
If there is no return (using a explicit or implicit return statement) is executed, then R functions seems to return a NULL object.
If you add numerical value to a this object, it will simply return numeric(0).
So, what happens in the second case is that when n reaches 11, it returns a NULL object, and goes back adding values to it. But NULL + 10 + 9 .. = numeric(0).
Check this with
no_ret <- function ()
{
# just return nothing
}
obj <- no_ret()
obj
# NULL
class(obj)
# "NULL
new_obj <- obj + 10
new_obj
# numeric(0)
When the first function is executed, the what the sum statement get is
a vector with a NULL in it. For example,
vec <- c(NULL, 10, 9,...) which is actually vec <- c(10, 9, ...), so you get the expected outcome.
> c(NULL, 10:1)
[1] 10 9 8 7 6 5 4 3 2 1
> sum(NULL, 10:1)
[1] 55
> NULL + 10:1
integer(0)
I did some programming work on R language to do the bubble sort. Sometimes it works perfectly without any error message, but sometimes, it shows "Error in if (x[i] > x[i + 1]) { : argument is of length zero". Can any one help me check whats wrong with it? I have attached my code below
example <- function(x) {
n <- length(x)
repeat {
hasChanged <- FALSE
n <- n - 1
for(i in 1:n) {
if ( x[i] > x[i+1] ) {
temp <- x[i]
x[i] <- x[i+1]
x[i+1] <- temp
hasChanged <- TRUE
cat("The current Vector is", x ,"\n")
}
}
if ( !hasChanged ) break;
}
}
x <-sample(1:10,5)
cat("The original Vector is", x ,"\n")
example(x)
The error occurs because you are iteratively decreasing n. Depending on the original vector's order (or lack thereof), n can reach the value of 1 after the last change. In that case, a further reduction of n in the next iteration step addresses the value x[0], which is undefined.
With a minimal correction your code will work properly, without giving error messages. Try to replace the line
if ( !hasChanged ) break;
with
if ( !hasChanged | n==1 ) break
Basically you have two termination criteria: Either nothing has been changed in the previous iteration or n is equal to one. In both cases, a further iteration won't change the vector since it is already ordered.
By the way, in R programming you don't need a semicolon at the end of a command. It is tolerated/ignored by the interpreter, but it clutters the code and is not considered good programming style.
Hope this helps.
I continue to get an error on my function, possibly I'm overlooking something simple. I cannot run the code without getting an error when applying the function.
k.nn <- function(k,p1,p) {
k > 0
K <-length(k)
p=matrix()
for (i in p) {
matrix <- cbind(p,p1[1],p1[2])
d <- sqrt((matrix[,1]-matrix[,3])^2+(matrix[,2]-matrix[,4])^2)
}
##use the sort function to find the smallest distance from 1:k and return all nearest k values
sort.d <- function(x) { #implement bubble sort
N=length(x)
N>0
c=class(x)
for (n in length(x):2) { #distinguish the last term in the vector, name it, much be of x length, consists an error of length 1. Error if you compute n in length(x):1, cover length of 1
if(length(x)<2)
return(x)
for (m in 1:(n - 1)) { #distinguish the first term in the vector, name it
if(x[m]>x[m + 1]) { #begin comparing each term to neighboring term
swap<-x[m]
x[m]<-x[m + 1]
x[m + 1]<-swap
}
}
}
return(x)
}
sorted=sort.d(d)
for (n in k){
print(sorted[1:k])}
}
p=matrix(c(6.9,7.6,7.1,.4,6.2,1.8,2.5,2.3,5.7,6.9,.9,4.4,5.2,1.9,.6,7.4,1.2,6.6,3.3,4.9),nrow=10,ncol=2) #given matrix
p1=c(6,6)
k=3 nn.3=k.nn(k,p1,p)
print(nn.3)
There's a missing carriage return or ";" in the penultimate line that is throwing an error. If you remove tha last line so that you can use traceback() it tells you that k.nn throws a " subscript out of bounds" error when a matrix index is 4.
Debugging 101 tells you to put in print functions to see where the function fails and putting in a print after
c=class(x) ; print(c)
... ives you a result, but putting another one in the sort.d function does not get executed. Looking at the code upstream from that point we see:
d <- sqrt((matrix[,1]-matrix[,3])^2+(matrix[,2]-matrix[,4])^2)
So looking at the function and the matrix you have given, ... my guess is that you passed a two-column matrix to a function that expected a four-column argument.