I am trying to plot histogram from the R output which is not a data frame. Below are my codes and the output.
x <- replicate(1000,
{y <- rpois(200, 1)
{lambda0 <- 1
for(i in 1:1)
{
if( i == 1 ) cat( sprintf("%15s %15s %15s %15s\n", "LogL", "Score", "Information", "New Estimate"))
logL <- sum((-lambda0) + y*(log(lambda0)))
score <- sum((y/lambda0)-1)
information <- sum(y/(lambda0)^2)
lambda1 <- lambda0 + score/information
cat( sprintf("%15.4f %15.4f %15.4f %15.5f\n", logL, score, information, lambda1))
lambda0 <- lambda1
}
}
})
Below is my output
I'm trying to take the new estimate from the output and create histogram. Can you please help?
Thank you.
You need to store the value for New Estimate during your loop. This way you can retrieve your results after the loop is finished. Normally when using a loop, you specify a variable in advance in which you can save the result for each iteration. E.g.:
numbers <- 1:3
result <- list(length = length(numbers)
for (i in seq_along(numbers){
result[[i]] <- numbers[[i]] + 1
}
In this example there is a vector of three numbers, you want to add one to each number and save the result. You can do this by creating a list of length 3 (adding length is better, but not necessary) and for each ith iteration, you save the result in the ith element of the list.
After finishing the loop you can retrieve the results from the result variable. And you can retrieve the ith result by using square brackets: result[[i]].
Related
I have a simple issue with a for loop in R - I am trying to make it run for the entire dataset and it only runs for the last row. This is done with quite complex datasets which are both shapefiles and I am testing the intersection of the geometries. That is why I can't quite make a reproducible example here.
Nevertheless, this is my code:
for(i in 1:nrow(data1)){
#get intersections between data2 and data1 for specific years
output = st_join(
x = data1[i, ],
y = data2[which(data2$year %in% data1$lag.year[i]:data1$year[i]), ],
join = st_intersects
)
#Get area of intersections
output = transform(output,
inter_area = mapply(function(x, y) {
as.numeric(sf::st_area(
sf::st_intersection(x, y)
))}, x = geometry, y = geom_2))
## obtaining the proportion of area in data1 intersected by data2
output = transform(output, prop_inter = inter_area/area)
#get cycle-specific values
output <- output%>%
group_by(code, year.x)%>%
dplyr::summarise(prop_inter = sum(prop_inter),
end_date= max(end_date),
start_date= max(start_date))%>%
ungroup()
return(output)
}
As you can see I am testing the intersections of data2 on data1 and see which percentage of data1 is intersected dependent on the values they take on year and lag.year. The issue is that when I run this it only returns the desidered outcome for the last row, instead of the entire data1 object. I've tested all the different bits of code inside the loop separately and they all do as I want, but once I try to scale all of it up to the entire dataframe it just does it for the last row.
So I assume this must be some simple stupid mistake I am making for the loop.
Thanks!
You keep rewriting the output object; you may want to create a vector of length nrow(data) and assign the result to its i-th element. I don't think this relates to {sf} or GIS in general, it is more about how for loops and vectors work in R - consider this example:
for (i in 1:50) {
output <- i # rewriting output object 50 times
}
print(output) # this will be a single element for last row (50)
output <- numeric(50)
for (i in 1:50) {
output[i] <- i # storing result in a new element of output for each i
}
print(output) # this will be 1:50 as expected
You may want to consider something along these lines (hard to make certain without access to your data, but it should get you started).
result <- numeric(nrow(data1)) # init the vector
for(i in 1:nrow(data1)){
#get intersections between data2 and data1 for specific years
output = st_join(
x = data1[i, ],
y = data2[which(data2$year %in% data1$lag.year[i]:data1$year[i]), ],
join = st_intersects
)
#Get area of intersections
output = transform(output,
inter_area = mapply(function(x, y) {
as.numeric(sf::st_area(
sf::st_intersection(x, y)
))}, x = geometry, y = geom_2))
## obtaining the proportion of area in data1 intersected by data2
output = transform(output, prop_inter = inter_area/area)
#get cycle-specific values
result[i] <- output%>% # store in i-th element of result instead
group_by(code, year.x)%>%
dplyr::summarise(prop_inter = sum(prop_inter),
end_date= max(end_date),
start_date= max(start_date))%>%
ungroup()
# return(output) # no need for return unless you are in a function
}
I am a beginner with R, so hopefully this will be an easy fix.
I am trying to use a for loop on a dataset for neuron firing direction in order to:
Incrementally add the next value from the dataset to a vector
Run a Rayleigh test on that vector and save it to a variable
Test if the Rayleigh test I just ran has a larger statistic than the the Rayleigh test in the last loop just before it, as well as having a p-value of less than .05
If the value is larger, save the statistic value, so that the next loop can compare to it
If the value is larger, save the vector
So far I have this for the code, and after going through it for a long time I'm at a loss for why it's not working. Every time I run it, the for loop goes all the way to the end and just reports the rayleigh value and vector for the whole dataset, which I know for sure isn't correct.
(I'm using the circular package for the rayleigh test function)
# This first line is just to create an initial rayleigh statistic to compare to in the loop that is low
best_rayleigh <- rayleigh.test(1:10)
data_vector <- c()
for (i in firing_directions) {
data_vector <- append(data_vector, i)
ray_lee_test <- rayleigh.test(data_vector)
if ((ray_lee_test$statistic>best_rayleigh$statistic)&(ray_lee_test$p.value<=.05)) {
best_rayleigh <- ray_lee_test
best_rayleigh_vector <- data_vector
} else {
NULL
}
}
Any help is appreciated. Thank you!
Update: I tried using && instead of single & in the if statement, however it returned the same result
The following code doesn't give warnings and selects the vector with highest test statistic and "significant" p-value.
library(circular)
set.seed(2020)
firing_directions <- rvonmises(n = 25, mu = circular(pi), kappa = 2)
plot(firing_directions)
best_rayleigh <- rayleigh.test(circular(1:10))
for(i in seq_along(firing_directions)){
dv <- firing_directions[seq_len(i)]
rltest <- rayleigh.test(dv)
if((rltest$statistic > best_rayleigh$statistic) && (rltest$p.value <= 0.05)){
best_rayleigh <- rltest
best_rayleigh_vector <- dv
}
}
best_rayleigh
#
# Rayleigh Test of Uniformity
# General Unimodal Alternative
#
#Test Statistic: 0.8048
#P-value: 0.0298
best_rayleigh_vector
#Circular Data:
#Type = angles
#Units = radians
#Template = none
#Modulo = asis
#Zero = 0
#Rotation = counter
#[1] 4.172219 2.510826 2.997495 4.095335 3.655613
I think the NULL is throwing up some issues. Not sure what will have if your throw a NULL. You only update the vector if it passes your criteria:
library(circular)
firing_directions= rvonmises(n=25, mu=circular(pi), kappa=2)
best_rayleigh <- rayleigh.test(1:10)
data_vector <- c()
for (i in firing_directions){
data_vector <- c(data_vector, i)
ray_lee_test <- rayleigh.test(data_vector)
if ((ray_lee_test$statistic>best_rayleigh$statistic)&(ray_lee_test$p.value<=.05)) {
best_rayleigh <- ray_lee_test
best_rayleigh_vector <- data_vector
}
}
I have a function of Maximum likelihood estimation and I am using optim function. I would like to plot the iteration outputs vs the logliklihood values.
Here is an very similar example to my complex function:
y <- rnorm(1000,2,2)
myfunc <- function(x){
fn <- function(theta) { sum ( 0.5*(xvec - theta[1])^2/theta[2] + 0.5* log(theta[2]) ) }
optim(theta <- c(0,5), fn, hessian=TRUE,method = "L-BFGS-B",lower=c(0,0),control = list(trace=1))
}
The output is:
iter 10 value 12.001318
final value 12.001318
iter 10 is the iteration step.
value 12.001318 is the logliklihood value.
my function return 100 of them. I know that I need to store them first and then plot them. But how to do that in R?
any help please?
There are a couple of options. Option 1: add REPORT = 1 in the control list and the function value will be printed at each step. You would have to process this printed data somehow, maybe by using sink() and then removing the extra text. Option 2: run optim one iteration at a time and storing values. You can then easily store the log likelihood value and plot it. Partial code for both options is presented below.
# generating random values
set.seed(10)
y <- rnorm(1000,2,2)
#### option 1 ####
# intermediate results printed
myfunc <- function(xvec){
fn <- function(theta) { sum ( 0.5*(xvec - theta[1])^2/theta[2] + 0.5* log(theta[2]) ) }
optim(theta <- c(0,5), fn, hessian=TRUE,method = "L-BFGS-B",lower=c(0,0),control = list(trace=1,REPORT=1))
}
# running optimization with input y
myfunc(y)
# would need to copy values or otherwise post process to make plot
#### option 2 ####
# running optimization one iteration at a time
fn <- function(theta) { sum ( 0.5*(y - theta[1])^2/theta[2] + 0.5* log(theta[2]) ) }
# storing log likelihood values
loglvals <- fn(c(0,5))
# initializing variables
temp1par <- c(0,0)
# running the loop
for(i in 1:100){
temp1 <- optim(theta <- ifelse(i==1,1,0)*c(0,5)+ifelse(i==1,0,1)*temp1par, fn, hessian=TRUE,method = "L-BFGS-B",lower=c(0,0),control = list(trace=1,REPORT=1,maxit=1))
temp1par <- temp1$par
loglvals <- c(loglvals,temp1$value)
}
# plotting results
# trimming the length of loglvals because the function converged
# before the loop calling optim stopped
# simply using unique to specify when it the optim didn't return new values
plot(seq(0,length(unique(loglvals)),1)
,loglvals[seq(1,length(unique(loglvals))+1,1)]
,ylab='log likelihood'
,xlab='iteration')
I have code that successfully calculates VaR based on Extreme Value Theory using historical data. I'm trying to run this same code on multiple simulated price paths (i.e. calculating a VaR for each path) and then taking the median or average of those VaRs.
Every example I could find online had the simulation function return the price at the end of the period and then they replicated the function X many time. That makes sense to me, except that I essentially need to calculate value-at-risk for each simulated path. Below is the code I have so far. I can say that the code works when using historical data (i.e. the "evt" function works fine and the datatable is populated correctly when the lossOnly, u, and evtVar lines aren't in a function). However, I've been trying to implement simulation in the second function and trying various combinations, which have all failed.
library('RODBC')
library('nor1mix')
library('fExtremes')
library('QRM')
library('fGarch')
#function for computing the EVT VaR
evt <- function(data,u){
#fit excess returns to gpd to get estimates
gpdfit = tryCatch({
gpdfit <- gpdFit(data,u,type="mle")
}, warning = function(w) {
gpdfit <- gpdFit(data,u,type="mle",optfunc="nlminb")
return(gpdfit)
}, error = function(e) {
gpdfit <- gpdFit(data,u,type="pwm",optfunc="nlminb")
return(gpdfit)
}, finally = {})
#now calculate VaRs
xi <- gpdfit#fit$par.ests["xi"]
beta <- gpdfit#fit$par.ests["beta"]
Nu <- length(gpdfit#data$exceedances)
n <- length(data)
evtVar95 <- (u+((beta/xi)*(((n/Nu)*.05)^(-xi) - 1.)))*100
evtVar99 <- (u+((beta/xi)*(((n/Nu)*.01)^(-xi) - 1.)))*100
evtVar997 <- (u+((beta/xi)*(((n/Nu)*.003)^(-xi) - 1.)))*100
evtVar999 <- (u+((beta/xi)*(((n/Nu)*.001)^(-xi) - 1.)))*100
#return calculations
return(cbind(evtVar95,evtVar99,evtVar997,evtVar999,u,xi,beta,Nu,n))
}
#data <- read.table("pricedata.txt")
prices <- data$V1
returns <- diff(log(prices)) #or returns <- log(prices[-1]/prices[-n])
xi <- mean(returns)
std <- sd(returns)
N <- length(prices)
lstval <- prices[N]
options(scipen = 999)
p <- c(lstval, rep(NA, N-1))
gen.path <- function(){
N <- length(prices)
for(i in 2:N)
p[i] <- p[i-1] * exp(rnorm(1, xi, std))
# plot(p, type = "l", col = "brown", main = "Simulated Price")
#evt calculation
#first get only the losses and then make them absolute
lossOnly <- abs(p[p<0])
#get threshold
u <- quantile(lossOnly, probs = 0.9, names=FALSE)
evtVar <- evt(lossOnly,u)
return(evtVar)
}
runs <- 10
sim.evtVar <- replicate(runs, gen.path())
evtVar <- mean(sim.evtVar)
#add data to total table
VaR <- c(evtVar[1],evtVar[2],evtVar[3],evtVar[4],evtVar[5],evtVar[6],evtVar[7],evtVar[8],evtVar[9])
DF <- data.frame(VaR, row.names=c("evtVar95","evtVaR_99","evtVaR_997","evtVaR_999","u","xi","beta","Nu","n"))
In short, I'm trying to run the value-at-risk function (first function) within the monte carlo function (second function) and trying to put the average simulated values into a data tables. I know the first function works, but it's the second function that's driving me crazy. There are the errors I'm getting:
> sim.evtVar <- replicate(runs, gen.path())
Error in if (xi > 0.5) { : missing value where TRUE/FALSE needed
Called from: .gpdpwmFit(x, u)
Browse[1]> evtVar <- mean(sim.evtVar)
Error during wrapup: object 'sim.evtVar' not found
Browse[1]>
> #add data to total table
> VaR <- c(evtVar[1],evtVar[2],evtVar[3],evtVar[4],evtVar[5],evtVar[6],evtVar[7],evtVar[8],evtVar[9])
Error: object 'evtVar' not found
> DF <- data.frame(VaR, row.names=c("evtVar95","evtVaR_99","evtVaR_997","evtVaR_999","u","xi","beta","Nu","n"))
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ""function"" to a data.frame
Any help you can provide is greatly appreciated! Thank you in advance!
I think the Problem is this row:
lstval <- prices[N]
because if you take a stock price, that can't ever be negative, you produce an empty vector at this row in your function:
lossOnly <- abs(p[p<0])
you should try instead:
lstval <- min(returns)
if you want the highest negative return of your dataset
I am trying to split my data set using two parameters, the fraction of missing values and "maf", and store the sub-data sets in a list. Here is what I have done (it's not working). Any help will be appreciated,
Thanks.
library(BLR)
library(missForest)
data(wheat)
X2<- prodNA(X, 0.4) ### creating missing values
dim(X2)
fd<-t(X2)
MAF<-function(geno){ ## markers are in the rows
geno[(geno!=0) & (geno!=1) & (geno!=-1)] <- NA
geno <- as.matrix(geno)
## calc_Freq for alleles
n0 <- apply(geno==0,1,sum,na.rm=T)
n1 <- apply(geno==1,1,sum,na.rm=T)
n2 <- apply(geno==-1,1,sum,na.rm=T)
n <- n0 + n1 + n2
## calculate allele frequencies
p <- ((2*n0)+n1)/(2*n)
q <- 1 - p
maf <- pmin(p, q)
maf}
frac.missing <- apply(fd,1,function(z){length(which(is.na(z)))/length(z)})
maf<-MAF(fd)
lst<-matrix()
for (i in seq(0.2,0.7,by =0.2)){
for (j in seq(0,0.2,by =0.005)){
lst=fd[(maf>j)|(frac.missing < i),]
}}
It sounds like you want the results that the split function provides.
If you have a vector, "frac.missing" and "maf" is defined on the basis of values in "fd" (and has the same length as the number of rows in fd"), then this would provide the split you are looking for:
spl.fd <- split(fd, list(maf, frac.missing) )
If you want to "group" the fd values basesd on of maf(fd) and frac.missing within the bands specified by your for-loop, then the same split-construct may do what your current code is failing to accomplish:
lst <- split( fd, list(cut(maf(fd), breaks = seq(0,0.2,by =0.005) ,
include.lowest=TRUE),
cut(frac.missing, breaks = seq(0.2,0.7,by =0.2),
right=TRUE,include.lowest=TRUE)
)
)
The right argument accomodates the desire to have the splits based on a "<" operator whereas the default operation of cut presumes a ">" comparison against the 'breaks'. The other function that provides similar facility is by.
the below codes give me exactly what i need:
Y<-t(GBS.binary)
nn<-colnames(Y)
fd<-Y
maf<-as.matrix(MAF(Y))
dff<-cbind(frac.missing,maf,Y)
colnames(dff)<-c("fm","maf",nn)
dff<-as.data.frame(dff)
for (i in seq(0.1,0.6,by=0.1)) {
for (j in seq(0,0.2,by=0.005)){
assign(paste("fm_",i,"maf_",j,sep=""),
(subset(dff, maf>j & fm <i))[,-c(1,2)])
} }