Non linear regression for exponential decay model in R - r

I have the following problem:
I asked 5 people (i=1, ..., 5) to forecast next period's return of 3 different stocks. This gives me the following data:
S_11_i_c <-read.table(text = "
i c_1 c_2 c_3
1 0.150 0.70 0.190
2 0.155 0.70 0.200
3 0.150 0.75 0.195
4 0.160 0.80 0.190
5 0.150 0.75 0.180
",header = T)
In words, in period t=10 participant i=1 expects the return of stock c_1 to be 0.15 in period t=11.
The forecasts are based on past returns of the stocks. These are the following:
S_t_c <-read.table(text = "
time S_c_1 S_c_2 S_c_3
1 0.020 0.015 0.040
2 0.045 0.030 0.050
3 0.060 0.045 0.060
4 0.075 0.060 0.060
5 0.090 0.070 0.060
6 0.105 0.070 0.090
7 0.120 0.070 0.120
8 0.125 0.070 0.140
9 0.130 0.070 0.160
10 0.145 0.070 0.180
",header = T)
In words, stock c=1 had a return of 0.145 in period 10.
So, the variables in table S_11_i_c are the dependent variables.
The variables in table S_t_c are the independet variables.
The model I want to estimate is the following:
My problem with coding this is as follows:
I do only know how to express
with the help of a loop. As in:
Sum_S_t_c <- data.frame(
s = seq(1:9),
c_1 = rnorm(9)
c_2 = rnorm(9)
c_3 = rnorm(9)
)
Sum_S_t_c = 0
for (c in 2:4) {
for (s in 0:9) {
Sum_S_t_c[s,c] <- Sum_S_t_c + S_t_c[10-s, c]
Sum_S_t_c = Sum_S_t_c[s,c]
}
}
However, loops within a regression are not possible. So, my other solution would be to rewrite the sum to
However, as my actual problem has a much larger n, this isn*t realy working for me.
Any ideas?

Related

How to calculate the Bonferroni Lower and Upper limits in R?

With the following data, I am trying to calculate the Chi Square and Bonferroni lower and upper Confidence intervals. The column "Data_No" identifies the dataset (as calculations needs to be done separately for each dataset).
Data_No Area Observed
1 3353 31
1 2297 2
1 1590 15
1 1087 16
1 817 2
1 847 10
1 1014 28
1 872 29
1 1026 29
1 1215 21
2 3353 31
2 2297 2
2 1590 15
3 1087 16
3 817 2
The code I used is
library(dplyr)
setwd("F:/GIS/July 2019/")
total_data <- read.csv("test.csv")
result_data <- NULL
for(i in unique(total_data$Data_No)){
data <- total_data[which(total_data$Data_No == i),] data <- data %>%
mutate(RelativeArea = Area/sum(Area), Expected = RelativeArea*sum(Observed), OminusE = Observed-Expected, O2 = OminusE^2, O2divE = O2/Expected, APU = Observed/sum(Observed), Alpha = 0.05/2*count(Data_No),
Zvalue = qnorm(Alpha,lower.tail=FALSE), lower = APU-Zvalue*sqrt(APU*(1-APU)/sum(Observed)), upper = APU+Zvalue*sqrt(APU*(1-APU)/sum(Observed)))
result_data <- rbind(result_data,data) }
write.csv(result_data,file='final_result.csv')
And the error message I get is:
Error in UseMethod("summarise_") : no applicable method for
'summarise_' applied to an object of class "c('integer', 'numeric')"
The column that I am calling "Alpha" is the alpha value of 0.05/2k, where K is the number of categories - in my example, I have 10 categories ("Data_No" column) for the first dataset, so "Alpha" needs to be 0.05/20 = 0.0025, and it's corresponding Z value is 2.807. The second dataset has 3 categories (so 0.05/6) and the third has 2 categories (0.05/4) in my example table (Data_No" column). Using the values from the newly calculated "Alpha" column, I then need to calculate the ZValue column (Zvalue = qnorm(Alpha,lower.tail=FALSE)) which I then use to calculate the lower and upper confidence intervals.
From the above data, here are the results that I should get, but note that I have had to manually calculate Alpha column and Zvalue, rather than insert those calculations within the R code:
Data_No Area Observed RelativeArea Alpha Z value lower upper
1 3353 31 0.237 0.003 2.807 0.092 0.247
1 2297 2 0.163 0.003 2.807 -0.011 0.033
1 1590 15 0.113 0.003 2.807 0.025 0.139
1 1087 16 0.077 0.003 2.807 0.029 0.146
1 817 2 0.058 0.003 2.807 -0.011 0.033
1 847 10 0.060 0.003 2.807 0.007 0.102
1 1014 28 0.072 0.003 2.807 0.078 0.228
1 872 29 0.062 0.003 2.807 0.083 0.234
1 1026 29 0.073 0.003 2.807 0.083 0.234
1 1215 21 0.086 0.003 2.807 0.049 0.181
2 3353 31 0.463 0.008 2.394 0.481 0.811
2 2297 2 0.317 0.008 2.394 -0.027 0.111
2 1590 15 0.220 0.008 2.394 0.152 0.473
3 1087 16 0.571 0.013 2.241 0.723 1.055
3 817 2 0.429 0.013 2.241 -0.055 0.277
Please note that I only included some of the columns generated from the code.
# You need to check the closing bracket for lower c.f. sqrt value. Following code should work.
data <- read.csv("test.csv")
data <- data %>% mutate(RelativeArea =
Area/sum(Area), Expected = RelativeArea*sum(Observed), OminusE =
Observed-Expected, O2 = OminusE^2, O2divE = O2/Expected, APU =
Observed/sum(Observed), lower =
APU-2.394*sqrt(APU*(1-APU)/sum(Observed)), upper =
APU+2.394*sqrt(APU*(1-APU)/sum(Observed)))
#Answer to follow-up question.
#Sample Data
Data_No Area Observed
1 3353 31
1 2297 2
2 1590 15
2 1087 16
#Code to run
total_data <- read.csv("test.csv")
result_data <- NULL
for(i in unique(total_data$Data_No)){
data <- total_data[which(total_data$Data_No == i),]
data <- data %>% mutate(RelativeArea =
Area/sum(Area), Expected = RelativeArea*sum(Observed), OminusE =
Observed-Expected, O2 = OminusE^2, O2divE = O2/Expected, APU =
Observed/sum(Observed), lower =
APU-2.394*sqrt(APU*(1-APU)/sum(Observed)), upper =
APU+2.394*sqrt(APU*(1-APU)/sum(Observed)))
result_data <- rbind(result_data,data)
}
write.csv(result_data,file='final_result.csv')
#Issue in calculating Alpha. I have updated the code.
library(dplyr)
setwd("F:/GIS/July 2019/")
total_data <- read.csv("test.csv")
#Creating the NO_OF_CATEGORIES column based on your question.
total_data$NO_OF_CATEGORIES <- 0
total_data[which(total_data$Data_No==1),]$NO_OF_CATEGORIES <- 10
total_data[which(total_data$Data_No==2),]$NO_OF_CATEGORIES <- 3
total_data[which(total_data$Data_No==3),]$NO_OF_CATEGORIES <- 2
#Actual code
result_data <- NULL
for(i in unique(total_data$Data_No)){
data <- total_data[which(total_data$Data_No == i),]
data <- data %>%
mutate(RelativeArea = Area/sum(Area), Expected = RelativeArea*sum(Observed), OminusE = Observed-Expected, O2 = OminusE^2, O2divE = O2/Expected, APU = Observed/sum(Observed), Alpha = 0.05/(2*(unique(data$NO_OF_CATEGORIES))),
Zvalue = qnorm(Alpha,lower.tail=FALSE), lower = APU-Zvalue*sqrt(APU*(1-APU)/sum(Observed)), upper = APU+Zvalue*sqrt(APU*(1-APU)/sum(Observed)))
result_data <- rbind(result_data,data) }
write.csv(result_data,file='final_result.csv')

multiple columns plot with correlation value in ggplot2

Hi I have a dataframe df as below.
I would like to make a facet plot that shows relation between columns A & B, A & C, A & D , B & C and C & D and overlay a regression line and person's correlation coefficient value.
I am trying to make a facet plot to show relation between each of these column could not figure out exactly how.
Any help would be appreciated. This question is unique in SO as there are not any ans for plot among columns.
df<- read.table(text =c("A B C D
0.451 0.333 0.034 0.173
0.491 0.27 0.033 0.207
0.389 0.249 0.084 0.271
0.425 0.819 0.077 0.281
0.457 0.429 0.053 0.386
0.436 0.524 0.049 0.249
0.423 0.27 0.093 0.279
0.463 0.315 0.019 0.204
"), header = T)
df
pl<-ggplot(data=df) + geom_point(aes(x=A,y=B,size=10)) +
geom_point(aes(x=A,y=C,size=10)) +
geom_point(aes(x=A,y=C,size=10)) +
geom_point(aes(x=A,y=D,size=10)) +
geom_smooth(method = "lm", se=FALSE, color="black")
pl

Convert column headers into new columns

My data frame consists of time series financial data from many public companies. I purposely set companies' weights as their column headers while cleaning the data, and I also calculated log returns for each of them in order to calculate weighted returns in the next step.
Here is an example. There are four companies: A, B, C and D, and their corresponding weights in the portfolio are 0.4, 0.3, 0.2, 0.1 separately. So the current data set looks like:
df1 <- data.frame(matrix(vector(),ncol=9, nrow = 4))
colnames(df1) <- c("Date","0.4","0.4.Log","0.3","0.3.Log","0.2","0.2.Log","0.1","0.1.Log")
df1[1,] <- c("2004-10-29","103.238","0","131.149","0","99.913","0","104.254","0")
df1[2,] <- c("2004-11-30","104.821","0.015","138.989","0.058","99.872","0.000","103.997","-0.002")
df1[3,] <- c("2004-12-31","105.141","0.003","137.266","-0.012","99.993","0.001","104.025","0.000")
df1[4,] <- c("2005-01-31","107.682","0.024","137.08","-0.001","99.782","-0.002","105.287","0.012")
df1
Date 0.4 0.4.Log 0.3 0.3.Log 0.2 0.2.Log 0.1 0.1.Log
1 2004-10-29 103.238 0 131.149 0 99.913 0 104.254 0
2 2004-11-30 104.821 0.015 138.989 0.058 99.872 0.000 103.997 -0.002
3 2004-12-31 105.141 0.003 137.266 -0.012 99.993 0.001 104.025 0.000
4 2005-01-31 107.682 0.024 137.08 -0.001 99.782 -0.002 105.287 0.012
I want to create new columns that contain company weights so that I can calculate weighted returns in my next step:
Date 0.4 0.4.W 0.4.Log 0.3 0.3.W 0.3.Log 0.2 0.2.W 0.2.Log 0.1 0.1.W 0.1.Log
1 2004-10-29 103.238 0.400 0.000 131.149 0.300 0.000 99.913 0.200 0.000 104.254 0.100 0.000
2 2004-11-30 104.821 0.400 0.015 138.989 0.300 0.058 99.872 0.200 0.000 103.997 0.100 -0.002
3 2004-12-31 105.141 0.400 0.003 137.266 0.300 -0.012 99.993 0.200 0.001 104.025 0.100 0.000
4 2005-01-31 107.682 0.400 0.024 137.080 0.300 -0.001 99.782 0.200 -0.002 105.287 0.100 0.012
We can try
v1 <- grep("^[0-9.]+$", names(df1), value = TRUE)
df1[paste0(v1, ".w")] <- as.list(as.numeric(v1))

find the index of max value in data frame and add the value

This is my data frame:
>head(dat)
geno P1 P2 P3 P4 dif
1 G1 0.015 0.007 0.026 0.951 0.001
2 G2 0.008 0.006 0.015 0.970 0.001
3 G3 0.009 0.006 0.017 0.968 0.000
4 G4 0.011 0.007 0.017 0.965 0.000
5 G5 0.013 0.005 0.021 0.961 0.000
6 G6 0.009 0.006 0.007 0.977 0.001
Here, I need to find max in each row and add dat$dif to the max.
when i used which.max(dat[,-1]), I am getting error:
Error in which.max(dat[,-1]) :
(list) object cannot be coerced to type 'double'
A previous answer (by Scriven) gives most of it but as others have stated, it incorrectly includes the last column. Here is one method that works around it:
idx <- (! names(dat) %in% c('geno','dif'))
dat$dif + apply(dat[,idx], 1, max)
# 1 2 3 4 5 6
# 0.952 0.971 0.968 0.965 0.961 0.978
You can easily put the idx stuff directly into the dat[,...] subsetting, but I broke it out here for clarity.
idx can be defined by numerous things here, such as "all but the first and last columns": idx <- names(dat)[-c(1, ncol(dat))]; or "anything that looks like P#": idx <- grep('^P[0-9]+', names(dat)).
There's an app, eh function for that :-).
max.col finds the index of the maximum position for each row of a matrix. Take note, that as max.col expects a matrix (numeric values only) you have to exclude the “geno” column when applying this function.
sapply(1:6,function(x) dat[x,max.col(dat[,2:5])[x] +1]) + dat$dif
[1] 0.952 0.971 0.968 0.965 0.961 0.978

counting values within a certain range within a sliding window

The problem that I like to solve is a sliding window going over the measurement data with a defined window width and a controllable stepwidth (there 1).
Within the window I need to detect a number of values within a certain range of the
first value expl. 2.2 +- 0.3 and count the number of such values in a row
expl. 2.2, 2.3, 2.1 , 1.8, 2.2, 2.5, 2.1 --> 3,1,3
d <- read.table(text="Number Time.s Potential.V Current.A
1 0.0000 0.075 -0.7653
2 0.0285 0.074 -0.7597
3 0.0855 0.076 -0.7549
17 0.8835 0.074 -0.7045
18 0.9405 0.073 -0.5983
19 0.9975 0.071 -0.1370
19 1.0175 0.070 -0.1370
20 1.0545 0.072 0.1295
21 1.1115 0.073 0.2680
8013 1.6555 0.076 -1.1070
8014 1.7125 0.075 -1.1850
8015 1.7695 0.073 -1.2610
8016 1.8265 0.072 -1.3460
8017 1.8835 0.071 -1.4380
8018 1.9405 0.070 -1.4350
8019 1.9975 0.061 -1.0720
8020 2.1045 0.062 -0.8823
8021 2.1115 0.058 -0.7917
8022 2.1685 0.060 -0.7481", header=TRUE)
rle(round(diff(d$Time.s[d$Time.s>1 & d$Time.s<2]),digits=2))
I can't use rle, because there is no acceptance interval one could define. Working with
a for loop is possible, but seams very un'R'ish.
width=4
bound.low <- 0.00
bound.high <- 0.03
Ergebnis <- data.frame(
Potential.V=seq(1,(nrow(d)-width),by=1),count=seq(1,(nrow(d)-width),by=1))
for (a in 1:(nrow(d)-width)) {
temp <- d[a:(a+width),c("Time.s","Potential.V")]
counter=0
for (b in 1:nrow(temp)){
if (temp$Potential.V[1] >= (temp$Potential.V[b] - bound.low ) &
temp$Potential.V[1] <= (temp$Potential.V[b] + bound.high) ){
(counter=counter+1)
} else { break() }
}
Ergebnis$Potential.V[a] <- temp$Potential.V[1]
Ergebnis$count[a] <- counter
}
print(Ergebnis)
Result
Potential.V count
1 0.075 2
2 0.074 1
3 0.076 5
4 0.074 5
5 0.073 5
6 0.071 2
7 0.070 1
8 0.072 1
9 0.073 1
10 0.076 5
11 0.075 5
12 0.073 5
13 0.072 5
14 0.071 5
15 0.070 5
rle(Ergebnis$count)
Run Length Encoding
lengths: int [1:6] 1 1 3 1 3 6
values : num [1:6] 2 1 5 2 1 5
So I find the needed counts in the lengths vector.
Is there a more elegant way of solving such problems ? My experiments with xts and zoo didn't worked out like I thought
best regards,
IInatas
P.S.
The reason for this data analysis is log data from an experiment which has a degrading problem with an increasing severity in relation to certain voltages. In the end there is a lifetime account and I try to calculate the rest that is left, based on this log data.
Here's a solution using zoo::rollapply to calculate Ergebnis, but you still need to run rle on the result.
# the function we're going to apply to each window
f <- function(x, upper=0.03, lower=0.00) {
# logical test
l <- x[1] >= (x-lower) & x[1] <= (x+upper)
# first FALSE value
m <- if(any(!l)) which.min(l) else length(l)
c(Potential.V=x[1],count=sum(l[1:m]))
}
Ergebnis <- data.frame(rollapply(d$Potential.V, 5, f, align='left'))
rle(Ergebnis$count)

Resources