returns to prices R - r

Here is sample data:
set.seed(13)
x1 <- runif(10, -0.05, 0.05)
x2 <- runif(10, -0.05, 0.05)
x3 <- runif(10, -0.05, 0.05)
x4 <- runif(10, -0.05, 0.05)
df <- as.data.frame(cbind(x1, x2, x3, x4))
Lets think that these are Returns and I would like to convert these to prices with a starting value of 100. There is a answer for turning one vector of returns to prices here: How to convert returns to prices? I have tried following:
index <- as.data.frame(Reduce(function(x,y) {x * exp(y)}, df, init=100, accumulate=T))
but that wont work for data frame. I also tried apply function, but couldn't get anything reasonable out of it.

Expand the answer to your data frame by running it column-wise.
index <- sapply(colnames(df), function(col){
Reduce(function(x,y){x * exp(y)},
df[[col]], init=100, accumulate=T)
})
index
#x1 x2 x3 x4
#[1,] 100.00000 100.0000 100.00000 100.00000
#[2,] 102.12550 101.6243 96.43574 99.23404
#[3,] 99.56554 105.5431 96.88956 98.29784
#[4,] 98.47272 109.7467 98.62877 102.50103
#[5,] 94.53007 110.4766 98.90613 105.71522
#[6,] 99.00045 111.5149 94.90222 106.13989
#[7,] 94.27516 110.0142 96.04782 102.05241
#[8,] 94.97819 108.4567 91.65382 101.58857
#[9,] 97.52289 109.4531 91.30083 97.13752
#[10,] 101.23305 113.5271 89.76203 99.68356
#[11,] 96.69209 115.5952 90.96857 95.62000

Use cumsum, which works on data frames.
R> index <- exp(cumsum(df)) * 100
x1 x2 x3 x4
1 102.12550 101.6243 96.43574 99.23404
2 99.56554 105.5431 96.88956 98.29784
3 98.47272 109.7467 98.62877 102.50103
4 94.53007 110.4766 98.90613 105.71522
5 99.00045 111.5149 94.90222 106.13989
6 94.27516 110.0142 96.04782 102.05241
7 94.97819 108.4567 91.65382 101.58857
8 97.52289 109.4531 91.30083 97.13752
9 101.23305 113.5271 89.76203 99.68356
10 96.69209 115.5952 90.96857 95.62000

Related

t.test applied in pairs to more than two samples in R

I have 44 samples where I only have its sample size, average and 1 standard deviation. I asked about the possibility of doing a t.test and some of you returned the answer:
T.test <- function(n, mean, sd) {
s <- sum((n - 1) * sd^2) / (sum(n) - 2) # weighted variance
t <- sqrt(prod(n) / sum(n)) * (diff(mean) / sqrt(s)) # t statistic
df <- sum(n) - 2 # degrees of freedom
p <- (1 - pt(abs(t), df)) * 2 # p value
c(t = t, p = p)
}
dat <- data.frame(mean = c(6.62, 6.31), sd = c(.52, .49), n = c(10, 12))
# mean sd n
# 1 6.62 0.52 10
# 2 6.31 0.49 12
T.test(dat$n, dat$mean, dat$sd)
# t p
# -1.4373111 0.1660915
However I would like to perform a t.test analysis to every single pair of samples. As I have 44 samples, it would be a very hard task.
Imagine I have 5 samples:
# mean sd n
# 1 6.62 0.52 10
# 2 6.31 0.49 12
# 3 5.95 0.76 34
# 4 5.67 0.56 23
# 5 6.12 0.35 16
I would like to perfom a t.test between 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5 and get a table with all the results obtained at the same time.
The obtained table would be with the name of the samples in the first row and in the first column, so half of the table would be redundant. Both results (t.test and p-value) must appear. It should be something similar to this:
# 1 2 3 4 5
# 1 - test1 p-value1 test2 p-value2 test3 p-value3 test4 p-value4
# 2 - - test5 p-value5 test6 p-value6 test7 p-value7
# 3 - - - test8 p-value8 test9 p-value9
# 4 - - - - test0 p-value0
# 5 - - - - -
Can anybody of you show me how could be the code in R to obtain what I request automatically using data written above? I could then update it to my full samples.
We can use the pairwise.t.test function which also provides different p-value adjustments for pairwise comparisons. This is needed, because among the many t.tests you will encounter significant differences just by chance.
We first set up the original data.frame and a data.frame to be filled.
df = data.frame(n = c(10,12,34,23,16), mean = c(6.62,6.31,5.95,5.67,6.12), sd = c(0.52,0.49,0.76,0.56,0.35))
sample_distributions = data.frame(value = numeric(0), sample = integer(0))
We then use the values in df to build normal distributions with the provided parameters and append them on sample_distributions.
for(i in 1:nrow(df)){
values = rnorm(df$n[i], df$mean[i], df$sd[i])
sample= rep(i, length(values))
sample_distributions = rbind(sample_distributions, data.frame(values,sample))
}
Finally, we use these distributions to perform pairwise t.tests.
pairwise.t.test(x = sample_distributions$values, g = sample_distributions$sample, p.adjust.method = "bonferroni")
which yields:
Pairwise comparisons using t tests with pooled SD
data: sample_distributions$values and sample_distributions$sample
1 2 3 4
2 1.0000 - - -
3 0.0051 0.1524 - -
4 0.0099 0.2309 1.0000 -
5 0.9955 1.0000 0.4172 0.6055
P value adjustment method: bonferroni
We could use outer to do the T.test on all the combinations of rows.
res <- outer(1:nrow(dat), 1:nrow(dat), FUN=Vectorize(function(i,j) {
x1 <- dat[c(i,j), ]
T.test(x1$n, x1$mean, x1$sd)[[2]]}))
If we need only the upper triangular p-values, we can assign the lower.tri elements to NA.
res[lower.tri(res, diag=TRUE)] <- NA
res
# [,1] [,2] [,3] [,4] [,5]
#[1,] NA 0.1660915 0.01270188 7.317558e-05 0.007149738
#[2,] NA NA 0.13401244 2.075498e-03 0.241424622
#[3,] NA NA NA 1.368773e-01 0.399642479
#[4,] NA NA NA NA 0.007203030
#[5,] NA NA NA NA NA
This could be also done using combn to return a vector of 'p values'
v1 <- combn(1:nrow(dat), 2, FUN=function(i) {
x1 <- dat[i,]
T.test(x1$n, x1$mean, x1$sd)})[2,]
If we need the matrix output, we can create a matrix with NA values
res2 <- matrix(NA, 5, 5)
then fill the elements in the matrix based on the logical index returned from lower.tri.
res2[lower.tri(res2, diag=FALSE)] <- v1
and transpose (t) to return the same output as in outer.
t(res2)
# [,1] [,2] [,3] [,4] [,5]
#[1,] NA 0.1660915 0.01270188 7.317558e-05 0.007149738
#[2,] NA NA 0.13401244 2.075498e-03 0.241424622
#[3,] NA NA NA 1.368773e-01 0.399642479
#[4,] NA NA NA NA 0.007203030
#[5,] NA NA NA NA NA
data
dat <- structure(list(mean = c(6.62, 6.31, 5.95, 5.67, 6.12),
sd = c(0.52,
0.49, 0.76, 0.56, 0.35), n = c(10L, 12L, 34L, 23L, 16L)),
.Names = c("mean",
"sd", "n"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5"))

NA issues with Linear Regression

I want to regress y1 with x, then y2 with x and so on and save the slope,intercept,r2 values ,p values etc. to a vector such that the final vector should contain values for y1...n even if the value is NA.
The following is how my data looks
y1 y2 y3 y4 y5 x
NA 89 86 91 69 1
NA 86 NA 88 NA 2
NA 86 83 88 66 3
NA 100 NA 102 80 4
Using the following code, the slope values will not be calculated for those columns where all the values of y[,i] are NA but will still be calculated if one of the values is a NA.
slope<-0
for(i in 1:length(names(y))){
if (all(is.na(y[,i]))==FALSE) {
m <- lm(y[,i] ~ time)
slope <- c(slope, m$coefficients[2])
}
}
However,I still cannot figure out a way by which I maintain the positional information of all y[,i] such that my final vector output would look something like this
slope
1 NA
2 9.362637e-01
3 8.461538e-01
4 3.450549e-01
5 6.593407e-01
ANy help will be much appreciated
sapply over the non-x columns of DF returning the coefficients if there are any non-NAs in the dependent variable (y) and returning NAs otherwise:
t(sapply(DF[-6], function(y) if (any(!is.na(y))) coef(lm(y ~ x, DF)) else c(NA, NA)))
This gives the following where column 1 is the intercepts and column 2 is the slopes:
[,1] [,2]
y1 NA NA
y2 82.00000 3.300000
y3 87.50000 -1.500000
y4 84.00000 3.300000
y5 63.85714 2.928571
If only the slopes are needed then:
matrix(sapply(DF[-6], function(y) if (any(!is.na(y))) coef(lm(y ~ x, DF))[2] else NA))
#This is for the slope only.
nn<-lapply(paste0("y",1:5),function(i){
if (all(is.na(y[[i]]))==FALSE) {bb<-lm(y[[i]]~x,data=y)
return(bb[[1]][2])
}else{
return(NA)
}
})
do.call(rbind,kk)
x
[1,] NA
[2,] 3.300000
[3,] -1.500000
[4,] 3.300000
[5,] 2.928571
do.call(rbind,nn)

Applying gsub to various columns

What is the most efficient way to apply gsub to various columns?
The following does not work
x1=c("10%","20%","30%")
x2=c("60%","50%","40%")
x3 = c(1,2,3)
x = data.frame(x1,x2,x3)
per_col = c(1,2)
x = gsub("%","",x[,per_col])
How can I most efficiently drop the "%" sign in specified columns.
Can I apply it to the whole dataframe? This would be useful in the case where I don't know where the percentage columns are.
You can use apply to apply it to the whole data.frame
apply(x, 2, function(y) as.numeric(gsub("%", "", y)))
x1 x2 x3
[1,] 10 60 1
[2,] 20 50 2
[3,] 30 40 3
Or, you could try the lapply solution:
as.data.frame(lapply(x, function(y) gsub("%", "", y)))
x1 x2 x3
1 10 60 1
2 20 50 2
3 30 40 3
To clean the % out you can do:
x[per_col] <- lapply(x[per_col], function(y) as.numeric(gsub("%", "", y)))
x
x1 x2 x3
1 10 60 1
2 20 50 2
3 30 40 3
The first answer works but be careful if you are using data.frame with string: the #docendo discimus's answer will return NAs.
If you want to keep the content of your column as string just remove the as.numeric and convert your table into a data frame after :
as.data.frame(apply(x, 2, function(y) as.numeric(gsub("%", "", y))))
x1 x2 x3
[1,] 10 60 1
[2,] 20 50 2
[3,] 30 40 3
We can unlist per_col columns, remove "%" symbol and convert it into numeric.
x[per_col] <- as.numeric(gsub("%","", unlist(x[per_col])))
#In this case using sub would be enough too as we have only 1 % symbol to replace
#x[per_col] <- as.numeric(sub("%","", unlist(x[per_col])))
x
# x1 x2 x3
#1 10 60 1
#2 20 50 2
#3 30 40 3
To add on docendo discimus' answer, an extension with non-adjacent columns and returning a data.frame:
x1 <- c("10%", "20%", "30%")
x2 <- c("60%", "50%", "40%")
x3 <- c(1, 2, 3)
x4 <- c("60%", "50%", "40%")
x <- data.frame(x1, x2, x3, x4)
x[, c(1:2, 4)] <- as.data.frame(apply(x[,c(1:2, 4)], 2,
function(x) {
as.numeric(gsub("%", "", x))}
))
> x
x1 x2 x3 x4
1 10 60 1 60
2 20 50 2 50
3 30 40 3 40
> class(x)
[1] "data.frame"

Constructing derivative of a conditional density

I am using npcdens from np package to construct a conditional density of y on covariates x. However, I need the derivative of the log of this density with respect to y. Is there some way in R to get this?
bw <- npcdensbw(formula=y ~ x1+x2+x3)
fhat <- npcdens(bws=bw,gradients=TRUE)
grad.fhat <- gradients(npcdens(bws=bw,gradients=TRUE))
which returns the gradient with respect to x1, x2 and x3
Can we use this example dataset?
dta = data.frame(expand.grid(x1=1:5,x2=2:6,x3=5:10))
dta$y = with(dta,x1+2*x2 + 3*x3^2)
head(dta)
x1 x2 x3 y
1 1 2 5 80
2 2 2 5 81
3 3 2 5 82
4 4 2 5 83
5 5 2 5 84
6 1 3 5 82
y is the value of the "density". estimate a conditional bandwith object
bw <- npcdensbw(formula = y ~ x1+x2+x3,data=dta)
and look at the gradients
head(gradients(npcdens(bws=bw,gradients=TRUE)))
[,1] [,2] [,3]
[1,] -2.024422e-15 -2.048994e-50 -1.227563e-294
[2,] -1.444541e-15 -1.994174e-50 -1.604693e-294
[3,] -1.017979e-31 -1.201719e-50 -1.743784e-294
[4,] 1.444541e-15 -6.753912e-64 -1.604693e-294
[5,] 2.024422e-15 1.201719e-50 -1.227563e-294
[6,] -2.024422e-15 -3.250713e-50 -1.227563e-294
What do you mean with "derivative with respect to y"? this is a function g(x1,x2,x3), so you can only take derivatives w.r.t. to those 3 dimensions. Concerning the "log of y" part of your question, could this be it?
bw <- npcdensbw(formula = log(y) ~ x1 + x2 + x3,data=dta)
I've never used this package, so these are the thoughts of a non-practitioner. I guess you looked at the examples in help(npcdensbw)?

How to fill off-diagonals and ignore diagonals in matrix in R?

I am trying to fill a matrix in R where the final result will ignore the diagonal entries and the values will be filled in around the diagonal. A simple example of what I mean is, if I take a simple 3x3 matrix like the one shown below:
ab <- c(1:9)
mat <- matrix(ab,nrow=3,ncol=3)
colnames(mat)<- paste0("x", 1:3)
rownames(mat)<- paste0("y", 1:3)
mat
x1 x2 x3
y1 1 4 7
y2 2 5 8
y3 3 6 9
What I want to achieve is to fill the diagonals with 0 and shift all the other values around the diagonal. So, for example if I just use diag(mat)<-0 that results in this:
x1 x2 x3
y1 0 4 7
y2 2 0 8
y3 3 6 0
Whereas, the result I'm looking for is something like this (where the values get wrapped around the diagonal):
x1 x2 x3
y1 0 3 5
y2 1 0 6
y3 2 4 0
I'm not worried about the values that are pushed out of the matrix (i.e., 7,8,9).
Any suggestions?
Thanks
EDIT: The upvoted solution below, seems to have solved the problem
One solution that works for your example is to first declare a matrix full of ones except on the diagonal:
M <- 1 - diag(3)
And then to replace all the ones by the desired off-diagonal values
M[M == 1] <- 1:6
M
# [,1] [,2] [,3]
# [1,] 0 3 5
# [2,] 1 0 6
# [3,] 2 4 0
A more complicated scenario (e.g. diagonal coefficients that are not 0, or an unkonwn number of off-diagonal elements) might need a little bit of additionnal work.
You may need a loop:
n <- 9
seqs <- seq(1:n)
mats <- matrix(0, nrow = 3, ncol = 3)
ind <- 0
for(i in 1:nrow(mats)){
for(j in 1:nrow(mats)){
if(i == j) {
mats[i,j] <- 0 }
else {
ind <- ind + 1
mats[j,i] <- seqs[ind]
}
}
}
Resulting in:
>mats
[,1] [,2] [,3]
[1,] 0 3 5
[2,] 1 0 6
[3,] 2 4 0
This will work ok for your example. Not sure I needed n1 & n2, could be altered to one value if always symmetric
# original data
ab <- c(1:9)
n1 <- 3
n2 <- 3
# You could add the 0's to the diagonal, by adding a 0 before every n1 split
# of the data e.g. 0,1,2,3 & 0,4,5,6 & 0,7,8,9
split_ab <- split(ab, ceiling((1:length(ab))/n1))
update_split_ab <- lapply(split_ab, function(x){
c(0, x)
})
new_ab <- unlist(update_split_ab)
mat <- matrix(new_ab, nrow=n1, ncol=n2)
colnames(mat)<- paste0("x", 1:n2)
rownames(mat)<- paste0("y", 1:n1)
mat
# turn this in to a function
makeShiftedMatrix <- function(ab=1:9, n1=3, n2=3){
split_ab <- split(ab, ceiling((1:length(ab))/n1))
update_split_ab <- lapply(split_ab, function(x){
c(0, x)
})
new_ab <- unlist(update_split_ab)
mat <- matrix(new_ab, nrow=n1, ncol=n2)
colnames(mat)<- paste0("x", 1:n2)
rownames(mat)<- paste0("y", 1:n1)
mat
return(mat)
}
# default
makeShiftedMatrix()
# to read in original matrix and shift:
old_mat <- matrix(ab, nrow=n1, ncol=n2)
makeShiftedMatrix(ab=unlist(old_mat))

Resources