I am using npcdens from np package to construct a conditional density of y on covariates x. However, I need the derivative of the log of this density with respect to y. Is there some way in R to get this?
bw <- npcdensbw(formula=y ~ x1+x2+x3)
fhat <- npcdens(bws=bw,gradients=TRUE)
grad.fhat <- gradients(npcdens(bws=bw,gradients=TRUE))
which returns the gradient with respect to x1, x2 and x3
Can we use this example dataset?
dta = data.frame(expand.grid(x1=1:5,x2=2:6,x3=5:10))
dta$y = with(dta,x1+2*x2 + 3*x3^2)
head(dta)
x1 x2 x3 y
1 1 2 5 80
2 2 2 5 81
3 3 2 5 82
4 4 2 5 83
5 5 2 5 84
6 1 3 5 82
y is the value of the "density". estimate a conditional bandwith object
bw <- npcdensbw(formula = y ~ x1+x2+x3,data=dta)
and look at the gradients
head(gradients(npcdens(bws=bw,gradients=TRUE)))
[,1] [,2] [,3]
[1,] -2.024422e-15 -2.048994e-50 -1.227563e-294
[2,] -1.444541e-15 -1.994174e-50 -1.604693e-294
[3,] -1.017979e-31 -1.201719e-50 -1.743784e-294
[4,] 1.444541e-15 -6.753912e-64 -1.604693e-294
[5,] 2.024422e-15 1.201719e-50 -1.227563e-294
[6,] -2.024422e-15 -3.250713e-50 -1.227563e-294
What do you mean with "derivative with respect to y"? this is a function g(x1,x2,x3), so you can only take derivatives w.r.t. to those 3 dimensions. Concerning the "log of y" part of your question, could this be it?
bw <- npcdensbw(formula = log(y) ~ x1 + x2 + x3,data=dta)
I've never used this package, so these are the thoughts of a non-practitioner. I guess you looked at the examples in help(npcdensbw)?
Related
I'm very new to R. I have two matrices of different dimensions, C (3 rows, 79 columns) and T(3 rows, 215 columns). I want my code to calculate the Spearman correlation between the first column of C and all the columns of T and return the maximum correlation with the indexes and of the columns. Then, the second column of C and all the columns of T and so on. In fact, I want to find the columns between two matrices which are most correlated. Hope it was clear.
What I did was a nested for loop, but the result is not what I search.
for (i in 1:79){
for(j in 1:215){
print(max(cor(C[,i],T[,j],method = c("spearman"))))
}
}
You don't have to loop over the columns.
x <- cor(C,T,method = c("spearman"))
out <- data.frame(MaxCorr = apply(x,1,max), T_ColIndex=apply(x,1,which.max),C_ColIndex=1:nrow(x))
head(out)
gives,
MaxCorr T_ColIndex C_ColIndex
1 1 8 1
2 1 1 2
3 1 2 3
4 1 1 4
5 1 11 5
6 1 4 6
Fake Data:
C <- matrix(rnorm(3*79),nrow=3)
T <- matrix(rnorm(3*215),nrow=3)
Maybe something like the function below can solve the problem.
pairwise_cor <- function(x, y, method = "spearman"){
ix <- seq_len(ncol(x))
iy <- seq_len(ncol(y))
t(sapply(ix, function(i){
m <- sapply(iy, function(j) cor(x[,i], y[,j], method = method))
setNames(c(i, which.max(m), max(m)), c("col_x", "col_y", "max"))
}))
}
set.seed(2021)
C <- matrix(rnorm(3*5), nrow=3)
T <- matrix(rnorm(3*7), nrow=3)
pairwise_cor(C, T)
# col_x col_y max
#[1,] 1 1 1.0
#[2,] 2 2 1.0
#[3,] 3 2 1.0
#[4,] 4 3 0.5
#[5,] 5 5 1.0
I have a function :
f <- function(x,y){ return (x + y)}
I have to make a plot 2 D (not 3 D) with on the X and Y aes c(30:200). So I have to map both the x and the y to the function and based on the result of that function I have to color the point f(xi,yi) > ? and so on. How would I achieve this ?
I tried :
range <- c(30:200)
ys = matrix(nrow = 171,ncol = 171 )
for (i in range){
for (y in range){
ys[i-29,y-29] <- f(i,y) # exemple if f(i,j) < 0.5 color (i,j) red
}
}
df <- data.frame(x= c(30:200), y= c(30:200))
Now the x and y axes are correct however how would I be able to plot this since I cant just bind ys to the y axes. Using a ys seems like it isnt the right way to achieve this, how would I do this
Thx for the help
Here's a sample given a small matrix.
First, I'll generate the matrix ... you use whatever data you want.
m <- matrix(1:25, nr=5)
m
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 6 11 16 21
# [2,] 2 7 12 17 22
# [3,] 3 8 13 18 23
# [4,] 4 9 14 19 24
# [5,] 5 10 15 20 25
Now, convert it to the "long" format that ggplot2 prefers:
library(dplyr)
library(tidyr)
longm <- cbind(m, x = seq_len(nrow(m))) %>%
as.data.frame() %>%
gather(y, val, -x) %>%
mutate(y = as.integer(gsub("\\D", "", y)))
head(longm)
# x y val
# 1 1 1 1
# 2 2 1 2
# 3 3 1 3
# 4 4 1 4
# 5 5 1 5
# 6 1 2 6
And a plot:
library(ggplot2)
ggplot(longm, aes(x, y, fill=val)) + geom_tile()
# or, depending on other factors, otherwise identical
ggplot(longm, aes(x, y)) + geom_tile(aes(fill=val))
It's notable (to me) that the top-left value in the matrix (m[1,1]) is actually the bottom-left in the heatmap. This can be adjusted with scale_y_reverse(). From here, it should be primarily aesthetics.
I am trying to calculate Within And Total Sum of Squares and Cross-Product Matrices (W) in one-way MANOVA.
I have a treatment matrix tm:
n x1 x2
1 6 7
1 5 9
1 8 6
...
2 3 3
2 1 6
2 2 3
...
3 2 3
3 2 3
3 5 1
...
I also have each individual observations in their own variables, for example:
x111 = x[1,1]
x112 = x[2,1]
...
that are also in the variables that create vectors:
# creating vectors
t11 = c(x111, x111_2) # 6,7
t12 = c(x112, x112_2) # 5,9
t13 = c(x113, x113_2) # 8,6
t14 = c(x114, x114_2) # 4,9
t15 = c(x115, x115_2) # 7,9
t21 = c(x211, x211_2) # 3,3
t22 = c(x212, x212_2) # 1,6
t23 = c(x213, x213_2) # 2,3
t31 = c(x311, x311_2) # 2,3
t32 = c(x312, x312_2) # 5,1
t33 = c(x313, x313_2) # 3,1
t34 = c(x314, x314_2) # 2,3
>dput(t11)
c(6,7)
I am trying to calculate W (Within And Total Sum of Squares and Cross-Product Matrices).
The means are
> x1 # treatment 1
[1] 6 8
> x2 # treatment 2
[1] 2 4
> x3 # treatment 3
[1] 3 2
> x # overall mean
X1 X2
[1,] 4 5
The code I have is:
W = (t(t11)-t(x1))*(t11-x1)
+(t(t12)-t(x1))%*%(t12-x1)
+(t(t13)-t(x1))%*%(t13-x1)
+(t(t14)-t(x1))%*%(t14-x1)
+(t(t15)-t(x1))%*%(t15-x1)
+(t(t21)-t(x2))%*%(t21-x2)
+(t(t22)-t(x2))%*%(t22-x2)
+(t(t23)-t(x2))%*%(t23-x2)
+(t(t31)-t(x3))%*%(t31-x3)
+(t(t32)-t(x3))%*%(t32-x3)
+(t(t33)-t(x3))%*%(t33-x3)
+(t(t34)-t(x3))%*%(t34-x3)
The result I get is:
Error in (t(t11) - t(x1)) * (t11 - x1) + (t(t12) - t(x1)) %*% :
non-conformable arrays
When I isolated each statements, I got this:
> (t(t11)-t(x1))%*%(t11-x1)
[,1]
[1,] 1
> (t(t12)-t(x1))%*%(t12-x1)
[,1]
[1,] 2
Why do these statements evaluate to 1x1 matrices? When I calculate 2x1 and 1x2 operations (subtraction and multiplication) manually, I get 2x2 for both. Here is an online calculator
It can be confusing sometimes when working with vectors in R and you want to do matrix multiplication. A vector in R (say, x = c(1,2)) is printed like it might be a row vector, but R treats it as a column vector.
With that in mind, to get the 2x2 matrix you want, do
t11 = c(6, 7)
x1 = c(6, 8)
(t11 - x1) %*% t(t11 - x1)
No need for too many transposes.
My questions concern the calculation of the Cramers V to detect correlation between categorial variables. I 've got a dataset with missing values, but I created a fake dataset for illustration with two variables a and b, one of them containing to NA's.
a <- factor(c("M","","F","F","","M","F","F"))
a2 <- factor(a, levels = c('M','F'),labels =c('Male','Female'))
b <- factor(c("y","y","","y","n","n","n","y"))
b2 <- factor(b, levels=c("y","n"),labels=c("yes","no"))
df<-cbind(a2,b2)
The assocstats function gives me the result for the cramers V:
require(vcd)
> tab <-table(a,b)
> assocstats(tab)
X^2 df P(> X^2)
Likelihood Ratio 1.7261 4 0.78597
Pearson 1.3333 4 0.85570
Phi-Coefficient : 0.408
Contingency Coeff.: 0.378
Cramer's V : 0.289
Now I want to drop the NA's from the levels
a[a==""]<-NA
a3 <- droplevels(a)
levels(a3)
tab <-table(a,b)
assocstats(tab)
But everytime I remove NA's the result looks like this:
X^2 df P(> X^2)
Likelihood Ratio 0.13844 2 0.93312
Pearson NaN 2 NaN
Phi-Coefficient : NaN
Contingency Coeff.: NaN
Cramer's V : NaN
Also, because I have a large dataset I would like to calculate a matrix of the Cramer V results. I found this code here on stack overflow and it seems to work...
get.V<-function(y){
col.y<-ncol(y)
V<-matrix(ncol=col.y,nrow=col.y)
for(i in 1:col.y){
for(j in 1:col.y){
V[i,j]<-assocstats(table(y[,i],y[,j]))$cramer
}
}
return(V)
}
get.V(tab)
Only that the result is different than that with assocstats function:
[,1] [,2] [,3]
[1,] 1.0 0.5 1
[2,] 0.5 1.0 1
[3,] 1.0 1.0 1
This can not be right, because I get this result every time, even when changing the number of observations... what is wrong with this code?
Conclusion:I don't know which one of the result is right. I have a large dataset with a lot of NA's in it. The first asocstat result and the code give different results, altough there is no big difference,because the code only creates a matrix. The second asocstat function gives only NaN.I cant detect any errors... Can somebody help me?
You don't have to replace the "" with NA if you are using factors--any unique value that you don't define in levels will be converted to NA by factor
a <- factor(c("M","","F","F","","M","F","F"))
a2 <- factor(a, levels = c('M','F'),labels =c('Male','Female'))
a
# [1] M F F M F F
# Levels: F M
a2
# [1] Male <NA> Female Female <NA> Male Female Female
# Levels: Male Female
b <- factor(c("y","y","","y","n","n","n","y"))
b2 <- factor(b, levels=c("y","n"),labels=c("yes","no"))
(df <- cbind(a2,b2))
# a2 b2
# [1,] 1 1
# [2,] NA 1
# [3,] 2 NA
# [4,] 2 1
# [5,] NA 2
# [6,] 1 2
# [7,] 2 2
# [8,] 2 1
Above, you're creating a matrix which loses all the labels that you created with factor. I think you want a data frame:
(df <- data.frame(a2,b2))
# a2 b2
# 1 Male yes
# 2 <NA> yes
# 3 Female <NA>
# 4 Female yes
# 5 <NA> no
# 6 Male no
# 7 Female no
# 8 Female yes
require('vcd')
(tab <- table(a2,b2, useNA = 'ifany'))
# b2
# a2 yes no <NA>
# Male 1 1 0
# Female 2 1 1
# <NA> 1 1 0
(tab <- table(a2,b2))
# b2
# a2 yes no
# Male 1 1
# Female 2 1
You need to explicitly tell table if you want to see NA values in the table. Otherwise, it will drop them by default so that you are already "excluding" them when you use assocstats:
assocstats(tab)
# X^2 df P(> X^2)
# Likelihood Ratio 0.13844 1 0.70983
# Pearson 0.13889 1 0.70939
#
# Phi-Coefficient : 0.167
# Contingency Coeff.: 0.164
# Cramer's V : 0.167
For get.V just pass the data frame or matrix, not the table:
get.V <- function(y) {
col.y <- ncol(y)
V <- matrix(ncol=col.y,nrow=col.y)
for(i in 1:col.y){
for(j in 1:col.y){
V[i,j] <- assocstats(table(y[,i],y[,j]))$cramer
}
}
return(V)
}
get.V(df)
# [,1] [,2]
# [1,] 1.0000000 0.1666667
# [2,] 0.1666667 1.0000000
I want to regress y1 with x, then y2 with x and so on and save the slope,intercept,r2 values ,p values etc. to a vector such that the final vector should contain values for y1...n even if the value is NA.
The following is how my data looks
y1 y2 y3 y4 y5 x
NA 89 86 91 69 1
NA 86 NA 88 NA 2
NA 86 83 88 66 3
NA 100 NA 102 80 4
Using the following code, the slope values will not be calculated for those columns where all the values of y[,i] are NA but will still be calculated if one of the values is a NA.
slope<-0
for(i in 1:length(names(y))){
if (all(is.na(y[,i]))==FALSE) {
m <- lm(y[,i] ~ time)
slope <- c(slope, m$coefficients[2])
}
}
However,I still cannot figure out a way by which I maintain the positional information of all y[,i] such that my final vector output would look something like this
slope
1 NA
2 9.362637e-01
3 8.461538e-01
4 3.450549e-01
5 6.593407e-01
ANy help will be much appreciated
sapply over the non-x columns of DF returning the coefficients if there are any non-NAs in the dependent variable (y) and returning NAs otherwise:
t(sapply(DF[-6], function(y) if (any(!is.na(y))) coef(lm(y ~ x, DF)) else c(NA, NA)))
This gives the following where column 1 is the intercepts and column 2 is the slopes:
[,1] [,2]
y1 NA NA
y2 82.00000 3.300000
y3 87.50000 -1.500000
y4 84.00000 3.300000
y5 63.85714 2.928571
If only the slopes are needed then:
matrix(sapply(DF[-6], function(y) if (any(!is.na(y))) coef(lm(y ~ x, DF))[2] else NA))
#This is for the slope only.
nn<-lapply(paste0("y",1:5),function(i){
if (all(is.na(y[[i]]))==FALSE) {bb<-lm(y[[i]]~x,data=y)
return(bb[[1]][2])
}else{
return(NA)
}
})
do.call(rbind,kk)
x
[1,] NA
[2,] 3.300000
[3,] -1.500000
[4,] 3.300000
[5,] 2.928571
do.call(rbind,nn)