I have following data:
x1 = sample(1:10, 100, replace=T)
x2 = sample(1:3, 100, replace=T)
x3 = sample(50:100, 100, replace=T)
y1 = sample(50:100, 100, replace=T)
y2 = sample(50:100, 100, replace=T)
mydf = data.frame(x1,x2,x3,y1,y2)
head(mydf)
x1 x2 x3 y1 y2
1 2 2 96 100 73
2 5 2 77 93 52
3 10 1 86 54 80
4 3 2 98 59 94
5 2 2 85 94 85
6 9 2 56 79 99
I have following data:
I want to do correlations and produce following output:
x1 x2 x3
y1 r.value; p.value r.value; p.value r.value; p.value
y2 r.value; p.value r.value; p.value r.value; p.value
R value needs to be rounded to 2 digits and p_value to 3 digits.
How can this be done? Thanks for your help.
I tried following:
library(Hmisc)
res = rcorr(as.matrix(mydf), type="pearson")
res
x1 x2 x3 y1 y2
x1 1.00 -0.01 -0.16 -0.28 -0.21
x2 -0.01 1.00 -0.20 -0.10 -0.13
x3 -0.16 -0.20 1.00 0.14 -0.09
y1 -0.28 -0.10 0.14 1.00 0.12
y2 -0.21 -0.13 -0.09 0.12 1.00
n= 100
P
x1 x2 x3 y1 y2
x1 0.9520 0.1089 0.0047 0.0364
x2 0.9520 0.0444 0.3463 0.1887
x3 0.1089 0.0444 0.1727 0.3948
y1 0.0047 0.3463 0.1727 0.2482
y2 0.0364 0.1887 0.3948 0.2482
matrix(paste0(round(res[[1]][,1:3],2),';',round(res[[3]][1:2,],4)),ncol=3)
[,1] [,2] [,3]
[1,] "1;NA" "-0.01;0.0444" "-0.16;NA"
[2,] "-0.01;0.952" "1;0.0047" "-0.2;0.952"
[3,] "-0.16;0.952" "-0.2;0.3463" "1;0.952"
[4,] "-0.28;NA" "-0.1;0.0364" "0.14;NA"
[5,] "-0.21;0.1089" "-0.13;0.1887" "-0.09;0.1089"
But the combination is not correct.
You can also do the following, which doesn't need to precise the positions of rows/columns you need :
matrix(paste(unlist(round(res[[1]],2)),unlist(round(res[[3]],3)),sep=";"),
nrow=nrow(res[[1]]),dimnames=dimnames(res[[1]]))
update : I added a dimnames parameter so the dimnames are "transmitted" to the result matrix.
For example, with the random sampling I had, you'll get :
x1 x2 x3 y1 y2
x1 "1;NA" "-0.2;0.052" "0.02;0.833" "-0.04;0.674" "0.02;0.819"
x2 "-0.2;0.052" "1;NA" "-0.13;0.202" "-0.01;0.896" "0.05;0.653"
x3 "0.02;0.833" "-0.13;0.202" "1;NA" "-0.05;0.636" "-0.13;0.185"
y1 "-0.04;0.674" "-0.01;0.896" "-0.05;0.636" "1;NA" "-0.02;0.858"
y2 "0.02;0.819" "0.05;0.653" "-0.13;0.185" "-0.02;0.858" "1;NA"
Try
r2 <- matrix(0, ncol=3, nrow=2,
dimnames=list( paste0('y',1:2), paste0('x',1:3)))
r2[] <- paste(round(res$r[4:5,1:3],2), round(res$P[4:5,1:3],4), sep="; ")
Update
You could create a function like below
f1 <- function(df){
df1 <- df[order(colnames(df))]
indx <- sub('\\d+', '', colnames(df1))
indx1 <- which(indx[-1]!= indx[-length(indx)])
indx2 <- (indx1+1):ncol(df1)
r2 <- matrix(0, ncol=indx1, nrow=(ncol(df1)-indx1),
dimnames=list(colnames(df1)[indx2], colnames(df1)[1:indx1]))
r1 <- rcorr(as.matrix(df1), type='pearson')
r2[] <- paste(round(r1$r[indx2,1:indx1],2), round(r1$P[indx2,1:indx1],4),
sep="; ")
r2
}
f1(mydf) #using your dataset (`set.seed` is different)
# x1 x2 x3
#y1 "0.07; 0.4773" "0.02; 0.84" "0.21; 0.0385"
#y2 "-0.08; 0.4363" "0.08; 0.4146" "0.02; 0.8599"
Testing with unordered dataset
f1(mydf1)
# x1 x2 x3 x4
#y1 "-0.08; 0.4086" "0.17; 0.0945" "-0.25; 0.0112" "-0.16; 0.1025"
#y2 "0.07; 0.5174" "-0.1; 0.3054" "0.03; 0.7478" "-0.06; 0.5776"
Update2
If you want a function to have the numeric index argument
f2 <- function(df, v1, v2){
r2 <- matrix(0, nrow=length(v2), ncol=length(v1),
dimnames=list(colnames(df)[v2], colnames(df)[v1]))
r1 <- rcorr(as.matrix(df), type='pearson')
r2[] <- paste(round(r1$r[v2,v1],2), round(r1$P[v2,v1],4), sep="; ")
r2
}
f2(mydf, 1:3, 4:5)
f2(mydf, c(1,3), c(2,4,5))
data
set.seed(29)
x1 = sample(1:10, 100, replace=T)
x2 = sample(1:3, 100, replace=T)
x3 = sample(50:100, 100, replace=T)
x4 <- sample(40:80, 100, replace=TRUE)
y1 = sample(50:100, 100, replace=T)
y2 = sample(50:100, 100, replace=T)
mydfN = data.frame(x1,x2,x3,x4, y1,y2)
set.seed(25)
mydf1 <- mydfN[sample(colnames(mydfN))]
Related
I'm trying to make a new matrix using values from other matrix with R. I'm trying to match the names of rows and columns while importing the values. This is what what trying to do:
I have two matrices;
X1 X2 X3 X4
X1 0 9 8 0
X2 1 2 3 5
X4 6 1 2 4
X1 X2 X3 X4
X1 NA NA NA NA
X2 NA NA NA NA
X3 NA NA NA NA
X4 NA NA NA NA
I want to do
X1 X2 X3 X4
X1 0 9 8 0
X2 1 2 3 5
X3 NA NA NA NA
X4 6 1 2 4
These matrices are just simple examples of my dataset, my real data is more complicated.
Many thanks,
checking for rownames and colnames matches in both matrices will prevent subscript out of bounds error. See below.
mat2[rownames(mat2) %in% rownames(mat1),
colnames(mat2) %in% colnames(mat1)] <- mat1[rownames(mat1) %in% rownames(mat2),
colnames(mat1) %in% colnames(mat2)]
mat2
# X1 X2 X3 X4
# X1 0 9 8 0
# X2 1 2 3 5
# X3 NA NA NA NA
# X4 6 1 2 4
Data:
mat1 <- read.table(text = ' X1 X2 X3 X4
X1 0 9 8 0
X2 1 2 3 5
X4 6 1 2 4', header = TRUE)
mat1 <- as.matrix(mat1)
mat2 <- matrix(NA, nrow = 4, ncol = 4, dimnames = list(paste0("X", 1:4),
paste0("X", 1:4)))
If I understood your question you can do this:
# Building your matrices
mat1 <- matrix(runif(12), nrow = 3, ncol = 4)
mat2 <- matrix(NA, nrow = 4, ncol = 4)
labs <- paste0("x", 1:4)
colnames(mat1) <- colnames(mat2) <- labs
rownames(mat2) <- labs
rownames(mat1) <- labs[c(1:2, 4)]
#
rows <- sort(unique(c(rownames(mat1), rownames(mat2))))
result <- matrix(NA, nrow = length(rows), ncol = ncol(mat1))
result[match(rownames(mat1), rows), ] <- mat1
As one option for model selection for MCMCglmm (see also this related question) I am trying out model averaging using the package MuMIn. It doesn't seem to work - see output below. Any ideas why? The output looks nonsense. In particular, there are a bunch of NA values for z values, and where these are not NA, they are all exactly 1. This may stem from the fact that all but one model has been assigned a weight of 0, which again seem unrealistic.
Note that in the documentation for MuMIn, it is listed as being compatible with MCMCglmm objects.
Reproducible example:
set.seed(1234)
library(MCMCglmm)
data(bird.families)
n <- Ntip(bird.families)
# Create some dummy variables
d <- data.frame(taxon = bird.families$tip.label,
X1 = rnorm(n),
X2 = rnorm(n),
X3 = sample(c("A", "B", "C"), n, replace = T),
X4 = sample(c("A", "B", "C"), n, replace = T))
# Simulate a phenotype composed of phylogenetic, fixed and residual effects
d$phenotype <- rbv(bird.families, 1, nodes="TIPS") +
d$X1*0.7 +
ifelse(d$X3 == "B", 0.5, 0) +
ifelse(d$X3 == "C", 0.8, 0) +
rnorm(n, 0, 1)
# Inverse matrix of shared phyloegnetic history
Ainv <- inverseA(bird.families)$Ainv
# Set priors
prior <- list(R = list(V = 1, nu = 0.002),
G = list(G1 = list(V = 1, nu = 0.002)))
uMCMCglmm <- updateable(MCMCglmm)
model <- uMCMCglmm(phenotype ~ X1 + X2 + X3 + X4,
random = ~taxon,
ginverse = list(taxon=Ainv),
data = d,
prior = prior,
verbose = FALSE)
# Explore possible simplified models
options(na.action = "na.fail")
dred <- dredge(model)
# Calculate a model average
avg <- model.avg(dred)
summary(avg)
Output:
Call:
model.avg(object = dred)
Component model call:
uMCMCglmm(fixed = phenotype ~ <16 unique rhs>, random = ~taxon, data = d,
prior = prior, verbose = FALSE, ginverse = list(taxon = Ainv))
Component models:
df logLik AICc delta weight
3 5 -49.24 108.93 0.00 1
4 5 -71.18 152.82 43.89 0
(Null) 3 -76.98 160.13 51.20 0
34 7 -90.35 195.56 86.63 0
23 6 -95.03 202.71 93.78 0
24 6 -105.79 224.22 115.29 0
1 4 -134.87 278.04 169.11 0
123 7 -137.36 289.59 180.66 0
2 4 -154.82 317.93 209.00 0
234 8 -162.69 342.51 233.58 0
13 6 -167.74 348.12 239.19 0
124 7 -171.06 356.99 248.05 0
14 6 -172.53 357.70 248.77 0
134 8 -171.60 360.33 251.40 0
12 5 -181.16 372.78 263.84 0
1234 9 -189.33 398.07 289.14 0
Term codes:
X1 X2 X3 X4
1 2 3 4
Model-averaged coefficients:
(full average)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.642e-01 NA NA NA
X3B 6.708e-01 6.708e-01 1 0.317
X3C 9.802e-01 9.802e-01 1 0.317
X4B -9.505e-11 9.505e-11 1 0.317
X4C -7.822e-11 7.822e-11 1 0.317
X2 -3.259e-22 3.259e-22 1 0.317
X1 1.378e-37 1.378e-37 1 0.317
(conditional average)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.76421 NA NA NA
X3B 0.67078 NA NA NA
X3C 0.98025 NA NA NA
X4B -0.32229 NA NA NA
X4C -0.26522 NA NA NA
X2 -0.07528 NA NA NA
X1 0.72300 NA NA NA
Relative variable importance:
X3 X4 X2 X1
Importance: 1 <0.01 <0.01 <0.01
N containing models: 8 8 8 8
I would like to ask help on distance measures for continuous variables
There is an example:
x1 = (0,0)
x2 = (1,0)
x3 = (5,5)
The example is to find the distance matrix for L1-norm and L2-norm(Euclidean).
I don't know how to compute in R to get the following answer:
I have tried to do it like this but it didn't work as expected.
y2 <- c(0,0)
y3 <- c(1,0)
y4 <- c(5,5)
y5 <- rbind(y2,y3,y4)
dist(y5)
y2 <- c(0,0)
y3 <- c(1,0)
y4 <- c(5,5)
mat <- rbind(y2, y3, y4)
d1 <- dist(mat, upper=TRUE, diag=TRUE, method="manhattan")
d1
# y2 y3 y4
# y2 0 1 10
# y3 1 0 9
# y4 10 9 0
d2 <- dist(mat, upper=TRUE, diag=TRUE)^2
d2
# y2 y3 y4
# y2 0 1 50
# y3 1 0 41
# y4 50 41 0
I have two data frames:
>df1
type id1 id2 id3 count1 count2 count3
a x1 y1 z1 10 20 0
b x2 y2 z2 20 0 30
c x3 y3 z3 10 10 10
>df2
id prop
x1 10
x2 5
x3 100
y1 0
y2 50
y3 80
z1 10
z2 20
z3 30
count* are like weights. So, finally I want to join the table such that TotalProp is weighted sum of prop and counts
For e.g. for the first row in df1 TotalProp = 10(prop for x1) * 10(count1) + 0(Prop for y1) * 20(count2) + 10(Prop for z1) * 0(count3) = 100
Hence my final table looks like this:
>result
type id1 id2 id3 TotalProp
a x1 y1 z1 100
b x2 y2 z2 700
c x3 y3 z3 2100
Any idea how can I do this?
Thanks.
One line solution first and then explanation using multiple steps
df1
## type id1 id2 id3 count1 count2 count3
## 1 a x1 y1 z1 10 20 0
## 2 b x2 y2 z2 20 0 30
## 3 c x3 y3 z3 10 10 10
df2
## id prop
## x1 x1 10
## x2 x2 5
## x3 x3 100
## y1 y1 0
## y2 y2 50
## y3 y3 80
## z1 z1 10
## z2 z2 20
## z3 z3 30
rownames(df2) <- df2$id
result <- data.frame(type = df1$type, TotalProp = rowSums(matrix(df2[unlist(df1[, c("id1", "id2", "id3")]), "prop"], nrow = nrow(df1)) * as.matrix(df1[,
c("count1", "count2", "count3")])))
result
## type TotalProp
## 1 a 100
## 2 b 700
## 3 c 2100
Stepwise explanation
First we get all the id values in a vector for which we want to fetch corresponding prop values from df2
Step 1
unlist(df1[, c("id1", "id2", "id3")])
## id11 id12 id13 id21 id22 id23 id31 id32 id33
## "x1" "x2" "x3" "y1" "y2" "y3" "z1" "z2" "z3"
Step 2
We name the rows of df2 with df2$id.
rownames(df2) <- df2$id
Step 3
Then using result from step 1, we get corresponding prop values
df2[unlist(df1[, c("id1", "id2", "id3")]), "prop"]
## [1] 10 5 100 0 50 80 10 20 30
Step 4
Convert the vector from step 3 back to 2 dimensional form
matrix(df2[unlist(df1[, c("id1", "id2", "id3")]), "prop"], nrow = nrow(df1))
## [,1] [,2] [,3]
## [1,] 10 0 10
## [2,] 5 50 20
## [3,] 100 80 30
Step 5
Multiply result of Step 4 with counts from df1
as.matrix(df1[, c("count1", "count2", "count3")])
## count1 count2 count3
## [1,] 10 20 0
## [2,] 20 0 30
## [3,] 10 10 10
matrix(df2[unlist(df1[, c("id1", "id2", "id3")]), "prop"], nrow = nrow(df1)) *
as.matrix(df1[, c("count1", "count2", "count3")])
## count1 count2 count3
## [1,] 100 0 0
## [2,] 100 0 600
## [3,] 1000 800 300
Step 6
Apply rowSums to result from step 5 to get desired TotalProp values
rowSums(matrix(df2[unlist(df1[,c('id1','id2','id3')]),'prop'], nrow=nrow(df1)) * as.matrix(df1[,c('count1', 'count2', 'count3')]))
## [1] 100 700 2100
My solution relies on the data structure, so it is not universal, but short.
m1 <- matrix(df[, tail(names(df1), 3)])
m2 <- matrix(df2$prop, 3)
rowSums(m1 * m2)
[1] 100 700 2100
It does not use ids whatsoever, so be careful!
And another way...
TotalProp <- apply(df1,1,function(x) {
sapply(x[2:4],function(x)df2[df2$id==x,]$prop) %*% as.numeric(x[5:7])
})
result <- cbind(df1[1:4],TotalProp)
%*% is the inner product operator, which is like rowsum, so this is somewhat like #ChinmayPatil's answer. So the steps are:
For each row in df1, extract the elements of df2 which have id = cols 2:4 of df1
Form the inner product of the vector from 1 with the vector formed from cols 5:7 of df1
Repeat for each row of df1 [apply(df1,1, ...)]
I have a 3-dimensional array, the variables being x, y and z. x is a list of places, y is a list of time, and z is a list of names. The list of names do not start at the same initial time across the places:
x y z
x1 1 NA
x1 2 z2
x1 3 z3
x1 4 z1
x2 1 NA
x2 2 NA
x2 3 z5
x2 4 z3
x3 1 z3
x3 2 z1
x3 3 z2
x3 4 z2
How do I find the first z for every x? I want the output matrix or dataframe to be:
x z
x1 z2
x2 z5
x3 z3
EDITED, after example data was supplied
You can use function ddply() in package plyr
dat <- "x y z
x1 1 NA
x1 2 z2
x1 3 z3
x1 4 z1
x2 1 NA
x2 2 NA
x2 3 z5
x2 4 z3
x3 1 z3
x3 2 z1
x3 3 z2
x3 4 z2"
df <- read.table(textConnection(dat), header=TRUE, stringsAsFactors=FALSE)
library(plyr)
ddply(df, .(x), function(x)x[!is.na(x$z), ][1, "z"])
x V1
1 x1 z2
2 x2 z5
3 x3 z3
If you don't want to use plyr
t(data.frame(lapply(split(df, as.factor(df$x)), function(k) head(k$z[!is.na(k$z)], 1))))
[,1]
x1 "z2"
x2 "z5"
x3 "z3"