I have a table similar to this one, and want to calculate the ratio between column A and B. For example:
A B C D E F
[1,] 187 174 183 115 101 104
[2,] 451 166 177 842 101 133
[3,] 727 171 187 12803 98 134
[4,] 1532 181 196 730 98 108
[5,] 4139 188 214 20358 105 159
[6,] 689 185 211 1633 110 162
[7,] 1625 184 195 2283 109 114
[8,] 771 181 190 904 105 110
[9,] 950 177 190 1033 106 112
[10,] 703 180 191 463 106 110
[11,] 2052 178 188 2585 100 105
[12,] 1161 178 187 2874 99 110
[13,] 214 175 184 173 98 110
[14,] 473 184 191 971 104 111
[15,] 756 185 193 14743 107 114
I want to create a new matrix that has all of those previous rows as new rows and columns (15 rows and 15 columns) like so (values in parentheses are placeholders for the calculated ratios):
[,1] [,2] [,3] [,4]
[1,] (A1:B1) (A1:B2) (A1:B3) (A1:B4) ...
[2,]
[3,]
[4,]
...
That is maybe not the best example, but I hope it is not too confusing.
To calculate the ratios A1:B1, A2:B2, A3:B3 I could do something like:
data.matrix(data["A"]/data["B"])
And to do it for all, I would do something like:
data.matrix(data[1,]/data[1,1])
data.matrix(data[1,]/data[1,2])
...
and so on.
This seems to be a lot of work and maybe someone knows a quicker and more efficient method.
EDIT
I thought the combn function would work, but then I figured out it doesn't. When I have a 2 column matrix, such as:
A B
[1,] 187 115
[2,] 451 842
[3,] 727 12803
[4,] 1532 730
[5,] 4139 20358
[6,] 689 1633
[7,] 1625 2283
[8,] 771 904
[9,] 950 1033
[10,] 703 463
[11,] 2052 2585
[12,] 1161 2874
[13,] 214 173
[14,] 473 971
[15,] 756 14743
And I use the combn function to calculate all possible ratios (A1:B1, A1:B2, ... A2:B1, A2:B2...) I get just the result for A1 vs all values of B.
> combn(ncol(data), 2, function(x) data[,x[1]]/data[,x[2]])
[,1]
[1,] 1.62608696
[2,] 0.53562945
[3,] 0.05678357
[4,] 2.09863014
[5,] 0.20331074
[6,] 0.42192284
[7,] 0.71178274
[8,] 0.85287611
[9,] 0.91965150
[10,] 1.51835853
[11,] 0.79381044
[12,] 0.40396660
[13,] 1.23699422
[14,] 0.48712667
[15,] 0.05127857
Or maybe I just don't understand the combn function and I am doing something wrong here.
You can achieve what you want by using expand.grid, apply and matrix functions as below
I am assuming what you want is matrix like
A1/B1 A1/B2 A1/B3 ...
A2/B1 A2/B2 A2/B3 ...
... ... ... ...
... ... ... ...
Here is the code to do that. Explanation in comments
txt <- "A B C D E F\n187 174 183 115 101 104\n451 166 177 842 101 133\n727 171 187 12803 98 134\n1532 181 196 730 98 108\n4139 188 214 20358 105 159\n689 185 211 1633 110 162\n1625 184 195 2283 109 114\n771 181 190 904 105 110\n950 177 190 1033 106 112\n703 180 191 463 106 110\n2052 178 188 2585 100 105\n1161 178 187 2874 99 110\n214 175 184 173 98 110\n473 184 191 971 104 111\n756 185 193 14743 107 114"
data <- as.matrix(read.table(textConnection(txt), header = TRUE))
# expand.grid : creates every combination of one element each from column A and
# B with elements of B repeated first
# apply : calls function(x) { x[1]/x[2]) } for every combination outputted by
# expand.grid
# matrix : converts the result of apply into matrix. dimnames arguments sets
# rownames and colnames for easy verification for us
result <- matrix(apply(expand.grid(data[, "A"], data[, "B"]), 1, function(x) x[1]/x[2]),
nrow = nrow(data), dimnames = list(data[, "A"], data[, "B"]))
# note that we have set rownames for result to be values of A and colnames for
# result to be value of B
result
## 174 166 171 181 188 185 184
## 187 1.074713 1.126506 1.093567 1.033149 0.9946809 1.010811 1.016304
## 451 2.591954 2.716867 2.637427 2.491713 2.3989362 2.437838 2.451087
## 727 4.178161 4.379518 4.251462 4.016575 3.8670213 3.929730 3.951087
## 1532 8.804598 9.228916 8.959064 8.464088 8.1489362 8.281081 8.326087
## 4139 23.787356 24.933735 24.204678 22.867403 22.0159574 22.372973 22.494565
## 689 3.959770 4.150602 4.029240 3.806630 3.6648936 3.724324 3.744565
## 1625 9.339080 9.789157 9.502924 8.977901 8.6436170 8.783784 8.831522
## 771 4.431034 4.644578 4.508772 4.259669 4.1010638 4.167568 4.190217
## 950 5.459770 5.722892 5.555556 5.248619 5.0531915 5.135135 5.163043
## 703 4.040230 4.234940 4.111111 3.883978 3.7393617 3.800000 3.820652
## 2052 11.793103 12.361446 12.000000 11.337017 10.9148936 11.091892 11.152174
## 1161 6.672414 6.993976 6.789474 6.414365 6.1755319 6.275676 6.309783
## 214 1.229885 1.289157 1.251462 1.182320 1.1382979 1.156757 1.163043
## 473 2.718391 2.849398 2.766082 2.613260 2.5159574 2.556757 2.570652
## 756 4.344828 4.554217 4.421053 4.176796 4.0212766 4.086486 4.108696
## 181 177 180 178 178 175 184
## 187 1.033149 1.056497 1.038889 1.050562 1.050562 1.068571 1.016304
## 451 2.491713 2.548023 2.505556 2.533708 2.533708 2.577143 2.451087
## 727 4.016575 4.107345 4.038889 4.084270 4.084270 4.154286 3.951087
## 1532 8.464088 8.655367 8.511111 8.606742 8.606742 8.754286 8.326087
## 4139 22.867403 23.384181 22.994444 23.252809 23.252809 23.651429 22.494565
## 689 3.806630 3.892655 3.827778 3.870787 3.870787 3.937143 3.744565
## 1625 8.977901 9.180791 9.027778 9.129213 9.129213 9.285714 8.831522
## 771 4.259669 4.355932 4.283333 4.331461 4.331461 4.405714 4.190217
## 950 5.248619 5.367232 5.277778 5.337079 5.337079 5.428571 5.163043
## 703 3.883978 3.971751 3.905556 3.949438 3.949438 4.017143 3.820652
## 2052 11.337017 11.593220 11.400000 11.528090 11.528090 11.725714 11.152174
## 1161 6.414365 6.559322 6.450000 6.522472 6.522472 6.634286 6.309783
## 214 1.182320 1.209040 1.188889 1.202247 1.202247 1.222857 1.163043
## 473 2.613260 2.672316 2.627778 2.657303 2.657303 2.702857 2.570652
## 756 4.176796 4.271186 4.200000 4.247191 4.247191 4.320000 4.108696
## 185
## 187 1.010811
## 451 2.437838
## 727 3.929730
## 1532 8.281081
## 4139 22.372973
## 689 3.724324
## 1625 8.783784
## 771 4.167568
## 950 5.135135
## 703 3.800000
## 2052 11.091892
## 1161 6.275676
## 214 1.156757
## 473 2.556757
## 756 4.086486
Edit: I seem to have misunderstood the question. The answer is even more simpler using outer:
# gives the same 15*15 matrix as geektrader's
outer(mm[,1], mm[,2], '/')
Old answer (not correct):
You should use combn:
# combn(ncol(mm), 2) gives you all possible combinations
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
# [1,] 1 1 1 1 1 2 2 2 2 3 3 3 4 4 5
# [2,] 2 3 4 5 6 3 4 5 6 4 5 6 5 6 6
# it also accepts a function argument. we can use it to divide
# respective columns
mm.div <- combn(ncol(mm), 2, function(x) mm[,x[1]]/mm[,x[2]])
# set column names the matrix
colnames(mm.div) <- combn(colnames(mm), 2, paste, collapse="")
I might be completely missing the point here, but why not just use a couple for loops? I wrote a quick function, then you could pass the pairs to.
For example:
A <- rnorm(15)
B <- rnorm(15)
data <- data.frame(A,B)
ratio <- function(input1, input2){
out <- matrix(0, nrow=length(input1), ncol=length(input1))
k <- 1
for (i in 1:length(input1)){
for (j in 1:length(input1)){
out[k, j] <- input1[k] / input2[j]
}
k <- k + 1
}
return(out)
}
ratio(data$A, data$B)
EDIT
Another thought. To then use the function to do all possible pairs of ratios, you could simply add another for loop, like this:
combs <- combn(1:4, 2)
out <- list()
for (i in 1:(length(combs)/2)){
out[[i]] <- ratio(data[,combs[1,i]], data[,combs[2,i]])
}
Hope that helps!
Related
I am trying to calculate my data's means by populating a new dataframe with data corrected by my experiment's blank.
So far, I have created my new data frame:
data_mean <- data.frame(matrix(ncol = 17, # As many columns as experimental conditions plus one for "Time(h)"
nrow = nrow(data)))
Copied the data corresponding to time:
data_mean[,1] <- data[,1]
And attempted to populate the dataframe by assigning the mean of every condition minus the mean of the blanks to each column:
data_mean[,2] <- rowMeans(data[,5:8])-rowMeans(data[,2:4])
data_mean[,3] <- rowMeans(data[,9:12])-rowMeans(data[,2:4])
data_mean[,4] <- rowMeans(data[,13:16])-rowMeans(data[,2:4])
data_mean[,5] <- rowMeans(data[,17:20])-rowMeans(data[,2:4])
and so on.
Is there an easier way to do this rather than typing the same code over and over?
res <- sapply(split.default(data[, -1], seq(ncol(data) - 1)%/%4), rowSums)
res[,-1] - res[,1] # Should give you all the differences above
example:
data <- data.frame(matrix(1:200, 10))
res <- sapply(split.default(data[, -1], seq(ncol(data) - 1)%/%4), rowSums)
res[,-1] - res[,1]
1 2 3 4
[1,] 161 321 481 641
[2,] 162 322 482 642
[3,] 163 323 483 643
[4,] 164 324 484 644
[5,] 165 325 485 645
[6,] 166 326 486 646
[7,] 167 327 487 647
[8,] 168 328 488 648
[9,] 169 329 489 649
[10,] 170 330 490 650
and you can check:
rowSums(data[, 5:8]) - rowSums(data[,2:4])
[1] 161 162 163 164 165 166 167 168 169 170 # first column
rowSums(data[, 9:12]) - rowSums(data[,2:4])
[1] 321 322 323 324 325 326 327 328 329 330 # second column
I am having a problem with summing the rows of my matrices. I have a list formed by 30 matrices
Matrix<-matrix(1:45, ncol=9)
List<-list(lapply(seq_len(30), function(X) Matrix))
The idea is to create 30 matrices size 5*3. Firstly, I need to sum some columns, 1:3 4:6 7:9, such that the result will be the following:
[,1] [,2] [,3]
[1,] 18 63 108
[2,] 21 66 111
[3,] 34 69 114
[4,] 47 72 117
[5,] 30 75 120
I am trying to get this matrix using this code:
Y<-lapply(List, function(x) rowSums(x[, 1:3]))
But, it only allows me to sum the 3 firsts columns.
After this, I need to sum the list and obtain only one matrix(5*3). I think that the command final<-reduce(Y,+) could help.
540 1890 3240
630 1980 3330
1020 2070 3420
1410 2160 3510
900 2250 3600
Thank you for your help
You need to find someway to group your columns by threes, for example:
grp = (1:ncol(Matrix) -1) %/% 3
or if you know the dimensions:
grp = rep(0:2,each=3)
To do rowSums in columns of threes, we can do this with a function:
SumCols = function(M,col_grp){
sapply(unique(col_grp),function(i)rowSums(M[,col_grp==i]))
}
SumCols(Matrix,grp)
[,1] [,2] [,3]
[1,] 18 63 108
[2,] 21 66 111
[3,] 24 69 114
[4,] 27 72 117
[5,] 30 75 120
So put this inside your List of matrices,
Reduce("+",lapply(List[[1]],SumCols,grp))
[,1] [,2] [,3]
[1,] 540 1890 3240
[2,] 630 1980 3330
[3,] 720 2070 3420
[4,] 810 2160 3510
[5,] 900 2250 3600
Here is another base R solution
out <- Reduce(`+`,Map(function(x) do.call(cbind,Map(rowSums, split.default(data.frame(x),ceiling(seq(ncol(x))/3)))),List[[1]]))
such that
> out
0 1 2
[1,] 540 1890 3240
[2,] 630 1980 3330
[3,] 720 2070 3420
[4,] 810 2160 3510
[5,] 900 2250 3600
I am trying to develop a hierarchical Dirichlet-multinomial process hidden Markov model in JAGS to estimate multiparty, primary voting intention based on opinion poll results. I also use the primary vote estimate to calculate a two-party preferred vote-share under Australia's preferential voting system.
A dmulti() multinomial distribution has failed with the run-time error message: Unable to find appropriate sampler. I have a work-around that uses a series of binomial distributions and a sum-to-N constraint. Theoretically, this should produce the same result, but it creates space and time inefficienncy in the model.
My question is whether there is something I can do in the hidden, temporal part of the model below to make the multinomial distribution work.
The model (and surrounding R code) follows:
data = list(PERIOD = PERIOD,
HOUSECOUNT = HOUSECOUNT,
NUMPOLLS = NUMPOLLS,
PARTIES = PARTIES,
primaryVotes = primaryVotes,
pollWeek = df$Week,
house = as.integer(df$House),
# manage rounding issues with df$Sample ...
n = rowSums(primaryVotes),
preference_flows = preference_flows
)
print(data)
# ----- JAGS model ...
library(rjags)
model <- "
model {
#### -- observational model
for(poll in 1:NUMPOLLS) { # for each poll result - rows
adjusted_poll[poll, 1:PARTIES] <- walk[pollWeek[poll], 1:PARTIES] +
houseEffect[house[poll], 1:PARTIES]
primaryVotes[poll, 1:PARTIES] ~ dmulti(adjusted_poll[poll, 1:PARTIES], n[poll])
}
#### -- temporal model (a weekly walk where this week is much like last week)
#tightness <- 30000 # KLUDGE: value selected by trial and error to look like DLM
t ~ dunif(1000, 100000) # less kludgy - let the model decide
tightness <- round(t)
for(week in 2:PERIOD) { # rows
# This results in a JAGS runtime error: Unable to find appropriate sampler
#multinomial[week, 1:PARTIES] ~ dmulti( walk[week-1, 1:PARTIES], tightness)
# This is the KLUDGE to approximate the above ...
# Should be the same theoretically ...
# but results in a larger directed acyclic graph (DAG)
for(party in 2:PARTIES) {
multinomial[week, party] ~ dbin(walk[week-1, party], tightness)
}
multinomial[week, 1] <- tightness - sum(multinomial[week, 2:PARTIES])
# The other part of the Dirichlet-Multinomial process
walk[week, 1:PARTIES] ~ ddirch(multinomial[week, 1:PARTIES])
}
## -- weakly informative priors for first week in the temporal model
for (party in 1:2) { # for each major party
alpha[party] ~ dunif(250, 600) # majors between 25% and 60%
}
for (party in 3:PARTIES) { # for each minor party
alpha[party] ~ dunif(10, 250) # minors between 1% and 25%
}
walk[1, 1:PARTIES] ~ ddirch(alpha[])
## -- estimate a Coalition TPP from the primary votes
for(week in 1:PERIOD) {
CoalitionTPP[week] <- sum(walk[week, 1:PARTIES] *
preference_flows[1:PARTIES])
}
#### -- sum-to-zero constraints on house effects
for (party in 2:PARTIES) { # for each party ...
# house effects across houses sum to zero
# NOTE: ALL MUST SUM TO ZERO
houseEffect[1, party] <- -sum( houseEffect[2:HOUSECOUNT, party] )
}
for(house in 1:HOUSECOUNT) { # for each house ...
# house effects across the parties sum to zero
houseEffect[house, 1] <- -sum( houseEffect[house, 2:PARTIES] )
}
# but note, we do not apply a double constraint to houseEffect[1, 1]
monitorHouseEffectOneSumParties <- sum(houseEffect[1, 1:PARTIES])
monitorHouseEffectOneSumHouses <- sum(houseEffect[1:HOUSECOUNT, 1])
## -- vague normal priors for house effects - centred on zero
for (party in 2:PARTIES) { # for each party (cols)
for(house in 2:HOUSECOUNT) { # (rows)
houseEffect[house, party] ~ dnorm(0, pow(0.1, -2))
}
}
}
"
jags <- jags.model(textConnection(model),
data = data,
n.chains=4,
n.adapt=n_adapt
)
The input data for the model over six months follows.
$PERIOD
[1] 27
$HOUSECOUNT
[1] 5
$NUMPOLLS
[1] 37
$PARTIES
[1] 4
$primaryVotes
Coalition Labor Greens Other
[1,] 390 375 120 115
[2,] 407 407 143 143
[3,] 532 574 154 140
[4,] 560 518 168 154
[5,] 350 410 115 125
[6,] 439 450 139 127
[7,] 385 385 95 135
[8,] 375 395 120 110
[9,] 1465 1483 417 325
[10,] 504 602 154 140
[11,] 532 560 154 154
[12,] 504 602 154 140
[13,] 355 415 120 110
[14,] 412 483 141 141
[15,] 1345 1450 392 312
[16,] 375 405 100 120
[17,] 448 448 142 142
[18,] 588 504 168 140
[19,] 390 380 115 115
[20,] 441 453 139 128
[21,] 380 400 110 110
[22,] 471 425 126 126
[23,] 957 979 278 205
[24,] 405 360 125 110
[25,] 546 532 182 126
[26,] 471 413 126 138
[27,] 385 380 120 115
[28,] 1008 995 301 228
[29,] 400 375 115 110
[30,] 457 410 141 164
[31,] 690 656 185 151
[32,] 603 491 182 126
[33,] 415 355 125 105
[34,] 464 429 139 128
[35,] 1307 1218 385 273
[36,] 410 370 130 90
[37,] 479 433 152 105
$pollWeek
[1] 1 1 2 2 3 3 7 9 9 10 10 11 11 11 11 13 13 14 15 15 17 17 18 19 20
[26] 20 21 22 23 23 25 25 25 25 25 27 27
$house
[1] 3 4 1 2 3 4 3 3 5 1 2 1 3 4 5 3 4 2 3 4 3 4 5 3 2 4 3 5 3 4 1 2 3 4 5 3 4
$n
[1] 1000 1100 1400 1400 1000 1155 1000 1000 3690 1400 1400 1400 1000 1177 3499
[16] 1000 1180 1400 1000 1161 1000 1148 2419 1000 1386 1148 1000 2532 1000 1172
[31] 1682 1402 1000 1160 3183 1000 1169
$preference_flows
[1] 1.0000 0.0000 0.1697 0.5330
A comparison of the output (compared with other models I have) follows. The red line in the next chart was generated from the above.
Let's say I have created the following matrix:
> x <- matrix(1:20000,nrow=100)
> x[1:10,1:10]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 101 201 301 401 501 601 701 801 901
[2,] 2 102 202 302 402 502 602 702 802 902
[3,] 3 103 203 303 403 503 603 703 803 903
[4,] 4 104 204 304 404 504 604 704 804 904
[5,] 5 105 205 305 405 505 605 705 805 905
[6,] 6 106 206 306 406 506 606 706 806 906
[7,] 7 107 207 307 407 507 607 707 807 907
[8,] 8 108 208 308 408 508 608 708 808 908
[9,] 9 109 209 309 409 509 609 709 809 909
[10,] 10 110 210 310 410 510 610 710 810 910
What are the methods in R to change row and column names? For example, I like row names to be SS1, SS2, ..., SS100 and column names to be M1, M2, ..., M200. I usually work with data with 1000s of rows and columns, and I need a good method to do that. Some people use something like attributes(x)$dimnames <- list(...) and some use rownames <- paste(...). What are all possible methods?
My second question is, can I use the same methods after I convert the matrix to a data frame?
From comment to answer:
row.names(x) <- paste("SS", 1:nrow(x), sep="")
colnames(x) <- paste("M" , 1:ncol(x), sep="")
As #doug wrote, it works for matrices and data frames.
Yes same methods will work (matrix/data.frame)--see below:
A = matrix(1:12, nrow=4)
colnames(A) = c("col1", "col2", "col3")
row.names(A) = c("row1", "row2", "row3", "row4")
dfA = as.data.frame(A)
row.names(dfA) = c("r1", "r2", "r3", "r4")
colnames(A) = c("C1", "C2", "C3")
And to save time, you can do this:
x = rep("col", dim(M)[2])
y = 1:dim(M)[2]
colnames(M) = paste(x, y, sep="")
If it is within a list you can do.
dimnames(x)[[1]]<-paste("SS", 1:nrow(x), sep="")
dimnames(x)[[2]]<-paste("M" , 1:ncol(x), sep="")
I have two vectors, subject and target. I want to create a new vector based on comparisons between the two existing vectors, with elements being compared lagged. I've solved this okay using the loop below, but I'm essentially wondering whether there's a more elegant solution using apply?
subject <- c(200, 195, 190, 185, 185, 185, 188, 189, 195, 200, 210, 210)
target <- c(subject[1], subject[1]-cumsum(rep(perweek, length(subject)-1)))
adjtarget <- target
for (i in 1:(length(subject)-1)) {
if (subject[i] > adjtarget[i]) {
adjtarget[i+1] <- adjtarget[i]
} else {
adjtarget[i+1] <- adjtarget[i]-perweek }
}
}
This doesn't exactly solve your problem, but may point in a helpful direction. I'm disregarding the interplay between changing adjtarget and comparing to it, and show a similar problem, where we compare to the constant target. Then it's possible to change the if in the loop to a vector comparison:
lv <- but.last(subject) > but.last(target)
ind <- which(lv)
Prepare the result vector (I'll call it x, as it won't be the same result as your adjtarget) as a shifted copy of target and assign the changes to it:
x <- c(target[1], but.last(target)) # corresponds to the true branch of the `if`
x[ind+1] <- target[ind] - perweek # corresponds to the false branch
Alternatively,
x <- c(target[1], but.last(target) - (!lv)*perweek
As I said, this doesn't solve your problem, but perhaps we could start from here.
Just for clarification, if I understand your code, this is the kind of result you're looking for...
> (goal <- cbind(subject,target,adjtarget))
subject target adjtarget
[1,] 200 200 200
[2,] 195 198 198
[3,] 190 196 196
[4,] 185 194 194
[5,] 185 192 192
[6,] 185 190 190
[7,] 188 188 188
[8,] 189 186 186
[9,] 195 184 186
[10,] 200 182 186
[11,] 210 180 186
[12,] 210 178 186
If I'm right, then the challenge to vectorizing this is the repeated assignment of 186 in adjtarget. Vectorized code will evaluate the right hand side (RHS) before assigning it to the left hand side (LHS). So, the vectorized code won't see the updated value in adjtarget at row 9 until after the assignment is finished.
> y <- ifelse(subject > target, 1, 0) # matches TRUE case
> x <- target
> x[ind+1] <- target[ind]
> cbind(goal, x, y)
subject target adjtarget x y
[1,] 200 200 200 200 0
[2,] 195 198 198 198 0
[3,] 190 196 196 196 0
[4,] 185 194 194 194 0
[5,] 185 192 192 192 0
[6,] 185 190 190 190 0
[7,] 188 188 188 188 0
[8,] 189 186 186 186 1
[9,] 195 184 186 186 1 # assigned correctly (?)
[10,] 200 182 186 184 1 # incorrect x; should be 186
[11,] 210 180 186 182 1 # incorrect x; should be 186
[12,] 210 178 186 180 1 # incorrect x; should be 186