Calculate MLE for ß parameters - r

I'm new to R and have and stuck with an assignment. Would be really grateful if someone could help!
This is the task:
"Use a linear regression model to calculate the MLE(ß^) for the three ß parameters when using the linear regression mode to relate each of the L genotype markers with the phenotype (i.e. your R code must include the formula for the MLE).
Plot histograms for each of parameter estimates ß calculated for the L genotypes (i.e. three histograms!)"
I have 200 individuals and my Xd vector and Xa vector are:
> Xd
[1] 0 0 0 0 1 0 0 NA 0 0 0 0 0 0 0 NA 0 NA 0 1 0 0 0 1 0
[26] 0 0 0 0 0 0 0 0 -1 0 0 1 -1 0 NA -1 0 0 0 0 0 0 NA 0 1
[51] 0 0 0 -1 0 0 0 NA 0 0 -1 -1 0 0 0 0 -1 0 0 -1 0 0 0 0 0
[76] 0 1 0 0 0 0 1 0 0 0 NA NA 0 0 0 NA 0 0 NA 0 -1 0 0 0 -1
[101] 0 0 0 NA NA 0 NA 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0
[126] 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 NA 1 0 0
[151] -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 1 1 0 -1 0 -1
[176] 0 0 0 0 NA 0 0 0 1 0 0 0 0 0 0 0 -1 0 0 0 0 NA 0 0
> Xa
[1] 1 1 1 1 -1 1 1 NA 1 1 1 1 1 1 1 NA 1 NA 1 -1 1 1 1 -1 1
[26] 1 1 1 1 1 1 1 1 -1 1 1 -1 -1 1 NA -1 1 1 1 1 1 1 NA 1 -1
[51] 1 1 1 -1 1 1 1 NA 1 1 -1 -1 1 1 1 1 -1 1 1 -1 1 1 1 1 1
[76] 1 -1 1 1 1 1 -1 1 1 1 NA NA 1 1 1 NA 1 1 NA 1 -1 1 1 1 -1
[101] 1 1 1 NA NA 1 NA 1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1
[126] 1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1 1 1 NA -1 1 1
[151] -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 NA 1 -1 -1 1 -1 1 -1
[176] 1 1 1 1 NA 1 1 1 -1 1 1 1 1 1 1 1 -1 1 1 1 1 NA 1 1
>
What i did was:
> Xmu=rep(1,200)
> X=cbind(Xmu, Xa, Xd)
Then I get the following error
In cbind(Xmu, Xa, Xd) :
number of rows of result is not a multiple of vector length (arg 2)
What does that mean?? How do I calculate my MLE for my ß parameters? I would have proceeded like this:
Y <- 1 + Xa*1 + Xd*0 + rnorm(200,0,sqrt(1))
betas <- solve(t(X)%*% X) %*% t(X)%*% Y
beta_mu <- betas[1]
beta_a <- betas[2]
beta_d <- betas[3]
Also the "your code must include the MLE formula"-part confuses me!? Thanks

Related

Count occurences of teams in matrix in R

Have a 1000*16 matrix from a simulation with team names as characters. I want to count number of occurrences per team in all 16 columns.
I know I could do apply(test, 2, table) but that makes the data hard to work with afterward since all teams is not included in every column.
If you have a vector that is all the unique team names you could do something like this. I'm counting occurrences here via column to ensure that not every team (in this case letter) is not included.
set.seed(15)
letter_mat <- matrix(
sample(
LETTERS,
size = 1000*16,
replace = TRUE
),
ncol = 16,
nrow = 1000
)
output <- t(
apply(
letter_mat,
1,
function(x) table(factor(x, levels = LETTERS))
)
)
head(output)
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
[1,] 1 2 0 1 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 1 1 0 0 1
[2,] 0 1 0 2 2 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 2 2 1
[3,] 1 1 0 0 1 0 1 2 1 0 0 0 0 0 1 0 1 0 1 1 0 0 3 0 1 1
[4,] 0 1 0 0 0 1 0 0 0 2 0 1 0 0 1 1 1 1 2 0 2 3 0 0 0 0
[5,] 2 1 0 0 0 0 0 2 0 2 1 1 1 0 0 2 0 2 1 0 0 1 0 0 0 0
[6,] 0 0 0 0 0 1 3 1 0 0 0 0 1 1 3 0 1 0 0 1 0 0 0 1 0 3

Calculate mean values of multiple measurements in a table with two categorical variables and a single continues variable

I have this puzzle to solve.
This is given data
# A tibble: 351 x 3
# Groups: expcode [?]
expcode rank distributpermm.3
<chr> <int> <dbl>
1 ER02 1 892.325
2 ER02 2 694.030
3 ER02 3 917.110
4 ER02 4 991.475
5 ER02 5 1487.210
6 ER02 6 892.325
7 ER02 7 694.030
8 ER02 8 1710.290
9 ER02 9 1090.620
10 ER02 10 1288.915
# ... with 341 more rows
When I call table on this data like this:
table(ranktab$expcode, ranktab$rank)
I get a ordinary table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
ER02 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER03 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER04 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER05 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER07 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ER11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER12 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER14 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
ER16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
ER18 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
ER19 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER22 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER23 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ER26 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Now I would like to get a matrix looks like this table above, but instate of sum of cases I would like to have the valves of third variable in the data frame, if there are two observations, then the mean of these.
Let's consider that your initial data is in df dataframe
df1 <- with(df, aggregate(distributpermm.3, by = list(expcode, rank), mean))
colnames(df1) <- colnames(df)
#this will give you final output in the desired format
xtabs(distributpermm.3 ~ expcode + rank, df1)
Hope this helps!
If you just want to obtain the means of variable relative to variable, you can use aggregate function.
Try this:
expcode = c (rep ("ER02", 3), rep ("ER03", 4), "ER04", rep ("ER05", 2))
rank = c (1, 2, 3, 1, 2, 3, 4, 1, 1, 2)
ddistributpermml.3 = c (892.325, 694.030, 917.110, 991.475, 1487.210, 892.325, 694.030, 1710.290, 1090.620, 1288.915)
data = data.frame (expcode, rank, ddistributpermml)
res = aggregate (data [, 3], list (data$expcode), mean)
colnames (res) = c ("expcode", "mean (distributpermm.3)")
res
# > res
# expcode mean (distributpermm.3)
# 1 ER02 834.4883
# 2 ER03 1016.2600
# 3 ER04 1710.2900
# 4 ER05 1189.7675
If you want to keep variable in some way, please clarify what you want to obtain.

Create block diagonal data frame in R

I have a data set that looks like this:
Person Team
114 1
115 1
116 1
117 1
121 1
122 1
123 1
214 2
215 2
216 2
217 2
221 2
222 2
223 2
"Team" ranges from 1 to 33, and teams vary in terms of size (i.e., there can be 5, 6, or 7 members, depending on the team). I need to create a data set into something that looks like this:
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
The sizes of the individual blocks are given by the number of people in a team. How can I do this in R?
You could use bdiag from the package Matrix. For example:
> bdiag(matrix(1,ncol=7,nrow=7),matrix(1,ncol=7,nrow=7))
Another idea, although, I guess this is less efficient/elegant than RStudent's:
DF = data.frame(Person = sample(100, 21), Team = rep(1:5, c(3,6,4,5,3)))
DF
lengths = tapply(DF$Person, DF$Team, length)
mat = matrix(0, sum(lengths), sum(lengths))
mat[do.call(rbind,
mapply(function(a, b) arrayInd(seq_len(a ^ 2), c(a, a)) + b,
lengths, cumsum(c(0, lengths[-length(lengths)])),
SIMPLIFY = F))] = 1
mat

Retrieve values in each cluster in R

I have successfully run the DBSCAN algorithm (here is the stripped down command):
results <- dbscan(data,MinPts=15, eps=0.01)
and plotted my clusters:
plot(results, data)
results$cluster returns a list with numeric values. The value at each index reflects the cluster to which the original data in that index belongs:
[1] 0 1 2 1 0 0 2 1 0 0 0 1 2 0 2 0 2 0 0 1 2 0 2 2 0 1 2 0 1 0 1 0 2 0 0 0 1 1 0 1 2 0 0 0 1 0 0 1 1 0 1
[52] 0 2 2 0 0 1 2 2 0 2 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 2 2 2 2 2 0 0 0 0 0 2 1 2 1 0 2 0 0 1 1 1 0 0 1
[103] 2 1 1 0 1 0 1 1 0 0 0 0 1 2 0 0 1 1 1 1 0 0 0 1 0 0 2 2 1 1 0 1 2 1 0 0 1 0 1 2 0 0 2 0 0 2 2 2 2 0 1
However, how can I retrieve the values of the original data that is in each cluster? For example, how can I get all the values from the original data that are in cluster #2?
Okay, this should do the trick for, e.g., cluster #2:
data[results$cluster==2,]

Convert Survival Data from Wide to Long

I am reading http://www.uk.sagepub.com/books/Book233417 and the Rcmdr is used to transform the Rossi data http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt from wide to long for time-varying survival analysis.
The Rcmdr script to do the transformation is:
.CovSets <-structure(list(covariate.1 = c("emp1", "emp2", "emp3", "emp4", "emp5", "emp6", "emp7", "emp8", "emp9", "emp10", "emp11", "emp12", "emp13", "emp14", "emp15", "emp16", "emp17", "emp18", "emp19", "emp20", "emp21", "emp22", "emp23", "emp24", "emp25", "emp26", "emp27", "emp28", "emp29","emp30", "emp31", "emp32", "emp33", "emp34", "emp35", "emp36", "emp37", "emp38", "emp39", "emp40", "emp41", "emp42", "emp43", "emp44", "emp45", "emp46", "emp47", "emp48", "emp49", "emp50", "emp51", "emp52")), .Names = "covariate.1")
Rossi.long <- unfold(Rossi, time="week", event="arrest", cov=.CovSets,
cov.names=c("covariate.1"))
remove(.CovSets)
However this script does not run if the Rcmdr is not loaded.
The results of the Rcmdr script transforms the Rossi dataframe from
> head(Rossi,20)
week arrest fin age race wexp mar paro prio educ emp1 emp2 emp3 emp4 emp5 emp6 emp7 emp8 emp9 emp10 emp11 emp12 emp13 emp14 emp15 emp16 emp17 emp18 emp19 emp20 emp21 emp22 emp23 emp24 emp25
1 20 1 0 27 1 0 0 1 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA
2 17 1 0 18 1 0 0 1 8 4 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 NA NA NA NA NA NA NA NA
3 25 1 0 19 0 1 0 1 13 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
4 52 0 1 23 1 1 1 1 1 5 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
5 52 0 0 19 0 1 0 1 3 3 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
6 52 0 0 24 1 1 0 0 2 4 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
7 23 1 0 25 1 1 1 1 0 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 NA NA
8 52 0 1 21 1 1 0 1 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
9 52 0 0 22 1 0 0 0 6 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0
10 52 0 0 20 1 1 0 0 0 5 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 52 0 1 26 1 0 0 1 3 3 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
12 52 0 0 40 1 1 0 0 2 5 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
13 37 1 0 17 1 1 0 1 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 52 0 0 37 1 1 0 0 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
15 25 1 0 20 1 0 0 1 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0
16 46 1 1 22 1 1 0 1 2 3 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
17 28 1 0 19 1 0 0 0 7 3 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0
18 52 0 0 20 1 0 0 0 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
19 52 0 0 25 1 0 0 1 12 3 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0
20 52 0 0 24 0 1 0 1 1 3 0 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1
emp26 emp27 emp28 emp29 emp30 emp31 emp32 emp33 emp34 emp35 emp36 emp37 emp38 emp39 emp40 emp41 emp42 emp43 emp44 emp45 emp46 emp47 emp48 emp49 emp50 emp51 emp52 id
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1
2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2
3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3
4 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4
5 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5
6 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6
7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 7
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8
9 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 9
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 11
12 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 12
13 0 0 1 1 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 13
14 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 14
15 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 15
16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 NA NA NA NA NA NA 16
17 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 17
18 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18
19 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 19
20 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20
To the Rossi.long data
> head(Rossi.long,30)
start stop arrest.time week arrest fin age race wexp mar paro prio educ id covariate.1
1.1 0 1 0 20 1 0 27 1 0 0 1 3 3 1 0
1.2 1 2 0 20 1 0 27 1 0 0 1 3 3 1 0
1.3 2 3 0 20 1 0 27 1 0 0 1 3 3 1 0
1.4 3 4 0 20 1 0 27 1 0 0 1 3 3 1 0
1.5 4 5 0 20 1 0 27 1 0 0 1 3 3 1 0
1.6 5 6 0 20 1 0 27 1 0 0 1 3 3 1 0
1.7 6 7 0 20 1 0 27 1 0 0 1 3 3 1 0
1.8 7 8 0 20 1 0 27 1 0 0 1 3 3 1 0
1.9 8 9 0 20 1 0 27 1 0 0 1 3 3 1 0
1.10 9 10 0 20 1 0 27 1 0 0 1 3 3 1 0
1.11 10 11 0 20 1 0 27 1 0 0 1 3 3 1 0
1.12 11 12 0 20 1 0 27 1 0 0 1 3 3 1 0
1.13 12 13 0 20 1 0 27 1 0 0 1 3 3 1 0
1.14 13 14 0 20 1 0 27 1 0 0 1 3 3 1 0
1.15 14 15 0 20 1 0 27 1 0 0 1 3 3 1 0
1.16 15 16 0 20 1 0 27 1 0 0 1 3 3 1 0
1.17 16 17 0 20 1 0 27 1 0 0 1 3 3 1 0
1.18 17 18 0 20 1 0 27 1 0 0 1 3 3 1 0
1.19 18 19 0 20 1 0 27 1 0 0 1 3 3 1 0
1.20 19 20 1 20 1 0 27 1 0 0 1 3 3 1 0
2.1 0 1 0 17 1 0 18 1 0 0 1 8 4 2 0
2.2 1 2 0 17 1 0 18 1 0 0 1 8 4 2 0
2.3 2 3 0 17 1 0 18 1 0 0 1 8 4 2 0
2.4 3 4 0 17 1 0 18 1 0 0 1 8 4 2 0
2.5 4 5 0 17 1 0 18 1 0 0 1 8 4 2 0
2.6 5 6 0 17 1 0 18 1 0 0 1 8 4 2 0
2.7 6 7 0 17 1 0 18 1 0 0 1 8 4 2 0
2.8 7 8 0 17 1 0 18 1 0 0 1 8 4 2 0
2.9 8 9 0 17 1 0 18 1 0 0 1 8 4 2 0
2.10 9 10 0 17 1 0 18 1 0 0 1 8 4 2 1
Is it possible to perform this exact transformation using the reshape or any other data transformation package?
UPDATE: The Rcmdr script is runnable only within Rcmdr
The 'unfold' function is located here (as documented in the pdf you linked to:
http://socserv.mcmaster.ca/jfox/Books/Companion/scripts/appendix-cox.R
The script does not require Rcmdr. It would require car (which in turn loads MASS and nnet but if you have Rcmdr then you must have car) and it does load survival which is a recommended package and should be available in all installations. It runs to completion without error in R 3.0.0 beta and I strongly suspect it would have run to completion in R 2.15.x.

Resources