validating model in new dataset - r

I have a dataset (d) in which I am looking at the prediction of hospital mortality (0 or 1) using tn1 and 3 other variables (rms package). I have built the model and now I would like to validate it in a second dataset using the same model coefficients. The variables have the same names etc, but I don’t know how to keep the coefficients from f1, rather than letting the model generate new coefficients for the second dataset.
I would be grateful for your expertise, many thanks, Annemarie
f1 <- lrm(outcomehosp ~ I(log2((tn1+0.001))) + apscore_ad + emsurg +
corrapiidiag, data = d)
record_id| corrapiidiag| tn1 |emsurg| apscore_ad |outcomehosp
7 3 0.27 1 24 1
8 9 0 1 21 0
9 7 0.11 0 22 0
11 9 0 0 13 0
12 9 13.9 0 17 0
13 22 5.02 0 37 0
21 9 9.6 0 34 0
25 9 0 0 10 0
27 9 0 0 33 1
28 25 0 0 18 1
30 9 0 0 19 0
31 9 0.16 0 26 1
32 9 0 0 13 1
34 7 0 0 18 0
35 9 0 0 20 0
36 9 3.03 0 41 1
37 9 0 0 18 0
38 9 0 0 18 0
39 9 0 0 17 0
40 9 0.14 0 23 0
41 9 0 0 10 0
42 9 0 0 8 0
43 9 2.45 0 11 0
45 9 0 1 12 0
46 9 0.16 1 17 0
49 9 0 1 22 0
50 9 0 0 15 0
51 9 0.05 1 16 0

Related

Could any one explain me about below error in the R germinationmetrics Package?

I would like to compute cumulative germination counts and Compute germination indices and Plot FPHF curves
My data structure is the following:
concentration temp rep Day01 Day02 Day03 Day04 Day05 Day06 Day07
1 0.0 10 1 0 0 0 0 0 0 0
2 0.5 10 1 0 0 0 0 6 6 6
3 0.3 10 1 0 0 0 0 8 8 8
4 0.1 10 1 0 0 0 0 6 6 6
5 0.0 10 2 0 0 0 0 0 0 0
6 0.5 10 2 0 0 0 0 9 9 9
7 0.3 10 2 0 0 0 0 8 8 8
8 0.1 10 2 0 0 0 0 6 6 6
9 0.0 10 3 0 0 0 0 0 0 0
10 0.5 10 3 0 0 0 0 5 5 5
11 0.3 10 3 0 0 0 0 8 8 8
12 0.1 10 3 0 0 0 0 2 2 2
13 0.0 20 1 0 0 0 0 0 7 7
14 0.5 20 1 0 0 0 0 17 17 17
15 0.3 20 1 0 0 0 0 21 21 21
16 0.1 20 1 0 0 0 0 20 20 20
17 0.0 20 2 0 0 0 0 0 7 10
18 0.5 20 2 0 0 0 0 13 13 13
19 0.3 20 2 0 0 0 0 18 18 18
20 0.1 20 2 0 0 0 0 22 22 22
21 0.0 20 3 0 0 0 0 0 14 14
22 0.5 20 3 0 0 0 0 15 15 15
23 0.3 20 3 0 0 0 0 15 15 15
24 0.1 20 3 0 0 0 0 14 14 14
25 0.0 30 1 0 0 0 0 0 0 0
26 0.5 30 1 0 0 0 0 0 0 0
27 0.3 30 1 0 0 0 0 0 0 0
28 0.1 30 1 0 0 0 0 0 0 0
29 0.0 30 2 0 0 0 0 0 0 0
30 0.5 30 2 0 0 0 0 0 0 0
31 0.3 30 2 0 0 0 0 0 0 0
32 0.1 30 2 0 0 0 0 0 0 0
33 0.0 30 3 0 0 0 0 0 0 0
34 0.5 30 3 0 0 0 0 0 0 0
35 0.3 30 3 0 0 0 0 0 0 0
36 0.1 30 3 0 0 0 0 0 0 0
Day08 Day09 Day10 Day11 Day12 Day13 Day14 Day15 Day16 Day17 Day18
1 0 0 1 1 1 1 1 1 1 1 1
2 18 18 18 18 20 20 20 20 20 20 20
3 18 18 18 18 20 20 20 20 20 20 20
4 16 16 16 16 18 18 18 19 19 19 19
5 0 0 1 1 1 1 1 1 1 1 1
6 22 22 22 22 23 23 23 23 23 23 23
7 22 22 22 22 23 23 23 23 23 23 23
8 18 18 18 18 19 19 19 19 19 19 19
9 0 0 2 2 2 4 4 4 4 4 4
10 20 20 20 20 21 21 21 21 21 21 21
11 17 17 17 17 20 20 20 20 20 20 20
12 22 22 22 22 23 23 23 23 23 23 23
13 7 7 7 7 7 7 7 7 7 7 7
14 23 23 23 23 23 23 23 23 23 23 23
15 24 24 24 24 24 24 24 24 24 24 24
16 24 24 24 24 24 24 24 24 24 24 24
17 10 10 10 10 10 10 10 10 10 10 10
18 25 25 25 25 25 25 25 25 25 25 25
19 23 23 23 23 23 23 23 23 23 23 23
20 23 23 23 23 23 23 23 23 23 23 23
21 14 14 14 14 14 14 14 14 14 14 14
22 23 23 23 23 23 23 23 23 23 23 23
23 21 21 21 21 21 21 21 21 21 21 21
24 20 20 20 20 20 20 20 20 20 20 20
25 0 0 0 0 0 0 0 0 0 0 0
26 0 0 0 0 0 0 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0 0 0
28 0 0 0 0 0 0 0 0 0 0 0
29 0 0 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0 0 0
31 0 0 0 0 0 0 0 0 0 0 0
32 0 0 0 0 0 0 0 0 0 0 0
33 0 0 0 0 0 0 0 0 0 0 0
34 0 0 0 0 0 0 0 0 0 0 0
35 0 0 0 0 0 0 0 0 0 0 0
36 0 0 0 0 0 0 0 0 0 0 0
Day19 Total.Seeds
1 1 25
2 20 25
3 20 25
4 19 25
5 1 25
6 23 25
7 23 25
8 19 25
9 4 25
10 21 25
11 20 25
12 23 25
13 7 25
14 23 25
15 24 25
16 24 25
17 10 25
18 25 25
19 23 25
20 23 25
21 14 25
22 23 25
23 21 25
24 20 25
25 0 25
26 0 25
27 0 25
28 0 25
29 0 25
30 0 25
31 0 25
32 0 25
33 0 25
34 0 25
35 0 25
36 0 25
I receive the following error:
data(gcdata1)
Warning message:
In data(gcdata1) : data set ‘gcdata1’ not found
I created the below variable for counts.per.intervals
counts.per.intervals <- c("Day01", "Day02", "Day03", "Day04", "Day05",
+ "Day06", "Day07", "Day08", "Day09", "Day10",
+ "Day11", "Day12", "Day13", "Day14", "Day15", "Day16", "Day17", "Day18", "Day19")
As the following variable for indices
indices<-germination.indices(gcdata1, total.seeds.col = "Total.Seeds",
counts.intervals.cols = counts.per.intervals,
intervals = 1:19, partial = FALSE, max.int = 5)
I received the below error:
Error in if (nearest[2] == nearest[1]) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Summing up different elements in a matrix in R

I'm trying to perform calculations on different elements in a matrix in R. My Matrix is 18x18 and I would like to get e.g. the mean of each 6x6 array (which makes 9 arrays in total). My desired arrays would be:
A1 <- df[1:6,1:6]
A2 <- df[1:6,7:12]
A3 <- df[1:6,13:18]
B1 <- df[7:12,1:6]
B2 <- df[7:12,7:12]
B3 <- df[7:12,13:18]
C1 <- df[13:18,1:6]
C2 <- df[13:18,7:12]
C3 <- df[13:18,13:18]
The matrix looks like this:
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
5 14 17 9 10 8 4 10 12 18 9 13 14 NA NA 19 15 10 10
10 30 32 23 27 17 28 25 12 28 29 28 26 19 25 34 24 11 17
15 16 16 16 9 17 27 17 16 30 13 18 13 15 13 19 8 7 9
20 15 12 18 18 18 6 4 6 9 11 10 10 13 11 8 10 15 15
25 7 13 21 7 3 5 2 5 5 4 3 2 3 5 2 1 5 6
30 5 9 1 7 7 4 4 12 8 9 2 0 5 2 1 0 2 6
35 3 0 2 0 0 4 4 7 4 4 5 2 0 0 1 0 0 0
40 0 4 0 0 0 1 3 9 10 10 1 0 0 0 1 0 1 0
45 0 0 0 0 0 3 10 9 17 9 1 0 0 0 0 0 0 0
50 0 0 2 0 0 0 2 8 20 0 0 0 0 0 1 0 0 0
55 0 0 0 0 0 0 7 3 21 0 0 0 0 0 0 0 0 0
60 0 0 0 0 3 4 10 2 2 0 0 1 0 0 0 0 0 0
65 0 0 0 0 0 4 8 4 8 11 0 0 0 0 0 0 0 0
70 0 0 0 0 0 6 2 5 14 0 0 0 0 0 0 0 0 0
75 0 0 0 0 0 4 0 5 9 0 0 0 0 0 0 0 0 0
80 0 0 0 0 0 4 4 0 4 2 0 0 0 0 0 0 0 0
85 0 0 0 0 0 0 0 4 1 1 0 0 0 0 0 0 0 0
90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Is there a clean way to solve this issue with a loop?
Thanks a lot in advance,
Paul
Given your matrix, e.g.
x <- matrix(1:(18*18), ncol=18)
Try, for example for sub matrices of 6
step <- 6
nx <- nrow(x)
if((nx %% step) != 0) stop("nx %% step should be 0")
indI <- seq(1, nx, by=step)
nbStep <- length(indI)
for(Col in 1:nbStep){
for(Row in 1:nbStep){
name <- paste0(LETTERS[Col],Row)
theCol <- indI[Col]:(indI[Col]+step-1)
theRow <- indI[Row]:(indI[Row]+step-1)
assign(name, sum(x[theCol, theRow]))
}
}
You'll get your results in A1, A2, A3...
This is the idea. Twist the code for non square matrices, different size of sub matrices, ...
Here's one way:
# generate fake data
set.seed(47)
n = 18
m = matrix(rpois(n * n, lambda = 5), nrow = n)
# generate starting indices
n_array = 6
start_i = seq(1, n, by = n_array)
arr_starts = expand.grid(row = start_i, col = start_i)
# calculate sums
with(arr_starts, mapply(function(x, y) sum(m[(x + 1:n_array) - 1, (y + 1:n_array) - 1]), row, col))
# [1] 158 188 176 201 188 201 197 206 204

Fitting Weibull distribution to the censored data

I would like to estimate Maximum Likelihood parameters of the Weibull distribution by applying to the following data with a given censoring vector in R:
data= 9 2 11 49 7 5 3 36 30 6 62 5 3 29 29 1 13 1 24 11 9 4 7 15 11 15 1 1 1 1 1 2 6 12 12 28 14 14 57 17 4 2 3 6 21 6 16 19 28 18 19 9 59 12 3 27 8 26 19 47 68 17 15 25 25 6 54 1 2 11 4 1 36 2 5 5 3 38 3 1 10 69 1 8 3 17 21 19 11 1 6 1 1 18 2 51 6 12 11 13 3 19 16 18 28 10 26 32 6 25 1 44
cens= 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
I would be very thankful if anyone could help me.
Use the Abrem package:
install.packages("abrem", repos="http://R-Forge.R-project.org")
You may need to manually install an older version of RccpArmadillo if you have issues like I did:
install.packages("https://cran.r-project.org/src/contrib/Archive/RcppArmadillo/RcppArmadillo_0.6.100.0.0.tar.gz", repos=NULL, type="source")
Then have at it:
library(abrem)
a = Abrem(fail = c(2, 11, 49, ...), susp = c(9, 44))
a = abrem.fit(a, dist = 'weibull', method.fit = 'mle')
a = abrem.conf(a) # add 90% confidence bands
plot.abrem(a) # plot the points and fit distribution
print.abrem(a) # print the results, which includes the fitted parameters
I may have confused your failures vs. suspensions data, but hopefully the example makes it clear where each goes.

Create one column out of several columns Rstudio or Excel

I've tried to find an already existing question on this matter but I couldn't so that is why I'm asking here you:
Summary:
I want to make ONE column out of several Columns. All the values in the columns are put in the same order as they are and also, the columns should be stacked below each other.
Description and details
Below is an example of how my csv.file look like. However, note that there is >400 columns and that's why I don't want to do it manually in for example Excel. ALL columns has 24 rows each.
X1 X2 X3 X4 X5 X6 ... X470
0 1 5 10 8 0 7
0 0 0 0 0 0 0
2 3 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
I want to "stack" all the columns in one column, as I've shortly described in the summary:
Info: The sign "..." below implies the rest of the values from that column.
VALUE FROM COLUMN
0 X1
0 X1
2 X1
...
1 X2
0 X2
3 X2
...
5 X3
...
10 X4
...
8 X5
...
0 X6
...
7 X470
...
So in the end, instead of having 486 column where each of them have 24 rows. I will have 1 column with 11664 rows. It would be good if the origin column is written in a new column on the side (as showed above) but this is not "obligated".
OBS! Note that I've with this df just showed in general what I want to achieve, so clear and understandable commands are appreciated as I will apply it to my df.
It doesn't matter if the solution is done in R or Excel! As long as it is easy to do
I hope my description is clear, otherwise please let me know so I can try to describe again.
Many thanks for suggestions and help.
Kind regards, Elin
We can use stack to get the values in one column and the colnames in the next.
stack(df)
Or use unlist
data.frame(VALUE=unlist(df),
fromColumn= rep(names(df), each=nrow(df)))
Here's a VBA user defined function to do the job:
Function ConcatCols(Colrange As Variant) As Variant
Dim LongCol() As Variant, i As Long, j As Long, k As Long
Dim NumCols As Long, NumRows As Long, NumRows2 As Long
If TypeName(Colrange) = "Range" Then Colrange = Colrange.Value2
NumRows = UBound(Colrange)
NumCols = UBound(Colrange, 2)
NumRows2 = NumRows * NumCols
ReDim LongCol(1 To NumRows2, 1 To 1)
k = 1
For i = 1 To NumCols
For j = 1 To NumRows
LongCol(k, 1) = Colrange(j, i)
k = k + 1
Next j
Next i
ConcatCols = LongCol
End Function
Enter the code in a VBA module then enter =ConcatCols(A1:RL24) as an array function (Ctrl-Shift-Enter) in column RM (or wherever you want) to view the entire concatenated column, or call from a VBA sub to write the data to the spreadsheet.
The following is pretty simple but requires loading the reshape2 package which I think is included in base. As suggested above, stack() gives similar output, but reverses the columns.
library(reshape2)
df <- data.frame("A" = 1:21, "B" = 21:41, "C" = 40:60)
> df
A B C
1 1 21 40
2 2 22 41
3 3 23 42
4 4 24 43
5 5 25 44
6 6 26 45
7 7 27 46
8 8 28 47
9 9 29 48
10 10 30 49
11 11 31 50
12 12 32 51
13 13 33 52
14 14 34 53
15 15 35 54
16 16 36 55
17 17 37 56
18 18 38 57
19 19 39 58
20 20 40 59
21 21 41 60
melt(df)
> melt(df)
No id variables; using all as measure variables
variable value
1 A 1
2 A 2
3 A 3
4 A 4
5 A 5
6 A 6
7 A 7
8 A 8
9 A 9
10 A 10
11 A 11
12 A 12
13 A 13
14 A 14
15 A 15
16 A 16
17 A 17
18 A 18
19 A 19
20 A 20
21 A 21
22 B 21
23 B 22
24 B 23
25 B 24
26 B 25
27 B 26
28 B 27
29 B 28
30 B 29
31 B 30
32 B 31
33 B 32
34 B 33
35 B 34
36 B 35
37 B 36
38 B 37
39 B 38
40 B 39
41 B 40
42 B 41
43 C 40
44 C 41
45 C 42
46 C 43
47 C 44
48 C 45
49 C 46
50 C 47
51 C 48
52 C 49
53 C 50
54 C 51
55 C 52
56 C 53
57 C 54
58 C 55
59 C 56
60 C 57
61 C 58
62 C 59
63 C 60

How to replace values from a matrix with another matrix based on column/row names?

I have a small matrix:
SMALL<-matrix(c(1:9),3, 3)
colnames(SMALL)<-c("25","36","48")
rownames(SMALL)<-c("18","25","48")
looks like:
25 36 48
18 1 4 7
25 2 5 8
48 3 6 9
And a large matrix:
LARGE<-matrix(0,4, 4)
colnames(LARGE)<-c("12","25","36","48")
rownames(LARGE)<-c("18","25","38","48")
looks like:
12 25 36 48
18 0 0 0 0
25 0 0 0 0
38 0 0 0 0
48 0 0 0 0
I would like to replace values from the large matrix by those from the small one based on the column/row names.
Looking for this result:
12 25 36 48
18 0 1 4 7
25 0 2 5 8
38 0 0 0 0
48 0 3 6 9
Any ideas ?
Assuming there is a match for each col and row name of SMALL in LARGE:
i <- match(rownames(SMALL), rownames(LARGE))
j <- match(colnames(SMALL), colnames(LARGE))
LARGE[i,j] <- SMALL
# 12 25 36 48
#18 0 1 4 7
#25 0 2 5 8
#38 0 0 0 0
#48 0 3 6 9

Resources