ctree to produce predictions other than the existing categories - r

I have a ctree to run. In my training set, the response variable has 3 categories: 0, 1, 99.
However, the tree plot produces more value outcomes between 0 and 1:
Actual
Prediction 0 1 99
0 6281 0 0
0.0869565217391304 63 6 0
0.288888888888889 32 13 0
0.529411764705882 24 27 0
0.588235294117647 35 50 0
0.625 9 15 0
0.641891891891892 53 95 0
0.684014869888476 85 184 0
0.807692307692308 5 21 0
0.853035143769968 46 267 0
0.864406779661017 8 51 0
0.892018779342723 23 190 0
0.896103896103896 8 69 0
0.95668549905838 23 508 0
0.98695652173913 3 227 0
1 0 58 0
99 0 0 3018
Does anyone know how is this possible?
Thank you!

Related

Producing dataframe in R with rows summing to same number, including all possible combinations of numbers in each column

I am trying to create a dataframe in R.
I have 4 categories (e.g. Blue, Red, Yellow, Green) and I would like each row to sum to 100%. For each category I want to create incrimental differences of 5% units and produce a dataframe which has all possible combinations of numbers (to nearest 5%) for the 4 categories. I realise I am not explaining this well at all, so I have tried to show what I mean in the following table:
Blue
Red
Yellow
Green
95
5
0
0
95
0
5
0
95
0
0
5
5
95
0
0
0
95
5
0
0
95
0
5
5
0
95
0
0
5
95
0
0
0
95
5
5
0
0
95
0
5
0
95
0
0
5
95
90
10
0
0
90
0
10
0
90
0
0
10
10
90
0
0
0
90
10
0
0
90
0
10
10
0
90
0
0
10
90
0
0
0
90
10
10
0
0
90
0
10
0
90
0
0
10
90
90
5
5
0
90
5
0
5
90
0
5
5
5
90
5
0
5
90
0
5
0
90
5
5
5
5
90
0
5
0
90
5
0
5
90
5
5
5
0
90
5
0
5
90
0
5
5
90
85
15
0
0
85
0
15
0
85
0
0
15
15
85
0
0
0
85
15
0
0
85
0
15
15
0
85
0
0
15
85
0
0
0
85
15
15
0
0
85
0
15
0
85
0
0
15
85
85
10
5
0
85
10
0
5
85
5
10
0
85
0
10
5
85
5
0
10
85
0
5
10
10
85
5
0
10
85
0
5
5
85
10
0
0
85
10
5
5
85
0
10
0
85
5
10
10
5
85
0
10
0
85
5
5
10
85
0
0
10
85
5
5
0
85
10
0
5
85
10
10
5
0
85
10
0
5
85
5
10
0
85
0
10
5
85
5
0
10
85
0
5
10
85
85
5
5
5
I am struggling to know where to start here...
You could nest three for loops and bind the results together:
target_df <- data.frame()
for (i in seq(95, 0, by = -5)) {
for (j in seq(100 - i, 0, by = -5)) {
for(k in seq(100 - i - j, 0, by = -5)) {
target_df <- rbind(target_df, data.frame(Blue = i, Red = j, Yellow = k, Green = 100 - i - j - k))
}
}
}
This returns
Blue Red Yellow Green
1 95 5 0 0
2 95 0 5 0
3 95 0 0 5
4 90 10 0 0
5 90 5 5 0
6 90 5 0 5
7 90 0 10 0
8 90 0 5 5
9 90 0 0 10
10 85 15 0 0
You might want to remove three rows containing 100 in columns Red, Yellow and Green.

How to make a random strata sample in R?

I have a data.frame calls "per" who has three variables: nrodocumento, cod_jer(42 groups) and grupo_fict(8 groups). I would like to have a random sample (data.frame)for each cod_jer and inside each grupo_fict.
> dput(head(per))
structure(list(nrodocumento = c(49574917L, 54692750L, 54731807L,
57364176L, 57364198L, 46867674L), cod_jer = c(1146L, 32L, 0L,
0L, 0L, 0L), grupo_fict = c(3L, 1L, 8L, 1L, 1L, 1L)), .Names =
c("nrodocumento",
"cod_jer", "grupo_fict"), row.names = c(NA, 6L), class = "data.frame")
> head(per,n=100)
nrodocumento cod_jer grupo_fict
1 49574917 1146 3
2 54692750 32 1
3 54731807 0 8
4 57364176 0 1
5 57364198 0 1
6 46867674 0 1
7 46867668 0 1
8 57364201 0 1
9 53767871 0 1
10 55339012 0 1
11 49204318 0 8
12 53743017 0 1
13 47622958 0 1
14 49019862 0 1
15 50167428 0 2
16 48783260 0 4
17 52020945 433 5
18 54486680 236 4
19 51402916 0 4
20 48543242 0 2
21 54671603 0 1
22 50644599 0 8
23 53293608 0 1
24 52742799 0 4
25 49815210 0 8
26 50967719 236 3
27 51938997 0 8
28 50057188 324 3
29 52754706 0 6
30 55322102 0 3
31 53040748 0 1
32 50321642 0 5
33 51621354 236 8
34 49611806 0 7
35 53347667 0 8
36 52462498 0 3
37 54158570 0 8
38 54034849 0 8
39 52507674 321 3
40 50218598 317 7
41 45078442 432 7
42 51491066 0 8
43 53278953 0 2
44 52661658 0 2
45 50092873 236 3
46 50308064 0 7
47 51941635 0 7
48 53527966 0 1
49 49614579 0 1
50 49450678 318 8
51 52953427 1146 7
52 52133221 0 8
53 53363128 0 7
54 52819643 0 1
55 47516589 0 1
56 52563137 0 3
57 49511296 0 7
58 54154013 0 2
59 50822420 1349 4
60 50822408 1349 4
61 50822414 1349 6
62 52339683 0 1
63 50026113 0 7
64 47328586 0 7
65 56041961 0 7
66 47756955 432 8
67 53158397 0 7
68 53151167 0 7
69 54710039 0 3
70 54408844 114 4
71 46286323 114 4
72 50310877 0 1
73 50929135 0 7
74 49817218 0 1
75 53604540 0 8
76 52812736 1147 1
77 53726314 1147 1
78 50835936 0 8
79 55429334 0 1
80 48421020 329 8
81 49800217 0 3
82 52818263 0 1
83 45884978 0 1
84 50203385 0 1
85 53433610 0 2
86 54515938 0 1
87 50263935 0 8
88 52439152 0 2
89 48424129 236 3
90 47031563 0 8
91 53577610 11 1
92 48759083 11 1
93 50344731 432 1
94 51164013 0 3
95 52026977 163 7
96 50965482 0 3
97 45947594 433 8
98 53357234 0 7
99 48367529 0 8
100 54286153 0 3
> table(per$cod_jer,per$grupo_fict)
1 2 3 4 5 6 7 8
0 3990 2296 1743 1453 356 250 2031 2051
11 149 85 29 34 14 6 34 25
13 2 4 1 0 0 0 1 1
14 3 1 0 0 0 0 0 1
32 37 12 13 10 3 1 23 13
101 19 12 6 5 3 0 6 12
102 2 0 0 0 0 0 0 0
103 11 10 3 3 0 1 3 0
104 17 8 1 7 2 1 7 9
105 11 12 3 3 3 0 6 10
106 147 57 30 29 8 1 43 42
107 33 37 5 9 3 2 8 9
108 6 10 2 3 0 2 3 4
109 44 37 11 9 6 2 14 14
111 112 81 26 28 8 3 22 18
112 21 8 4 8 2 0 3 2
113 94 61 14 16 4 1 17 24
114 60 52 10 14 9 5 8 20
115 72 24 21 13 5 1 11 16
125 5 4 1 0 1 0 0 1
138 15 5 2 2 1 0 2 0
163 50 35 26 26 7 12 43 41
234 51 43 31 32 10 7 49 53
236 78 29 46 35 7 7 39 37
317 44 28 21 13 7 2 28 21
318 20 27 5 10 4 3 12 14
319 45 21 25 19 1 2 26 21
321 6 4 9 3 0 3 8 1
322 43 30 24 16 5 3 16 34
323 30 14 25 15 3 4 24 22
324 59 29 31 27 8 5 28 27
325 15 12 6 5 1 2 8 11
326 18 12 17 13 4 2 20 15
327 45 28 23 26 7 6 25 40
328 52 49 33 32 5 9 31 35
329 42 36 26 20 2 3 23 30
431 6 2 4 1 2 0 2 6
432 39 18 27 24 5 1 28 34
433 139 92 90 89 18 13 61 66
1146 97 49 26 14 7 5 24 29
1147 56 33 26 25 9 0 19 20
1349 15 9 11 10 0 1 10 3
1544 62 33 20 32 4 3 25 43
1545 37 13 22 14 1 3 14 31
1848 16 27 11 15 3 0 10 12
For other hand I have a data.frame wiht vacancies, I mean, the size of each sample I need inside each gruop.
> dput(head(vacantes))
structure(list(cod_jer = c(101L, 316L, 325L, 1349L, 1544L, 102L
), vacantes = c(132, 180, 54, 63, 45, 0), vac1 = c(27, 36, 11,
13, 9, 0), vac2 = c(27, 36, 11, 13, 9, 0), vac3 = c(24, 33, 10,
12, 9, 0), vac4 = c(24, 33, 10, 12, 9, 0), vac5 = c(8, 11, 4,
4, 3, 0), vac6 = c(8, 11, 4, 4, 3, 0), vac7 = c(7, 10, 3, 3,
2, 0), vac8 = c(7, 10, 3, 3, 2, 0)), .Names = c("cod_jer", "vacantes",
"vac1", "vac2", "vac3", "vac4", "vac5", "vac6", "vac7", "vac8"
), row.names = c(NA, 6L), class = "data.frame")
> vacantes
cod_jer vacantes vac1 vac2 vac3 vac4 vac5 vac6 vac7 vac8
1 101 132 27 27 24 24 8 8 7 7
2 316 180 36 36 33 33 11 11 10 10
3 325 54 11 11 10 10 4 4 3 3
4 1349 63 13 13 12 12 4 4 3 3
5 1544 45 9 9 9 9 3 3 2 2
6 102 0 0 0 0 0 0 0 0 0
7 103 0 0 0 0 0 0 0 0 0
8 104 0 0 0 0 0 0 0 0 0
9 105 0 0 0 0 0 0 0 0 0
10 106 0 0 0 0 0 0 0 0 0
11 107 0 0 0 0 0 0 0 0 0
12 108 0 0 0 0 0 0 0 0 0
13 109 0 0 0 0 0 0 0 0 0
14 110 0 0 0 0 0 0 0 0 0
15 111 0 0 0 0 0 0 0 0 0
16 112 0 0 0 0 0 0 0 0 0
17 113 0 0 0 0 0 0 0 0 0
18 114 0 0 0 0 0 0 0 0 0
19 115 0 0 0 0 0 0 0 0 0
20 137 0 0 0 0 0 0 0 0 0
21 138 0 0 0 0 0 0 0 0 0
22 139 0 0 0 0 0 0 0 0 0
23 140 0 0 0 0 0 0 0 0 0
24 234 0 0 0 0 0 0 0 0 0
25 236 0 0 0 0 0 0 0 0 0
26 317 0 0 0 0 0 0 0 0 0
27 318 0 0 0 0 0 0 0 0 0
28 319 0 0 0 0 0 0 0 0 0
29 320 0 0 0 0 0 0 0 0 0
30 321 0 0 0 0 0 0 0 0 0
31 322 0 0 0 0 0 0 0 0 0
32 323 0 0 0 0 0 0 0 0 0
33 324 0 0 0 0 0 0 0 0 0
34 326 0 0 0 0 0 0 0 0 0
35 327 0 0 0 0 0 0 0 0 0
36 328 0 0 0 0 0 0 0 0 0
37 329 0 0 0 0 0 0 0 0 0
38 431 0 0 0 0 0 0 0 0 0
39 432 0 0 0 0 0 0 0 0 0
40 433 0 0 0 0 0 0 0 0 0
41 1146 0 0 0 0 0 0 0 0 0
42 1147 0 0 0 0 0 0 0 0 0
43 1545 0 0 0 0 0 0 0 0 0
44 1630 0 0 0 0 0 0 0 0 0
45 1848 0 0 0 0 0 0 0 0 0
I would like to make a sample strata in each of this combination groups: cod_jer and grupo_fict, in case of vacancies are 0, the sample size will be 0.
I was trying this:
size=subset(vacantes,select=c(vac1,vac2,vac3,vac4,vac5,vac6,vac7,vac8))
size=as.matrix(size)
size=as.vector(size)
for(i in 1:length(size)) {
if (size[i] > 0 ) {
s=strata(per,c("cod_jer","grupo_fict"),size=size,
method="srswor")
} else {
s="0"
}}
But I cant get it work :(
Any suugestion?
Thanks!

validating model in new dataset

I have a dataset (d) in which I am looking at the prediction of hospital mortality (0 or 1) using tn1 and 3 other variables (rms package). I have built the model and now I would like to validate it in a second dataset using the same model coefficients. The variables have the same names etc, but I don’t know how to keep the coefficients from f1, rather than letting the model generate new coefficients for the second dataset.
I would be grateful for your expertise, many thanks, Annemarie
f1 <- lrm(outcomehosp ~ I(log2((tn1+0.001))) + apscore_ad + emsurg +
corrapiidiag, data = d)
record_id| corrapiidiag| tn1 |emsurg| apscore_ad |outcomehosp
7 3 0.27 1 24 1
8 9 0 1 21 0
9 7 0.11 0 22 0
11 9 0 0 13 0
12 9 13.9 0 17 0
13 22 5.02 0 37 0
21 9 9.6 0 34 0
25 9 0 0 10 0
27 9 0 0 33 1
28 25 0 0 18 1
30 9 0 0 19 0
31 9 0.16 0 26 1
32 9 0 0 13 1
34 7 0 0 18 0
35 9 0 0 20 0
36 9 3.03 0 41 1
37 9 0 0 18 0
38 9 0 0 18 0
39 9 0 0 17 0
40 9 0.14 0 23 0
41 9 0 0 10 0
42 9 0 0 8 0
43 9 2.45 0 11 0
45 9 0 1 12 0
46 9 0.16 1 17 0
49 9 0 1 22 0
50 9 0 0 15 0
51 9 0.05 1 16 0

Create one column out of several columns Rstudio or Excel

I've tried to find an already existing question on this matter but I couldn't so that is why I'm asking here you:
Summary:
I want to make ONE column out of several Columns. All the values in the columns are put in the same order as they are and also, the columns should be stacked below each other.
Description and details
Below is an example of how my csv.file look like. However, note that there is >400 columns and that's why I don't want to do it manually in for example Excel. ALL columns has 24 rows each.
X1 X2 X3 X4 X5 X6 ... X470
0 1 5 10 8 0 7
0 0 0 0 0 0 0
2 3 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
I want to "stack" all the columns in one column, as I've shortly described in the summary:
Info: The sign "..." below implies the rest of the values from that column.
VALUE FROM COLUMN
0 X1
0 X1
2 X1
...
1 X2
0 X2
3 X2
...
5 X3
...
10 X4
...
8 X5
...
0 X6
...
7 X470
...
So in the end, instead of having 486 column where each of them have 24 rows. I will have 1 column with 11664 rows. It would be good if the origin column is written in a new column on the side (as showed above) but this is not "obligated".
OBS! Note that I've with this df just showed in general what I want to achieve, so clear and understandable commands are appreciated as I will apply it to my df.
It doesn't matter if the solution is done in R or Excel! As long as it is easy to do
I hope my description is clear, otherwise please let me know so I can try to describe again.
Many thanks for suggestions and help.
Kind regards, Elin
We can use stack to get the values in one column and the colnames in the next.
stack(df)
Or use unlist
data.frame(VALUE=unlist(df),
fromColumn= rep(names(df), each=nrow(df)))
Here's a VBA user defined function to do the job:
Function ConcatCols(Colrange As Variant) As Variant
Dim LongCol() As Variant, i As Long, j As Long, k As Long
Dim NumCols As Long, NumRows As Long, NumRows2 As Long
If TypeName(Colrange) = "Range" Then Colrange = Colrange.Value2
NumRows = UBound(Colrange)
NumCols = UBound(Colrange, 2)
NumRows2 = NumRows * NumCols
ReDim LongCol(1 To NumRows2, 1 To 1)
k = 1
For i = 1 To NumCols
For j = 1 To NumRows
LongCol(k, 1) = Colrange(j, i)
k = k + 1
Next j
Next i
ConcatCols = LongCol
End Function
Enter the code in a VBA module then enter =ConcatCols(A1:RL24) as an array function (Ctrl-Shift-Enter) in column RM (or wherever you want) to view the entire concatenated column, or call from a VBA sub to write the data to the spreadsheet.
The following is pretty simple but requires loading the reshape2 package which I think is included in base. As suggested above, stack() gives similar output, but reverses the columns.
library(reshape2)
df <- data.frame("A" = 1:21, "B" = 21:41, "C" = 40:60)
> df
A B C
1 1 21 40
2 2 22 41
3 3 23 42
4 4 24 43
5 5 25 44
6 6 26 45
7 7 27 46
8 8 28 47
9 9 29 48
10 10 30 49
11 11 31 50
12 12 32 51
13 13 33 52
14 14 34 53
15 15 35 54
16 16 36 55
17 17 37 56
18 18 38 57
19 19 39 58
20 20 40 59
21 21 41 60
melt(df)
> melt(df)
No id variables; using all as measure variables
variable value
1 A 1
2 A 2
3 A 3
4 A 4
5 A 5
6 A 6
7 A 7
8 A 8
9 A 9
10 A 10
11 A 11
12 A 12
13 A 13
14 A 14
15 A 15
16 A 16
17 A 17
18 A 18
19 A 19
20 A 20
21 A 21
22 B 21
23 B 22
24 B 23
25 B 24
26 B 25
27 B 26
28 B 27
29 B 28
30 B 29
31 B 30
32 B 31
33 B 32
34 B 33
35 B 34
36 B 35
37 B 36
38 B 37
39 B 38
40 B 39
41 B 40
42 B 41
43 C 40
44 C 41
45 C 42
46 C 43
47 C 44
48 C 45
49 C 46
50 C 47
51 C 48
52 C 49
53 C 50
54 C 51
55 C 52
56 C 53
57 C 54
58 C 55
59 C 56
60 C 57
61 C 58
62 C 59
63 C 60

How to show all threads on a specific CPU on Solaris?

Some process (or threads) is hammering CPU0 as you can see in mpstat 30 2
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 0 13 0 2 7 0 151 0 4250 99 1 0 0
1 114 0 2 197 84 5220 5 10 109 0 10518 30 2 0 67
2 79 0 1 184 83 5208 5 5 89 0 9788 30 2 0 68
3 67 0 1 181 84 5150 5 4 87 0 9510 30 2 0 69
4 53 0 3 171 72 12238 4 7 183 0 22214 3 3 0 94
5 43 0 3 135 7 218 2 6 16 0 162 0 1 0 99
6 110 0 2 172 79 4918 5 3 164 0 9553 34 2 0 64
7 120 0 1 180 80 4873 4 4 194 0 9494 32 2 0 66
8 53 0 1 23 2 28665 5 7 494 0 62023 12 9 0 79
9 43 0 0 34 2 21469 6 8 676 0 58090 10 13 0 77
10 59 0 1 210 2 33462 4 4 227 0 63500 7 16 0 78
11 93 0 2 16940 16627 1261 2 6 1027 0 2043 0 10 0 90
12 17 0 1 65 3 59 0 3 3 0 19 0 0 0 100
13 6 0 1 89 4 104 0 3 2 0 9 0 0 0 100
14 4 0 10 65 5 54 0 3 1 0 12 0 0 0 100
15 4 0 1 66 6 56 0 3 2 0 21 0 0 0 100
16 2 0 0 91 16 78 0 3 2 0 30 0 0 0 100
17 17 0 1 80 15 70 0 4 2 0 79 0 0 0 100
18 76 0 3 14946 14928 25 0 4 24 0 102 0 4 0 96
19 57 0 0 20 2 17 0 3 15 0 107 0 0 0 100
20 18 0 0 26 0 25 0 3 10 0 21 0 0 0 100
21 0 0 0 106 70 46 0 3 4 0 40 0 1 0 99
22 13 0 0 31 3 28 0 3 4 0 49 0 0 0 100
23 0 0 0 35 5 24 0 3 5 0 54 0 0 0 100
but with prstat -P0 only see the ndbmtd running wit around 15% on CPU0
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
20028 root 77G 75G cpu0 40 0 8369:33:0 15% ndbmtd/44
660 root 6200K 3700K sleep 59 0 0:00:53 0.0% inetd/4
159 daemon 4540K 2408K sleep 59 0 0:00:09 0.0% kcfd/3
11 root 11M 10M sleep 59 0 0:00:58 0.0% svc.configd/15
Is there a way to show all processes and treads on CPU0?
To show all processes and threads (LWPs) on CPU0:
prstat -P0 -L

Resources