Estimation transition matrix with low observation count - r

I am building a markov model with an relativ low count of observations for a given number of states.
Are there other methods to estimate the real transition probabilities than the cohort method? Especially to ensure that the probabilities are decreasing with increasing distance from the current state. The pair (11,14) does not behave in that manner and the underlying model wouldn't support this.
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
2 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 1 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 2 10 8 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 9 53 13 2 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 17 42 17 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 21 71 21 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 23 102 21 3 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 23 57 33 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 1 34 142 28 1 3 0 0 0 0 0
12 0 0 0 0 0 0 0 0 1 28 127 27 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 28 134 27 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 27 93 20 2 0 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 23 133 19 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 22 114 20 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 192 19 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 263 21
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24 827
Thanks

Related

EDITED: spreading data based on column match

I have an empty data frame I am trying to populate.
Df1 looks like this:
col1 col2 col3 col4 important_col
1 82 193 104 86 120
2 85 68 116 63 100
3 78 145 10 132 28
4 121 158 103 15 109
5 48 175 168 190 151
6 91 136 156 180 155
Df2 looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A data frame full of 0's.
I combine the data frames to make df_fin.
What I am trying to do now is something similar to a dummy variable approach… I have the column in important_col. What I am trying to do is spread this column out, so if important_col = 28 then put a 1 in column 28.
How can I go about creating this?
EDIT: I added a comment to illustrate what I am trying to achieve. I paste it here also.
Say that the important_col is countries, then the column names would
be all the countries in the world. That is in this example all of the
241 countries in the world. However the data I might have already
collected might only contain 200 of these countires. So
one_hot_encoding here would give me 200 columns but I am missing
potentially 41 countries. So if a new user from a country (not
currently in the data) comes to the data and inputs their country,
then it wouldn´t be recognised
Smaller example:
col1 col2 col3 col4 important_col 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 11 14 3 11 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 19 15 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 17 10 10 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 13 10 8 17 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 18 5 3 18 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 11 10 9 5 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 5 11 18 16 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 5 8 13 8 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 10 1 7 16 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 4 17 17 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Expected output:
col1 col2 col3 col4 important_col 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 11 14 3 11 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 19 15 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 17 10 10 6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 13 10 8 17 10 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
5 18 5 3 18 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
6 11 10 9 5 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
7 5 11 18 16 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
8 5 8 13 8 6 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 10 1 7 16 12 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
10 4 17 17 3 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The number of columns is greater than the number of potential entries into important_col. Using the countries example the columns would be all countries in the world and the important_col would consist of a subset of these countries.
Code to generate the above:
df1 <- data.frame(replicate(5, sample(1:20, 10, rep=TRUE)))
colnames(df1) <- c("col1", "col2", "col3", "col4", "important_col")
df2 <- data.frame(replicate(20, sample(0:0, nrow(df1), rep=TRUE)))
colnames(df2) <- gsub("X", "", colnames(df2))
df_fin <- cbind(df1, df2)
df_fin
Does this solve the problem:
Data:
set.seed(123)
df1 <- data.frame(replicate(5, sample(1:20, 10, rep=TRUE)))
colnames(df1) <- c("col1", "col2", "col3", "col4", "important_col")
df2 <- data.frame(replicate(20, sample(0:0, nrow(df1), rep=TRUE)))
colnames(df2) <- gsub("X", "", colnames(df2))
df_fin <- cbind(df1, df2)
Result:
vecp <- colnames(df2)
imp_col <- df1$important_col
m <- matrix(vecp, byrow = TRUE, nrow = length(imp_col), ncol = length(vecp))
d <- ifelse(m == imp_col, 1, 0)
df_fin <- cbind(df1, d)
Output:
col1 col2 col3 col4 important_col 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 6 20 18 20 3 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 16 10 14 19 9 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
3 9 14 13 14 9 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
4 18 12 20 16 8 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5 19 3 14 1 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 1 18 15 10 3 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 11 5 11 16 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 18 1 12 5 10 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
9 12 7 6 7 6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 10 20 3 5 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
What you are trying to do is one hot encoding which you can easily achieve using model.matrix
Below example should take you to the right direction:
df <- data.frame(important_col = as.factor(c(1:3)))
df
important_col
1 1
2 2
3 3
as.data.frame(model.matrix(~.-1, df))
important_col1 important_col2 important_col3
1 1 0 0
2 0 1 0
3 0 0 1
Like Sonny mentioned, model.matrix() should do the job. One potential problem is that you have to add back columns that did not show up in your important_col like the following case:
df <- data.frame(important_col = as.factor(c(1:3, 5)))
df
important_col
1 1
2 2
3 3
4 5
as.data.frame(model.matrix(~.-1, df))
important_col1 important_col2 important_col3 important_col5
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
Col4 is missing in the second df, because the important_col does not include value 4. You have to add back the col 4 if you need it for analysis.

Garch(1,1) with Dummy Variable

I am trying in R to use Garch(1,1) to estimate the influence of day of the week, and also later other parameters, on my log return (ln(Pt/Pt-1)) of Product sells
I have all setup in a CSV file and for each Day a dummy variable (D1,D2) with 1 or 0 as value
I am building the following model in R
#Bind Data
ext.reg.D1 <- mydata$D1
ext.reg.D2 <- mydata$D2
ext.reg.D3 <- mydata$D3
ext.reg.D4 <- mydata$D4
ext.reg.D5 <- mydata$D5
ext.reg.D6 <- mydata$D6
ext.reg.D7 <- mydata$D7
ext.reg <- cbind(ext.reg.D1, ext.reg.D2, ext.reg.D3,ext.reg.D4,ext.reg.D5,ext.reg.D6)
y <- mydata$log_return
fit.spec <- ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(1, 1), submodel = NULL, external.regressors = NULL, variance.targeting = FALSE), mean.model = list(armaOrder = c(0, 0), external.regressors = ext.reg), distribution.model = "norm", start.pars = list(), fixed.pars = list())
fit <- ugarchfit(data = y, spec = fit.spec)
Error
In .sgarchfit(spec = spec, data = data, out.sample = out.sample, : ugarchfit-->warning: solver failer to converge.
Any ideas how to solve this?
Thanks
Sampled Data 14 Rows
log_return D5 D6 D7 D1 D2 D3 D4
1 -0.02979189 1 0 0 0 0 0 0
2 17.43188265 0 1 0 0 0 0 0
3 -9.12727223 0 0 1 0 0 0 0
4 2.77744081 0 0 0 1 0 0 0
5 9.62597392 0 0 0 0 1 0 0
6 -0.11614358 0 0 0 0 0 1 0
7 10.81279075 0 0 0 0 0 0 1
8 -1.03825650 1 0 0 0 0 0 0
9 -5.49109661 0 1 0 0 0 0 0
10 -16.81177602 0 0 1 0 0 0 0
11 9.74292804 0 0 0 1 0 0 0
12 15.22583595 0 0 0 0 1 0 0
13 -1.79578436 0 0 0 0 0 1 0
14 0.40559431 0 0 0 0 0 0 1
15 -2.38281092 1 0 0 0 0 0 0
16 -4.88853323 0 1 0 0 0 0 0
17 -16.98493635 0 0 1 0 0 0 0
18 7.57998016 0 0 0 1 0 0 0
19 17.56008274 0 0 0 0 1 0 0
20 -0.46754932 0 0 0 0 0 1 0
21 -1.27007966 0 0 0 0 0 0 1
22 -1.79234966 1 0 0 0 0 0 0
23 -5.79461986 0 1 0 0 0 0 0
24 -17.82636881 0 0 1 0 0 0 0
25 9.48124679 0 0 0 1 0 0 0
26 17.64277207 0 0 0 0 1 0 0
27 -0.71191725 0 0 0 0 0 1 0
28 -1.14937870 0 0 0 0 0 0 1
29 -1.62331777 1 0 0 0 0 0 0
30 -5.52787401 0 1 0 0 0 0 0
31 -18.50034717 0 0 1 0 0 0 0
32 10.31502542 0 0 0 1 0 0 0
33 16.21997258 0 0 0 0 1 0 0
34 -1.09910695 0 0 0 0 0 1 0
35 -0.57416519 0 0 0 0 0 0 1
36 -1.83623328 1 0 0 0 0 0 0
37 -5.48021232 0 1 0 0 0 0 0
38 -20.02869823 0 0 1 0 0 0 0
39 11.48799875 0 0 0 1 0 0 0
40 17.55356524 0 0 0 0 1 0 0
41 -1.45430558 0 0 0 0 0 1 0
42 -2.15287757 0 0 0 0 0 0 1
43 -4.91058837 1 0 0 0 0 0 0
44 -4.35107354 0 1 0 0 0 0 0
45 -19.40533612 0 0 1 0 0 0 0
46 6.47785167 0 0 0 1 0 0 0
47 16.54500844 0 0 0 0 1 0 0
48 1.43266482 0 0 0 0 0 1 0
49 1.91234500 0 0 0 0 0 0 1
50 -1.44926252 1 0 0 0 0 0 0
51 -5.69296574 0 1 0 0 0 0 0
52 -14.21241905 0 0 1 0 0 0 0
53 9.85180551 0 0 0 1 0 0 0
54 16.72072000 0 0 0 0 1 0 0
55 -1.04381003 0 0 0 0 0 1 0
56 -1.49048390 0 0 0 0 0 0 1
57 -2.57835848 1 0 0 0 0 0 0
58 -2.93456505 0 1 0 0 0 0 0
59 -21.27981318 0 0 1 0 0 0 0
60 14.27747712 0 0 0 1 0 0 0
61 15.20376637 0 0 0 0 1 0 0
62 -2.36474181 0 0 0 0 0 1 0
63 -0.12825700 0 0 0 0 0 0 1
64 -2.17755007 1 0 0 0 0 0 0
65 -6.50236487 0 1 0 0 0 0 0
66 -20.40159745 0 0 1 0 0 0 0
67 10.12381534 0 0 0 1 0 0 0
68 19.34672964 0 0 0 0 1 0 0
69 -0.18663788 0 0 0 0 0 1 0
70 -1.26430704 0 0 0 0 0 0 1
71 -2.17712050 1 0 0 0 0 0 0
72 -5.20850527 0 1 0 0 0 0 0
73 -19.00303225 0 0 1 0 0 0 0
74 10.78960865 0 0 0 1 0 0 0
75 16.50911599 0 0 0 0 1 0 0
76 -1.20629718 0 0 0 0 0 1 0
77 -0.92077350 0 0 0 0 0 0 1
78 -2.13818901 1 0 0 0 0 0 0
79 -6.39795596 0 1 0 0 0 0 0
80 -16.89947946 0 0 1 0 0 0 0
81 11.84070286 0 0 0 1 0 0 0
82 16.76126417 0 0 0 0 1 0 0
83 -2.32992683 0 0 0 0 0 1 0
84 -0.04347497 0 0 0 0 0 0 1
85 -1.58421553 1 0 0 0 0 0 0
86 -5.11294741 0 1 0 0 0 0 0
87 -22.94382512 0 0 1 0 0 0 0
88 12.08906834 0 0 0 1 0 0 0
89 18.59588505 0 0 0 0 1 0 0
90 -0.66190281 0 0 0 0 0 1 0
91 -3.35891858 0 0 0 0 0 0 1
92 -5.56096067 1 0 0 0 0 0 0
93 -19.12946131 0 1 0 0 0 0 0
94 -2.45717082 0 0 1 0 0 0 0
95 -6.00314421 0 0 0 1 0 0 0
96 16.87403882 0 0 0 0 1 0 0
97 16.72700765 0 0 0 0 0 1 0
98 -1.80683941 0 0 0 0 0 0 1
99 -2.08228231 1 0 0 0 0 0 0
100 -5.98864409 0 1 0 0 0 0 0
101 -14.91991224 0 0 1 0 0 0 0
I think the problem is that the explanatory variables are all dummy variables. You should include another non dummy variable as x with D1...D7. Your model does not make sense without this variable.
You can not estimate y (which is a continuous variable) with only dummy ones. try for example to add y-1 to
ext.reg <- cbind(ext.reg.D1, ext.reg.D2, ext.reg.D3,ext.reg.D4,ext.reg.D5,ext.reg.D6)
good luck
change your ext.reg for this
ext.reg <- cbind(ext.reg.D1, ext.reg.D2, ext.reg.D3, ext.reg.D4,
ext.reg.D5, ext.reg.D6, ext.reg.D7)
men see, solved exercise.

raster plot with dataframes

I am new to raster plots, and I am not sure which one is the fastest and more appropriate way to create a raster plot with my data.
I have a dataframe with 64 rows (location) and 202 columns (time). The values of the dataframe can be 0, 1 or 2. I would like to create a raster plot (with time as x axis, and location as y axes) in which I can visualise the values with 0 as white rectangles, the values with 1 as black rectangles and the values with 2 as grey rectangles.
X 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
fp1 0 0 0 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
fp2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f3 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f4 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
p3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
p4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
o1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
library(raster)
r <- raster(as.matrix(mtcars))
r[0:95] <- 0
r[96:180] <- 2
r[181:352] <- 1
breakpoints <- c(0, 1, 2)
colors <- c("white","black","grey")
plot(r, breaks=breakpoints, col=colors, axes = FALSE, legend = FALSE)

Turn a long data structure to a wide matrix structure

I do have the following data structure...
ID value
1 1 1
2 1 63
3 1 2
4 1 58
5 2 3
6 2 4
7 3 34
8 3 25
Now I want to turn it into a kind of dyadic data structure. Every ID with the same value should have a relationship.
I tried several option and:
df_wide <- dcast(df, ID ~ value)
... have brought me a long way down the road...
ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 39 40
1 1001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0
4 1011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 1018 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 1020 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
7 1030 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
8 1036 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Now is my main problem to turn it into a proper matrix to get a igraph object out of it.
df_wide_matrix <- data.matrix(df_wide)
df_aus_wide_g <- graph.edgelist(df_wide_matrix ,directed = TRUE)
don't get me there...
I also tried to transform it into a adjacency matrix...
df_wide_matrix <- get.adjacency(graph.edgelist(as.matrix(df_wide), directed=FALSE))
... but it didn't work either
If you want to create an edge between all IDs with the same value, try something like this instead. First merge the data frame onto itself by the value. Then, remove the value column, and remove all (undirected) edges that are duplicate or just points. Finally, convert to a two-column matrix and create the edges.
res <- merge(df, df, by='value', all=FALSE)[,c('ID.x','ID.y')]
res <- res[res$ID.x<res$ID.y,]
resg <- graph.edgelist(as.matrix(res))

Loosing observation when I use reshape in R

I have data set
> head(pain_subset2, n= 50)
PatientID RSE SE SECODE
1 1001-01 0 0 0
2 1001-01 0 0 0
3 1001-02 0 0 0
4 1001-02 0 0 0
5 1002-01 0 0 0
6 1002-01 1 2a 1
7 1002-02 0 0 0
8 1002-02 0 0 0
9 1002-02 0 0 0
10 1002-03 0 0 0
11 1002-03 0 0 0
12 1002-03 1 1 1
> dim(pain_subset2)
[1] 817 4
> table(pain_subset2$RSE)
0 1
788 29
> table(pain_subset2$SE)
0 1 2a 2b 3 4 5
788 7 5 1 6 4 6
> table(pain_subset2$SECODE)
0 1
788 29
I want to create matrix with n * 6 (n :# of PatientID, column :6 levels of SE)
I use reshape, I lost many observations
> dim(p)
[1] 246 9
My code:
p <- reshape(pain_subset2, timevar = "SE", idvar = c("PatientID","RSE"),v.names = "SECODE", direction = "wide")
p[is.na(p)] <- 0
> table(p$RSE)
0 1
226 20
Compare with table of RSE, I lost 9 patients having 1.
This is out put I have
PatientID RSE SECODE.0 SECODE.2a SECODE.1 SECODE.5 SECODE.3 SECODE.2b SECODE.4
1 1001-01 0 0 0 0 0 0 0 0
3 1001-02 0 0 0 0 0 0 0 0
5 1002-01 0 0 0 0 0 0 0 0
6 1002-01 1 0 1 0 0 0 0 0
7 1002-02 0 0 0 0 0 0 0 0
10 1002-03 0 0 0 0 0 0 0 0
12 1002-03 1 0 0 1 0 0 0 0
13 1002-04 0 0 0 0 0 0 0 0
15 1003-01 0 0 0 0 0 0 0 0
18 1003-02 0 0 0 0 0 0 0 0
21 1003-03 0 0 0 0 0 0 0 0
24 1003-04 0 0 0 0 0 0 0 0
27 1003-05 0 0 0 0 0 0 0 0
30 1003-06 0 0 0 0 0 0 0 0
32 1003-07 0 0 0 0 0 0 0 0
35 1004-01 0 0 0 0 0 0 0 0
36 1004-01 1 0 0 0 1 0 0 0
40 1004-02a 0 0 0 0 0 0 0 0
Anyone knows what happens, I really appreciate.
Thanks for your help, best.
Try:
library(dplyr)
library(tidyr)
pain_subset2 %>%
spread(SE, SECODE)

Resources