Related
This question was migrated from Stack Overflow because it can be answered on Cross Validated.
Migrated 24 days ago.
Suppose there are several categorical variables included in the LASSO regression.
For a categorical variable with more than two factors, it is mandatory to create a dummy table.
For example, the categorical variable is vaccination status (Vacc_Stat), in which there are three categories,i.e., 1 = not vaccinated, 2 = partially vaccinated, and 3 = fully vaccinated.
Using the model. matrix function for the vaccination status variable will yield two dummy columns because the value 1= not vaccinated is the reference.
If the final result of the LASSO regression coefficient is as follows
Vacc_Stat1 .
Vacc_Stat2 .
Vacc_Stat3 -4.208877e-01
Do we use the Vacc_Stat3 only, or we used the Vacc_Stat variable as a whole?
I am planning to do a LASSO regression followed by a logistic regression of the remaining variables selected through LASSO regression.
Thank you in advance.
I am expecting that if one of the dummy variables is included in the LASSO regression, then we used the original categorical variable as a whole.
The following is the minimal reproducible dataset
structure(c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1,
1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0,
1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0,
0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0,
1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1,
1, 1, 1, 1, 1, 0, 1, 0, 0.34, 0.49, 38, 0.58, 0.2, 0.49, 0.65,
40.57, 2.08, 49.52, 50.77, 38.04, 76.55, 55.95, 53.23, 38.04,
99.72, 80.04, 92.41, 47, 66, 70, 52, 36, 39, 67, 42, 23, 66,
37.109375, 31.22945431, 26.2345679, 20.76124567, 35.3798127,
26.44628099, 23.87511478, 24.8015873, 21.49959688, 22.47120876,
110, 159, 127, 100, 120, 115, 100, 112, 130, 119, 72, 78, 80,
72, 80, 73, 76, 75, 80, 78, 84, 86, 88, 103, 90, 91, 90, 82,
88, 105, 36, 37, 36.5, 36, 38, 38, 36, 36.4, 37, 36, 20, 40,
24, 20, 22, 24, 18, 20, 22, 20, 90, 99, 98, 99, 96, 90, 98, 99,
99, 90, 7, 5, 0, 2, 7, 10, 3, 3, 2, 2, 11.7, 13.8, 13, 10.9,
11.6, 14.5, 15, 16.2, 12.3, 14.2, 3.9, 4.2, 3.6, 4.7, 4, 3.2,
4.4, 5.1, 3, 3.78, 15.7, 28.8, 6, 7.8, 37.6, 13.9, 26.6, 27.2,
33, 23, 138, 139, 121, 135, 139, 132, 133, 138, 137, 128, 75,
64.5, 87.4, 88.9, 47.1, 78, 61.8, 62.52, 56.3, 63.2, 753, 305,
250, 267, 315, 207, 285, 293, 366, 307, 8.7, 8.1, 11.2, 75.9,
13.7, 10.03, 42.2, 10, 9, 10.6, 11.07, 6.8, 1.18, 23.18, 4.33,
5.25, 8.73, 7.44, 8.01, 10.37), dim = c(10L, 76L), dimnames =
list(
c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"),
c("TEST.Year2",
"TEST.Year3", "TEST.Gender2", "TEST.Vacc_Stat1",
"TEST.Vacc_Stat2",
"TEST.Vacc_Stat3", "TEST.Risk_AI2", "TEST.Risk_Obesity2",
"TEST.Risk_Smoking2",
"TEST.Risk_HT2", "TEST.Risk_DM2", "TEST.Risk_Asthma2",
"TEST.Risk_CHD2",
"TEST.Risk_CVD2", "TEST.Risk_COPD2", "TEST.Risk_TBC2",
"TEST.Risk_CKD2",
"TEST.Risk_CLD2", "TEST.Risk_Brain2", "TEST.Risk_HIV2",
"TEST.Risk_Cancer2",
"TEST.Symptom_Fever2", "TEST.Symptom_Cough2",
"TEST.Symptom_Sore_Throat2",
"TEST.Symptom_Rinnorrhea2", "TEST.Symptom_Anosmia2",
"TEST.Symptom_Myalgia2",
"TEST.Symptom_Headache2", "TEST.Symptom_Malaise2",
"TEST.Symptom_Anorexia2",
"TEST.Symptom_Diarrhea2", "TEST.Symptom_Nausea2",
"TEST.Symptom_Vomitting2",
"TEST.Symptom_Abd_Pain2", "TEST.Symptom_Dyspneu2",
"TEST.Symptom_Chest_Pain2",
"TEST.Symptom_LOC2", "TEST.Lab_RT_PCR2", "TEST.CXR_Proj2",
"TEST.CXR_Proj3",
"TEST.CXR_Pneumonia1", "TEST.CXR_Pneumonia2",
"TEST.CXR_Effusion2",
"TEST.Co_Septic2", "TEST.Co_Septic_Shock2", "TEST.Co_ARDS2",
"TEST.Co_Sx_Infection2", "TEST.Severity_Adm2",
"TEST.Severity_Adm3",
"TEST.Severity_Adm4", "TEST.Severity_Adm_Cat_12",
"TEST.Severity_Adm_Cat_22",
"TEST.Severity_Adm_Cat_32", "TEST.Severity_Worst2",
"TEST.Severity_Worst3",
"TEST.Severity_Worst4", "TEST.Progression2", "TEST.CXR_ALA_Num",
"TEST.CXR_Prob_Num", "TEST.Age", "TEST.BMI", "TEST.Vital_SBP",
"TEST.Vital_DBP", "TEST.Vital_PR", "TEST.Vital_Temp",
"TEST.Vital_RR",
"TEST.Vital_SpO2", "TEST.Symptom_Onset", "TEST.Lab_Hb",
"TEST.Lab_K",
"TEST.Lab_Lim", "TEST.Lab_Na", "TEST.Lab_Neu", "TEST.Lab_Tr",
"TEST.Lab_Ur", "TEST.Lab_WBC")))
I am trying to properly model minimum temperature using maximum temperature, precipitation, and month of year. I know there are lots of questions on how to use a factor in a linear model, but honestly, none of them seem to answer my questions. The way R treats and uses dummy variables is very confusing for me. Here is a tiny sample of my data, with code following.
data <- structure(list(month = c(5, 6, 9, 8, 9, 9, 10, 10, 1, 3, 6, 4,
11, 1, 3, 12, 8, 5, 12, 3, 10, 12, 9, 1, 1, 10, 12, 4, 7, 7,
11, 8, 10, 3, 7, 1, 3, 9, 10, 11, 5, 1, 7, 10, 9, 11, 7, 4, 6,
12, 10, 11, 11, 7, 5, 7, 5, 1, 6, 6, 5, 1, 1, 5, 5, 11, 12, 6,
10, 6, 2, 6, 4, 11, 9, 6, 11, 3, 8, 12, 6, 2, 6, 3, 10, 9, 4,
4, 5, 11, 11, 11, 1, 8, 4, 4, 10, 12, 9, 8), tmax = c(54, 84,
74, 82, 63, 87, 68, 59, -4, 17, 69, 42, 46, 29, 38, 42, 95, 67,
22, 48, 50, 34, 74, 40, 1, 71, 49, 32, 89, 74, 56, 92, 69, 23,
86, 49, 47, 84, 48, 73, 62, 8, 83, 60, 69, 17, 90, 69, 77, 37,
55, 43, 38, 93, 52, 84, 73, 35, 75, 83, 53, 33, 33, 81, 68, 55,
31, 98, 72, 80, 13, 85, 71, 48, 68, 85, 53, 48, 92, 4, 61, 34,
89, 62, 50, 62, 73, 63, 63, 33, 31, 57, 7, 72, 45, 64, 63, 31,
65, 85), tmin = c(0.04, 0.21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0.01, 0, 0, 0, 0.14, 0.18, NA, 0.13, 0, 0.15, NA, 0.02, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0.5, 0, 0, 0, 0.38, 0, 0, 0, 0.01, 0, 0.42,
NA, 0, NA, 0, NA, 0, 0, 0, 0, 0, 0, 0.25, 0, 0, 0.84, 0.03, 0,
0, 0, 0, 0, 0, 0, 0.01, 0, NA, 0.26, 0, 0, 0, 0.32, 0, 0, 0,
0, 0.2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0, 0, 0, NA,
0.02, 0), precip = c(0.04, 0.21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0.01, 0, 0, 0, 0.14, 0.18, NA, 0.13, 0, 0.15, NA, 0.02, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0, 0, 0, 0.38, 0, 0, 0, 0.01, 0,
0.42, NA, 0, NA, 0, NA, 0, 0, 0, 0, 0, 0, 0.25, 0, 0, 0.84, 0.03,
0, 0, 0, 0, 0, 0, 0, 0.01, 0, NA, 0.26, 0, 0, 0, 0.32, 0, 0,
0, 0, 0.2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0, 0, 0,
NA, 0.02, 0)), row.names = c(11604L, 32822L, 32919L, 35089L,
40958L, 3690L, 34052L, 19787L, 26818L, 14839L, 21143L, 32761L,
14364L, 14043L, 30552L, 30077L, 5846L, 2486L, 25352L, 13369L,
21268L, 6355L, 16844L, 26847L, 35593L, 20523L, 10359L, 9379L,
6200L, 26647L, 23129L, 19388L, 38057L, 12637L, 42724L, 15875L,
1314L, 7352L, 34397L, 12146L, 27310L, 20622L, 8026L, 12121L,
26709L, 7409L, 1091L, 11587L, 23699L, 31917L, 14328L, 19458L,
10322L, 351L, 43747L, 23350L, 31329L, 8939L, 42693L, 34279L,
18541L, 25011L, 37791L, 17834L, 2845L, 12519L, 19848L, 3978L,
5907L, 28075L, 15177L, 3616L, 32037L, 9955L, 1498L, 17858L, 10700L,
27624L, 4768L, 24624L, 20036L, 5683L, 43408L, 37485L, 21255L,
15747L, 15234L, 7933L, 27690L, 24227L, 17286L, 30781L, 2358L,
9885L, 28380L, 35327L, 8851L, 14743L, 37314L, 8057L), class = "data.frame")
If I use the following code, January is missing from the output (the output I have below is using the entire data set of 42000 rows). Does this mean that the intercept represents January?
tmin_model <- lm(data$tmin ~ data$tmax + data$precip + as.factor(data$month))
Call:
lm(formula = data$tmin ~ data$tmax + data$precip + as.factor(data$month))
Residuals:
Min 1Q Median 3Q Max
-41.663 -4.827 0.182 5.110 22.489
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -13.524700 0.148019 -91.371 < 2e-16 ***
data$tmax 0.674834 0.003098 217.837 < 2e-16 ***
data$precip 6.671204 0.164683 40.509 < 2e-16 ***
as.factor(data$month)2 1.090986 0.187072 5.832 5.52e-09 ***
as.factor(data$month)3 5.868886 0.189904 30.904 < 2e-16 ***
as.factor(data$month)4 7.325417 0.209629 34.945 < 2e-16 ***
as.factor(data$month)5 10.453276 0.230197 45.410 < 2e-16 ***
as.factor(data$month)6 14.364899 0.250073 57.443 < 2e-16 ***
as.factor(data$month)7 15.382325 0.260707 59.002 < 2e-16 ***
as.factor(data$month)8 14.269489 0.256420 55.649 < 2e-16 ***
as.factor(data$month)9 10.729316 0.238739 44.942 < 2e-16 ***
as.factor(data$month)10 7.209093 0.214178 33.659 < 2e-16 ***
as.factor(data$month)11 5.950449 0.192669 30.884 < 2e-16 ***
as.factor(data$month)12 2.752499 0.183948 14.963 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.286 on 39784 degrees of freedom
(4411 observations deleted due to missingness)
Multiple R-squared: 0.8929, Adjusted R-squared: 0.8929
F-statistic: 2.553e+04 on 13 and 39784 DF, p-value: < 2.2e-16
Do I need to create "dummy" variables for each month to do this properly? Also, how do I do a "predict" on this with just a couple data points? I always get the full 42000-some rows returned when all I want is a couple data points returned using the model. For example, for just January for one point, why does the following code return 42000 rows?
predict.lm(tmin_model, newdata = data.frame(tmax = rnorm(1, 20, 13), month = 1, precip = 0, tmin = NA))
Thank you.
Constructing the model as
data$month <- factor(data$month)
tmin_model <- lm(tmin ~tmax + precip + month, data = data)
returns only one row
predict.lm(tmin_model, newdata =
data.frame(tmax = rnorm(1, 20, 13), month = factor(1), precip = 0, tmin = NA))
1
-7.905385e-18
Since I need to create an automatic shift planning tool, R lpSolve is the package I plan to use. However, I can't get the optimal output by the code shown as below. The output(hourly supply by accumulating all shifts available) can't fulfill the demand. e.g. The demand for a hour is 46 but the supply is only 25 which means 21 unit of demand can't be satisfied.
Background:
The objective
To minimize the difference between total supply and total demand.
To minimize the difference between supply and demand for each hour.
Constraint
Shift constraint - each hour might have several shifts available to be assigned.
Max capacity - I have the cap of supply. the sum of total shift cant exceed the cap for each hour. (46 is the cap in this example).
In the constr matrix, 24 rows represent 24 hours starting from 7 am and 9 columns refers to no. of shifts I have.
The constr.val refers to hourly demand.
Please let me know if there is anything unclear. thanks!
library(lpSolve)
obj.fun<-c(1,1,1,1,1,1,1,1,1)
constr<-matrix(c(
1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 0, 0, 0,
1, 0, 0, 1, 0, 1, 0, 1, 0,
1, 0, 0, 1, 0, 1, 0, 1, 0,
1, 0, 0, 1, 0, 1, 0, 1, 0,
0, 1, 0, 1, 0, 1, 0, 1, 0,
0, 1, 0, 1, 0, 1, 0, 1, 0,
0, 1, 0, 1, 0, 1, 0, 1, 0,
0, 1, 0, 0, 1, 1, 0, 1, 0,
0, 1, 0, 0, 1, 0, 1, 1, 0,
0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 1, 0, 1, 0, 1, 0, 1,
0, 0, 1, 0, 1, 0, 1, 0, 1,
0, 0, 1, 0, 1, 0, 1, 0, 1,
0, 0, 1, 0, 0, 0, 1, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0), nrow = 24, byrow = TRUE)
constr.dir <- rep("<=", 24)
constr.val <- c(24, 20, 21, 22, 26, 34, 40, 44,
46, 46, 46, 46, 46, 46, 46, 46,
46, 46, 46, 46, 46, 46, 41, 27)
day.shift <- lp("max", obj.fun, constr, constr.dir,
constr.val, compute.sens = TRUE)
day.shift$solution
I have a negative binomial regression model where I predict Twitter messages' retweet count based on their use of certain word types (ME words, Moral words, and Emotional words):
M1 <- glm.nb(retweetCount ~ ME_words + Moral_words + Emo_words, data = Tweets)
I now want to sample with bootstrapping (for instance, samples of 1,000 with replacement from the dataframe's original 500,000 messages) from the large dataset, Tweets, to run iterations of the model and analyse the variance of the coefficients. What is the best approach to doing this? I am assuming I need to use the boot package, but I am a bit lost with where to start.
Ideally, I would like to create a for loop that can run a number of iterations, and then store the coefficients of each model iteration in a new dataframe. This would be extremely useful for future analyses.
Here is some reproducible data from the much much large dataframe Tweets:
>dput((head(Tweets, 100)))
structure(list(retweetCount = c(1388, 762, 748, 436, 342, 320,
312, 295, 264, 251, 196, 190, 175, 167, 165, 163, 149, 148, 148,
146, 133, 132, 126, 124, 122, 122, 121, 120, 118, 118, 114, 113,
112, 110, 108, 107, 104, 101, 100, 96, 95, 94, 93, 92, 90, 90,
89, 89, 87, 86, 84, 83, 83, 83, 82, 82, 82, 82, 78, 78, 78, 76,
76, 76, 76, 74, 74, 73, 73, 72, 72, 71, 70, 70, 70, 70, 69, 69,
69, 68, 68, 67, 65, 65, 65, 65, 63, 62, 62, 61, 61, 61, 61, 60,
60, 59, 59, 59, 59, 58), ME_words = c(2, 2, 2, 0, 0, 1, 1, 0,
1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1,
0, 3, 0, 1, 0, 1, 1, 4, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0,
0, 0, 2, 2, 0, 0, 1, 0, 1, 0, 0, 2, 0, 0, 0, 1, 0, 1, 1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 1, 0, 0), Moral_words = c(0, 0, 1, 1, 1, 2, 0,
0, 0, 1, 0, 1, 2, 0, 1, 1, 1, 2, 0, 1, 0, 1, 1, 0, 2, 0, 1, 1,
1, 0, 1, 1, 1, 1, 0, 2, 0, 1, 1, 1, 2, 0, 1, 1, 1, 1, 0, 1, 0,
0, 5, 1, 1, 1, 1, 2, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 2, 0, 0, 0,
1, 1, 2, 0, 0, 0, 0, 0, 1, 3, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1,
1, 1, 0, 0, 2, 2, 1, 0, 0), Emo_words = c(0, 0, 1, 1, 0, 0, 2,
0, 1, 0, 2, 0, 1, 0, 1, 2, 0, 3, 1, 1, 2, 0, 0, 0, 0, 0, 1, 1,
1, 2, 0, 1, 0, 0, 0, 1, 0, 1, 0, 2, 0, 0, 1, 0, 1, 1, 2, 0, 0,
1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 3, 0, 0, 2, 0, 0, 1, 0,
1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 2, 2, 1, 0, 0, 0, 0, 2, 1, 0, 0,
1, 0, 0, 1, 2, 2, 0, 0, 0)), row.names = c(NA, -100L), class = c("tbl_df",
"tbl", "data.frame"))
You can use the boot package, but for simple versions of the bootstrap it's almost simpler to roll your own.
fit initial model
library(MASS)
M1 <- glm.nb(retweetCount ~ ME_words + Moral_words +
Emo_words, data = Tweets)
set up data structure for results
nboot <- 1000
bres <- matrix(NA,nrow=nboot,
ncol=length(coef(M1)),
dimnames=list(rep=seq(nboot),
coef=names(coef(M1))))
bootstrap
set.seed(101)
bootsize <- 200
for (i in seq(nboot)) {
bdat <- Tweets[sample(nrow(Tweets),size=bootsize,replace=TRUE),]
bfit <- update(M1, data=bdat) ## refit with new data
bres[i,] <- coef(bfit)
}
structure output
data.frame(mean_est=colMeans(bres),
t(apply(bres,2,quantile,c(0.025,0.975))))
I'm trying to do something a little bit complicated for a beginner in programming.
I have a matrix 16x16 and I want to plot the values as a heatmap using image() in R.
How can I plot the "0" (zeros) in blue when the sum (row index + column index) is <= 15? Is that possible?
example matrix:
x <- c(3045, 893, 692, 830, 617, 155, 246, 657, 105, 60, 18, 7, 7, 4, 2, 11234,
2985, 2242, 2471, 1575, 366, 503, 1283, 170, 79, 32, 6, 4, 1, 3, 19475, 4756,
3233, 3251, 1810, 409, 575, 1210, 139, 41, 11, 4, 2, 0, 0, 20830, 4739, 2990,
2531, 1346, 298, 325, 612, 60, 17, 1, 0, 1, 0, 0, 15304, 3196, 1885, 1440, 610,
117, 115, 185, 14, 2, 0, 0, 0, 0, 0, 8026, 1535, 806, 539, 223, 33, 37, 39, 0,
0, 0, 0, 0, 0, 0, 3300, 562, 286, 141, 45, 14, 5, 12, 0, 0, 0, 0, 0, 0, 0, 1067,
160, 65, 40, 14, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 277, 47, 6, 2, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 72, 6, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 5, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
xmat <- matrix(x, ncol = 12)
xmat <- cbind(xmat, rep(0,16), rep(0,16), rep(0,16), rep(0,16))
xmat <- rbind(xmat, rep(0,16))
dimnames(xmat) = list(0:15, 0:15)
xmat
Thanks!
Vitor
Plot the cases meeting the criteria as blue.
xmat.new <- xmat
xmat.new[!((row(xmat) + col(xmat) <= 15) & xmat==0)] <- NA
image(xmat.new,col="blue")
Plot the cases not meeting the criteria as normal. Notice the add=TRUE
xmat.new <- xmat
xmat.new[((row(xmat) + col(xmat) <= 15) & xmat==0)] <- NA
image(xmat.new,add=TRUE)
Result:
Edited to include #Marek's suggestion to simplify the statements.