I need your help!
I am trying to pull out rows of the second matrix based on IDs from the first matrix. To check that my function (which is not provided here) works correctly, I run the following code (CritMat is the second matrix and parms is the first):
results <- matrix(0, nrow = 15, ncol = 8)
colnames(results) <- c("alpha", "beta", "omega", "T=64", "T=128", "T=256", "T=512", "T=1024")
for (r in 1:15) {
results [r,] <- CritMat[CritMat[, 1] == parms[r, 2] & CritMat[, 2] ==
parms[r, 1] & CritMat[, 3] == parms[r, 3] , ]
print(results[r,])
}
The loop works for the first 4 iterations followed by the following error message for the fifth:
*Error in results[r, ] <- CritMat[CritMat[, 1] == parms[r, 2] & CritMat[, :
replacement has length zero*
Any idea why this happens and solution.
Many thanks
AA
****parms matrix****
beta alpha omega
1 0.005 0.005 0.990
2 0.240 0.005 0.755
3 0.490 0.005 0.505
4 0.740 0.005 0.255
5 0.990 0.005 0.005
6 0.005 0.250 0.745
7 0.240 0.250 0.510
8 0.490 0.250 0.260
9 0.740 0.250 0.010
10 0.005 0.500 0.495
11 0.240 0.500 0.260
12 0.490 0.500 0.010
13 0.005 0.750 0.245
14 0.240 0.750 0.010
15 0.005 0.990 0.005
****CritMat matrix****
alpha beta omega T.64 T.128 T.256 T.512 T.1024
1 0.005 0.005 0.990 -2.956420 -2.919654 -2.921704 -2.886429 -2.879443
2 0.005 0.240 0.755 -2.959242 -2.917744 -2.923356 -2.885018 -2.881905
3 0.005 0.490 0.505 -2.959395 -2.915798 -2.927405 -2.886637 -2.885186
4 0.005 0.740 0.255 -2.957763 -2.912088 -2.934518 -2.890182 -2.889484
5 0.005 0.990 0.005 -2.937999 -2.857668 -2.864637 -2.819950 -2.820588
6 0.250 0.005 0.745 -2.987160 -2.986864 -2.897846 -2.865875 -2.911572
7 0.250 0.240 0.510 -3.034868 -2.979375 -2.924888 -2.875446 -2.898752
8 0.250 0.490 0.260 -3.052279 -2.995942 -2.969414 -2.926178 -2.918958
9 0.250 0.740 0.010 -3.197169 -3.263336 -3.258011 -3.202253 -3.248068
10 0.500 0.005 0.495 -3.031267 -3.038585 -2.936348 -2.921126 -2.908868
11 0.500 0.240 0.260 -3.142031 -3.086536 -3.026555 -3.079825 -2.871080
12 0.500 0.490 0.010 -3.383052 -3.410789 -3.431221 -3.367462 -3.332024
13 0.750 0.005 0.245 -3.209441 -3.170385 -3.112472 -3.141569 -2.925559
14 0.750 0.240 0.010 -3.452131 -3.517234 -3.428402 -3.477691 -3.178128
15 0.990 0.005 0.005 -3.427804 -3.491805 -3.298037 -3.290127 -3.087541
Related
Following Taylor and Tibshirani (2015), I'm applying the selectiveinference package in R after a Lasso Logit fit with glmnet. Specifically, I'm interested in inference for the lasso with a fixed lambda.
Below I report the code:
First, I standardized the X matrix (as suggested https://cran.r-project.org/web/packages/selectiveInference/selectiveInference.pdf).
Then, I fit glmnet and after I extracted the beta coefficient for a lambda previously picked with LOOCV.
X.std <- std(X1[1:2833,])
fit = glmnet(X.std, Y[1:2833],alpha=1, family=c("binomial"))
fit$lambda
lambda=0.00431814
n=2833
beta_hat = coef(fit, x=X.std, y=Y[1:2833], s=lambda/n, exact=TRUE)
beta_hat
out = fixedLassoInf(X.std, Y[1:2833],beta_hat,lambda,family="binomial")
out
After I run the code, this is what I get. I understood that there is something related to KKT conditions, and that is a problem specific to Lasso Logit, as when I try with family=gaussian, I do not get any warnings or mistakes.
Warning message:
In fixedLogitLassoInf(x, y, beta, lambda, alpha = alpha, type = type, :
Solution beta does not satisfy the KKT conditions (to within specified tolerances)
> res
Call:
fixedLassoInf(x = X.std, y = Y[1:2833], beta = b, lambda = lam,
family = c("binomial"))
Testing results at lambda = 0.004, with alpha = 0.100
Var Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
1 58.558 6.496 0.000 46.078 124.807 0.049 0.050
2 -8.008 -2.815 0.005 -13.555 -3.106 0.049 0.049
3 -18.514 -6.580 0.000 -31.262 -14.153 0.049 0.048
4 -1.070 -0.390 0.447 -22.976 19.282 0.050 0.050
5 -0.320 -1.231 0.610 -0.660 1.837 0.050 0.000
6 -0.448 -1.906 0.619 -2.378 5.056 0.050 0.050
7 -47.732 -9.624 0.000 -161.370 -44.277 0.050 0.050
8 -39.023 -8.378 0.000 -54.988 -31.510 0.050 0.048
10 23.827 1.991 0.181 -20.151 42.867 0.049 0.049
11 -2.454 -0.522 0.087 -269.951 9.345 0.050 0.050
12 0.045 0.018 0.993 -Inf -14.962 0.000 0.050
13 -18.647 -1.143 0.156 -149.623 25.464 0.050 0.050
14 -3.508 -1.140 0.305 -8.444 7.000 0.049 0.049
15 -0.620 -0.209 0.846 -3.486 46.045 0.050 0.050
16 -3.960 -1.288 0.739 -6.931 47.641 0.049 0.050
17 -8.587 -3.010 0.023 -42.700 -2.474 0.050 0.049
18 2.851 0.986 0.031 2.745 196.728 0.050 0.050
19 -6.612 -1.258 0.546 -14.967 37.070 0.049 0.000
20 -11.621 -2.291 0.021 -29.558 -2.536 0.050 0.049
21 -76.957 -0.980 0.565 -186.701 483.180 0.049 0.050
22 -13.556 -5.053 0.000 -126.367 -13.274 0.050 0.049
23 -4.836 -0.388 0.519 -109.667 125.933 0.050 0.050
24 11.355 0.898 0.492 -55.335 30.312 0.050 0.049
25 -1.118 -0.146 0.919 -4.439 232.172 0.049 0.050
26 -7.776 -1.298 0.200 -17.540 8.006 0.050 0.049
27 0.678 0.234 0.515 -42.265 38.710 0.050 0.050
28 32.938 1.065 0.335 -77.314 82.363 0.050 0.049
Does someone know how to solve this warning?
I would like to understand which kind of "tolerances" should I specify.
Thank for the help.
I have the following data:
library tidyverse
age_grp <- c(10,9,8,7,6,5,4,3,2,1)
start <- c(0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420)
change <- c(0.020,0.033,0.029,0.031,0.027,0.032,0.032,0.030,0.027,0.034)
final_outcome <- c(0.400,0.367,0.338,0.307,0.28,0.248,0.216,0.186,0.159,0.125)
my_data <- data.frame(age_grp,start,change,final_outcome)
my_data1 <- my_data %>%
dplyr::arrange(age_grp)
I would like to subtract the values in the variable change from the values in the variable start such that it is an iterative decrease from the oldest age group to the youngest. The final values that I am looking to get are in the variable final_outcome. For example, starting with age_grp 10, I want to subtract 0.20 from 0.420 to get 0.400. Then, I would like to subtract 0.033 from 0.400 to get 0.367 and so forth. I am struggling with how to store those differences. I have made an attempt below, but I don't know how to store the difference to then continue the subtraction forward (or backward, depending on how you look at it). Any advice or suggestions would be appreciated.
my_data1$attempt <- NA
#calculating the decreases
for(i in 2:nrow(my_data1))
my_data1$attempt[i] <- round(my_data1$start[i] - my_data1$change[i-1], 4)
If we need the same output as in final_outcome
library(dplyr)
my_data %>%
mutate(attempt = start - cumsum(change)) %>%
arrange(age_grp)
-output
# age_grp start change final_outcome attempt
#1 1 0.42 0.034 0.125 0.125
#2 2 0.42 0.027 0.159 0.159
#3 3 0.42 0.030 0.186 0.186
#4 4 0.42 0.032 0.216 0.216
#5 5 0.42 0.032 0.248 0.248
#6 6 0.42 0.027 0.280 0.280
#7 7 0.42 0.031 0.307 0.307
#8 8 0.42 0.029 0.338 0.338
#9 9 0.42 0.033 0.367 0.367
#10 10 0.42 0.020 0.400 0.400
my_data$final <- my_data$start - cumsum(my_data$change)
library(tidyverse)
my_data %>%
mutate(attempt = accumulate(change, ~ .x - .y, .init = start[1])[-1])
Note: accumulate is from the purrr library that's part of the tidyverse. It also has a .dir argument where you can go "forward" or "backward".
Or in base R using Reduce:
within(my_data, attempt <- Reduce("-", change, init = start[1], accumulate = T)[-1])
Reduce has an argument right that can also do the computation forwards or backwards.
Output
age_grp start change final_outcome attempt
1 10 0.42 0.020 0.400 0.400
2 9 0.42 0.033 0.367 0.367
3 8 0.42 0.029 0.338 0.338
4 7 0.42 0.031 0.307 0.307
5 6 0.42 0.027 0.280 0.280
6 5 0.42 0.032 0.248 0.248
7 4 0.42 0.032 0.216 0.216
8 3 0.42 0.030 0.186 0.186
9 2 0.42 0.027 0.159 0.159
10 1 0.42 0.034 0.125 0.125
I have a data frame like this:
> mydata <- read.csv("mydata.csv", header=T, stringsAsFactors=F)
> tbl_df(mydata)
# A tibble: 16,499 x 60
SRC_035_01 SRC_035_01.1 SRC_035_02 SRC_035_02.1
1 Force Time Force Time
2 -0.0037 0.000 0.0041 0.000
3 0.0000 0.004 0.0073 0.004
4 0.0079 0.008 0.0156 0.008
5 0.0150 0.012 0.0228 0.012
6 0.0177 0.016 0.0262 0.016
7 0.0141 0.020 0.0236 0.020
8 0.0103 0.024 0.0206 0.024
9 0.0080 0.028 0.0193 0.028
10 0.0102 0.032 0.0226 0.032
I need to combine the first row with the header, column by column, resulting in this output:
# A tibble: 16,498 x 60
SRC_035_01_Force SRC_035_01.1_Time SRC_035_02_Force SRC_035_02.1_Time
2 -0.0037 0.000 0.0041 0.000
3 0.0000 0.004 0.0073 0.004
4 0.0079 0.008 0.0156 0.008
5 0.0150 0.012 0.0228 0.012
6 0.0177 0.016 0.0262 0.016
7 0.0141 0.020 0.0236 0.020
8 0.0103 0.024 0.0206 0.024
9 0.0080 0.028 0.0193 0.028
10 0.0102 0.032 0.0226 0.032
Anyone could give me a code hint?
Thanks a lot!
Given the data
Step A B C D E F G I J
1 1 0.158 0.011 0.099 6.504 5.914 0.000 0.100 0.330 0.000
2 2 0.345 0.016 0.102 6.050 5.285 0.000 0.102 0.316 0.001
3 1 0.324 0.015 0.100 7.146 6.426 0.000 0.101 0.293 0.000
4 2 0.264 0.015 0.099 5.864 5.202 0.000 0.101 0.296 0.000
5 1 0.346 0.022 0.101 5.889 5.027 0.000 0.101 0.411 0.000
6 2 0.397 0.022 0.130 6.061 5.311 0.000 0.131 0.220 0.000
7 1 0.337 0.015 0.048 7.417 6.839 0.000 0.110 0.129 0.000
8 2 0.362 0.016 0.143 5.726 4.951 0.001 0.144 0.268 0.000
9 1 0.178 0.011 0.099 5.831 5.290 0.000 0.100 0.261 0.000
d < - read.table('sample.txt', header=T) gives me a data frame, and boxplot(d$A ~ d$Step) yields a reasonable graph, but I cannot seem to get all plots on the same graph. Something like boxplot(d ~ d$Step) is what I expected to work, but I get the following error:
Error in model.frame.default(formula = d ~ d$Step) :
invalid type (list) for variable 'd'
I've tried making Step a factor d$Step <- as.factor(d$Step) but that seems to have no effect.
An alternative is to plot these in base R each on their own scale, like this
par(mfrow=c(3,3))
for(i in 2:10) {
boxplot(d[,i] ~ d$Step, main=names(d)[i]) }
We can do this with tidyverse
library(tidyverse)
gather(d, Var, Val, -Step) %>%
mutate(Step=factor(Step)) %>%
ggplot(., aes(x=Var, y = Val, fill=Step)) +
geom_boxplot() +
scale_fill_manual(values = c("red", "blue"))
I've got data in the following format.
P10_neg._qn P11_neg._qn P12_neg._qn P14_neg._qn P17_neg._qn P24_neg._qn P25_neg._qn
1 -0.025 -0.037 -0.032 -0.061 -0.176 0.033 -0.011
2 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
3 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
4 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
5 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
6 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
What is the best way by which I can check, for every row, how many entries are greater than 0.1, for instance and return a vector of counts?
You can use the rowSum function for this task. Assuming that dat is you matrix then :
rowSum(dat > 0.1)
Using the sample data provided we have :
dat <- read.table(text = ' P10_neg._qn P11_neg._qn P12_neg._qn P14_neg._qn P17_neg._qn P24_neg._qn P25_neg._qn
1 -0.025 -0.037 -0.032 -0.061 -0.176 0.033 -0.011
2 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
3 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
4 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
5 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
6 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087',
row.names = 1, header = TRUE)
rowSums(dat > 0.1)
## 1 2 3 4 5 6
## 0 1 1 1 1 1
apply(dat, 1, function(x) sum(x>.1))
# [1] 0 1 1 1 1 1
here an Rcpp version:
// [[Rcpp::export]]
IntegerVector countGreaterThan2(NumericMatrix M,double val) {
IntegerVector res;
for (int i=0; i<M.nrow(); i++) {
NumericVector row = M( i, _);
double num = std::count_if(row.begin(), row.end(),
[&val](const double& x) -> bool {return x>val;});
res.push_back(num);
}
return res;
}
But rowSum is unbeatable:
system.time(rowSums(dfx>0.2))
user system elapsed
0.01 0.00 0.02
> system.time(countGreaterThan2(dfx,0.2))
user system elapsed
0.06 0.00 0.06