Related
I need to make a DESeq2 analysis with my dataset for an homework, but I'm really new with this package (I never used it before).
When I want to make a
counts <- read.table("ProstateCancerCountData.txt",sep="", header=TRUE, row.names=1)
metadat<- read.table("mart_export.txt",sep=",", header=TRUE, row.names=1)
counts <- as.matrix(counts)
dds <- DESeqDataSetFromMatrix(countData = counts, colData = metadat, design = ~ GC.content+ Gene.type)
I have this error :
Erreur dans DESeqDataSetFromMatrix(countData = counts, colData = metadat, :
ncol(countData) == nrow(colData) n'est pas TRUE
I don't know how to fix it.
This is the two dataset I have to used for the analysis :
head(counts)
N_10 T_10 N_11 T_12 N_13 T_13 N_14 T_14 N_1 T_1 N_2 T_2 N_3
ENSG00000000003 401 442 1155 1095 788 754 852 938 774 520 808 648 891
ENSG00000000005 0 7 23 9 5 2 45 5 11 10 56 8 7
ENSG00000000419 112 96 424 468 385 452 751 491 247 222 509 363 706
ENSG00000000457 13 121 327 165 40 204 290 199 70 121 104 151 352
ENSG00000000460 24 66 162 137 71 159 174 156 86 94 120 91 166
ENSG00000000938 96 128 218 372 126 129 538 320 117 129 157 238 177
T_3 N_4 N_5 T_6 N_7 T_7 N_8 T_8 N_9 T_9
ENSG00000000003 1071 2059 737 1006 1146 653 1299 1306 1522 490
ENSG00000000005 0 18 0 7 1 4 1 2 0 3
ENSG00000000419 622 988 307 402 294 323 535 518 573 322
ENSG00000000457 333 328 58 153 138 115 179 200 86 85
ENSG00000000460 152 162 100 100 101 148 128 78 83 109
ENSG00000000938 86 113 410 230 64 76 93 61 121 68
head(metadat)
Chromosome.scaffold.name Gene.start..bp. Gene.end..bp.
ENSG00000271782 1 50902700 50902978
ENSG00000232753 1 103817769 103828355
ENSG00000225767 1 50927141 50936822
ENSG00000202140 1 50965430 50965529
ENSG00000207194 1 51048076 51048183
ENSG00000252825 1 51215968 51216025
GC.content Gene.type
ENSG00000271782 35.48 lincRNA
ENSG00000232753 33.99 lincRNA
ENSG00000225767 38.99 antisense
ENSG00000202140 43.00 misc_RNA
ENSG00000207194 37.96 snRNA
ENSG00000252825 36.21 snRNA
Thank you for your help, and for your lighting
EDIT :
Thank you for your previous answer.
I take an another dataset to make this homework. But I have another bug :
This is my new dataset :
head(mycounts)
R1L1Kidney R1L2Liver R1L3Kidney R1L4Liver R1L6Liver
ENSG00000177757 2 1 0 0 1
ENSG00000187634 49 27 43 34 23
ENSG00000188976 73 34 77 56 45
ENSG00000187961 15 8 15 13 11
ENSG00000187583 1 0 1 1 0
ENSG00000187642 4 0 5 0 2
R1L7Kidney R1L8Liver R2L2Kidney R2L3Liver R2L6Kidney
ENSG00000177757 2 0 1 1 3
ENSG00000187634 41 35 42 25 47
ENSG00000188976 68 55 70 42 82
ENSG00000187961 13 12 12 20 15
ENSG00000187583 3 0 0 2 3
ENSG00000187642 12 1 9 4 9
head(myfactors)
Tissue TissueRun
R1L1Kidney Kidney Kidney_1
R1L2Liver Liver Liver_1
R1L3Kidney Kidney Kidney_1
R1L4Liver Liver Liver_1
R1L6Liver Liver Liver_1
R1L7Kidney Kidney Kidney_1
When I code my DESeq object, I would take the Tissue and TissueRun for take care of the batch. But I have an error :
dds2 <- DESeqDataSetFromMatrix(countData = mycounts, colData = myfactors, design = ~ Tissue + TissueRun)
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
Please read the vignette section 'Model matrix not full rank':
vignette('DESeq2')
Thank you for your help
Trying to use R cv.glmnet() for cross validation on loans data.
I have a data set on loan data (Kaggle) and have already split into train, test.
Separated the y response from the predictive variables in select(1) and select(-1).
Created matrix so as to avoid the "Error in storage.mode(y) <- "double" : 'list' object cannot be coerced to type 'double' " problem earlier.
Now seeking to run cv.glmnet() for cross validation, but this error stops me now.
"Error in y - predmat : non-numeric argument to binary operator"
Error in non-numeric argument, yet all my data is numeric, save for one factor for response y.
As a side question, what is the predmat in "y - predmat" refer to?
x_vars <- as.matrix(data.sample.train.split %>% select(-1))
y_resp <- as.matrix(data.sample.train.split %>% select(1))
cv_output <- cv.glmnet(x_vars, y_resp, type.measure = "deviance", nfolds = 5)
cv_output <- cv.glmnet(x_vars, y_resp,
type.measure = "deviance",
lambda = NULL,
nfolds = 5)
I am also considering to try this function:
ddd.lasso <- cv.glmnet(x_vars, y_resp, alpha = 1, family = "binomial")
ddd.model <- glmnet(x_vars, y_resp, alpha = 1, family = "binomial", lambda = ddd.lasso$lambda.min)
Data sample is as follows, just some of the columns:
c("loan_amnt", "funded_amnt",
"funded_amnt_inv", "grade", "emp_length", "annual_inc", "dti",
"mths_since_last_delinq", "mths_since_last_record", "open_acc",
"pub_rec", "revol_bal", "revol_util", "total_acc", "out_prncp",
"out_prncp_inv", "total_pymnt", "total_pymnt_inv", "total_rec_prncp",
"total_rec_int", "total_rec_late_fee", "recoveries", "collection_recovery_fee",
"last_pymnt_amnt", "collections_12_mths_ex_med", "acc_now_delinq"
)))
loan_amnt funded_amnt funded_amnt_inv grade emp_length annual_inc dti
3 10000 10000 10000.000 60 10 49200.00 20.00
10 10000 10000 10000.000 60 4 42000.00 18.60
14 20250 20250 19142.161 60 3 43370.00 26.53
17 15000 15000 15000.000 80 2 92000.00 29.44
18 4000 4000 4000.000 80 10 106000.00 5.63
31 4400 4400 4400.000 40 10 55000.00 20.01
35 10000 10000 10000.000 100 10 60000.00 12.74
37 25600 25600 25350.000 80 9 110000.00 15.71
41 10000 10000 10000.000 80 1 39000.00 18.58
64 9200 9200 9200.000 80 2 60000.00 19.96
72 7000 7000 7000.000 80 4 39120.00 21.01
74 3500 3500 3500.000 100 10 83000.00 2.31
77 9500 9500 9500.000 100 7 50000.00 8.18
89 10000 10000 10000.000 100 1 43000.00 25.26
98 7000 7000 7000.000 80 1 30000.00 15.80
112 21600 21600 20498.266 20 8 60000.00 16.74
117 7200 7200 7200.000 80 5 48000.00 17.43
118 12000 12000 11975.000 60 1 57000.00 10.86
125 10000 10000 10000.000 100 5 70000.00 16.78
126 8000 8000 8000.000 60 3 28000.00 12.60
128 6000 6000 6000.000 60 10 94800.00 24.53
138 35000 35000 35000.000 80 2 168000.00 3.17
144 14000 14000 14000.000 100 10 66000.00 11.15
149 3000 3000 3000.000 60 5 71000.00 21.84
152 12000 12000 11975.000 80 2 60000.00 15.50
153 6000 6000 6000.000 100 3 34000.00 14.51
155 7000 7000 7000.000 80 7 82000.00 12.00
166 24250 18100 18075.000 -1 7 120000.00 12.96
170 2500 2500 2500.000 80 7 29000.00 18.70
172 4225 4225 4225.000 80 5 55200.00 17.61
180 6000 6000 6000.000 60 5 50000.00 15.58
192 5000 5000 5000.000 80 5 38004.00 23.78
193 8000 8000 8000.000 80 3 31000.00 16.22
199 12000 12000 12000.000 80 4 40000.00 22.20
203 3200 3200 3200.000 80 9 61200.00 2.16
209 5000 5000 5000.000 80 2 70000.00 20.06
220 13250 13250 13250.000 40 10 52000.00 23.70
224 12000 12000 12000.000 100 10 68000.00 7.08
mths_since_last_delinq mths_since_last_record open_acc pub_rec revol_bal revol_util
3 35 59 10 0 5598 21.0
10 61 114 14 0 24043 70.2
14 18 107 8 0 17813 85.6
17 54 79 8 0 13707 93.9
18 18 97 12 0 6110 37.7
31 68 119 7 0 25237 99.0
35 37 93 11 0 14019 19.5
37 11 118 11 0 26088 62.0
41 58 17 5 0 12874 72.7
64 39 95 8 0 23299 78.7
72 26 33 8 1 9414 52.4
74 35 59 6 0 3092 13.4
77 46 118 8 0 13422 60.5
89 59 105 8 0 8215 37.2
98 68 101 7 0 15455 47.6
112 23 26 6 0 13354 78.1
117 24 19 7 0 16450 80.2
118 47 87 7 0 9273 81.5
125 32 92 9 0 10770 69.0
126 66 112 8 0 6187 54.3
128 10 101 13 0 71890 95.9
138 22 97 16 0 1099 1.4
144 26 102 7 0 12095 35.4
149 59 103 4 0 15072 88.7
152 46 94 7 0 12168 85.7
153 70 81 9 0 13683 64.8
155 79 83 6 0 25334 71.6
166 66 118 7 0 31992 99.0
170 63 99 5 0 2668 66.7
172 69 104 6 0 4055 73.7
180 49 94 8 0 7361 83.6
192 5 85 12 0 10023 57.3
193 28 77 13 0 2751 34.4
199 78 109 9 0 16273 55.5
203 79 113 5 1 2795 33.3
209 27 62 14 0 13543 54.2
220 70 86 8 0 15002 91.5
224 21 70 7 0 15433 55.6
total_acc out_prncp out_prncp_inv total_pymnt total_pymnt_inv total_rec_prncp
3 37 0 0 12226.302 12226.30 10000.00
10 28 0 0 12519.260 12519.26 10000.00
14 22 0 0 27663.043 25417.68 20250.00
17 31 0 0 15823.480 15823.48 15000.00
18 44 0 0 4484.790 4484.79 4000.00
31 11 0 0 5626.893 5626.89 4400.00
35 18 0 0 10282.670 10282.67 10000.00
37 27 0 0 29695.623 29405.63 25600.00
41 10 0 0 11474.760 11474.76 10000.00
64 19 0 0 10480.840 10480.84 9200.00
72 26 0 0 7932.300 7932.30 7000.00
74 28 0 0 3834.661 3834.66 3500.00
77 13 0 0 10493.710 10493.71 9500.00
89 16 0 0 11264.010 11264.01 10000.00
98 11 0 0 8452.257 8452.26 7000.00
112 21 0 0 27580.750 24853.63 21600.00
117 10 0 0 8677.156 8677.16 7200.00
118 11 0 0 14396.580 14366.62 12000.00
125 18 0 0 10902.910 10902.91 10000.00
126 11 0 0 8636.820 8636.82 8000.00
128 30 0 0 7215.050 7215.05 6000.00
138 22 0 0 38059.760 38059.76 35000.00
144 46 0 0 15450.084 15450.08 14000.00
149 14 0 0 3723.936 3723.94 3000.00
152 21 0 0 13919.414 13890.44 12000.00
153 16 0 0 6857.261 6857.26 6000.00
155 31 0 0 8290.730 8290.73 7000.00
166 20 0 0 22188.250 22157.63 18100.00
170 13 0 0 2894.740 2894.74 2500.00
172 12 0 0 5081.023 5081.02 4225.00
180 14 0 0 7325.299 7325.30 6000.00
192 17 0 0 6534.430 6534.43 5000.00
193 29 0 0 8306.470 8306.47 8000.00
199 23 0 0 14006.680 14006.68 12000.00
203 17 0 0 3709.193 3709.19 3200.00
209 26 0 0 5501.160 5501.16 5000.00
220 18 0 0 15650.390 15650.39 13250.00
224 34 0 0 12554.010 12554.01 12000.00
total_rec_int total_rec_late_fee recoveries collection_recovery_fee last_pymnt_amnt
3 2209.33 16.97000 0 0 357.48
10 2519.26 0.00000 0 0 370.46
14 7413.04 0.00000 0 0 6024.09
17 823.48 0.00000 0 0 2447.05
18 484.79 0.00000 0 0 2638.77
31 1226.89 0.00000 0 0 162.44
35 282.67 0.00000 0 0 8762.05
37 4095.62 0.00000 0 0 838.27
41 1474.76 0.00000 0 0 5803.94
64 1280.84 0.00000 0 0 365.48
72 932.30 0.00000 0 0 4235.03
74 334.66 0.00000 0 0 107.86
77 993.71 0.00000 0 0 5378.43
89 1264.01 0.00000 0 0 4.84
98 1452.26 0.00000 0 0 238.06
112 5980.75 0.00000 0 0 17416.49
117 1462.16 15.00000 0 0 19.26
118 2396.58 0.00000 0 0 5359.38
125 902.91 0.00000 0 0 4152.52
126 636.82 0.00000 0 0 6983.56
128 1215.05 0.00000 0 0 1960.88
138 3059.76 0.00000 0 0 272.59
144 1450.08 0.00000 0 0 2133.17
149 723.94 0.00000 0 0 107.29
152 1919.41 0.00000 0 0 395.05
153 857.26 0.00000 0 0 198.16
155 1290.73 0.00000 0 0 2454.29
166 4088.25 0.00000 0 0 16499.75
170 394.74 0.00000 0 0 1168.50
172 856.02 0.00000 0 0 146.48
180 1325.30 0.00000 0 0 215.51
192 1534.43 0.00000 0 0 1561.93
193 306.47 0.00000 0 0 7778.22
199 2006.68 0.00000 0 0 5971.51
203 509.19 0.00000 0 0 317.41
209 501.16 0.00000 0 0 3833.62
220 2400.39 0.00000 0 0 9026.78
224 554.01 0.00000 0 0 473.95
collections_12_mths_ex_med acc_now_delinq
3 0 0
10 0 0
14 0 0
17 0 0
18 0 0
31 0 0
35 0 0
37 0 0
41 0 0
64 0 0
72 0 0
74 0 0
77 0 0
89 0 0
98 0 0
112 0 0
117 0 0
118 0 0
125 0 0
126 0 0
128 0 0
138 0 0
144 0 0
149 0 0
152 0 0
153 0 0
155 0 0
166 0 0
170 0 0
172 0 0
180 0 0
192 0 0
193 0 0
199 0 0
203 0 0
209 0 0
220 0 0
224 0 0
Looks like a incorrect glmnet family, I accidently chose the default 'deviance' for cv.glmnet, when in fact my data was binomial. My next solution is to figure out "Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned"
Code that improved the solution:
cv.lasso <- cv.glmnet(x_vars, y_resp, alpha = 1, family = "binomial", nfolds = 5)
cv.model <- glmnet(x_vars, y_resp, alpha = 1, relax=TRUE, family="binomial", lambda=cv.lasso$lambda.min)
I have the following data frame in R (heavily truncated, obviously):
X PLAYER_ID PLAYER_NAME LOC_X LOC_Y SHOT_MADE_FLAG
1 0 201935 James Harden 14 55 0
2 1 201935 James Harden 0 24 0
3 2 201935 James Harden 50 74 0
4 3 201935 James Harden 160 215 0
5 4 201935 James Harden 22 21 1
6 5 201935 James Harden -43 278 1
7 6 201935 James Harden 221 6 0
8 7 201935 James Harden -27 21 0
9 8 201935 James Harden -119 235 0
10 9 201935 James Harden -223 101 0
I named this data frame shots in R, and then tried the following:
league_model_19 <- gam(SHOT_MADE_FLAG ~ ti(LOC_X) +
ti(LOC_Y) +
ti(LOC_X, LOC_Y),
data = shots)
But this gives the error shown in the title:
Error in model.frame.default(formula = SHOT_MADE_FLAG ~ ti(LOC_X) + ti(LOC_Y) + : invalid type (list) for variable 'ti(LOC_X)'
I checked the type of each value in LOC_X, and all of them are integers. I understand the column itself is a list/series, but shouldn't this work anyways?
Any way to avoid getting an error here would be great. Thanks in advance.
Edit:
str(shots) outputs:
> str(shots)
'data.frame': 1456 obs. of 6 variables:
$ X : int 0 1 2 3 4 5 6 7 8 9 ...
$ PLAYER_ID : int 201935 201935 201935 201935 201935 201935 201935 201935 201935 201935 ...
$ PLAYER_NAME : chr "James Harden" "James Harden" "James Harden" "James Harden" ...
$ LOC_X : int 14 0 50 160 22 -43 221 -27 -119 -223 ...
$ LOC_Y : int 55 24 74 215 21 278 6 21 235 101 ...
$ SHOT_MADE_FLAG: int 0 0 0 0 1 1 0 0 0 0 ...
and dput(shots) outputs (truncated):
> dput(shots)
structure(list(X = 0:1455, PLAYER_ID = c(201935L, 201935L, 201935L,
201935L, 201935L, 201935L, 201935L, 201935L,....)
Edit 2:
> shots[c('LOC_X','LOC_Y','SHOT_MADE_FLAG')]
LOC_X LOC_Y SHOT_MADE_FLAG
1 14 55 0
2 0 24 0
3 50 74 0
4 160 215 0
5 22 21 1
6 -43 278 1
7 221 6 0
8 -27 21 0
9 -119 235 0
10 -223 101 0
11 60 238 0
12 232 74 0
13 -136 239 0
14 -14 7 1
15 192 168 0
16 157 206 0
17 172 189 0
18 -168 194 0
19 10 4 1
20 -5 10 0
21 -127 228 0
22 -63 259 0
23 -95 241 1
24 227 27 0
25 -12 12 1
26 192 160 0
27 -144 236 0
28 6 281 0
29 40 250 1
30 29 52 0
31 -120 224 0
32 3 10 1
33 -131 221 0
34 32 276 0
35 -18 10 1
36 30 258 0
37 -31 251 0
38 141 210 0
39 13 77 0
40 -6 19 1
41 -18 62 0
42 47 140 0
43 210 132 0
44 139 217 0
45 163 196 1
46 -30 16 1
47 27 256 1
48 -195 173 0
49 0 251 0
50 2 22 1
51 157 203 0
52 54 249 0
53 -132 233 0
54 20 1 1
55 197 210 0
56 -147 224 1
57 1 29 1
58 -31 22 0
59 -128 236 0
60 -35 13 0
61 -29 14 1
62 234 89 0
63 196 172 0
64 -149 220 0
65 94 252 1
66 -92 269 0
67 14 20 1
68 -6 17 1
69 -171 190 1
70 163 205 0
71 2 4 1
72 11 14 1
73 117 256 0
74 0 35 0
75 -3 0 1
76 -232 120 0
77 -8 7 1
78 -2 298 1
79 -6 16 1
80 135 243 1
81 4 17 0
82 149 222 0
83 -6 31 1
84 122 97 1
85 -27 15 1
86 -2 34 0
87 -179 227 1
88 0 17 1
89 -55 310 0
90 223 159 0
91 3 -1 1
92 -11 17 0
93 -78 247 1
94 19 20 0
95 -9 16 0
96 -44 248 0
97 122 220 0
98 -15 -2 0
99 0 5 1
100 101 237 0
101 237 17 0
102 -32 250 1
103 -5 11 1
104 109 216 0
105 -228 45 0
106 18 29 1
107 -25 11 1
108 128 117 1
109 -38 246 0
110 39 57 0
111 -111 284 0
112 -44 83 0
113 111 223 0
114 -14 101 1
115 2 86 0
116 90 232 0
117 98 228 0
118 0 34 1
119 8 57 0
120 130 209 0
121 81 244 1
122 5 15 1
123 -18 27 0
124 -6 17 0
125 210 122 0
126 199 149 0
127 120 221 0
128 -142 221 0
129 -11 12 0
130 226 60 1
131 -152 212 0
132 -188 181 0
133 -1 4 1
134 -22 253 0
135 -16 253 0
136 -6 23 1
137 -120 275 1
138 8 52 1
139 -15 8 0
140 -237 8 1
141 190 159 0
142 9 8 0
143 -50 74 0
144 -17 17 0
145 -143 231 0
146 -136 222 1
147 200 147 1
148 -176 186 1
149 -229 48 0
150 -16 17 1
151 -65 11 0
152 -54 89 0
153 6 16 1
154 -9 8 0
155 -11 2 1
156 -94 251 0
157 -141 246 1
158 45 239 0
159 230 82 1
160 7 4 0
161 -16 13 1
162 131 227 1
163 125 235 0
164 20 260 0
165 -46 252 0
166 144 211 0
167 -2 4 1
168 104 253 1
169 8 16 0
170 109 226 0
171 -1 71 1
172 -201 147 0
173 76 241 0
174 0 2 1
175 21 52 0
176 -79 16 0
177 -196 157 0
178 168 194 1
179 -131 236 1
180 2 6 1
181 42 254 1
182 -50 262 1
183 -6 0 1
184 57 278 0
185 229 14 1
186 127 226 1
187 139 230 0
188 -234 111 0
189 157 211 0
190 82 255 0
191 -216 138 0
192 230 101 0
193 14 2 0
194 -47 252 0
195 -12 10 1
196 113 231 0
197 -28 264 1
198 2 248 0
199 -17 4 1
200 -235 13 1
201 48 253 1
202 20 256 0
203 -1 18 0
204 -109 233 0
205 -107 238 0
206 -116 14 0
207 14 57 0
208 63 240 1
209 96 246 0
210 13 47 0
211 -188 172 1
212 24 252 1
213 123 257 1
214 -144 206 0
215 11 61 1
216 77 247 1
217 158 201 1
218 107 315 0
219 11 283 0
220 161 209 0
221 1 45 0
222 -105 225 0
223 9 30 1
224 27 -6 1
225 3 58 1
226 -19 3 0
227 -165 208 0
228 3 55 1
229 -176 201 1
230 -18 11 0
231 -13 78 1
232 -16 7 1
233 -27 254 0
234 -117 263 0
235 95 233 0
236 211 146 0
237 -6 46 1
238 3 20 1
239 -22 79 1
240 -8 270 1
241 161 190 1
242 20 70 0
243 -14 10 1
244 144 249 1
245 14 0 1
246 -22 29 1
247 8 53 0
248 4 104 0
249 236 15 0
250 113 222 1
251 -40 260 1
252 51 115 0
253 -181 178 0
254 -144 218 0
255 62 49 0
256 116 257 1
257 131 237 1
258 38 81 1
259 -6 114 1
260 -21 21 0
261 111 266 0
262 -36 14 0
263 103 272 0
264 -27 41 0
265 3 80 0
266 -20 17 1
267 -44 316 1
268 152 253 0
269 -65 255 1
270 76 243 1
271 -35 284 0
272 5 69 1
273 -114 249 0
274 32 127 0
275 192 172 1
276 -159 205 0
277 0 13 0
278 200 155 0
279 11 10 0
280 -11 38 1
281 -98 256 1
282 87 239 0
283 -88 259 1
284 8 71 0
285 1 8 1
286 -6 82 1
287 -27 55 1
288 -14 267 0
289 0 262 0
290 -36 69 1
291 -11 59 1
292 -177 271 1
293 -62 277 1
294 -4 21 0
295 84 243 0
296 -47 48 1
297 -36 48 0
298 -94 236 0
299 -6 20 0
300 -1 34 0
301 -88 251 1
302 2 11 0
303 -225 29 0
304 -25 31 0
305 3 76 1
306 -16 31 1
307 -36 262 1
308 -164 221 0
309 5 21 0
310 -1 11 1
311 -1 264 1
312 6 260 0
313 0 28 0
314 -98 239 0
315 -6 91 1
316 -98 251 0
317 85 244 1
318 -51 273 0
319 14 24 0
320 -9 21 0
321 57 253 1
322 50 251 1
323 -231 1 0
324 -5 15 1
325 -143 214 0
326 -51 248 0
327 147 219 0
328 39 258 0
329 92 150 0
330 72 282 0
331 0 -1 0
332 13 32 1
333 -22 258 1
So far I still can't replicate this. This should be a comment but is a bit too long.
Read in the data you've provided:
shots <- read.csv(text=
"X,PLAYER_ID,PLAYER_NAME,LOC_X,LOC_Y,SHOT_MADE_FLAG
0,201935,James Harden,14,55,0
1,201935,James Harden,0,24,0
2,201935,James Harden,50,74,0
3,201935,James Harden,160,215,0
4,201935,James Harden,22,21,1
5,201935,James Harden,-43,278,1
6,201935,James Harden,221,6,0
7,201935,James Harden,-27,21,0
8,201935,James Harden,-119,235,0
9,201935,James Harden,-223,101,0
")
The numeric variables here come out as numeric, but we'll convert to integer (to match your data as closely as possible) below.
Extend the data (randomly) to the size of your full data set:
sfun <- function(x) sample(x,replace=TRUE,size=1456)
set.seed(101)
shots2 <- with(shots,
data.frame(SHOT_MADE_FLAG=sfun(SHOT_MADE_FLAG),
LOC_X=as.integer(sfun(LOC_X)),
LOC_Y=as.integer(sfun(LOC_Y))))
Fit:
library(mgcv)
league_model_19 <- gam(SHOT_MADE_FLAG ~ ti(LOC_X) +
ti(LOC_Y) +
ti(LOC_X, LOC_Y),
data = shots2)
This works fine. You should try this code on your system and see if it works or not.
I also tried this with the data you provided above (333 rows) and it works fine.
So either (1) there's something weird about the remainder of your data that you haven't shown us, or (2) there's something weird about your R environment/package versions.
For (1), it would be helpful if you can post your full data set somewhere. (It should be sufficient to post shots[c('LOC_X','LOC_Y','SHOT_MADE_FLAG')], after checking that you still get the same issue with that subset of the data. We might not need all of the rows to reproduce, but we definitely need more than we have already.)
For (2), you could give us the results of sessionInfo() (or devtools::session_info()). It would be helpful to start from a clean R session to minimize the number of additional packages you have loaded, and to minimize the possibility that you have weird function definitions masking the ones in the package. (The results of find("ti"), find("gam") would also be useful; they should both be package:mgcv ...)
I have a data set that list the percentiles for a set of scores like this:
> percentiles
Score Percentile
1 231 0
2 385 1
3 403 2
4 413 3
5 418 4
6 424 5
7 429 6
8 434 7
9 437 8
10 441 9
11 443 10
I would like the "Score" column to run from 100 to 500. That is, I would like Scores 100 to 231 to be associated with a Percentile of 0, Scores 232 to 385 to be associated with a Percentile of 1, etc. Is there a simple way to fill in the values that do not appear in the sequence of "Score" values so it looks like the below data set?
> percentiles
Score Percentile
1 100 0
2 101 0
3 102 0
4 103 0
5 104 0
6 105 0
7 106 0
8 107 0
9 108 0
10 109 0
--------------------
130 229 0
131 230 0
132 231 0
133 232 1
134 233 1
135 234 1
136 235 1
137 236 1
138 237 1
139 238 1
140 239 1
If you convert percentiles to a data.table, you could do a rolling join with a new table of all scores 100:500. The rolling join with roll = -Inf gives a fill-backward behavior by itself, but still the 444:500 values are NA so a forward nafill is added at the end.
library(data.table)
setDT(percentiles)
percentiles[data.table(Score = 100:500), on = .(Score), roll = -Inf
][, Percentile := nafill(Percentile, 'locf')]
# Score Percentile
# 1: 100 0
# 2: 101 0
# 3: 102 0
# 4: 103 0
# 5: 104 0
# ---
# 397: 496 10
# 398: 497 10
# 399: 498 10
# 400: 499 10
# 401: 500 10
You might think about this differently: instead of a data frame to fill, as a set of breaks for binning your scores. Use the scores as the breaks with -Inf tacked on to have the lower bound. If you need something different to happen for the scores above the highest break, add Inf to the end of the breaks, but you'll need to come up with an additional label.
library(dplyr)
dat <- data.frame(Score = 100:500) %>%
mutate(Percentile = cut(Score, breaks = c(-Inf, percentiles$Score),
labels = percentiles$Percentile,
right = T, include.lowest = F))
Taking a look at a few of the breaking points:
slice(dat, c(129:135, 342:346))
#> Score Percentile
#> 1 228 0
#> 2 229 0
#> 3 230 0
#> 4 231 0
#> 5 232 1
#> 6 233 1
#> 7 234 1
#> 8 441 9
#> 9 442 10
#> 10 443 10
#> 11 444 <NA>
#> 12 445 <NA>
We could use complete
library(dplyr)
library(tidyr)
out <- complete(percentiles, Score = 100:500) %>%
fill(Percentile, .direction = "updown")
out %>%
slice(c(1:10, 130:140)) %>%
as.data.frame
# Score Percentile
#1 100 0
#2 101 0
#3 102 0
#4 103 0
#5 104 0
#6 105 0
#7 106 0
#8 107 0
#9 108 0
#10 109 0
#11 229 0
#12 230 0
#13 231 0
#14 232 1
#15 233 1
#16 234 1
#17 235 1
#18 236 1
#19 237 1
#20 238 1
#21 239 1
data
percentiles <- structure(list(Score = c(231L, 385L, 403L, 413L, 418L, 424L,
429L, 434L, 437L, 441L, 443L), Percentile = 0:10), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
In base R you could use the findInterval function to break up your sequence 100:500 into buckets determined by the Score, then index into the Percentile column:
x <- 100:500
ind <- findInterval(x, percentiles$Score, left.open = TRUE)
output <- data.frame(Score = x, Percentile = percentiles$Percentile[ind + 1])
Values of x above 443 will receive a percentile of NA.
Here is a base R solution, where cut() and match() are the key points to make it, i.e.,
df <- data.frame(Score = (x <- 100:500),
percentile = percentiles$Percentile[match(s <-cut(x,c(0,percentiles$Score)),levels(s))])
such that
> df
Score percentile
1 100 0
2 101 0
3 102 0
4 103 0
5 104 0
6 105 0
7 106 0
8 107 0
9 108 0
10 109 0
11 110 0
12 111 0
13 112 0
14 113 0
15 114 0
16 115 0
17 116 0
18 117 0
19 118 0
20 119 0
21 120 0
22 121 0
23 122 0
24 123 0
25 124 0
26 125 0
27 126 0
28 127 0
29 128 0
30 129 0
31 130 0
32 131 0
33 132 0
34 133 0
35 134 0
36 135 0
37 136 0
38 137 0
39 138 0
40 139 0
41 140 0
42 141 0
43 142 0
44 143 0
45 144 0
46 145 0
47 146 0
48 147 0
49 148 0
50 149 0
51 150 0
52 151 0
53 152 0
54 153 0
55 154 0
56 155 0
57 156 0
58 157 0
59 158 0
60 159 0
61 160 0
62 161 0
63 162 0
64 163 0
65 164 0
66 165 0
67 166 0
68 167 0
69 168 0
70 169 0
71 170 0
72 171 0
73 172 0
74 173 0
75 174 0
76 175 0
77 176 0
78 177 0
79 178 0
80 179 0
81 180 0
82 181 0
83 182 0
84 183 0
85 184 0
86 185 0
87 186 0
88 187 0
89 188 0
90 189 0
91 190 0
92 191 0
93 192 0
94 193 0
95 194 0
96 195 0
97 196 0
98 197 0
99 198 0
100 199 0
101 200 0
102 201 0
103 202 0
104 203 0
105 204 0
106 205 0
107 206 0
108 207 0
109 208 0
110 209 0
111 210 0
112 211 0
113 212 0
114 213 0
115 214 0
116 215 0
117 216 0
118 217 0
119 218 0
120 219 0
121 220 0
122 221 0
123 222 0
124 223 0
125 224 0
126 225 0
127 226 0
128 227 0
129 228 0
130 229 0
131 230 0
132 231 0
133 232 1
134 233 1
135 234 1
136 235 1
137 236 1
138 237 1
139 238 1
140 239 1
141 240 1
142 241 1
143 242 1
144 243 1
145 244 1
146 245 1
147 246 1
148 247 1
149 248 1
150 249 1
151 250 1
152 251 1
153 252 1
154 253 1
155 254 1
156 255 1
157 256 1
158 257 1
159 258 1
160 259 1
161 260 1
162 261 1
163 262 1
164 263 1
165 264 1
166 265 1
167 266 1
168 267 1
169 268 1
170 269 1
171 270 1
172 271 1
173 272 1
174 273 1
175 274 1
176 275 1
177 276 1
178 277 1
179 278 1
180 279 1
181 280 1
182 281 1
183 282 1
184 283 1
185 284 1
186 285 1
187 286 1
188 287 1
189 288 1
190 289 1
191 290 1
192 291 1
193 292 1
194 293 1
195 294 1
196 295 1
197 296 1
198 297 1
199 298 1
200 299 1
201 300 1
202 301 1
203 302 1
204 303 1
205 304 1
206 305 1
207 306 1
208 307 1
209 308 1
210 309 1
211 310 1
212 311 1
213 312 1
214 313 1
215 314 1
216 315 1
217 316 1
218 317 1
219 318 1
220 319 1
221 320 1
222 321 1
223 322 1
224 323 1
225 324 1
226 325 1
227 326 1
228 327 1
229 328 1
230 329 1
231 330 1
232 331 1
233 332 1
234 333 1
235 334 1
236 335 1
237 336 1
238 337 1
239 338 1
240 339 1
241 340 1
242 341 1
243 342 1
244 343 1
245 344 1
246 345 1
247 346 1
248 347 1
249 348 1
250 349 1
251 350 1
252 351 1
253 352 1
254 353 1
255 354 1
256 355 1
257 356 1
258 357 1
259 358 1
260 359 1
261 360 1
262 361 1
263 362 1
264 363 1
265 364 1
266 365 1
267 366 1
268 367 1
269 368 1
270 369 1
271 370 1
272 371 1
273 372 1
274 373 1
275 374 1
276 375 1
277 376 1
278 377 1
279 378 1
280 379 1
281 380 1
282 381 1
283 382 1
284 383 1
285 384 1
286 385 1
287 386 2
288 387 2
289 388 2
290 389 2
291 390 2
292 391 2
293 392 2
294 393 2
295 394 2
296 395 2
297 396 2
298 397 2
299 398 2
300 399 2
301 400 2
302 401 2
303 402 2
304 403 2
305 404 3
306 405 3
307 406 3
308 407 3
309 408 3
310 409 3
311 410 3
312 411 3
313 412 3
314 413 3
315 414 4
316 415 4
317 416 4
318 417 4
319 418 4
320 419 5
321 420 5
322 421 5
323 422 5
324 423 5
325 424 5
326 425 6
327 426 6
328 427 6
329 428 6
330 429 6
331 430 7
332 431 7
333 432 7
334 433 7
335 434 7
336 435 8
337 436 8
338 437 8
339 438 9
340 439 9
341 440 9
342 441 9
343 442 10
344 443 10
345 444 NA
346 445 NA
347 446 NA
348 447 NA
349 448 NA
350 449 NA
351 450 NA
352 451 NA
353 452 NA
354 453 NA
355 454 NA
356 455 NA
357 456 NA
358 457 NA
359 458 NA
360 459 NA
361 460 NA
362 461 NA
363 462 NA
364 463 NA
365 464 NA
366 465 NA
367 466 NA
368 467 NA
369 468 NA
370 469 NA
371 470 NA
372 471 NA
373 472 NA
374 473 NA
375 474 NA
376 475 NA
377 476 NA
378 477 NA
379 478 NA
380 479 NA
381 480 NA
382 481 NA
383 482 NA
384 483 NA
385 484 NA
386 485 NA
387 486 NA
388 487 NA
389 488 NA
390 489 NA
391 490 NA
392 491 NA
393 492 NA
394 493 NA
395 494 NA
396 495 NA
397 496 NA
398 497 NA
399 498 NA
400 499 NA
401 500 NA
A bit hacky Base R:
# Create a dataframe with all score values in the range:
score_range <- merge(data.frame(Score = c(100:500)), percentiles, by = "Score", all.x = TRUE)
# Reverse the order of the dataframe:
score_range <- score_range[rev(order(score_range$Score)),]
# Change the first NA to the maximum score:
score_range$Percentile[which(is.na(score_range$Percentile))][1] <- max(score_range$Percentile, na.rm = TRUE)
# Replace all NAs with the value before them:
score_range$Percentile <- na.omit(score_range$Percentile)[cumsum(!is.na(score_range$Percentile))]
Data:
percentiles <- structure(list(Score = c(231L, 385L, 403L, 413L, 418L, 424L,
429L, 434L, 437L, 441L, 443L),
Percentile = 0:10), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
I have to matrices (or data frames) that contain the same column and rownames.
The rownames of matrix 1 has an ID called dataTissue, the rownames of matrix 2 has the ID dataSerum. I would like to combine the two matrices where identical rows are placed adjacent (on top of each other). Please see my desired output.
I was thinking of using rbind, but I am not sure how to get this structure.
matrix 1:
> head(TumorTissue3)
020 045 080 082 084 086 088 090 091 092 094 096 1018 102 1065
dataTissue.hsa-let-7a-2-3p 1 0 1 0 0 0 1 1 0 2 0 0 0 5 0
dataTissue.hsa-let-7a-3p 2 0 0 0 1 0 1 1 0 2 1 1 1 1 0
dataTissue.hsa-let-7a-5p 67 12 25 34 40 115 42 33 26 58 22 149 64 178 52
dataTissue.hsa-let-7b-3p 11 5 10 15 1 34 29 59 16 30 11 44 11 65 3
dataTissue.hsa-let-7b-5p 4289 689 902 3340 3947 7326 3146 6249 2032 5664 1657 6619 1577 21132 720
dataTissue.hsa-let-7c-3p 1 0 0 2 0 9 2 13 2 10 2 13 5 9 0
1068 1104 112 113 1167 1196 120 121 1222 1237 1241 1302 1304 1322 134
dataTissue.hsa-let-7a-2-3p 2 11 0 0 0 0 3 0 2 1 0 0 3 5 1
dataTissue.hsa-let-7a-3p 0 0 1 0 0 1 0 0 1 4 1 0 2 3 0
dataTissue.hsa-let-7a-5p 70 266 60 8 29 99 90 37 102 93 28 22 156 214 176
dataTissue.hsa-let-7b-3p 14 15 24 12 8 8 43 25 14 33 9 12 16 38 11
dataTissue.hsa-let-7b-5p 1780 4185 5797 1168 1039 1006 10818 3269 2893 8847 3136 4990 1798 10142 3248
dataTissue.hsa-let-7c-3p 5 7 5 2 1 3 3 3 1 10 27 1 17 11 3
1372 140 145 146 1474 1532 1540 157 158 1588 1604 161 1743 176
dataTissue.hsa-let-7a-2-3p 0 0 0 0 6 1 10 0 6 1 0 1 0 2
dataTissue.hsa-let-7a-3p 1 0 1 0 0 1 0 0 3 0 0 2 0 1
dataTissue.hsa-let-7a-5p 18 1 53 17 129 54 110 2 165 70 51 165 81 77
dataTissue.hsa-let-7b-3p 3 0 22 12 46 3 60 0 79 9 15 40 3 50
dataTissue.hsa-let-7b-5p 931 245 3707 3632 16730 2653 13619 93 27568 3485 6202 18206 3094 11185
dataTissue.hsa-let-7c-3p 1 0 12 0 10 0 5 0 20 10 8 7 2 9
1808 1809 185 1859 186 1894 192 201 204 21 215 2218 236 27 32
dataTissue.hsa-let-7a-2-3p 2 1 1 0 1 0 0 0 3 0 2 6 0 3 5
dataTissue.hsa-let-7a-3p 0 0 0 0 1 0 0 0 1 0 3 3 0 1 0
dataTissue.hsa-let-7a-5p 33 160 56 16 92 63 90 3 119 58 116 46 37 137 40
dataTissue.hsa-let-7b-3p 11 1 23 10 18 3 48 14 34 16 54 23 12 96 33
dataTissue.hsa-let-7b-5p 3497 548 5575 2886 6664 1030 5895 604 8151 4076 14150 11132 2154 24793 5654
dataTissue.hsa-let-7c-3p 3 3 5 6 6 2 4 2 6 9 20 6 2 11 6
38 39 45 46 bf33 d10 HEP014 HEP015 mm7 s26 TxHEP-014 TxHEP-015
dataTissue.hsa-let-7a-2-3p 0 0 6 0 0 0 1 2 2 4 0 1
dataTissue.hsa-let-7a-3p 0 0 0 2 0 2 2 0 1 2 0 1
dataTissue.hsa-let-7a-5p 18 75 192 41 41 88 55 119 24 112 223 25
dataTissue.hsa-let-7b-3p 6 16 56 11 17 24 8 29 12 29 7 18
dataTissue.hsa-let-7b-5p 1648 2805 19275 1769 4554 5316 1552 7605 2369 7495 33820 2144
dataTissue.hsa-let-7c-3p 3 1 14 2 2 3 6 18 1 23 3 2
TxHEP-018 vs29
dataTissue.hsa-let-7a-2-3p 1 1
dataTissue.hsa-let-7a-3p 0 0
dataTissue.hsa-let-7a-5p 13 50
dataTissue.hsa-let-7b-3p 23 47
dataTissue.hsa-let-7b-5p 1631 4990
dataTissue.hsa-let-7c-3p 1 11
matrix 2:
> head(Serum3)
020 045 080 082 084 086 088 090 091 092 094 096 1018 102 1065 1068
dataSerum.hsa-let-7a-2-3p 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
dataSerum.hsa-let-7a-3p 4 2 0 2 329 1 186 0 2 4 1 6 13 7 15 3
dataSerum.hsa-let-7a-5p 988 2033 587 1480 4035 1167 4641 761 668 4118 6040 2660 10368 5802 5668 2709
dataSerum.hsa-let-7b-3p 9 8 4 18 76 3 62 1 5 24 9 9 41 10 30 6
dataSerum.hsa-let-7b-5p 1499 849 108 868 3197 202 2411 273 224 1309 943 822 5819 1594 3335 1164
dataSerum.hsa-let-7c-3p 0 0 0 0 29 0 11 0 0 0 5 0 2 0 0 1
1104 112 113 1167 1196 120 121 1222 1237 1241 1302 1304 1322 134 1372
dataSerum.hsa-let-7a-2-3p 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
dataSerum.hsa-let-7a-3p 15 2 0 8 5 2 0 11 13 51 4 0 1 7 0
dataSerum.hsa-let-7a-5p 30222 1836 1518 3902 5122 4597 983 3809 6310 3165 4023 434 2489 1496 600
dataSerum.hsa-let-7b-3p 57 2 1 14 19 14 2 14 35 162 10 0 10 11 6
dataSerum.hsa-let-7b-5p 11314 329 354 2169 2277 747 256 1157 3328 3662 1057 274 1267 991 305
dataSerum.hsa-let-7c-3p 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0
140 145 146 1474 1532 1540 157 158 1588 1604 161 1743 176 1808 1809
dataSerum.hsa-let-7a-2-3p 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
dataSerum.hsa-let-7a-3p 27 10 0 2 2 5 0 40 0 0 19 0 1 4 4
dataSerum.hsa-let-7a-5p 5364 12731 670 473 1045 767 927 49689 535 8 78671 757 1502 1146 539
dataSerum.hsa-let-7b-3p 63 37 1 3 14 5 10 59 1 0 56 6 3 12 6
dataSerum.hsa-let-7b-5p 2262 3209 88 363 759 459 309 13482 234 3 15113 1064 545 587 569
dataSerum.hsa-let-7c-3p 6 0 0 0 0 0 1 0 0 0 0 0 1 0 0
185 1859 186 1894 192 201 204 21 215 2218 236 27 32
dataSerum.hsa-let-7a-2-3p 0 0 0 0 0 3 0 0 0 0 0 0 0
dataSerum.hsa-let-7a-3p 16 0 101 3 7 346 10 1 93 0 305 6 12
dataSerum.hsa-let-7a-5p 42694 528 18730 498 3410 20484 11907 1031 474051 2185 299085 14576 9218
dataSerum.hsa-let-7b-3p 24 7 164 14 16 85 29 12 111 9 145 32 12
dataSerum.hsa-let-7b-5p 5454 216 4647 182 1149 8973 2645 147 72681 807 46354 4672 2375
dataSerum.hsa-let-7c-3p 0 0 9 0 0 32 0 0 0 0 3 1 0
38 39 45 46 bf33 d10 HEP014 HEP015 mm7 s26 TxHEP-014 TxHEP-015
dataSerum.hsa-let-7a-2-3p 0 0 0 1 0 0 0 0 0 0 0 0
dataSerum.hsa-let-7a-3p 9 17 6 119 27 0 1 5 0 10 7 1
dataSerum.hsa-let-7a-5p 2395 4382 8361 9747 6440 616 2981 5851 291 1386 3709 2494
dataSerum.hsa-let-7b-3p 18 28 24 104 33 6 12 24 5 36 2 11
dataSerum.hsa-let-7b-5p 690 1756 3425 3972 2330 136 1035 2152 235 638 555 1409
dataSerum.hsa-let-7c-3p 1 3 2 10 1 0 0 0 0 2 0 0
TxHEP-018 vs29
dataSerum.hsa-let-7a-2-3p 0 0
dataSerum.hsa-let-7a-3p 0 2
dataSerum.hsa-let-7a-5p 397 266
dataSerum.hsa-let-7b-3p 0 9
dataSerum.hsa-let-7b-5p 67 182
dataSerum.hsa-let-7c-3p 0 0
Output:
dataTissue.hsa-let-7a-2-3p 23 24 35 ....
dataSerum.hsa-let-7a-2-3p 42 535 54 ....
dataTissue.hsa-let-7a-3p 234 224 35 ....
dataSerum.hsa-let-7a-3p 2 33 54 ....
We can rbind both the matrices and then order by the rownames after removing the prefix part 'dataTissue./dataSerum.' from the row names using sub.
res <- rbind(TumorTissue3, Serum3)
nm1 <- sub('^[.]+\\.', '', row.names(res))
res[order(nm1),]