I have been trying to generate a dummy variable from the data column for interval.
Sample data
Date <- seq(as.Date("1988-01-01"), as.Date("2018-12-31"), by="1 day")
DATASET <- data.frame(rnorm(11323), Date)
I would like to create an interval: 20-04 : 20-08 for each year codes as 1. I would be grateful for the hint with code for doing this.
You could compare the day of the year. In base R that would be
DATASET$day_of_year <- as.integer(format(DATASET$Date, "%j"))
DATASET$flag <- +(with(DATASET, ifelse(as.integer(format(Date, "%Y")) %% 4 == 0 ,
day_of_year %in% 111:233, day_of_year %in% 110:232)))
For leap years 20-04 is 111th day of the year and 20-08 is 233rd day and for rest of the years they are 110 and 232 respectively. We assign 1 when the date is between those 2 values.
Maybe you can try the following code to have codes for the interval between 20-4 and 20-8 for each year
DATASET <- within(DATASET,
code <- ave(as.numeric(format(DATASET$Date,"%m%d")),
as.numeric(format(DATASET$Date,"%Y")),
FUN = function(x) ifelse(x>=420 & x <=820,1,0)))
and a small piece of result is shown as below
> DATASET
rnorm.11323. Date code
1 -0.326546058 1988-01-01 0
2 -0.561589735 1988-01-02 0
3 -0.417091199 1988-01-03 0
4 -0.482488496 1988-01-04 0
5 0.039820482 1988-01-05 0
6 -0.285270230 1988-01-06 0
7 -1.301004464 1988-01-07 0
8 1.835118221 1988-01-08 0
9 -0.207213889 1988-01-09 0
10 1.695089989 1988-01-10 0
11 -0.618905489 1988-01-11 0
12 1.689917961 1988-01-12 0
13 -0.272349252 1988-01-13 0
14 0.585059685 1988-01-14 0
15 -0.793666725 1988-01-15 0
16 -0.276084733 1988-01-16 0
17 -0.474363507 1988-01-17 0
18 1.703568414 1988-01-18 0
19 0.011776841 1988-01-19 0
20 0.029492096 1988-01-20 0
21 -1.313446231 1988-01-21 0
22 -0.127952381 1988-01-22 0
23 -0.203861769 1988-01-23 0
24 -0.365669967 1988-01-24 0
25 -0.239937083 1988-01-25 0
26 0.620562975 1988-01-26 0
27 0.652111601 1988-01-27 0
28 -0.869191381 1988-01-28 0
29 0.130085565 1988-01-29 0
30 0.059768397 1988-01-30 0
31 0.349921562 1988-01-31 0
32 -1.087277224 1988-02-01 0
33 -1.250976040 1988-02-02 0
34 -0.970337410 1988-02-03 0
35 2.063232550 1988-02-04 0
36 -0.294777997 1988-02-05 0
37 0.535559649 1988-02-06 0
38 -0.229363577 1988-02-07 0
39 -1.819158790 1988-02-08 0
40 1.020335484 1988-02-09 0
41 0.102285275 1988-02-10 0
42 1.254992570 1988-02-11 0
43 1.584044869 1988-02-12 0
44 -0.629629933 1988-02-13 0
45 -1.073561540 1988-02-14 0
46 1.273920124 1988-02-15 0
47 -0.376367657 1988-02-16 0
48 1.331066300 1988-02-17 0
49 0.694872356 1988-02-18 0
50 0.863826292 1988-02-19 0
51 -1.411795778 1988-02-20 0
52 0.388793450 1988-02-21 0
53 -0.216112938 1988-02-22 0
54 -0.196632011 1988-02-23 0
55 0.558895841 1988-02-24 0
56 0.818765192 1988-02-25 0
57 -1.250469812 1988-02-26 0
58 0.803231988 1988-02-27 0
59 0.002634810 1988-02-28 0
60 0.252328475 1988-02-29 0
61 -0.958851197 1988-03-01 0
62 -1.448732431 1988-03-02 0
63 0.647314543 1988-03-03 0
64 0.644802476 1988-03-04 0
65 -0.087973096 1988-03-05 0
66 1.088076864 1988-03-06 0
67 -0.293465532 1988-03-07 0
68 0.141825697 1988-03-08 0
69 0.413649305 1988-03-09 0
70 -1.877052966 1988-03-10 0
71 -2.200275448 1988-03-11 0
72 -0.025524427 1988-03-12 0
73 1.236501510 1988-03-13 0
74 -0.872516837 1988-03-14 0
75 -1.063727523 1988-03-15 0
76 0.264564444 1988-03-16 0
77 0.971958801 1988-03-17 0
78 0.102470655 1988-03-18 0
79 1.369131551 1988-03-19 0
80 -0.041148284 1988-03-20 0
81 -2.476135538 1988-03-21 0
82 0.836740451 1988-03-22 0
83 0.078102241 1988-03-23 0
84 -0.949778901 1988-03-24 0
85 -0.975874102 1988-03-25 0
86 2.011305586 1988-03-26 0
87 1.441333862 1988-03-27 0
88 1.404182762 1988-03-28 0
89 -0.425158054 1988-03-29 0
90 1.250722900 1988-03-30 0
91 0.060629220 1988-03-31 0
92 -1.593162931 1988-04-01 0
93 0.475640908 1988-04-02 0
94 0.102547315 1988-04-03 0
95 -2.350611181 1988-04-04 0
96 0.185065822 1988-04-05 0
97 0.463470128 1988-04-06 0
98 1.722202344 1988-04-07 0
99 -1.344383635 1988-04-08 0
100 0.858491817 1988-04-09 0
101 -0.008338174 1988-04-10 0
102 0.572599035 1988-04-11 0
103 0.138858045 1988-04-12 0
104 -1.808541857 1988-04-13 0
105 1.308927384 1988-04-14 0
106 -2.374371017 1988-04-15 0
107 1.134519340 1988-04-16 0
108 1.604437740 1988-04-17 0
109 -0.109549779 1988-04-18 0
110 -0.011355562 1988-04-19 0
111 -1.462229758 1988-04-20 1
112 1.006583367 1988-04-21 1
113 -0.124824926 1988-04-22 1
114 1.611795681 1988-04-23 1
115 0.818715370 1988-04-24 1
116 -0.440445043 1988-04-25 1
117 0.024114452 1988-04-26 1
118 -1.418044894 1988-04-27 1
119 -0.632317886 1988-04-28 1
120 0.599948691 1988-04-29 1
121 1.055118998 1988-04-30 1
122 0.301676490 1988-05-01 1
123 -0.662547532 1988-05-02 1
124 0.425191055 1988-05-03 1
125 1.715003304 1988-05-04 1
126 -0.298346044 1988-05-05 1
127 -1.043983256 1988-05-06 1
128 -1.194283503 1988-05-07 1
129 -1.517810914 1988-05-08 1
130 0.386735460 1988-05-09 1
131 0.742102056 1988-05-10 1
132 0.953762078 1988-05-11 1
133 -0.602941007 1988-05-12 1
134 1.469329252 1988-05-13 1
135 -0.233230972 1988-05-14 1
136 0.663378860 1988-05-15 1
137 -0.749108544 1988-05-16 1
138 0.591009181 1988-05-17 1
139 0.013732152 1988-05-18 1
140 -0.774612526 1988-05-19 1
141 -1.707183964 1988-05-20 1
142 -0.808360648 1988-05-21 1
143 1.420371293 1988-05-22 1
144 0.603838459 1988-05-23 1
145 0.743964804 1988-05-24 1
146 0.059498235 1988-05-25 1
147 -0.597795793 1988-05-26 1
148 0.867167938 1988-05-27 1
149 0.441291857 1988-05-28 1
150 1.348769636 1988-05-29 1
151 -1.768938126 1988-05-30 1
152 1.070400122 1988-05-31 1
153 0.321542409 1988-06-01 1
154 -0.495030342 1988-06-02 1
155 -0.740337974 1988-06-03 1
156 -1.887552572 1988-06-04 1
157 0.805602475 1988-06-05 1
158 -0.824104379 1988-06-06 1
159 0.801460489 1988-06-07 1
160 -0.912871263 1988-06-08 1
161 -0.422677222 1988-06-09 1
162 0.126785279 1988-06-10 1
163 -0.598578319 1988-06-11 1
164 -1.535492985 1988-06-12 1
165 0.018486996 1988-06-13 1
166 -1.156209268 1988-06-14 1
167 0.656276068 1988-06-15 1
168 0.045640396 1988-06-16 1
169 0.627538985 1988-06-17 1
170 2.640792582 1988-06-18 1
171 -0.383475408 1988-06-19 1
172 -2.631633446 1988-06-20 1
173 0.772980776 1988-06-21 1
174 1.930884904 1988-06-22 1
175 2.026248604 1988-06-23 1
176 -0.134588724 1988-06-24 1
177 -0.593768442 1988-06-25 1
178 -0.427553478 1988-06-26 1
179 0.303955588 1988-06-27 1
180 -0.195481230 1988-06-28 1
181 1.231190798 1988-06-29 1
182 -0.871672993 1988-06-30 1
183 -1.002028081 1988-07-01 1
184 -0.912352588 1988-07-02 1
185 -0.714319398 1988-07-03 1
186 0.053181016 1988-07-04 1
187 0.865163557 1988-07-05 1
188 0.474865269 1988-07-06 1
189 -1.105410939 1988-07-07 1
190 -0.110529764 1988-07-08 1
191 -0.805821554 1988-07-09 1
192 -1.550774659 1988-07-10 1
193 -0.508057551 1988-07-11 1
194 -0.755394814 1988-07-12 1
195 0.993023957 1988-07-13 1
196 -0.342427853 1988-07-14 1
197 -1.481690158 1988-07-15 1
198 -0.095168751 1988-07-16 1
199 1.320208464 1988-07-17 1
200 -0.340080090 1988-07-18 1
201 -1.545902324 1988-07-19 1
202 0.389589474 1988-07-20 1
203 -0.734778233 1988-07-21 1
204 0.296933278 1988-07-22 1
205 -0.024469569 1988-07-23 1
206 1.261660247 1988-07-24 1
207 -0.136786252 1988-07-25 1
208 0.908519533 1988-07-26 1
209 1.576193030 1988-07-27 1
210 0.413044482 1988-07-28 1
211 -0.601938271 1988-07-29 1
212 0.495905040 1988-07-30 1
213 0.440665366 1988-07-31 1
214 -0.804152825 1988-08-01 1
215 -1.065705237 1988-08-02 1
216 0.149246056 1988-08-03 1
217 -0.530891226 1988-08-04 1
218 -0.879233155 1988-08-05 1
219 -0.262727374 1988-08-06 1
220 -2.244552614 1988-08-07 1
221 -1.531707789 1988-08-08 1
222 1.498847169 1988-08-09 1
223 0.810096179 1988-08-10 1
224 -1.690822775 1988-08-11 1
225 0.303456055 1988-08-12 1
226 -0.874022497 1988-08-13 1
227 0.244933676 1988-08-14 1
228 1.220193574 1988-08-15 1
229 -0.456840188 1988-08-16 1
230 1.083075786 1988-08-17 1
231 -1.769152445 1988-08-18 1
232 -1.038850200 1988-08-19 1
233 0.963345582 1988-08-20 1
234 0.036574589 1988-08-21 0
235 -2.613751531 1988-08-22 0
236 1.441930677 1988-08-23 0
237 -1.927433949 1988-08-24 0
238 -0.045661284 1988-08-25 0
239 0.974935858 1988-08-26 0
240 -1.457985965 1988-08-27 0
241 0.914085417 1988-08-28 0
242 -0.004152904 1988-08-29 0
243 1.653886738 1988-08-30 0
244 0.972947047 1988-08-31 0
Related
Good morning,
I need to read the following .data file : https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/cleveland.data
For this , I tried without success :
f <-file("https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/cleveland.data", open="r" ,encoding="UTF-16LE")
data <- read.table(f, dec=",", header=F)
Thank you a lot for help!
I would try to use the coatless/ucidata package to access the data.
https://github.com/coatless/ucidata
Here you can see how the package loads in the data file and processing:
https://github.com/coatless/ucidata/blob/master/data-raw/heart_disease_build.R
If you wish to try out the package, you will need devtools installed. Here is what you can try:
# install.packages("devtools")
devtools::install_github("coatless/ucidata")
# load data
data("heart_disease_cl", package = "ucidata")
# show beginning rows of data
head(heart_disease_cl)
Output
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal num
1 63 Male typical angina 145 233 1 probable/definite hypertrophy 150 No 2.3 downsloping 0 fixed defect 0
2 67 Male asymptomatic 160 286 0 probable/definite hypertrophy 108 Yes 1.5 flat 3 normal 2
3 67 Male asymptomatic 120 229 0 probable/definite hypertrophy 129 Yes 2.6 flat 2 reversable defect 1
4 37 Male non-anginal pain 130 250 0 normal 187 No 3.5 downsloping 0 normal 0
5 41 Female atypical angina 130 204 0 probable/definite hypertrophy 172 No 1.4 upsloping 0 normal 0
6 56 Male atypical angina 120 236 0 normal 178 No 0.8 upsloping 0 normal 0
I had found another solution with RCurl :
library (RCurl)
download <- getURL("http://archive.ics.uci.edu/ml/machine-learning-databases/00519/heart_failure_clinical_records_dataset.csv")
data <- read.csv (text = download)
head(data)
#Output :
age anaemia creatinine_phosphokinase diabetes ejection_fraction high_blood_pressure platelets serum_creatinine
1 75 0 582 0 20 1 265000 1.9
2 55 0 7861 0 38 0 263358 1.1
3 65 0 146 0 20 0 162000 1.3
4 50 1 111 0 20 0 210000 1.9
5 65 1 160 1 20 0 327000 2.7
6 90 1 47 0 40 1 204000 2.1
serum_sodium sex smoking time DEATH_EVENT
1 130 1 0 4 1
2 136 1 0 6 1
3 129 1 1 7 1
4 137 1 0 7 1
5 116 0 0 8 1
6 132 1 1 8 1
I am new to r and I have a dataframe very close to the one below and I would love to find a general way that tells me how many times plus 1, the number "0" appears for each country (intro4) and id.
Intro4 number id
221 TAN 0 19
222 TAN 0 73
223 TAN 0 73
224 TOG 0 37
225 TOG 0 58
226 UGA 0 96
227 UGA 0 112
228 UGA 0 96
229 ZAM 0 40
230 ZAM 0 99
231 ZAM 0 139
I can do it by hand by it is a big data frame and would take forever, count () gives me the frequency but doesn't divide it between different countries. I have found a way to do it but I will have to select and filter for each individual county (intro4) and add 1 to the result. I was wondering if there was any quicker way to fo it. The code I have tried was this one:
projects <- finalr %>% select (Intro4,number,id)
projects1<-projects %>% filter (str_detect (number, "0"))
projects2<-projects1 %>%arrange (Intro4)
projects3<-sum(projects2$Intro4 == "TAN", na.rm = TRUE)
projects4<-sum(projects2$Intro4=="UGA",na.rm=TRUE)
I would be extremely grateful for any help, thank you :)
You can also do it as followed:
library(dplyr)
dat <- read.table(header = T, text =
"Intro4 number id
TAN 0 19
TAN 0 73
TAN 0 73
TOG 0 37
TOG 0 58
UGA 0 96
UGA 0 112
UGA 0 96
ZAM 0 40
ZAM 0 99
ZAM 0 139", stringsAsFactors = F)
dat %>% group_by(Intro4, id, number) %>% tally()
Which produces:
Intro4 id number n
<chr> <int> <int> <int>
1 TAN 19 0 1
2 TAN 73 0 2
3 TOG 37 0 1
4 TOG 58 0 1
5 UGA 96 0 2
6 UGA 112 0 1
7 ZAM 40 0 1
8 ZAM 99 0 1
9 ZAM 139 0 1
Assuming number can be anything like 0, 1, 2 etc. one can count occurrence of 0 by sum(number==0). A solution using dplyr can be as:
library(dplyr)
df %>% group_by(Intro4, id) %>%
summarise(count = sum(number==0))
# # A tibble: 9 x 3
# # Groups: Intro4 [?]
# Intro4 id count
# <chr> <int> <int>
# 1 TAN 19 1
# 2 TAN 73 2
# 3 TOG 37 1
# 4 TOG 58 1
# 5 UGA 96 2
# 6 UGA 112 1
# 7 ZAM 40 1
# 8 ZAM 99 1
# 9 ZAM 139 1
Data:
df <- read.table(text="
Intro4 number id
221 TAN 0 19
222 TAN 0 73
223 TAN 0 73
224 TOG 0 37
225 TOG 0 58
226 UGA 0 96
227 UGA 0 112
228 UGA 0 96
229 ZAM 0 40
230 ZAM 0 99
231 ZAM 0 139",
header = TRUE, stringsAsFactors = FALSE)
I have started to work with species distribution modeling with GLM. Using BIOCLIM environmental data (for example: Bio10, Bio15, Bio16, Bio17 as predictors), the following data (stored in an object presausTrain):
ID bioclim_10 bioclim_11 bioclim_15 bioclim_16 pres longitude latitude
2 2 225.00000 105.00000000 22.206624 299.18014 1 -58.8786 -34.2269
3 3 228.97882 112.97809077 27.000000 319.94470 1 -59.5144 -33.7806
4 4 219.00000 104.57779206 16.000000 265.57779 1 -57.2555 -35.2549
6 6 188.00000 83.00000000 18.000000 260.42379 1 -57.5419 -38.0551
9 9 224.58419 104.73418836 23.000000 320.08305 1 -58.9186 -34.4132
10 10 243.60300 94.16917531 64.561824 85.17573 1 -68.6146 -32.8886
11 11 224.58433 104.73658836 23.000000 320.09025 1 -58.9187 -34.4133
12 12 253.00000 97.00000000 68.608231 71.99121 1 -68.5041 -32.3345
13 13 224.60863 104.75578836 23.000000 320.02305 1 -58.9195 -34.4128
15 15 245.44112 94.58706179 64.849824 84.25853 1 -68.6026 -32.8416
16 16 264.02281 151.00000000 54.022813 393.34787 1 -60.7727 -28.6506
17 17 244.67617 128.19141384 48.323829 366.28249 1 -60.6717 -31.6380
18 18 263.00000 149.49003689 53.490037 391.42668 1 -60.7500 -28.7500
19 19 272.04463 181.06767992 43.272909 412.80043 1 -58.1522 -25.1102
20 20 250.00000 132.00000000 49.877386 358.92412 1 -60.8829 -31.2539
21 21 268.54597 165.00000000 32.000000 418.09660 1 -58.0293 -28.0340
26 26 263.03251 149.36775948 53.286986 392.57182 1 -60.7333 -28.7333
27 27 262.00000 149.00000000 52.954712 394.07047 1 -60.6666 -28.7857
28 28 194.26954 91.54652958 113.000000 221.44775 1 -70.8308 -33.2159
29 29 195.00139 91.98381950 113.000000 219.30565 1 -70.8255 -33.2179
30 30 194.71515 92.34394042 113.000000 219.32903 1 -70.8312 -33.1968
31 31 194.87274 92.25693323 113.000000 218.64974 1 -70.8271 -33.2033
32 32 262.51488 149.00000000 53.000000 391.44238 1 -60.7334 -28.7999
33 33 236.09116 148.19977261 21.050265 543.87328 1 -53.9738 -25.8564
34 34 244.17649 128.15908399 47.077874 363.03794 1 -60.6339 -31.6890
36 36 249.80369 132.80368760 47.196312 364.22593 1 -60.2472 -31.2462
37 37 268.00000 164.88563766 32.000000 414.86622 1 -58.0654 -28.0482
38 38 268.00000 164.86220565 32.000000 414.68268 1 -58.0699 -28.0454
39 39 256.00000 142.51301366 48.000000 358.57247 1 -60.5333 -29.7500
40 40 255.02037 143.12581264 46.732643 438.70468 1 -59.7161 -29.3281
41 41 264.00000 151.00000000 54.000000 394.65955 1 -60.7500 -28.6500
42 42 254.54615 164.95675375 19.200389 502.30639 1 -54.4563 -25.6887
43 43 272.00000 173.71328176 36.025171 467.51253 1 -58.1000 -26.5833
44 44 286.97773 208.08168096 56.000000 292.08590 1 -59.5522 -21.2787
45 45 224.22325 78.22324976 38.606279 185.39521 1 -63.5337 -37.7471
46 46 248.74987 159.74987480 27.453648 559.43635 1 -54.2713 -25.6734
47 47 209.41746 124.45790111 107.988728 331.33831 1 -71.6073 -33.5050
48 48 244.38027 128.36415875 49.000000 369.61503 1 -60.6817 -31.5992
49 49 162.85989 96.36235347 118.491117 443.99917 1 -71.5244 -33.1645
50 50 130.32560 17.41336935 68.079547 360.58826 1 -71.1000 -40.9500
51 51 139.05510 25.70054673 69.765255 389.11327 1 -71.0837 -40.9810
52 52 209.13482 124.35046642 107.868234 332.58278 1 -71.6089 -33.5031
53 53 256.00458 165.33361100 21.301138 511.40500 1 -54.4162 -25.6967
54 54 271.00000 170.00000000 60.000000 362.54198 1 -60.4542 -25.9167
56 56 229.00000 112.35964626 25.000000 301.35039 1 -59.0210 -33.6877
57 57 119.99753 15.10747321 54.000000 471.71589 1 -71.7248 -42.7099
58 58 135.70071 20.70070732 72.827280 349.44457 1 -71.0065 -41.0595
59 59 264.00000 174.43120494 23.910081 420.64503 1 -57.0766 -26.0751
60 60 262.52382 173.72329246 25.077236 432.73019 1 -57.0500 -26.0167
62 62 179.34210 80.87832470 86.102021 594.32138 1 -72.6524 -37.8537
63 63 154.27204 63.26968212 83.647330 756.03579 1 -72.7667 -37.6333
64 64 170.36894 82.95671452 76.716261 582.33120 1 -72.9125 -38.0167
65 65 255.29339 141.05130937 44.000000 362.34977 1 -59.6919 -30.0224
68 68 244.00000 126.00000000 47.000000 373.97578 1 -60.7068 -31.8564
70 70 169.65447 81.75782454 60.138823 575.48334 1 -72.6000 -38.7333
71 71 280.00000 209.22244349 60.000000 311.98601 1 -60.0000 -20.0000
74 74 173.06376 91.94939494 86.649798 808.16328 1 -72.9333 -37.1667
75 75 93.88276 -3.88122756 123.993938 122.31361 1 -65.7049 -23.1626
77 77 244.73709 128.25037481 48.750699 368.01469 1 -60.7000 -31.6333
78 78 238.25716 118.42208981 26.120460 317.28934 1 -58.5249 -33.0121
79 79 264.68778 215.00000000 54.000000 469.93021 1 -63.0000 -17.0000
81 81 132.00000 77.00000000 37.000000 770.18289 1 -74.1167 -43.3500
82 82 204.24999 73.75029357 31.762202 275.78719 1 -60.2000 -37.3000
84 84 230.00000 113.03251305 23.367766 283.85559 1 -58.7333 -33.4833
85 85 239.68529 122.46175316 12.326192 327.46175 1 -55.7766 -32.5428
86 86 192.89750 78.09241708 19.000000 252.85173 1 -58.0658 -37.8406
87 87 127.72334 35.72334013 73.511696 1099.73574 1 -71.8167 -38.2167
90 90 225.43089 107.43089205 22.000000 304.29268 1 -58.3902 -34.8034
91 91 134.53429 72.02286008 40.000000 865.02286 1 -73.6167 -43.1167
92 92 225.07390 102.00000000 39.238986 337.55187 1 -60.7313 -34.2004
93 93 255.09615 141.09614673 15.688826 373.75971 1 -56.4500 -30.4300
95 95 168.91143 99.08857071 84.088571 593.54574 1 -73.0167 -36.7333
96 96 241.33689 219.33688825 75.000000 952.96431 1 -59.0000 -13.0000
97 97 267.51799 180.35046950 87.000000 353.46857 1 -63.0700 -20.8700
98 98 262.97274 210.03301311 61.289635 734.77698 1 -63.6667 -17.4500
99 99 217.00000 98.96529301 18.652335 283.29342 1 -57.9995 -35.5728
102 102 229.00000 107.00000000 28.000000 311.07590 1 -59.8228 -34.3834
104 104 225.00000 104.96487882 22.610418 318.55003 1 -59.0000 -34.4000
105 105 259.00000 147.04660936 24.512470 410.87977 1 -57.0944 -29.7149
107 107 244.31221 120.02550008 33.687788 366.17838 1 -59.0000 -31.8333
108 108 208.64289 87.07940941 14.000000 206.82547 1 -57.1347 -37.0029
109 109 248.30467 157.12855496 18.887519 493.23855 1 -54.3167 -25.9000
112 112 259.00000 151.00000000 22.000000 434.28496 1 -56.6444 -29.1753
113 113 227.87889 110.74950291 22.000000 310.87978 1 -58.3934 -34.7014
114 114 188.31179 83.92218970 17.311789 259.70139 1 -57.8431 -38.2656
116 116 224.69761 106.55401173 23.302389 327.10194 1 -58.5911 -34.4191
117 117 222.10785 105.00227249 25.889880 343.91302 1 -58.7179 -34.5725
118 118 200.28610 81.28609587 20.000000 248.04349 1 -58.2529 -37.8511
119 119 254.42257 162.36128278 19.840207 503.72360 1 -55.6030 -27.4438
120 120 249.80225 132.00000000 50.000000 358.77755 1 -61.0000 -31.0000
124 124 231.00000 142.00000000 15.000000 389.98814 1 -51.0936 -31.2872
126 126 235.43596 148.97624749 10.953114 404.58779 1 -50.9919 -29.9444
127 127 234.99430 153.85461324 11.073286 388.21653 1 -51.7181 -29.9433
128 128 233.60352 152.60352054 8.912486 383.80058 1 -51.3247 -29.7000
129 129 244.81880 184.81879611 42.716919 1025.62246 1 -46.4197 -23.8900
131 131 213.06989 159.02007109 60.544541 617.99241 1 -46.6339 -23.5503
133 133 212.30438 154.80780343 67.000000 636.61384 1 -46.8800 -23.1803
137 137 223.21176 165.93543578 70.980998 654.28145 1 -46.9797 -22.7003
22 2 194.00099 73.00099051 26.001083 276.00325 0 -59.1797 -37.3630
310 3 205.99766 62.99766278 20.000000 65.00000 0 -66.6797 -40.7797
410 4 267.99982 119.00012978 90.000314 163.00107 0 -67.0130 -30.1547
66 6 218.00083 127.00051598 16.000000 423.99824 0 -52.6380 -31.4047
8 8 272.99900 256.99786347 80.002135 769.99245 0 -48.2213 -10.5714
910 9 258.99943 245.00000000 20.999083 908.00078 0 -75.9297 -1.0297
1010 10 280.00232 267.00116165 86.000000 651.00936 0 -65.6380 8.3036
1110 11 279.00000 174.00000000 87.000000 465.99622 0 -63.2213 -23.1130
121 12 249.00582 217.00581833 70.999999 704.98944 0 -55.4297 -15.3214
14 14 273.00147 251.00146645 83.000000 809.99861 0 -51.3463 -12.9880
151 15 246.00666 221.00665863 85.001131 906.97968 0 -49.9713 -15.6964
161 16 263.00137 249.00250902 50.000000 835.00547 0 -71.0130 -8.1964
171 17 224.99969 99.99969124 43.000000 335.99883 0 -61.1797 -34.1547
181 18 228.99874 203.99940734 80.999335 669.99256 0 -47.6380 -15.4880
191 19 268.99981 254.99946750 98.000347 827.99056 0 -38.3880 -3.9047
201 20 76.98821 -0.01070989 17.000000 132.00969 0 -67.3463 -54.6964
25 25 229.00100 147.99999952 18.000000 521.00148 0 -53.5963 -25.3630
261 26 264.00251 247.00271798 54.000207 966.99373 0 -55.5130 0.1786
271 27 187.01335 46.01069668 35.998369 53.99662 0 -69.4713 -38.4047
281 28 228.00046 213.99999953 73.000000 815.00304 0 -60.7213 -12.6130
291 29 268.00058 262.00000000 45.000000 940.99818 0 -63.5547 -5.3630
301 30 228.01359 218.01360884 54.000000 1191.84478 0 -73.0547 9.4286
311 31 267.99977 257.99977009 12.001378 956.99770 0 -73.5130 -2.5714
321 32 253.00035 243.00034548 70.000000 897.99648 0 -51.8047 -4.6547
331 33 259.00000 242.99977322 58.000000 746.00023 0 -71.0130 -10.3214
35 35 266.00115 234.00023820 129.998376 343.98683 0 -80.1380 -2.8214
361 36 239.00091 158.00203076 9.999796 490.99858 0 -53.3880 -27.0714
371 37 256.00107 223.00214169 82.000471 710.99743 0 -49.6797 -18.4880
381 38 264.99942 250.99884783 29.998848 1096.99310 0 -70.4297 1.0536
391 39 15.01118 -12.98915222 81.002712 506.99793 0 -75.9713 -12.2797
401 40 259.00048 148.99941053 29.000000 407.99775 0 -57.8047 -29.1130
411 41 271.00000 261.99999933 39.998855 874.99310 0 -62.3047 -4.3214
421 42 245.99772 232.99637349 74.000000 1106.00622 0 -54.6380 -9.0714
431 43 270.00000 254.99884138 37.001159 1035.03518 0 -58.9713 5.6786
441 44 210.99715 171.99887466 77.000001 800.00466 0 -47.0547 -19.8214
451 45 258.99980 247.00000000 67.000000 783.00282 0 -53.5130 -4.5714
461 46 290.00000 265.00000000 85.999424 911.00865 0 -68.5547 7.2203
471 47 278.00101 181.99999886 68.998878 276.00123 0 -61.2213 -23.4464
481 48 262.99508 220.99508159 72.002646 313.03257 0 -36.8463 -9.9047
491 49 268.00000 247.99999990 59.000000 800.00140 0 -67.8880 -10.6130
501 50 248.99900 185.99899706 47.000000 460.00236 0 -52.2213 -22.2797
511 51 263.00095 228.00095374 92.001329 343.99639 0 -39.7630 -8.0714
521 52 266.00199 258.00132718 41.000444 959.00026 0 -65.0130 3.1370
531 53 251.00102 214.00036761 23.000651 262.00170 0 -39.2630 -12.6130
55 55 258.00020 248.00020222 74.999546 910.00354 0 -51.3880 -4.7380
561 56 223.00509 177.00376930 87.000000 680.00092 0 -42.7630 -18.0714
571 57 232.00792 162.00508410 91.000000 439.99713 0 -64.1797 -20.3214
581 58 122.97838 92.97610938 109.995439 423.17753 0 -73.3047 -15.0714
591 59 219.00298 160.00429666 73.998683 707.97064 0 -45.2213 -22.9047
61 61 278.00059 265.99955351 73.000000 994.99690 0 -60.5547 1.3870
621 62 264.99965 251.99965249 68.999652 931.99988 0 -54.5547 -3.0297
631 63 262.00057 252.99885628 31.000000 784.00057 0 -71.3047 -5.6964
641 64 276.00000 188.99909010 43.999678 324.00000 0 -58.9297 -23.7797
67 67 140.99782 16.99759906 42.000450 52.99995 0 -69.7630 -44.2380
69 69 91.02975 5.03139201 43.000502 727.18200 0 -72.5547 -43.3630
701 70 238.99996 150.99995612 22.999681 546.00091 0 -53.9297 -25.7797
73 73 156.99829 31.99828703 24.000743 55.00000 0 -70.3880 -48.0297
741 74 231.00043 215.00021260 75.000213 952.00132 0 -59.7630 -12.7380
76 76 187.00000 58.99890366 25.001097 58.00110 0 -65.6797 -43.5714
771 77 267.99832 252.99831923 105.998772 684.01876 0 -38.5547 -4.1130
781 78 269.99889 131.00022074 82.999779 177.00133 0 -66.3880 -30.1964
80 80 262.00102 254.00102180 41.000000 892.99807 0 -67.9713 -5.3214
811 81 255.00000 142.00000000 36.000000 376.99893 0 -58.1797 -29.7380
83 83 193.00076 91.00030596 94.000000 539.00085 0 -72.3880 -36.2797
841 84 278.99976 263.99975911 50.000000 1121.00142 0 -56.5547 -1.0297
851 85 250.00067 227.00067291 82.000000 917.00517 0 -55.3880 -13.7380
861 86 260.00000 250.99977489 23.000000 824.99751 0 -69.7630 -3.6130
89 89 282.00038 256.00019033 99.000532 413.00270 0 -41.8880 -7.5297
901 90 270.00000 225.99941948 57.998288 982.01583 0 -65.1797 -15.3214
911 91 267.99899 253.99898537 46.000000 491.01211 0 -61.5963 7.4703
921 92 219.00059 87.99889395 39.000000 319.99779 0 -61.3047 -36.3214
94 94 225.99978 86.99977939 72.999470 271.99907 0 -66.1380 -33.5297
951 95 116.00396 4.00263725 30.998909 74.99845 0 -71.5130 -48.0714
961 96 208.00089 134.00089034 13.000000 504.99850 0 -52.0130 -28.4880
971 97 259.99964 235.00031380 74.000666 869.99228 0 -57.2213 -14.6547
981 98 231.00000 79.00000000 49.999780 166.99961 0 -66.0130 -37.1547
101 101 245.99606 236.99707761 29.998848 1009.00190 0 -65.4713 1.5536
103 103 259.99783 246.99885797 54.001024 1039.00717 0 -68.7213 -7.5714
1041 104 244.00153 195.00152628 76.999074 658.99242 0 -49.1797 -21.0297
106 106 263.99796 253.99796462 71.001017 1446.01159 0 -64.2213 6.1786
1071 107 173.99788 103.99765761 173.996402 14.00164 0 -69.5130 -19.6130
1081 108 240.00781 232.00780741 34.000000 791.98642 0 -63.4713 3.1786
111 111 216.99799 105.99696451 108.002055 399.00018 0 -71.5963 -34.3630
1121 112 264.00119 249.00118836 57.000594 916.99807 0 -54.0547 1.1370
1131 113 265.99668 250.99667928 40.000000 687.99853 0 -74.0130 -8.0297
115 115 83.00514 44.00536119 101.999774 432.98950 0 -70.2630 -15.8630
1161 116 238.00000 118.00000000 17.000590 353.99879 0 -57.5547 -33.2797
1171 117 251.00441 219.00441215 91.999012 314.99232 0 -40.7213 -8.3214
1181 118 271.00022 262.00021686 33.000000 883.99764 0 -62.8880 -3.1964
1191 119 262.00000 252.99942668 25.000000 766.99989 0 -70.9297 -5.0297
123 123 276.00021 263.99964610 71.000215 887.00057 0 -61.5130 1.9703
125 125 247.99556 205.99494311 57.999540 274.00241 0 -36.8047 -9.1130
1261 126 249.99882 231.99882490 71.999271 452.98961 0 -67.4297 10.0120
1271 127 278.00000 236.00000000 69.000000 578.00059 0 -56.9297 -16.5714
1281 128 288.00021 198.99989033 65.999786 306.99978 0 -60.7630 -22.2797
130 130 283.00046 259.00091135 91.999846 722.00153 0 -42.1797 -5.7380
132 132 191.01856 177.01912717 68.003215 1108.93325 0 -73.5547 11.0953
136 136 261.00000 246.00102657 59.000000 726.00457 0 -71.2213 -9.6130
the expression for the model structure:
model <- pres ~ bioclim_10 + I(bioclim_10^2) + bioclim_11 + I(bioclim_11^2) + bioclim_15 + I(bioclim_15^2) + bioclim_16 + I(bioclim_16^2)
the following expression for GLM:
GLM <- glm(model, family=binomial(link=logit), data=presausTrain)
the results will contain a lot of negative values for the projected suitability for the species:
projecaoSuitability <- predict(predictors, GLM)
plot(projecaoSuitability, main='Myocastor coypus')
[
For example, at the coorddinates point -41.55306 -12.39342, the model predicts:
pointXY = data.frame(-41.55306, -12.39342)
suitabAtPoint = extract(predictors,pointXY)
predictedSuitabilityAtPoint = predict(GLM, as.data.frame(suitabAtPoint))
-2.167515
I think that negative values is occouring because some mistake of mine, as it make non sense and my Random Forest, Maxent and Bioclim returns values ranging from 0 to 1.
Someone can help me, please?
You need to use the type = 'response' argument in your call of predict (the default is link). This will give you fitted probabilities rather than their natural logs.
I am experimenting pca with R. I have the following data:
V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
2454 0 168 290 45 1715 61 551 245 30 91
222 188 94 105 60 3374 615 7 294 0 169
552 0 0 465 0 3040 0 0 771 0 0
2872 0 0 0 0 3380 0 289 0 0 0
2938 0 56 56 0 2039 538 311 113 0 254
2849 0 0 332 0 2548 0 332 0 0 221
3102 0 0 0 0 2690 0 0 0 807 807
3134 0 0 0 0 2897 289 144 144 144 0
558 0 0 0 0 3453 0 0 0 0 0
2893 0 262 175 0 2452 350 1138 262 87 175
552 0 0 351 0 3114 0 0 678 0 0
2874 0 109 54 0 2565 272 1037 109 0 0
1396 0 0 407 0 1730 0 0 305 0 0
2866 0 71 179 0 2403 358 753 35 107 143
449 0 0 0 0 2825 0 0 0 0 0
2888 0 0 523 0 2615 104 627 209 0 0
2537 0 57 0 0 1854 0 0 463 0 0
2873 0 0 342 0 3196 0 114 0 0 114
720 0 0 365 4 2704 0 4 643 4 0
218 125 31 94 219 2479 722 0 219 0 94
to which I apply the following code:
fit <- prcomp(data)
ev <- fit$rotation # pc loadings
In order to make some tests, I tried to see the data matrix I retrieve when I do keep all the components I can keep:
numberComponentsKept = 10
featureVector = ev[,1:numberComponentsKept]
newData <- as.matrix(data)%*%as.matrix(featureVector)
The newData matrix should be the same as the original one, but instead, I get a very different result:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
2454 1424.447 867.5986 514.0592 -155.4783720 -574.7425 85.38724 -86.71887 90.872507 4.305168 92.08284
222 3139.681 1020.4150 376.3165 471.8718398 -796.9549 142.14301 -119.86945 32.919950 -31.269467 32.55846
552 2851.544 539.6075 883.3969 -93.3579153 -908.6689 68.34030 -40.97052 -13.856931 23.133566 89.00851
2872 3111.317 1210.0187 433.0382 -144.4065362 -381.2305 -20.08927 -49.03447 9.569258 44.201571 70.13113
2938 1788.334 945.8162 189.6526 308.7703509 -593.5577 124.88484 -109.67276 -115.127348 14.170615 99.19492
2849 2291.839 978.1819 374.7567 -243.6739292 -496.8707 287.01065 -126.22501 -18.747873 54.080763 62.80605
3102 2530.989 814.7548 -510.5978 -410.6295894 -1015.3228 46.85727 -21.20662 14.696831 23.687923 72.37691
3134 2679.430 970.1323 311.8627 124.2884480 -536.4490 -26.23858 83.86768 -17.808390 -28.802387 92.09583
558 3268.599 988.2515 353.6538 -82.9155988 -342.5729 12.96219 -60.94886 18.537087 7.291126 96.14917
2893 1921.761 1664.0084 631.0800 -55.6321469 -864.9628 -28.11045 -104.78931 37.797727 -12.078535 104.88374
552 2927.108 607.6489 799.9602 -79.5494412 -827.6994 14.14625 -50.12209 -14.020936 29.996639 86.72887
2874 2084.285 1636.7999 621.6383 -49.2934502 -577.4815 -67.27198 -11.06071 -7.167577 47.395309 51.02962
1396 1618.171 337.4320 488.2717 -100.1663625 -469.8857 212.37199 -1.19409 13.531485 -23.332701 64.58806
2866 2007.261 1387.6890 395.1586 0.8640971 -636.1243 133.41074 12.34794 -26.969634 5.506828 74.13767
449 2674.136 808.5174 289.3345 -67.8356695 -280.2689 10.60475 -49.86404 15.165731 5.965083 78.66244
2888 2254.171 1162.4988 749.7230 -206.0215007 -652.2364 302.36320 40.76341 -1.079259 17.635956 57.86999
2537 1747.098 371.8884 429.1309 9.3761544 -480.7130 -196.25019 -81.31580 2.819608 24.089379 56.91885
2873 2973.872 974.3854 433.7282 -197.0601947 -478.3647 301.96576 -81.81105 14.516646 -1.191972 100.79057
720 2537.535 504.4124 744.5909 -78.1162036 -771.1396 38.17725 -36.61446 -9.079443 25.488688 78.21597
218 2292.718 800.5257 260.6641 603.3295960 -641.9296 187.38913 11.71382 70.011487 78.047216 96.10967
What did I do wrong?
I think the problem is rather a PCA problem than an R problem. You multiply the original data with the rotation matrix and you wonder then why newData!=data. This would be only the case if the rotation matrix would be the identity matrix.
What you probably were planning to do is the following:
# Run PCA:
fit <- prcomp(USArrests)
ev <- fit$rotation # pc loadings
# Reversed PCA:
head(fit$x%*% t(as.matrix(ev)))
# Centered Original data:
head(t(apply(USArrests,1,'-',colMeans(USArrests))))
In the last step you have to center the data, because the function prcomp centers them by default.
Hi am using a matrix of gene expression, frag counts to calculate differentially expressed genes. I would like to know how to remove the rows which have values as 0. Then my data set will be compact and less spurious results will be given for the downstream analysis I do using this matrix.
Input
gene ZPT.1 ZPT.0 ZPT.2 ZPT.3 PDGT.1 PDGT.0
XLOC_000001 3516 626 1277 770 4309 9030
XLOC_000002 342 82 185 72 835 1095
XLOC_000003 2000 361 867 438 454 687
XLOC_000004 143 30 67 37 90 236
XLOC_000005 0 0 0 0 0 0
XLOC_000006 0 0 0 0 0 0
XLOC_000007 0 0 0 0 1 3
XLOC_000008 0 0 0 0 0 0
XLOC_000009 0 0 0 0 0 0
XLOC_000010 7 1 5 3 0 1
XLOC_000011 63 10 19 15 92 228
Desired output
gene ZPT.1 ZPT.0 ZPT.2 ZPT.3 PDGT.1 PDGT.0
XLOC_000001 3516 626 1277 770 4309 9030
XLOC_000002 342 82 185 72 835 1095
XLOC_000003 2000 361 867 438 454 687
XLOC_000004 143 30 67 37 90 236
XLOC_000007 0 0 0 0 1 3
XLOC_000010 7 1 5 3 0 1
XLOC_000011 63 10 19 15 92 228
As of now I only want to remove those rows where all the frag count columns are 0 if in any row some values are 0 and others are non zero I would like to keep that row intact as you can see my example above.
Please let me know how to do this.
df[apply(df[,-1], 1, function(x) !all(x==0)),]
A lot of options to do this within the tidyverse have been posted here: How to remove rows where all columns are zero using dplyr pipe
my preferred option is using rowwise()
library(tidyverse)
df <- df %>%
rowwise() %>%
filter(sum(c(col1,col2,col3)) != 0)