A sample of my data is available here.
I am trying to calculate the growth rate (change in weight (wt) over time) for each squirrel.
When I have my data in wide format:
squirrel fieldBirthDate date1 date2 date3 date4 date5 date6 age1 age2 age3 age4 age5 age6 wt1 wt2 wt3 wt4 wt5 wt6 litterid
22922 2017-05-13 2017-05-14 2017-06-07 NA NA NA NA 1 25 NA NA NA NA 12 52.9 NA NA NA NA 7684
22976 2017-05-13 2017-05-16 2017-06-07 NA NA NA NA 3 25 NA NA NA NA 15.5 50.9 NA NA NA NA 7692
22926 2017-05-13 2017-05-16 2017-06-07 NA NA NA NA 0 25 NA NA NA NA 10.1 48 NA NA NA NA 7719
I am able to calculate growth rate with the following code:
library(dplyr)
#growth rate between weight 1 and weight 3, divided by age when weight 3 is recorded
growth <- growth %>%
mutate (g.rate=((wt3-wt1)/age3))
#growth rate between weight 1 and weight 2, divided by age when weight 2 is recorded
merge.growth <- merge.growth %>%
mutate (g.rate=((wt2-wt1)/age2))
However, when the data is in long format (a format needed for the analysis I am running afterwards):
squirrel litterid date age wt
22922 7684 2017-05-13 0 NA
22922 7684 2017-05-14 1 12
22922 7684 2017-06-07 25 52.9
22976 7692 2017-05-13 1 NA
22976 7692 2017-05-16 3 15.5
22976 7692 2017-06-07 25 50.9
22926 7719 2017-05-14 0 10.1
22926 7719 2017-06-08 25 48
I cannot use the mutate function I used above. I am hoping to create a new column that includes growth rate as follows:
squirrel litterid date age wt g.rate
22922 7684 2017-05-13 0 NA NA
22922 7684 2017-05-14 1 12 NA
22922 7684 2017-06-07 25 52.9 1.704
22976 7692 2017-05-13 1 NA NA
22976 7692 2017-05-16 3 15.5 NA
22976 7692 2017-06-07 25 50.9 1.609
22926 7719 2017-05-14 0 10.1 NA
22926 7719 2017-06-08 25 48 1.516
22758 7736 2017-05-03 0 8.8 NA
22758 7736 2017-05-28 25 43 1.368
22758 7736 2017-07-05 63 126 1.860
22758 7736 2017-07-23 81 161 1.879
22758 7736 2017-07-26 84 171 1.930
I have been calculating the growth rates (growth between each wt and the first time it was weighed) in excel, however I would like to do the calculations in R instead since I have a large number of squirrels to work with. I suspect if else loops might be the way to go here, but I am not well versed in that sort of coding. Any suggestions or ideas are welcome!
You can use group_by to calculate this for each squirrel:
group_by(df, squirrel) %>%
mutate(g.rate = (wt - nth(wt, which.min(is.na(wt)))) /
(age - nth(age, which.min(is.na(wt)))))
That leaves NaNs where the age term is zero, but you can change those to NAs if you want with df$g.rate[is.nan(df$g.rate)] <- NA.
alternative using data.table and its function "shift" that takes the previous row
library(data.table)
df= data.table(df)
df[,"growth":=(wt-shift(wt,1))/age,by=.(squirrel)]
Related
I need to get a weather dataset ready as input to keras. I have 1096 entries over 3 years of daily data of which first month is missing. I got one of the columns filled in for temperature from a nearby weather station. However, to check which imputation fits best, I deleted these 30 values and kept all columns as NA for first month. Then, I tried various imputing packages including 1. Mice - gave continuous values but too high average; 2. KNN (VIM) gave a constant value too high 3.MissForest - gave constant value too high; 4. imputeTS_interpolation - gave continuous value slightly low; 5. imputeTS_seasonal - gave constant value slight low.
Therefore, I selected imputeTS_interpolation. And used this to now impute the remaining columns after filling the temperature column with actual values. However, I cannot seem to get the seasonality in imputeTS working.
Any idea why? Please find below the data file and code used:
Code:
# #MICE did not match with historical data too high avg of 10.2
# impute <- mice(gf, method = "pmm")
# print(impute)
# xyplot(impute, Temp ~ Reco | .imp, pch = 20, cex =1.4)
# mf <- complete(impute, 3)
# mf <- cbind(mf, Date = df$Date.Time)
# write.csv(mf, "Mice_imputed.csv", row.names=TRUE)
# View(mf)
#
# View(gf_impo)
# ##Using Miss Forest ;) too high contstant 9.3
# gf_impo <- missForest(gf, maxiter = 100, ntree = 500)
# gf_impo$ximp
# gf_impo <- cbind(gf_impo, Date = df$Date.Time)
# write.csv(gf_impo$ximp, "Val_MissForest_imputed.csv", row.names=TRUE)
# class(gf_impo)
##KNN using VIM too high constant 13
imp_knn <- kNN(gf, k = 500)
aggr(imp_knn, delimiter = "imp")
View(imp_knn)
imp_knn <- cbind(imp_knn, Date = df$Date.Time)
write.csv(imp_knn, "Val_KNN_imputed.csv", row.names=TRUE)
View(imp_seas)
#imputeTS
#for seasonal imputation
imp_seas <- gf
imp_seas <- cbind(imp_seas, Date = df$Date.Time)
View(imp_seas)
View(imp_TS_intn)
imp_TS_intn <- na_interpolation(imp_seas, option = "spline") #avg of 2.83 close to real 4.1
# imp_TS_seas <- na_seasplit(imp_seas, algorithm = "interpolation", find_frequency = FALSE, maxgap = Inf)
#const 2.7
write.csv(imp_TS_intn, "ML_impTS_interpolate.csv", row.names=TRUE)
DATA:
A B C D E Date
1 NA NA NA 5.4000000 NA 2018-01-01
2 NA NA NA 5.7500000 NA 2018-01-02
3 NA NA NA 6.8000000 NA 2018-01-03
4 NA NA NA 6.3500000 NA 2018-01-04
5 NA NA NA 3.3500000 NA 2018-01-05
6 NA NA NA 3.0500000 NA 2018-01-06
7 NA NA NA 2.2000000 NA 2018-01-07
8 NA NA NA 0.6500000 NA 2018-01-08
9 NA NA NA 2.8500000 NA 2018-01-09
10 NA NA NA 2.2000000 NA 2018-01-10
11 NA NA NA 2.3500000 NA 2018-01-11
12 NA NA NA 5.1000000 NA 2018-01-12
13 NA NA NA 6.5500000 NA 2018-01-13
14 NA NA NA 5.0000000 NA 2018-01-14
15 NA NA NA 5.7500000 NA 2018-01-15
16 NA NA NA 2.0000000 NA 2018-01-16
17 NA NA NA 5.0500000 NA 2018-01-17
18 NA NA NA 3.8500000 NA 2018-01-18
19 NA NA NA 2.4500000 NA 2018-01-19
20 NA NA NA 5.1500000 NA 2018-01-20
21 NA NA NA 6.7500000 NA 2018-01-21
22 NA NA NA 9.2500000 NA 2018-01-22
23 NA NA NA 9.5000000 NA 2018-01-23
24 NA NA NA 6.4500000 NA 2018-01-24
25 NA NA NA 5.4000000 NA 2018-01-25
26 NA NA NA 5.3500000 NA 2018-01-26
27 NA NA NA 6.5500000 NA 2018-01-27
28 NA NA NA 10.1000000 NA 2018-01-28
29 NA NA NA 6.6000000 NA 2018-01-29
30 NA NA NA 3.8500000 NA 2018-01-30
31 NA NA NA 2.9000000 NA 2018-01-31
32 0.05374951 0.041144312 0.0023696211 5.9902083 0.068784302 2018-02-01
33 0.07565470 0.012326176 0.0057481689 10.5280417 0.176209125 2018-02-02
34 0.04476314 0.113718139 0.0089845444 12.8125000 0.176408788 2018-02-03
35 0.01695546 0.060965133 -0.0034163682 16.9593750 0.000000000 2018-02-04
36 0.09910202 0.090170142 -0.0111946461 10.4867292 0.088337951 2018-02-05
37 0.08514839 0.026061013 -0.0029183210 7.1662500 0.085590326 2018-02-06
38 0.06724108 0.104761909 -0.0416036605 6.9130417 0.134828348 2018-02-07
39 0.07638534 0.097570813 -0.0192784571 3.3840000 0.029682717 2018-02-08
40 0.02568162 0.008244304 -0.0288903610 12.0282292 0.055817103 2018-02-09
41 0.02752688 0.088544666 -0.0172136911 6.8694792 0.098169954 2018-02-10
42 0.06643098 0.063321337 -0.0347752292 7.4539792 0.034110652 2018-02-11
43 0.09743445 0.057502178 0.0162851223 13.9365208 0.264168082 2018-02-12
44 0.09189575 0.034429904 0.0020940613 13.8687292 0.162341764 2018-02-13
45 0.07857244 0.009406862 0.0075904680 11.7800000 0.101283946 2018-02-14
46 0.01987263 0.024783795 -0.0088742973 4.4463750 0.063949011 2018-02-15
47 0.02332892 0.010138857 0.0091396448 5.6452292 0.034708981 2018-02-16
48 0.02022396 0.014207518 0.0036018714 14.2862500 0.043205299 2018-02-17
49 0.07043020 0.075317793 0.0036760070 5.5940208 0.171898590 2018-02-18
50 0.02120779 0.010461857 -0.0277470177 13.6131250 0.061486533 2018-02-19
51 0.06405819 0.034185344 0.0173606568 7.0551042 0.052148976 2018-02-20
52 0.09428869 0.026957653 0.0016863903 6.7955000 0.085888435 2018-02-21
53 0.04248937 0.048782786 0.0004039921 17.5706250 0.000000000 2018-02-22
54 0.02076763 0.038094949 -0.0003671638 14.8379167 0.000000000 2018-02-23
55 0.01343260 0.118003726 -0.0214988345 6.4564583 0.053353606 2018-02-24
56 0.05231647 0.054454132 -0.0098012290 7.8568333 0.183326943 2018-02-25
57 0.02476706 0.087501472 0.0031839472 15.7493750 0.210616272 2018-02-26
58 0.07358998 0.023558218 0.0031618607 10.8001250 0.241602571 2018-02-27
59 0.02042573 0.009268439 0.0088051496 7.2967500 0.251608940 2018-02-28
60 0.02107772 0.083567750 -0.0037223644 6.2674375 0.062221630 2018-03-01
61 0.05830801 0.029456683 0.0114978078 13.0810417 0.193765948 2018-03-02
62 0.02923587 0.070533843 0.0068299668 14.4095833 0.244310193 2018-03-03
63 0.02570283 0.058270093 0.0137174366 3.8527917 0.120846709 2018-03-04
64 0.01434395 0.014637405 0.0051951050 9.6877083 0.112579011 2018-03-05
65 0.06426214 0.078872579 0.0068664343 4.6763750 0.000000000 2018-03-06
66 0.04782772 0.011762501 0.0086182870 12.7027083 0.129606106 2018-03-07
67 0.01809136 0.105398844 0.0231671305 10.8052083 0.017683908 2018-03-08
68 0.04427582 0.020397435 -0.0009758693 6.5983333 0.041148864 2018-03-09
69 0.05123687 0.115984361 -0.0372104856 6.5021250 0.180013174 2018-03-10
70 0.01913266 0.005981014 -0.0159701842 8.9844375 0.095262921 2018-03-11
71 0.04407234 0.009142247 -0.0031640496 7.7638333 0.000000000 2018-03-12
72 0.09108709 0.038174205 0.0005654564 5.3772083 0.044105747 2018-03-13
73 0.05488394 0.115153937 0.0192819858 8.9182917 0.039993864 2018-03-14
74 0.03726892 0.067983475 -0.0311367032 2.4423333 0.066108171 2018-03-15
75 0.05563102 0.003831231 -0.0011148743 10.7100000 0.217461791 2018-03-16
76 0.04922930 0.055446609 0.0075246331 5.0829375 0.149530704 2018-03-17
77 0.02972858 0.061966039 -0.0392014211 12.3645625 0.060670492 2018-03-18
78 0.02812688 0.018183092 0.0134514770 9.0172292 0.158435250 2018-03-19
79 0.03066101 0.007622504 -0.0249482114 6.2709792 0.118487919 2018-03-20
80 0.06801767 0.083261012 0.0133423296 13.3683333 0.196053774 2018-03-21
81 0.04178157 0.093600914 0.0116253865 10.0024167 0.020835522 2018-03-22
82 0.04725052 0.018187748 -0.0115718535 10.3528333 0.097352796 2018-03-23
83 0.02042339 0.081504844 -0.0380958738 17.2006250 0.010500742 2018-03-24
84 0.06674396 0.098739090 -0.0108474961 17.5437500 0.119415595 2018-03-25
85 0.07049507 0.016286614 -0.0007817195 16.8800000 0.060452087 2018-03-26
86 0.01244906 0.018100693 -0.0266155999 8.8651458 0.018144668 2018-03-27
87 0.05271711 0.015368632 -0.0477885811 7.2415417 0.092797451 2018-03-28
88 0.01610886 0.014919094 0.0023487944 7.7914792 0.062818728 2018-03-29
89 0.08847253 0.059397043 0.0130362880 10.9732708 0.087451484 2018-03-30
90 0.02938725 0.044473745 0.0091253257 6.0241458 0.025488946 2018-03-31
91 0.08599249 0.043160908 0.0082536160 8.8211875 0.012975783 2018-04-01
92 0.05747667 0.017709243 -0.0090965038 6.3249375 0.065731818 2018-04-02
93 0.05772051 0.085210524 -0.0013533831 13.4166667 0.067790160 2018-04-03
94 0.01699834 0.020657341 0.0039885065 3.2999792 0.076302652 2018-04-04
95 0.03565076 0.110372607 -0.0313309140 12.7822083 0.184844707 2018-04-05
96 0.02050401 0.078943608 -0.0062322339 4.3233125 0.067820413 2018-04-06
97 0.06186790 0.013147512 0.0203249289 6.3953750 0.034104318 2018-04-07
98 0.06304988 0.012997642 0.0061171825 9.7322708 0.021220516 2018-04-08
99 0.03799006 0.012420760 0.0054724563 8.8472083 0.068664033 2018-04-09
100 0.01610225 0.061182804 0.0031002885 7.5622708 0.085766429 2018-04-10
101 0.05937683 0.008333173 -0.0053972689 7.8848542 0.058386726 2018-04-11
102 0.02190115 0.037843227 0.0089823372 8.3339792 0.055761391 2018-04-12
103 0.01179665 0.016899394 -0.0016533437 5.5101667 0.099133313 2018-04-13
104 0.02464707 0.021231270 -0.0212016846 15.5106250 0.126661378 2018-04-14
105 0.01906818 0.065273389 0.0081694393 7.6616667 0.032939519 2018-04-15
106 0.05418785 0.074619385 -0.0355680586 11.3618750 0.057768261 2018-04-16
107 0.06508988 0.014345229 0.0080423912 14.7137500 0.032709791 2018-04-17
108 0.06101126 0.060624597 -0.0399526978 17.2754167 0.230982139 2018-04-18
109 0.02226268 0.010230837 0.0001617419 2.9382083 0.000000000 2018-04-19
110 0.03884772 0.014218453 0.0039652960 10.7261875 0.179962834 2018-04-20
111 0.09054488 0.025711098 -0.0115944362 4.4734583 0.011442318 2018-04-21
112 0.03072171 0.076530730 0.0032123501 9.4128750 0.033174489 2018-04-22
113 0.04361276 0.101151670 0.0249408843 14.5804167 0.024238883 2018-04-23
114 0.03877568 0.049142846 0.0080689866 8.3168750 0.084570611 2018-04-24
115 0.05564027 0.076917047 0.0033447160 15.7308333 0.199762524 2018-04-25
116 0.04752544 0.019655228 -0.0063218138 15.7302083 0.020449908 2018-04-26
117 0.01718916 0.026132806 -0.0261027525 10.0887500 0.128898351 2018-04-27
118 0.04144832 0.034526516 0.0117868820 6.0784375 0.014449565 2018-04-28
119 0.03255833 0.113650910 -0.0123724759 11.8654167 0.085410171 2018-04-29
120 0.03656535 0.043333607 0.0230071368 7.0974167 0.035725321 2018-04-30
121 0.04570760 0.093595938 -0.0329915968 5.4016458 0.013467946 2018-05-01
122 0.07271528 0.061923504 0.0130002656 9.1602292 0.018299062 2018-05-02
123 0.02646133 0.007506529 -0.0276898846 0.2338125 0.246100834 2018-05-03
124 0.02379895 0.067273612 0.0112587565 19.1260417 0.120707266 2018-05-04
125 0.05925152 0.075768053 0.0050178925 16.2114583 0.162884739 2018-05-05
126 0.01858152 0.040845398 0.0164467420 12.9156250 0.028823967 2018-05-06
127 0.06994835 0.059457560 -0.0181926787 7.7316042 0.035106399 2018-05-07
128 0.05926409 0.038623605 0.0167222227 13.5464583 0.055665220 2018-05-08
129 0.03104010 0.006805893 -0.0141792029 14.5006250 0.012099383 2018-05-09
130 0.06631012 0.059314975 -0.0228020931 13.3711875 0.073114370 2018-05-10
131 0.03794480 0.015615642 0.0034917459 16.6675208 0.191141576 2018-05-11
132 0.03532917 0.050988581 0.0079455282 14.7375208 0.214172062 2018-05-12
133 0.08512617 0.063322454 0.0224309652 11.6861250 0.166425889 2018-05-13
134 0.04498265 0.012386160 -0.0051629339 7.2488333 0.280120908 2018-05-14
135 0.06383512 0.126840241 -0.0172296864 17.3852083 0.020363429 2018-05-15
136 0.06932861 0.026819550 -0.0109061610 20.9152083 0.099516538 2018-05-16
137 0.04020292 0.021831228 -0.0007211804 6.7122292 0.069831669 2018-05-17
138 0.02037474 0.020931810 0.0088341962 15.8758333 0.130548701 2018-05-18
139 0.01704143 0.105810563 -0.0243003529 10.7339583 0.038013440 2018-05-19
140 0.01266417 0.013985439 0.0091359503 6.5119375 0.196746897 2018-05-20
141 0.03623625 0.057182212 -0.0136101306 18.6637500 0.009431062 2018-05-21
142 0.03938695 0.054879146 0.0091277482 15.5393750 0.115389187 2018-05-22
143 0.05995812 0.061925644 -0.0029137774 11.8191667 0.015729774 2018-05-23
144 0.06548692 0.095240991 0.0055356839 4.3011875 0.081309326 2018-05-24
145 0.01582489 0.015264434 -0.0020079231 9.3315833 0.105132636 2018-05-25
146 0.06834050 0.028756388 -0.0512068435 13.6035417 0.212930829 2018-05-26
147 0.08354736 0.023524928 0.0041989465 4.5111250 0.227197329 2018-05-27
148 0.05738595 0.011159952 -0.0225834032 12.9385417 0.090503870 2018-05-28
149 0.07817132 0.103507587 -0.0222426051 13.4047292 0.034928812 2018-05-29
150 0.04773356 0.035856991 -0.0191600449 9.6657708 0.019893986 2018-05-30
Disclaimer: I am looking for a co-author for help in validating my work with keras / tensor flow
Maybe you can try the following things:
Using Seasonal Decomposition na_seadec() instead of seasonal split
Manually setting seasonality
Setting find frequency = TRUE
You can set the seasonality manually with the following actions:
Suppose you have a vector x and you have monthly values - this would mean freq = 12
# Create time series with seasonality information
x <-c(2,3,4,5,6,6,6,6,6,4,4,4,4,5,6,4,3,3,5,6,4,3,5,3,5,3,5,3,4,4,4,4,4,4,2,4,2,2,4,5,6,7,8,9,0,0,5,2,4)
x_with_freq <- ts(x, frequency = 12)
# imputation for the series
na_seadec(x_with_freq)
If you have daily values your frequency should be 365.
Other than this, you could also try to run na_seadec(x, find_frequency = T) then imputeTS tries to automatically find the seasonality for you.
But in the end, I don't know your data, could very well be, that the seasonal patterns aren't too strong.
I have a quarterly time series. I am trying to apply a function which is supposed calculate the year-to-year growth and year-to-year difference and multiply a variable by (-1).
I already used a similar function for calculating quarter-to-quarter changes and it worked.
I modified this function for yoy changes and it does not have any effect on my data frame. And any error popped up.
Do you have any suggestion how to modify the function or how to accomplish to apply the yoy change function on a time series?
Here is the code:
Date <- c("2004-01-01","2004-04-01", "2004-07-01","2004-10-01","2005-01-01","2005-04-01","2005-07-01","2005-10-01","2006-01-01","2006-04-01","2006-07-01","2006-10-01","2007-01-01","2007-04-01","2007-07-01","2007-10-01")
B1 <- c(3189.30,3482.05,3792.03,4128.66,4443.62,4876.54,5393.01,5885.01,6360.00,6930.00,7430.00,7901.00,8279.00,8867.00,9439.00,10101.00)
B2 <- c(7939.97,7950.58,7834.06,7746.23,7760.59,8209.00,8583.05,8930.74,9424.00,9992.00,10041.00,10900.00,11149.00,12022.00,12662.00,13470.00)
B3 <- as.numeric(c("","","","",140.20,140.30,147.30,151.20,159.60,165.60,173.20,177.30,185.30,199.30,217.10,234.90))
B4 <- as.numeric(c("","","","",-3.50,-14.60,-11.60,-10.20,-3.10,-16.00,-4.90,-17.60,-5.30,-10.90,-12.80,-8.40))
df <- data.frame(Date,B1,B2,B3,B4)
The code will produce following data frame:
Date B1 B2 B3 B4
1 2004-01-01 3189.30 7939.97 NA NA
2 2004-04-01 3482.05 7950.58 NA NA
3 2004-07-01 3792.03 7834.06 NA NA
4 2004-10-01 4128.66 7746.23 NA NA
5 2005-01-01 4443.62 7760.59 140.2 -3.5
6 2005-04-01 4876.54 8209.00 140.3 -14.6
7 2005-07-01 5393.01 8583.05 147.3 -11.6
8 2005-10-01 5885.01 8930.74 151.2 -10.2
9 2006-01-01 6360.00 9424.00 159.6 -3.1
10 2006-04-01 6930.00 9992.00 165.6 -16.0
11 2006-07-01 7430.00 10041.00 173.2 -4.9
12 2006-10-01 7901.00 10900.00 177.3 -17.6
13 2007-01-01 8279.00 11149.00 185.3 -5.3
14 2007-04-01 8867.00 12022.00 199.3 -10.9
15 2007-07-01 9439.00 12662.00 217.1 -12.8
16 2007-10-01 10101.00 13470.00 234.9 -8.4
And I want to apply following changes on the variables:
# yoy absolute difference change
abs.diff = c("B1","B2")
# yoy percentage change
percent.change = c("B3")
# make the variable negative
negative = c("B4")
This is the fuction that I am trying to use for my data frame.
transformation = function(D,abs.diff,percent.change,negative)
{
TT <- dim(D)[1]
DData <- D[-1,]
nms <- c()
for (i in c(2:dim(D)[2])) {
# yoy absolute difference change
if (names(D)[i] %in% abs.diff)
{ DData[,i] = (D[5:TT,i]-D[1:(TT-4),i])
names(DData)[i] = paste('a',names(D)[i],sep='') }
# yoy percent. change
if (names(D)[i] %in% percent.change)
{ DData[,i] = 100*(D[5:TT,i]-D[1:(TT-4),i])/D[1:(TT-4),i]
names(DData)[i] = paste('p',names(D)[i],sep='') }
#CA.deficit
if (names(D)[i] %in% negative)
{ DData[,i] = (-1)*D[1:TT,i] }
}
return(DData)
}
This is what I would like to get :
Date pB1 pB2 aB3 B4
1 2004-01-01 NA NA NA NA
2 2004-04-01 NA NA NA NA
3 2004-07-01 NA NA NA NA
4 2004-10-01 NA NA NA NA
5 2005-01-01 39.33 -2.26 NA 3.5
6 2005-04-01 40.05 3.25 NA 14.6
7 2005-07-01 42.22 9.56 NA 11.6
8 2005-10-01 42.54 15.29 11.0 10.2
9 2006-01-01 43.13 21.43 19.3 3.1
10 2006-04-01 42.11 21.72 18.3 16.0
11 2006-07-01 37.77 16.99 22.0 4.9
12 2006-10-01 34.26 22.05 17.7 17.6
13 2007-01-01 30.17 18.3 19.7 5.3
14 2007-04-01 27.95 20.32 26.1 10.9
15 2007-07-01 27.04 26.1 39.8 12.8
16 2007-10-01 27.84 23.58 49.6 8.4
Grouping by the months, i.e. 6th and 7th substring using ave and do the necessary calculations. With sapply we may loop over the columns.
f <- function(x) {
g <- substr(Date, 6, 7)
l <- length(unique(g))
o <- ave(x, g, FUN=function(x) 100/x * c(x[-1], NA) - 100)
c(rep(NA, l), head(o, -4))
}
cbind(df[1], sapply(df[-1], f))
# Date B1 B2 B3 B4
# 1 2004-01-01 NA NA NA NA
# 2 2004-04-01 NA NA NA NA
# 3 2004-07-01 NA NA NA NA
# 4 2004-10-01 NA NA NA NA
# 5 2005-01-01 39.32901 -2.259202 NA NA
# 6 2005-04-01 40.04796 3.250329 NA NA
# 7 2005-07-01 42.21960 9.560688 NA NA
# 8 2005-10-01 42.54044 15.291439 NA NA
# 9 2006-01-01 43.12655 21.434066 13.83738 -11.428571
# 10 2006-04-01 42.10895 21.720063 18.03279 9.589041
# 11 2006-07-01 37.77093 16.986386 17.58316 -57.758621
# 12 2006-10-01 34.25636 22.050356 17.26190 72.549020
# 13 2007-01-01 30.17296 18.304329 16.10276 70.967742
# 14 2007-04-01 27.95094 20.316253 20.35024 -31.875000
# 15 2007-07-01 27.03903 26.102978 25.34642 161.224490
# 16 2007-10-01 27.84458 23.577982 32.48731 -52.272727
I am new to the world of Statistics, So some simple suggestions will be acknowledged ...
I have a data frame in R
Ganeeshan
Year General OBC SC ST VI VacancySC VacancyGen VacancyOBC Banks Participated VacancyST VacancyHI
1 2016 52.5 52.5 41.75 31.50 37.5 1338 4500 2319 20 665 154
2 2015 76.0 76.0 50.00 47.75 36.0 1965 6146 3454 23 1050 270
3 2014 82.0 80.0 70.00 56.00 38.0 2496 8212 4482 23 1531 458
4 2013 61.0 60.0 50.00 26.00 27.0 3208 10846 5799 21 1827 458
5 2012 135.0 135.0 127.00 106.00 127.0 3409 11058 6062 21 1886 436
VacancyOC VacancyVI
1 113 102
2 358 242
3 323 321
4 208 390
5 257 345
and want to built a linear Model taking dependent variable as "General", I used the following command
GaneeshanModel1 <- lm(General ~ ., data = Ganeeshan)
I get " NA " instead of values in summary of model
Call:
lm(formula = General ~ ., data = Ganeeshan)
Residuals:
ALL 5 residuals are 0: no residual degrees of freedom!
Coefficients: (9 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6566.6562 NA NA NA
Year -3.2497 NA NA NA
OBC 0.5175 NA NA NA
SC -0.2167 NA NA NA
ST 0.6078 NA NA NA
VI NA NA NA NA
VacancySC NA NA NA NA
VacancyGen NA NA NA NA
VacancyOBC NA NA NA NA
`Banks Participated` NA NA NA NA
VacancyST NA NA NA NA
VacancyHI NA NA NA NA
VacancyOC NA NA NA NA
VacancyVI NA NA NA NA
why I am not getting any data here
This can happen if you don't do data preprocessing correctly first. It seems that your 'Bank' column is empty (NaN) and you should think about what to do with it (I am not sure if this is the whole file or there are other non-empty values inside your 'Bank' column). In general, before starting to use your data, you need to replace the NaN (empty) values in your columns with some numerical values (usually it is mean or median value of a column). In R, for your column 'Banks' (in case it has other non-empty values) for example you can do it like this:
dataset$Banks = ifelse(is.na(dataset$Banks),
ave(dataset$Banks, FUN = function(x) mean(x, na.rm = TRUE)),
dataset$Banks)
Otherwise, depending on your data set, if some of your values are represented by a period (or any other non number value) you can import your csv as
dataset = read.csv("data.csv", header = TRUE, c(" ", ".", "NA"))
to change 'period' and 'empty' values to NaN (NA) and after that use the line above to replace the NA (NaN) with mean/median/something else.
I have the following data.table:
Month Day Lat Long Temperature
1: 10 01 80.0 180 -6.383330333333309
2: 10 01 77.5 180 -6.193327999999976
3: 10 01 75.0 180 -6.263328333333312
4: 10 01 72.5 180 -5.759997333333306
5: 10 01 70.0 180 -4.838330999999976
---
117020: 12 31 32.5 310 11.840003833333355
117021: 12 31 30.0 310 13.065001833333357
117022: 12 31 27.5 310 14.685003333333356
117023: 12 31 25.0 310 15.946669666666690
117024: 12 31 22.5 310 16.578336333333358
For every location (given by Lat and Long), I have a temperature for each day from 1 October to 31 December.
There are 1,272 locations consisting of each pairwise combination of Lat:
Lat
1 80.0
2 77.5
3 75.0
4 72.5
5 70.0
--------
21 30.0
22 27.5
23 25.0
24 22.5
and Long:
Long
1 180.0
2 182.5
3 185.0
4 187.5
5 190.0
---------
49 300.0
50 302.5
51 305.0
52 307.5
53 310.0
I'm trying to create a data.table that consists of 1,272 rows (one per location) and 92 columns (one per day). Each element of that data.table will then contain the temperature at that location on that day.
Any advice about how to accomplish that goal without using a for loop?
Here we use ChickWeights as the data, where we use "Chick-Diet" as the equivalent of your "lat-lon", and "Time" as your "Date":
dcast.data.table(data.table(ChickWeight), Chick + Diet ~ Time)
Produces:
Chick Diet 0 2 4 6 8 10 12 14 16 18 20 21
1: 18 1 1 1 NA NA NA NA NA NA NA NA NA NA
2: 16 1 1 1 1 1 1 1 1 NA NA NA NA NA
3: 15 1 1 1 1 1 1 1 1 1 NA NA NA NA
4: 13 1 1 1 1 1 1 1 1 1 1 1 1 1
5: ... 46 rows omitted
You will likely need to lat + lon ~ Month + Day or some such for your formula.
In the future, please make your question reproducible as I did here by using a built-in data set.
First create a date value using the lubridate package (I assumed year = 2014, adjust as necessary):
library(lubridate)
df$datetext <- paste(df$Month,df$Day,"2014",sep="-")
df$date <- mdy(df$datetext)
Then one option is to use the tidyr package to spread the columns:
library(tidyr)
spread(df[,-c(1:2,6)],date,Temperature)
Lat Long 2014-10-01 2014-12-31
1 22.5 310 NA 16.57834
2 25.0 310 NA 15.94667
3 27.5 310 NA 14.68500
4 30.0 310 NA 13.06500
5 32.5 310 NA 11.84000
6 70.0 180 -4.838331 NA
7 72.5 180 -5.759997 NA
8 75.0 180 -6.263328 NA
9 77.5 180 -6.193328 NA
10 80.0 180 -6.383330 NA
I have ordered a set of rows to get this:
2 1983 TRI-COUNTY TRAUTH 0.1495 0.1395 NA 452 0.0764 4 0 06/02/83
4 1983 TRI-COUNTY TRAUTH 0.1193 0.1113 NA 32 0.0764 4 2 07/20/83
14 1983 TRI-COUNTY TRAUTH 0.1064 0.1064 NA 26 0.0763 6 2 08/03/83
17 1983 TRI-COUNTY TRAUTH 0.1110 0.1010 0.1010 176 0.0763 7 4 08/08/83
24 1983 TRI-COUNTY TRAUTH 0.1293 0.1215 NA 452 0.0763 4 0 09/12/83
41 1984 TRI-COUNTY TRAUTH 0.1325 0.1225 NA 452 0.0740 4 0 06/20/84
45 1984 TRI-COUNTY TRAUTH 0.1425 0.1325 NA 32 0.0741 4 2 07/17/84
47 1984 TRI-COUNTY TRAUTH 0.1395 0.1395 0.1250 91 0.0741 14 11 07/16/84
But I want to renumber these such that its 1,2,3,4,etc...
Can someone please help?
Are you just looking for something like this?:
row.names(datasetname) <- 1:nrow(datasetname)
Alternatively, if the first column in your example data is a variable (say V1) in a dataframe and not the row.names, this will work:
datasetname$V1 <- 1:nrow(datasetname)
This is the easiest way to do it:
rownames(dataset) = NULL
Another solution, normally used when binding rows:
dataset <- rbind( dataset , make.rows.names=FALSE )