How can I split a multiply imputed dataset created in Amelia?

How can I split a multiply imputed dataset created in Amelia? - r

I have imputed missing values using Amelia thereby creating 5 multiply imputed datasets. Now, I would like to split this multi-dataset, e.g. one set for year => 1990 and one set for year =<1990. Any ideas how I can do so? Many thanks!
data(freetrade)
freetrade$year #splitting variable
#Imputation of missing data
a.out <- amelia(freetrade, m=5, ts="year", cs="country")
#split of created dataset?

Amelia returns an object that contains a list of dataframes (for each imputations). You can see the structure of this object with str().
> library(Amelia)
> data(freetrade)
>
> a.out <- amelia(freetrade, m=5, ts="year", cs="country")
-- Imputation 1 --
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-- Imputation 2 --
1 2 3 4 5 6 7 8 9 10 11 12 13
-- Imputation 3 --
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
-- Imputation 4 --
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-- Imputation 5 --
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> str(a.out)
List of 12
$ imputations:List of 5
..$ imp1:'data.frame': 171 obs. of 10 variables:
.. ..$ year : int [1:171] 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 ...
.. ..$ country : chr [1:171] "SriLanka" "SriLanka" "SriLanka" "SriLanka" ...
.. ..$ tariff : num [1:171] 30.6 22.4 41.3 26.8 31 ...
.. ..$ polity : num [1:171] 6 5 5 5 5 5 5 5 5 5 ...
.. ..$ pop : num [1:171] 14988000 15189000 15417000 15599000 15837000 ...
.. ..$ gdp.pc : num [1:171] 461 474 489 508 526 ...
.. ..$ intresmi: num [1:171] 1.94 1.96 1.66 2.8 2.26 ...
.. ..$ signed : num [1:171] 0 0 1 0 0 0 0 1 0 0 ...
.. ..$ fiveop : num [1:171] 12.4 12.5 12.3 12.3 12.3 ...
.. ..$ usheg : num [1:171] 0.259 0.256 0.266 0.299 0.295 ...
..$ imp2:'data.frame': 171 obs. of 10 variables:
.. ..$ year : int [1:171] 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 ...
.. ..$ country : chr [1:171] "SriLanka" "SriLanka" "SriLanka" "SriLanka" ...
.. ..$ tariff : num [1:171] 33.6 59.7 41.3 18.2 31 ...
.. ..$ polity : num [1:171] 6 5 5 5 5 5 5 5 5 5 ...
.. ..$ pop : num [1:171] 14988000 15189000 15417000 15599000 15837000 ...
.. ..$ gdp.pc : num [1:171] 461 474 489 508 526 ...
.. ..$ intresmi: num [1:171] 1.94 1.96 1.66 2.8 2.26 ...
.. ..$ signed : num [1:171] 0 0 1 0 0 0 0 1 0 0 ...
.. ..$ fiveop : num [1:171] 12.4 12.5 12.3 12.3 12.3 ...
.. ..$ usheg : num [1:171] 0.259 0.256 0.266 0.299 0.295 ...
..$ imp3:'data.frame': 171 obs. of 10 variables:
.. ..$ year : int [1:171] 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 ...
.. ..$ country : chr [1:171] "SriLanka" "SriLanka" "SriLanka" "SriLanka" ...
.. ..$ tariff : num [1:171] 48.5 32.9 41.3 47.2 31 ...
.. ..$ polity : num [1:171] 6 5 5 5 5 5 5 5 5 5 ...
.. ..$ pop : num [1:171] 14988000 15189000 15417000 15599000 15837000 ...
.. ..$ gdp.pc : num [1:171] 461 474 489 508 526 ...
.. ..$ intresmi: num [1:171] 1.94 1.96 1.66 2.8 2.26 ...
.. ..$ signed : num [1:171] 0 0 1 0 0 0 0 1 0 0 ...
.. ..$ fiveop : num [1:171] 12.4 12.5 12.3 12.3 12.3 ...
.. ..$ usheg : num [1:171] 0.259 0.256 0.266 0.299 0.295 ...
..$ imp4:'data.frame': 171 obs. of 10 variables:
.. ..$ year : int [1:171] 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 ...
.. ..$ country : chr [1:171] "SriLanka" "SriLanka" "SriLanka" "SriLanka" ...
.. ..$ tariff : num [1:171] 18.4 45.5 41.3 16.9 31 ...
.. ..$ polity : num [1:171] 6 5 5 5 5 5 5 5 5 5 ...
.. ..$ pop : num [1:171] 14988000 15189000 15417000 15599000 15837000 ...
.. ..$ gdp.pc : num [1:171] 461 474 489 508 526 ...
.. ..$ intresmi: num [1:171] 1.94 1.96 1.66 2.8 2.26 ...
.. ..$ signed : num [1:171] 0 0 1 0 0 0 0 1 0 0 ...
.. ..$ fiveop : num [1:171] 12.4 12.5 12.3 12.3 12.3 ...
.. ..$ usheg : num [1:171] 0.259 0.256 0.266 0.299 0.295 ...
..$ imp5:'data.frame': 171 obs. of 10 variables:
.. ..$ year : int [1:171] 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 ...
.. ..$ country : chr [1:171] "SriLanka" "SriLanka" "SriLanka" "SriLanka" ...
.. ..$ tariff : num [1:171] 15.3 44.4 41.3 40.1 31 ...
.. ..$ polity : num [1:171] 6 5 5 5 5 5 5 5 5 5 ...
.. ..$ pop : num [1:171] 14988000 15189000 15417000 15599000 15837000 ...
.. ..$ gdp.pc : num [1:171] 461 474 489 508 526 ...
.. ..$ intresmi: num [1:171] 1.94 1.96 1.66 2.8 2.26 ...
.. ..$ signed : num [1:171] 0 0 1 0 0 0 0 1 0 0 ...
.. ..$ fiveop : num [1:171] 12.4 12.5 12.3 12.3 12.3 ...
.. ..$ usheg : num [1:171] 0.259 0.256 0.266 0.299 0.295 ...
..- attr(*, "class")= chr [1:2] "mi" "list"
$ m : num 5
$ missMatrix : logi [1:171, 1:10] FALSE FALSE FALSE FALSE FALSE FALSE ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:10] "year" "country" "tariff" "polity" ...
$ overvalues : NULL
$ theta : num [1:9, 1:9, 1:5] -1 -0.08456 -0.03404 -0.00193 0.06483 ...
$ mu : num [1:8, 1:5] -0.08456 -0.03404 -0.00193 0.06483 -0.11178 ...
$ covMatrices: num [1:8, 1:8, 1:5] 0.7881 -0.1869 -0.0531 0.2121 -0.0819 ...
$ code : num 1
$ message : chr "Normal EM convergence."
$ iterHist :List of 5
..$ : num [1:15, 1:3] 44 34 25 28 26 25 24 22 20 14 ...
..$ : num [1:13, 1:3] 44 27 24 22 22 21 18 17 14 11 ...
..$ : num [1:19, 1:3] 44 34 29 27 26 26 25 24 23 21 ...
..$ : num [1:15, 1:3] 44 34 27 28 23 24 23 23 19 19 ...
..$ : num [1:20, 1:3] 44 32 30 27 24 23 23 23 23 21 ...
$ arguments :List of 22
..$ idvars : NULL
..$ logs : NULL
..$ ts : num 1
..$ cs : num 2
..$ empri : NULL
..$ tolerance : num 1e-04
..$ polytime : NULL
..$ splinetime : NULL
..$ lags : NULL
..$ leads : NULL
..$ intercs : logi FALSE
..$ sqrts : NULL
..$ lgstc : NULL
..$ noms : NULL
..$ ords : NULL
..$ priors : NULL
..$ autopri : num 0.05
..$ bounds : NULL
..$ max.resample: num 100
..$ startvals : num 0
..$ overimp : NULL
..$ emburn : num [1:2] 0 0
..- attr(*, "class")= chr [1:2] "ameliaArgs" "list"
$ orig.vars : chr [1:10] "year" "country" "tariff" "polity" ...
- attr(*, "class")= chr "amelia"
From here you can see that the the "imputations" element of your a.out object contains your data frames, so you can reference each of your imputations from there. For example a.out$imputations[[1]]$year will give you the years from your first imputation. If you like to do that across each imputation then you can do so using an apply function or loop. To illustrate this, consider:
> sapply(a.out$imputations,function(x) head(x$year))
imp1 imp2 imp3 imp4 imp5
[1,] 1981 1981 1981 1981 1981
[2,] 1982 1982 1982 1982 1982
[3,] 1983 1983 1983 1983 1983
[4,] 1984 1984 1984 1984 1984
[5,] 1985 1985 1985 1985 1985
[6,] 1986 1986 1986 1986 1986
EDIT: I just re-read your question and I saw that you're actually looking for something more specific. You can take what's above an apply it to make subsets of each each data frame doing something like lapply(a.out$imputations,function(x) x[x$year > 1990,]). I'm not sure how you would like to combine these imputed datasets (split by years great than/less than 1990), but if you just want to append all rows together rbind() will do the trick (if not let me know how you'd like to and I can probably recommend a solution):
> df1 <- do.call(rbind,lapply(a.out$imputations,function(x) x[x$year > 1990,]))
> df2 <- do.call(rbind,lapply(a.out$imputations,function(x) x[x$year < 1990,]))
> head(df1)
year country tariff polity pop gdp.pc intresmi signed fiveop usheg
imp1.11 1991 SriLanka 26.9000 5 17247000 597.6987 2.285213 1.000000 12.8 0.2589872
imp1.12 1992 SriLanka 25.0000 5 17405000 618.3329 2.877877 0.515665 13.1 0.2623017
imp1.13 1993 SriLanka 24.2000 5 17628420 652.6205 4.280361 0.000000 13.2 0.2812928
imp1.14 1994 SriLanka 26.0000 5 17865000 680.0408 4.389912 0.000000 13.2 0.2783585
imp1.15 1995 SriLanka 20.0000 5 18112000 707.6591 3.995919 0.000000 13.2 0.2627195
imp1.16 1996 SriLanka 20.5646 5 18300000 727.0039 3.676763 0.000000 13.2 0.2681700
> head(df2)
year country tariff polity pop gdp.pc intresmi signed fiveop usheg
imp1.1 1981 SriLanka 30.56693 6 14988000 461.0236 1.937347 0 12.4 0.2593112
imp1.2 1982 SriLanka 22.39382 5 15189000 473.7634 1.964430 0 12.5 0.2558008
imp1.3 1983 SriLanka 41.30000 5 15417000 489.2266 1.663936 1 12.3 0.2655022
imp1.4 1984 SriLanka 26.81580 5 15599000 508.1739 2.797462 0 12.3 0.2988009
imp1.5 1985 SriLanka 31.00000 5 15837000 525.5609 2.259116 0 12.3 0.2952431
imp1.6 1986 SriLanka 17.76314 5 16117000 538.9237 1.832549 0 12.5 0.2886563

Related

Loop through list programatically

I have a list in R that I want to loop through all the elements.
This is the structure of the object:
> str(AAPL.OPT[c])
List of 1
$ jun.12.2020:List of 2
..$ calls:'data.frame': 52 obs. of 7 variables:
.. ..$ Strike: num [1:52] 180 185 200 210 240 ...
.. ..$ Last : num [1:52] 123 118 131 120 85 ...
.. ..$ Chg : num [1:52] 0 0 7.61 9.48 0 ...
.. ..$ Bid : num [1:52] 149 144 129 119 89 ...
.. ..$ Ask : num [1:52] 153.3 148.5 133.5 123.7 93.5 ...
.. ..$ Vol : int [1:52] NA 15 16 2 1 1 3 36 1 2 ...
.. ..$ OI : int [1:52] 0 15 25 4 50 3 4 36 6 10 ...
..$ puts :'data.frame': 56 obs. of 7 variables:
.. ..$ Strike: num [1:56] 150 165 170 180 185 190 195 200 205 210 ...
.. ..$ Last : num [1:56] 0.05 0.02 0.14 0.05 0.03 0.02 0.01 0.02 0.01 0.01 ...
.. ..$ Chg : num [1:56] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ Bid : num [1:56] NA 0 0 0 0 0 0 0 0 0 ...
.. ..$ Ask : num [1:56] 2.13 0.11 0.11 1.8 1.87 0.01 1.88 0.5 1.88 2.13 ...
.. ..$ Vol : int [1:56] NA 1 1 2 1 16 1 17 1 21 ...
.. ..$ OI : int [1:56] 1 10 7 9 76 201 113 314 92 264 ...
I cannot access the next level of the object programatically (by indexing the value)
I want to do something like this:
AAPL.OPT[c][1]
instead of this
AAPL.OPT[c]$jun.12.2020
Sample data of AAPL.OPT[c]
$`jun.12.2020`$`calls`
Strike Last Chg Bid Ask Vol OI
AAPL200612C00180000 180.0 123.29 0.00000000 149.00 153.35 NA 0
AAPL200612C00185000 185.0 117.60 0.00000000 144.00 148.50 15 15
AAPL200612C00200000 200.0 131.15 7.60999300 129.00 133.50 16 25
AAPL200612C00210000 210.0 119.95 9.47999600 119.30 123.65 2 4
....

AAPL.OPT[c] gives a list of length 1 which has two other lists in them. If we use [[c]] it gives a list of length 2 andtTo access each dataframe you can subset them further using [[ so AAPL.OPT[[c]][[1]] and AAPL.OPT[[c]][[2]].

We can use
AAPL.OPT[[c]]$jun.12.2020

Multiclass classification in H2O randomForest

I'm trying to use H20 randomForest for multiclass classification in R, but when I run the code, the randomForest always comes out as a regression model - despite the target variable being a factor. I am trying to predict 'Gradient', a factor with 5 levels, by one other factor 'Period' with 4 levels, and 21 numerical predictors.
Any help would be appreciated. Code below....
>str(df)
Class 'H2OFrame' <environment: 0x000001f6b361abe0>
- attr(*, "op")= chr ":="
- attr(*, "eval")= logi TRUE
- attr(*, "id")= chr "RTMP_sid_aecc_35"
- attr(*, "nrow")= int 63878
- attr(*, "ncol")= int 22
- attr(*, "types")=List of 22
- attr(*, "data")='data.frame': 10 obs. of 22 variables:
..$ Gradient: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1
..$ Period : Factor w/ 4 levels "Dawn","Day","Dusk",..: 2 2 2 2 2 2 2 2 2 2
..$ AC1 : num 1792 1793 1790 1790 1797 ...
..$ AC2 : num 316 316 318 317 324 ...
..$ AC3 : num 972 972 974 975 979 ...
etc for remaining numerical predictors.
>splits <- h2o.splitFrame(df, c(0.6,0.2), seed=1234)
>train <- h2o.assign(splits[[1]], "train.hex")
>valid <- h2o.assign(splits[[2]], "valid.hex")
>test <- h2o.assign(splits[[3]], "test.hex")
>str(train)
Class 'H2OFrame' <environment: 0x000002266fac7d40>
- attr(*, "op")= chr "assign"
- attr(*, "id")= chr "train.hex"
- attr(*, "nrow")= int 38259
- attr(*, "ncol")= int 22
- attr(*, "types")=List of 22
- attr(*, "data")='data.frame': 10 obs. of 22 variables:
..$ Gradient: Factor w/ 5 levels "LB","LU","PB",..: 1 1 1 1 1 1 1 1 1 1
..$ Period : Factor w/ 4 levels "Dawn","Day","Dusk",..: 2 2 2 2 2 2 2 2 2 2
..$ AC1 : num 1793 1797 1796 1805 1803 ...
..$ AC2 : num 316 324 322 322 323 ...
..$ AC3 : num 972 979 979 988 986 ...
..$ AC4 : num 663 662 664 673 670 ...
..$ AC5 : num 828 825 824 824 825 ...
..$ AD1 : num 1.22 1.42 1.73 2.25 1.99 ...
..$ AD2 : num 1.1 1.27 1.35 1.38 1.38 ...
..$ AD3 : num 1.22 1.42 1.72 2.24 1.99 ...
..$ AD4 : num 1.87 1.53 2.07 2.03 1.78 ...
..$ AD5 : num 2.33 2.33 2.33 2.33 2.33 ...
..$ AE1 : num 0.877 0.849 0.794 0.636 0.72 ...
..$ AE2 : num 0.3687 0.2332 0.1369 0.0433 0.0546 ...
..$ AE3 : num 0.774 0.723 0.624 0.335 0.487 ...
..$ AE4 : num 0.574 0.697 0.44 0.477 0.605 ...
..$ AE5 : num 0.542 0.542 0.554 0.543 0.542 ...
..$ BI1 : num 53 71.9 64 75.4 74.6 ...
..$ BI2 : num 6.51 5.88 4.54 2.3 2.34 ...
..$ BI3 : num 22.2 26 21.5 27.9 28 ...
..$ BI4 : num 7.86 9.58 8.59 12.17 12.5 ...
..$ BI5 : num 11.3 17.9 16.4 18.1 17.5 ...
> train[1:5,] ## rows 1-5, all columns
Gradient Period AC1 AC2 AC3 AC4 AC5 AD1 AD2 AD3 AD4 AD5 AE1 AE2 AE3 AE4 AE5
1 LB Day 1792.97 316.4038 972.4288 663.2612 827.6400 1.217491 1.104860 1.217491 1.866627 2.332115 0.876794 0.368712 0.774123 0.574168 0.541993
2 LB Day 1796.78 324.3562 979.2218 662.2341 824.6436 1.421910 1.274373 1.421910 1.526506 2.331810 0.848660 0.233177 0.722544 0.696906 0.542409
3 LB Day 1796.09 321.9081 978.7464 664.1776 824.4437 1.726798 1.345030 1.721740 2.066543 2.326278 0.794230 0.136892 0.624107 0.440458 0.553766
4 LB Day 1805.14 322.0390 987.9472 673.2841 824.3146 2.248474 1.381644 2.239061 2.028538 2.331881 0.636007 0.043267 0.334964 0.477149 0.542572
5 LB Day 1803.15 323.1540 985.6376 669.7603 824.6003 1.992025 1.380468 1.992004 1.782532 2.331971 0.720153 0.054578 0.486951 0.604876 0.542420
BI1 BI2 BI3 BI4 BI5
1 53.03567 6.506536 22.23446 7.862767 11.32708
2 71.94775 5.879407 26.04130 9.579798 17.94337
3 63.98763 4.535041 21.50727 8.590985 16.38780
4 75.38319 2.301110 27.89600 12.165991 18.06316
5 74.60517 2.342853 28.02568 12.499122 17.52902
rf1 <- h2o.randomForest(
training_frame = train,
validation_frame = valid,
x=2:22,
y=1,
ntrees = 200,
stopping_rounds = 2,
score_each_iteration = T,
seed = 1000000) `
>perf <- h2o.performance(rf1, valid)
>h2o.mcc(perf)
Error in h2o.metric(object, thresholds, "absolute_mcc") :
No absolute_mcc for H2OMultinomialMetrics
h2o.accuracy(perf)
Error in h2o.metric(object, thresholds, "accuracy") :
No accuracy for H2OMultinomialMetrics
and a summary from the model summary:
H2OMultinomialMetrics: drf
** Reported on training data. **
** Metrics reported on Out-Of-Bag training samples **
Training Set Metrics:
=====================
Extract training frame with `h2o.getFrame("train.hex")`
MSE: (Extract with `h2o.mse`) 0.2499334
RMSE: (Extract with `h2o.rmse`) 0.4999334
Logloss: (Extract with `h2o.logloss`) 0.9987891
Mean Per-Class Error: 0.2941914
R^2: (Extract with `h2o.r2`) 0.8683096

mcc is specifically for binary classifiers; your factor has more than 2 levels.
You can tell you have successfully done a multinomial classification, rather than a regression, because the error message says "No absolute_mcc for H2OMultinomialMetrics".
h2o.accuracy() and h2o.logloss() are available for multinomial models.
UPDATE: ...well, the docs say h2o.accuracy() is available, but a quick check on the iris dataset gives me the same error you see; must be related to that warning in the docs (which I didn't understand).
Anyway, more useful is likely to be h2o.confusionMatrix(rf1); the overall error shown in the bottom right is 1 - accuracy. Also h2o.confusionMatrix(rf1,valid=T) and h2o.confusionMatrix(rf1, test)

'Incorrect number of dimensions' when running Zelig 'arima' on imputed data

I'm getting an error when I try to run an arima model with the zelig package. I'm using MI data with 20 imputations that were created with Amelia. Here is a short summary of my id and response variables:
$ imp20:'data.frame': 442 obs. of 50 variables:
..$ region : Factor w/ 4 levels "Central Africa",..: 3 3 3 3 3 3 3 3 3 3 ...
..$ subregionid : Factor w/ 4 levels "FC","FE","FS",..: 3 3 3 3 3 3 3 3 3 3 ...
..$ country : Factor w/ 34 levels "Angola","Benin",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ ISO2 : Factor w/ 34 levels "AO","BF","BJ",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ ISO3 : Factor w/ 34 levels "AGO","BEN","BFA",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ year : num [1:442] 2002 2003 2004 2005 2006 ...
..$ cap.lat : num [1:442] -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 ...
..$ cap.long : num [1:442] 13.2 13.2 13.2 13.2 13.2 ...
..$ NGDP_RPCH : num [1:442] 14.53 5.25 10.88 18.26 20.73 ...
..$ NGDPD : num [1:442] 3.18 3.31 3.38 3.44 3.48 ...
..$ NGDPDPC : num [1:442] 2.68 2.69 2.72 2.75 2.78 ...
..$ NGSD_NGDP : num [1:442] 10.62 7.77 12.63 26.98 40.94
...
..$ PIKE.regional : num [1:442] 0.225 0.295 0.287 0.358 0.357 ...
..$ Definite.Probable : num [1:442] 36 36 36 36 36.1 ...
..$ Elephant.range : num [1:442] 406006 433613 511662 456046 459418 ...
..$ Change.by.year : num [1:442] 0.000463 0.000463 0.000463 0.000463 0.000463 ...
..$ Diff.from.expected : num [1:442] -0.0415 -0.0415 -0.0415 -0.0415 -0.0415 ...
Diff.from.expected is my response variable. And here is the code that I've run along with the error I'm getting.
z1 <- zarima$new()
> z1$zelig(Diff.from.expected~GNI, order=c(1,0,1), model="arima",
+ data = a.coVarsTrans.more, ts="year", cs="country")
Error in data[, cs] : incorrect number of dimensions
So it appears to me that there is an issue with the cs='country' call, but I'm not sure what the issue is. I'm planning to add more independent variables, but want to make sure that a basic model works first, which clearly it doesn't.
Here is the link to my saved Amelia .Rdata file.

How Can I Quickly Inspect Built-in Data Sets (PSA)?

One of the best ways to make a question reproducible is to use one of the built in data sets. Using data(), however, is frustrating because no information about the structure of the data set is provided.
How can I quickly view the structure of available data sets?

The following function may help:
dataStr <- function(fun=function(x) TRUE)
str(
Filter(
fun,
Filter(
Negate(is.null),
mget(data()$results[, "Item"], inh=T, ifn=list(NULL))
) ) )
It accepts a filtering function, applies it to all the data sets, and prints out the structure of the matching data sets. For example, if we're looking for matrices:
> dataStr(is.matrix)
List of 8
$ WorldPhones : num [1:7, 1:7] 45939 60423 64721 68484 71799 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:7] "1951" "1956" "1957" "1958" ...
.. ..$ : chr [1:7] "N.Amer" "Europe" "Asia" "S.Amer" ...
$ occupationalStatus : 'table' int [1:8, 1:8] 50 16 12 11 2 12 0 0 19 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ origin : chr [1:8] "1" "2" "3" "4" ...
.. ..$ destination: chr [1:8] "1" "2" "3" "4" ...
$ volcano : num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...
--- 5 entries omitted ---
Or for data frames (also omitting entries):
> dataStr(is.data.frame)
List of 42
$ BOD :'data.frame': 6 obs. of 2 variables:
..$ Time : num [1:6] 1 2 3 4 5 7
..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
..- attr(*, "reference")= chr "A1.4, p. 270"
$ CO2 :Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame': 84 obs. of 5 variables:
..$ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
..$ Type : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
..$ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
..$ conc : num [1:84] 95 175 250 350 500 675 1000 95 175 250 ...
..$ uptake : num [1:84] 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
--- 40 entries omitted ---
Or even for simple vectors:
> dataStr(function(x) is.atomic(x) && is.vector(x) && !is.ts(x))
List of 4
$ euro : Named num [1:11] 13.76 40.34 1.96 166.39 5.95 ...
..- attr(*, "names")= chr [1:11] "ATS" "BEF" "DEM" "ESP" ...
$ islands: Named num [1:48] 11506 5500 16988 2968 16 ...
..- attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia" ...
$ precip : Named num [1:70] 67 54.7 7 48.5 14 17.2 20.7 13 43.4 40.2 ...
..- attr(*, "names")= chr [1:70] "Mobile" "Juneau" "Phoenix" "Little Rock" ...
$ rivers : num [1:141] 735 320 325 392 524 ...

Plot NYC Citi Bike Data, Lat and Long Not Displayed on Map

I'm trying to plot the NYC citi bike station data on top of a map of NYC.
I downloaded zipcode data from here:
Here is what I've done:
> bike.loc<-bike.train
> nyc.zip<-readShapePoly("nyc_zipcta.shp")
> coordinates(bike.loc)<-c("start.station.id","end.station.id")
> class(bike.loc)
[1] "SpatialPointsDataFrame"
attr(,"package")
[1] "sp"
> str(bike.loc)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
..# data :'data.frame': 150000 obs. of 15 variables:
.. ..$ tripduration : int [1:150000] 970 711 549 883 88 535 975 307 218 542 ...
.. ..$ starttime : Factor w/ 41673 levels "7/1/13 0:00",..: 8738 12152 27602 11984 9651 21822 24531 13946 17666 20150 ...
.. ..$ stoptime : Factor w/ 41752 levels "7/1/13 0:04",..: 8774 12769 27646 12006 9647 21866 24585 13961 17689 18964 ...
.. ..$ start.station.name : Factor w/ 329 levels "1 Ave & E 15 St",..: 9 277 298 321 267 329 71 197 266 182 ...
.. ..$ start.station.latitude : num [1:150000] 40.7 40.7 40.8 40.7 40.7 ...
.. ..$ start.station.longitude: num [1:150000] -74 -74 -74 -74 -74 ...
.. ..$ end.station.name : Factor w/ 329 levels "1 Ave & E 15 St",..: 193 124 6 159 267 73 76 73 116 227 ...
.. ..$ end.station.latitude : num [1:150000] 40.7 40.7 40.8 40.7 40.7 ...
.. ..$ end.station.longitude : num [1:150000] -74 -74 -74 -74 -74 ...
.. ..$ bikeid : int [1:150000] 15301 17873 17596 15864 19005 17230 15476 19805 18494 18104 ...
.. ..$ usertype : Factor w/ 2 levels "Customer","Subscriber": 2 2 2 2 1 2 1 2 2 2 ...
.. ..$ birth.year : Factor w/ 79 levels "\\N","1899","1900",..: 67 37 70 67 1 55 1 45 73 63 ...
.. ..$ gender : int [1:150000] 1 1 1 2 0 1 0 1 1 1 ...
.. ..$ hour : int [1:150000] 19 1 2 8 12 14 15 17 11 9 ...
.. ..$ day : int [1:150000] 15 18 28 17 16 24 26 19 21 22 ...
..# coords.nrs : int [1:2] 4 8
..# coords : num [1:150000, 1:2] 528 466 495 328 212 430 358 323 482 406 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "start.station.id" "end.station.id"
..# bbox : num [1:2, 1:2] 72 72 3002 3002
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "start.station.id" "end.station.id"
.. .. ..$ : chr [1:2] "min" "max"
..# proj4string:Formal class 'CRS' [package "sp"] with 1 slots
.. .. ..# projargs: chr NA
I can produce a plot of NYC
plot(nyc.zip)
But I cannot plot the coordinates on top.
plot(bike.loc, add=T, col= "red", pch=15)
I've tried:
EPSG <- make_EPSG()
NY <- with(EPSG,EPSG[grepl("New York",note) & code==2263,]$prj4)
Based on this post, but haven't got it to work.
How do I get the lat/longs plotted over the map?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How can I split a multiply imputed dataset created in Amelia? - r

Related

Loop through list programatically

Multiclass classification in H2O randomForest

'Incorrect number of dimensions' when running Zelig 'arima' on imputed data

How Can I Quickly Inspect Built-in Data Sets (PSA)?

Plot NYC Citi Bike Data, Lat and Long Not Displayed on Map

Categories

Resources