R: zoo object vector explanation. Referring to ..$ : NULL. compared to ..$ : chr - r

I have two zoo objects and I am wondering, why the first zoo object shows
..$ : NULL
when the second one shows
..$ : chr [1:2266] "1" "2" "3" "4" ...
instead.
What does this mean in this case? And how could I change
"..$ : chr [1:2266] "1" "2" "3" "4" ..." to "..$ : NULL" in the second object?
Background: I am conducting an eventstudy (package eventstudies) and I get the error:
Error in rval[i, j, drop = drop., ...] : subscript out of bounds
One of the possible problems could be the formatting of the data. The object "OtherReturns" is used as example data in the package. I am hoping to resolve the error by understanding more about zoo objects. Yet, I could not understand the formatting difference between the two following objects:
> str(OtherReturns)
‘zoo’ series from 2010-07-01 to 2013-03-28
Data: num [1:720, 1:4] -1.1568 -0.2727 -0.0229 1.01 -0.9107 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "NiftyIndex" "CallMoneyRate" "SP500" "USDINR"
Index: Date[1:720], format: "2010-07-01" "2010-07-02" "2010-07-05" "2010-07-06" "2010-07-07" "2010-07-08" "2010-07-09" "2010-07-12" ...
> str(zoo_SP500)
‘zoo’ series from 2007-01-03 to 2015-12-31
Data: num [1:2266, 1:2] 1417 1418 1410 1413 1412 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:2266] "1" "2" "3" "4" ...
..$ : chr [1:2] "SP500_Closing" "Index_return"
Index: Date[1:2266], format: "2007-01-03" "2007-01-04" "2007-01-05" "2007-01-08" "2007-01-09" "2007-01-10" "2007-01-11" "2007-01-12" ...
At the first glance, both objects look the same in this regard:
> head(OtherReturns,3)
NiftyIndex CallMoneyRate SP500 USDINR
2010-07-01 -1.15678265 NA NA 0.3009785
2010-07-02 -0.27267977 NA -1.341017 0.0000000
2010-07-05 -0.02291607 NA NA 0.4070272
> head(zoo_SP500,3)
SP500_Closing Index_return
2007-01-03 1416.60 -0.1198618
2007-01-04 1418.34 0.1228293
2007-01-05 1409.71 -0.6084578
Any help and explanations much appreciated!
Thank you!

Related

Expand nested dataframe into parent

I have a dataframe nested within a dataframe that I'm getting from Mongo. The number of rows match in each so that when viewed it looks like a typical dataframe. My question, how do I expand the nested dataframe into the parent so that I can run dplyr selects? See the layout below
'data.frame': 10 obs. of 2 variables:
$ _id : int 1551 1033 1061 1262 1032 1896 1080 1099 1679 1690
$ personalInfo:'data.frame': 10 obs. of 2 variables:
..$ FirstName :List of 10
.. ..$ : chr "Jack"
.. ..$ : chr "Yogesh"
.. ..$ : chr "Steven"
.. ..$ : chr "Richard"
.. ..$ : chr "Thomas"
.. ..$ : chr "Craig"
.. ..$ : chr "David"
.. ..$ : chr "Aman"
.. ..$ : chr "Frank"
.. ..$ : chr "Robert"
..$ MiddleName :List of 10
.. ..$ : chr "B"
.. ..$ : NULL
.. ..$ : chr "J"
.. ..$ : chr "I"
.. ..$ : chr "E"
.. ..$ : chr "A"
.. ..$ : chr "R"
.. ..$ : NULL
.. ..$ : chr "J"
.. ..$ : chr "E"
As per suggestion, here's how you recreate the data
id <- c(1551, 1033, 1061, 1262, 1032, 1896, 1080, 1099, 1679, 1690)
fname <- list("Jack","Yogesh","Steven","Richard","Thomas","Craig","David","Aman","Frank","Robert")
mname <- list("B",NULL,"J","I","E","A","R",NULL,"J","E")
sub <- as.data.frame(cbind(fname, mname))
master <- as.data.frame(id)
master$personalInfo <- sub
We could loop the 'personalInfo', change the NULL elements of the list to NA and convert it to a real dataset with 3 columns
library(tidyverse)
out <- master %>%
pull(personalInfo) %>%
map_df(~ map_chr(.x, ~ replace(.x, is.null(.x), NA))) %>%
bind_cols(master %>%
select(id), .)
str(out)
#'data.frame': 10 obs. of 3 variables:
# $ id : num 1551 1033 1061 1262 1032 ...
# $ fname: chr "Jack" "Yogesh" "Steven" "Richard" ...
# $ mname: chr "B" NA "J" "I" ...
While #akrun's answer is probably more practical and probably the way to tidy your data, I think this output is closer to what you describe.
I create a new environment where I put the data.frame's content, there I unlist to the said environment the content of your problematic column, and finally I wrap it all back into a data.frame.
I use a strange hack with cbind as as.data.frame is annoying with list columns. Using tibble::as_tibble works fine however.
new_env <- new.env()
list2env(master,new_env)
list2env(new_env$personalInfo,new_env)
rm(personalInfo,envir = new_env)
res <- as.data.frame(do.call(cbind,as.list(new_env))) # or as_tibble(as.list(new_env))
rm(new_env)
res
# fname id mname
# 1 Jack 1551 B
# 2 Yogesh 1033 NULL
# 3 Steven 1061 J
# 4 Richard 1262 I
# 5 Thomas 1032 E
# 6 Craig 1896 A
# 7 David 1080 R
# 8 Aman 1099 NULL
# 9 Frank 1679 J
# 10 Robert 1690 E
str(res)
# 'data.frame': 10 obs. of 3 variables:
# $ fname:List of 10
# ..$ : chr "Jack"
# ..$ : chr "Yogesh"
# ..$ : chr "Steven"
# ..$ : chr "Richard"
# ..$ : chr "Thomas"
# ..$ : chr "Craig"
# ..$ : chr "David"
# ..$ : chr "Aman"
# ..$ : chr "Frank"
# ..$ : chr "Robert"
# $ id :List of 10
# ..$ : num 1551
# ..$ : num 1033
# ..$ : num 1061
# ..$ : num 1262
# ..$ : num 1032
# ..$ : num 1896
# ..$ : num 1080
# ..$ : num 1099
# ..$ : num 1679
# ..$ : num 1690
# $ mname:List of 10
# ..$ : chr "B"
# ..$ : NULL
# ..$ : chr "J"
# ..$ : chr "I"
# ..$ : chr "E"
# ..$ : chr "A"
# ..$ : chr "R"
# ..$ : NULL
# ..$ : chr "J"
# ..$ : chr "E"

Make calculations into a variable within a list

I would like to make some specific calculation within a large dataset.
This is my MWE using an API call (takes 3-4 sec ONLY to Download)
devtools::install_github('mingjerli/IMFData')
library(IMFData)
fdi_asst <- c("BFDA_BP6_USD","BFDAD_BP6_USD","BFDAE_BP6_USD")
databaseID <- "BOP"
startdate <- "1980-01-01"
enddate <- "2016-12-31"
checkquery <- FALSE
FDI_ASSETS <- as.data.frame(CompactDataMethod(databaseID, list(CL_FREA = "Q", CL_AREA_BOP = "", CL_INDICATOR_BOP= fdi_asst), startdate, enddate, checkquery))
my dataframe 'FDI_ASSETS' looks like this (I provide a picture instead of head() for convenience)
the last column is a list and contains three more variables:
head(FDI_ASSETS$Obs)
[[1]]
#TIME_PERIOD #OBS_VALUE #OBS_STATUS
1 1980-Q1 30.0318922812441 <NA>
2 1980-Q2 23.8926174547104 <NA>
3 1980-Q3 26.599634375058 <NA>
4 1980-Q4 32.7522451203517 <NA>
5 1981-Q1 44.124979234001 <NA>
6 1981-Q2 35.9907120805994 <NA>
MY SCOPE
I want to do the following:
if/when the "#UNIT_MULT == 6" then divide the "#OBS_VALUE" in FDI_ASSETS$Obs by 1000
if/when the "#UNIT_MULT == 3" then divide the "#OBS_VALUE" in FDI_ASSETS$Obs by 1000000
UPDATE
Structure of FDI_ASSETS looks like this:
str(FDI_ASSETS)
'data.frame': 375 obs. of 6 variables:
$ #FREQ : chr "Q" "Q" "Q" "Q" ...
$ #REF_AREA : chr "FI" "MX" "MX" "TO" ...
$ #INDICATOR : chr "BFDAE_BP6_USD" "BFDAD_BP6_USD" "BFDAE_BP6_USD" "BFDAD_BP6_USD" ...
$ #UNIT_MULT : chr "6" "6" "6" "3" ...
$ #TIME_FORMAT: chr "P3M" "P3M" "P3M" "P3M" ...
$ Obs :List of 375
..$ :'data.frame': 147 obs. of 3 variables:
.. ..$ #TIME_PERIOD: chr "1980-Q1" "1980-Q2" "1980-Q3" "1980-Q4" ...
.. ..$ #OBS_VALUE : chr "30.0318922812441" "23.8926174547104" "26.599634375058" "32.7522451203517" ...
.. ..$ #OBS_STATUS : chr NA NA NA NA ...
..$ :'data.frame': 60 obs. of 2 variables:
.. ..$ #TIME_PERIOD: chr "2001-Q1" "2001-Q3" "2002-Q1" "2002-Q2" ...
.. ..$ #OBS_VALUE : chr "9.99999999748979E-05" "9.99999997475243E-05" "9.8999999998739E-05" "-9.90000000342661E-05" ...
..$ :'data.frame': 63 obs. of 2 variables:
.. ..$ #TIME_PERIOD: chr "2001-Q1" "2001-Q2" "2001-Q3" "2001-Q4" ...
.. ..$ #OBS_VALUE : chr "130.0149" "189.627" "3453.8319" "630.483" ...
..$ :'data.frame': 17 obs. of 2 variables:
I downloaded your data and it is quite complicated. I have removed my wrong answer so that you can get it answered by #akrun or someone similar :) I don't have the time to parse through it right now.
I found the following solution
list_assets<-list(FDI_ASSETS=FDI_ASSETS, Portfolio_ASSETS=Portfolio_ASSETS, other_invest_ASSETS=other_invest_ASSETS, fin_der_ASSETS=fin_der_ASSETS, Reserves=Reserves)
for (df in list_assets){
for( i in 1:length(df$"#UNIT_MULT")){
if (df$"#UNIT_MULT"[i]=="6"){
df$Obs[[i]]$"#OBS_VALUE" <- as.numeric(df$Obs[[i]]$"#OBS_VALUE")
df$Obs[[i]]$"#OBS_VALUE" <- df$Obs[[i]]$"#OBS_VALUE"/1000
} else if ((df$"#UNIT_MULT"[i]=="3")){
df$Obs[[i]]$"#OBS_VALUE" <- as.numeric(df$Obs[[i]]$"#OBS_VALUE")
df$Obs[[i]]$"#OBS_VALUE" <- df$Obs[[i]]$"#OBS_VALUE"/1000000
}
}
}
Please let me know how I can modify the code in order to make it more efficient and avoid these loops.

Data frame with matrix embedded in one variable

I'm using 'pls' package and for this I need to produce a dataframe with a structure a bit different to what I'm used to.
Data frame structure I need to work with 'pls: gasoline
library(pls)
gasoline
Example to show how my data looks like: gasoline2
Background info - What I tend to do to load data into R is to transcript the data in a .xls and then convert the file to a .txt which is then loaded to R.
When my data is loaded it looks like this:
gasoline2 <- as.data.frame(as.matrix(gasoline))
Question
How can convert the structure of gasoline2 into the structure of gasoline?
Thanks a lot in advance for your help!
You're looking for I, which will allow you to combine different data structures (like lists or matrices) as columns in a data.frame:
## Assume you are starting with this:
X <- as.data.frame(as.matrix(gasoline))
## Create a new object where column 1 is the same as the first
## column in your existing data frame, and column 2 is a matrix
## of the remaining columns
newGas <- cbind(X[1], NIR = I(as.matrix(X[-1])))
str(gasoline)
# 'data.frame': 60 obs. of 2 variables:
# $ octane: num 85.3 85.2 88.5 83.4 87.9 ...
# $ NIR : AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 -0.050859 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr "1" "2" "3" "4" ...
# .. ..$ : chr "900 nm" "902 nm" "904 nm" "906 nm" ...
str(newGas)
# 'data.frame': 60 obs. of 2 variables:
# $ octane: num 85.3 85.2 88.5 83.4 87.9 ...
# $ NIR : AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 -0.050859 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr "1" "2" "3" "4" ...
# .. ..$ : chr "NIR.900 nm" "NIR.902 nm" "NIR.904 nm" "NIR.906 nm" ...
There's a slight difference in the column naming, but I think that can easily be taken care of...
> colnames(newGas$NIR) <- gsub("NIR.", "", colnames(newGas$NIR))
> identical(gasoline, newGas)
[1] TRUE

Extracting component from all objects within an indexed list

I have an indexed list containing several objects each of which contains 3 matrices ($tab, $nobs and $other). There are hundred such objects in the list. The objective is to access only $tab matrix and transpose it from each of the objects.
genfreqT <- lapply(genfreq[[1:100]]$tab, function(x) t(x))
This does not seem to work.
Here is how the genfreq object is structured. This was created with R package adegenet.
> str(genfreq[[1]])
List of 3
$ tab : num [1:30, 1:1974] 0.6 0.5 0.325 0.675 0.6 0.5 0.5 0.375 0.55 0.475 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : Named chr [1:30] "1" "2" "3" "4" ...
.. .. ..- attr(*, "names")= chr [1:30] "01" "02" "03" "04" ...
.. ..$ : chr [1:1974] "L0001.1" "L0001.2" "L0002.1" "L0002.2" ...
$ nobs: num [1:30, 1:1000] 40 40 40 40 40 40 40 40 40 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : Named chr [1:30] "1" "2" "3" "4" ...
.. .. ..- attr(*, "names")= chr [1:30] "01" "02" "03" "04" ...
.. ..$ : Named chr [1:1000] "L0001" "L0002" "L0003" "L0004" ...
.. .. ..- attr(*, "names")= chr [1:1000] "L0001" "L0002" "L0003" "L0004" ...
$ call: language makefreq(x = x, truenames = TRUE)
genfreqT <-lapply(lapply(genfreq, "[[", "tab"),function(x) t(x))
The package developer for 'Adegenet' provided this solution:
> genfreqT <- lapply(genfreq, function(e) t(e$tab))
> summary(genfreqT)
Length Class Mode
data1.str 59220 -none- numeric
data2.str 59220 -none- numeric
data3.str 59220 -none- numeric

cramer.test: NAs introduced by coercion

I know there is a lot of information in Google about this problem, but I could not solve it.
I have a data frame:
> str(myData)
'data.frame': 1199456 obs. of 7 variables:
$ A: num 3064 82307 4431998 1354 193871 ...
$ B: num 6067 403916 2709997 2743 203434 ...
$ C: num 299 11752 33282 170 2748 ...
$ D: num 105 6676 7065 20 1593 ...
$ E: num 8 572 236 3 170 ...
$ F: num 0 21 95 0 13 ...
$ G: num 583 18512 961328 348 42728 ...
Then I convert it to a matrix in order to apply the Cramer-von Mises test from "cramer" library:
> myData = as.matrix(myData)
> str(myData)
num [1:1199456, 1:7] 3064 82307 4431998 1354 193871 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:1199456] "8" "32" "48" "49" ...
..$ : chr [1:7] "A" "B" "C" "D" ...
After that, if I apply a "cramer.test(myData[x1:y1,], myData[x2:y2,])" I get the following error:
Error in rep(0, (RVAL$m + RVAL$n)^2) : invalid 'times' argument
In addition: Warning message:
In matrix(rep(0, (RVAL$m + RVAL$n)^2), ncol = (RVAL$m + RVAL$n)) :
NAs introduced by coercion
I also tried to convert the data frame to a matrix like this, but the error is the same:
> myData = as.matrix(sapply(myData, as.numeric))
> str(myData)
num [1:1199456, 1:7] 3064 82307 4431998 1354 193871 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:7] "A" "B" "C" "D" ...
Your problem is that your data set is too large for the algorithm that cramer.test is using (at least the way it's coded). The code tries to create a lookup table according to
lookup <- matrix(rep(0, (RVAL$m + RVAL$n)^2),
ncol = (RVAL$m + RVAL$n))
where RVAL$m and RVAL$n are the number of rows of the two samples. The standard maximum length of an R vector is 2^31-1 on a 32-bit platform: since your samples have equal numbers of rows N, you'll be trying to create a vector of length (2*N^2), which in your case is 5.754779e+12 -- probably too big even if R would let you create the vector.
You may have to look for another implementation of the test, or another test.

Resources