My data is stored as a matrix and as a list at the same time? - r

I am using the tabular() function to produce tables in r (tables library).
I want to compute CI's from the data in the output (let mytable be the output from tabular()). Simple enough I thought, except when I go to call a value from the matrix, I get the error Error in mytable[1, i] - 1 : non-numeric argument to binary operator. I thought this was odd, as when I call up a particular cell of the matrix (where as.matrix returned true for mytable), for example mytable[1, i] for some i, I get an interger. I then do the as.list for mytable and get true also, so I am not sure what this means. I guess the tabular() function stores the results as a special kind of matrix.
I am only trying to pull out the mean,sdev, and n, which I am able to just by typing the cell location, for example mytable[1, i] would return an 86. However, when I try to call up the value in qt(.975,df=(mytable[1,i]-1)) for example, I get the error above. Not sure really how to approach this except to manually enter the values into another matrix (which I would like to avoid). Or, if I can compute CI's directly in the tabular() function that would work also. Cheers.

I shall quote for you the Value section of the documentation on the function ?tabular:
An object of S3 class "tabular". This is a matrix of mode list, whose
entries are computed summary values, with the following attributes:
rowLabels - A matrix of labels for the rows. This will have the same
number of rows as the main matrix, but may have multiple columns for
different nested levels of labels. If a label covers multiple rows, it
is entered in the first row, and NA is used to fill following rows.
colLabels - Like rowLabels, but labelling the columns.
table - The original table expression being displayed. A list of the
original format specifications are attached as a "fmtlist" attribute.
formats - A matrix of the same shape as the main result, containing NA
for default formatting, or an index into the format list.
As the documentation says, each element of the matrix is a list. If your tabular object is called tab type tab[1,1] and you should see a list containing one of your table values. If I wanted to modify that value, I would probably do something like:
tab[1,1]$term <- value
just like you would modify values in any other list.
Type attributes(tab) and you'll see the items listed above, containing a lot of the formatting information and row/col headers.

Related

Filtering data, comma vs not comma

I have the following code
#abnormal return
exp.ret <- lm((RET-rf)~mkt.rf+smb+hml, data=tesla[tesla$period=="estimation.period",])
tesla$abn.ret <- (tesla$RET-tesla$rf)-predict(exp.ret,tesla)
#CAR during event window
CAR <- sum(tesla$abn.ret[tesla$period=="event.period",])
First section runs fine, but second gets this error:
"Error in tesla$abn.ret[tesla$period == "event.period", ] :
incorrect number of dimensions
I know that the solution is to remove the last comma:
#CAR during event window
CAR <- sum(tesla$abn.ret[tesla$period=="event.period"])
Just wondering what is the right pedagogical way of understanding it, why do I need a comma in the end in some cases, but some not, when I'm filtering for only parts of the data frame.
$ sign, [[]] and [] have different meanings.
In short:
$ sign and [[]] subsets one column of a dataframe or one item of a list.
The output of a subsetted dataframe will be a vector, while the output of a subsetted list will be a variable the same class as the original item, which can be a dataframe, another list, etc...
It's important to note that $ doesn't accept a column index (only a column name) and that you cannot insert two column names/index after $ or inside [[]].
[] slices a dataframe or a list sorting out one or more elements.
the class of the output variable will be the same as the original variable.
if you slice a dataframe using [], the output will be a dataframe, the same applies for lists, etc...
In your specific case, you used $ sign to subset your variable. Then, you tried to slice this output from the subset action using [ , ], but it turned out that the output is a vector, and a vector has always only one dimension and an error was fired. You should slice your vector using [] (the output will be a vector) or [[]] (the output will be a vector with length = 1).
Possible ways to subset tesla as you wish:
tesla$abn.ret[tesla$period == "event.period"]
tesla[["abn.ret"]][tesla$period == "event.period"]
tesla[tesla$period == "event.period", "abn.ret"]
You would achieve the same result using tesla[["period"]] instead of tesla$period.
For some extra details/examples, refer to An introduction to R, published by CRAN.
I hope it helped you somehow..!
tesla$abn.ret is one-dimensional. Each comma separates a dimension, so yours implies 2 dimensions.
Alternatively you could run
tesla[tesla$period=="event.period", "abn.ret"]
And get the same results, since tesla is 2-d.
If you look at the documentation with command ?'[', you find that the default behaviour of syntax x[i] is to drop one dimension away.
If you want to disable the dropping of the dimension, you have explicitly to write x[i,drop=False].

How can I access data in a nested R list?

I want to learn how to access data from a nested list in R. I am relatively new to the R programming language, so I am unsure how to proceed.
The data is a 'large list(947 elements, 654.9mb) and takes the form:
The numbers within the datalist refer to station numbers and when I click on one (in Rstudio) it looks like this:
I want to kow how I can access the data within 'doy' for example. I have tried:
data[[1]]
which returns all the data for the first element of the list (site, location, doy,ltm etc). So clearly the number used within the square brackets is interpreted as an index for the list, as opposed to an identifier for the elements/station in the list.
Then I tried:
data$1
but it returned the error:
Error: unexpected numeric constant in "data$1"
Then I tried:
data[data$1==doy]
But was returned this:
Error: unexpected numeric constant in "data[data$1"
So at this point, I realise that it is not construing the number of the station as a category/factor within the list. It's just reading it as a number. So I thought I'd put some quotes around it to see if that changed what happened:
data[data$"1"=="doy"]
This returned
named list()
But when I looked at it in the environment, it was a list of 0.
I looked at some of the similar question here on Stack (like: accessing nested lists in R) and tried:
data[data$"1"=="doy",][[1]]
But just got:
Error in data[data$"1" == "doy", ] : incorrect number of dimensions
How can I access this data? It reminds me of a structure in Matlab, but it doesn't seem to be indexed in a similar fashion in R.
Let's look at some ways to do what you want:
data[[1]]
This returns the first element of the list, which is itself a list. You can use the $ subsetting shorthand, but the name of the first element is nonstandard. R prefers names that start with letters and include only alphanumeric characters, periods and underscores. You can escape this behavior with backticks:
data$`1`
If you want to access one of the elements of list 1 in your list of lists, you need to further subset. To get to doy, which is the third element of 1. You can do that four ways.
data[[1]][[3]]
data$`1`[[3]]
data[[1]]$doy
data$`1`$doy
One way (in addition to what Ben Norris has shown):
our_list[[c("1", "doy")]]
Reproducible example data (please provide next time)
our_list <- list(`1` = list(site = "x", doy = 3))

How to select column of dataframe using numbers

At the moment I am selecting columns the usual way:
`df$column1`
But I want to loop through the columns of my dataframe into a plot function, and when I use the method:
`df[,1]`
I get the error:
Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error in is.finite(x) : default method not implemented for type 'list'
So what would be the best way to select columns of the dataframe so that I can easily loop through the columns and avoid this error.
You can access dataframe columns using double square brackets e.g. df[[1]] prints the first column. For more info on why/how/etc, see Dave Tang's blog post: https://davetang.org/muse/2013/08/16/double-square-brackets-in-r

R in BERT won't use na.rm=TRUE in sum function

I installed BERT (R-language to Excel interface). In the functions.R file that is included, i modified the included Add function to use the na.rm argument, as follows.
Add <- function( ... ) {
sum(..., na.rm=TRUE);
}
However, it appears that the na.rm arugment is ignored. That is, the Add() function works fine in Excel if all values in the range are present.
[that is =R.Add(A1:A5) in Excel works fine if all of cells A1:A5 contain values]
But if I delete any value in the range (so the Excel cell is blank), I get #NULL! returned.
Is it possible to utilize the na.rm argument, using BERT so that for R-language functions that have the na.rm argument, it is taken into account and blank cells within the Excel range still compute on the remaining values and do not return #NULL!?
This is a little complicated because the behavior is different if you are passing in one argument (a range) or multiple arguments (individual cells). But in the case of a single argument, if you pass in a range that has empty cells, this will be passed as a list. In that case, you will need to call unlist, e.g.
Add <- function( ... ) {
sum(unlist(...), na.rm=TRUE);
}
Excel can have ranges that include different types (e.g. strings and numbers), but R can't. So when BERT passes data from Excel to R that has mixed types, it uses a list of lists.
There is a hint about this in the console when it runs, which is a good place to start -- it says that the argument is an invalid type (list).
As I said it's complicated because the three-dots argument could refer to multiple arguments, each of which could be a list or a single value (a scalar), each of which could be different types. In that case you'd need to use one of the apply functions to unlist different arguments. But try the above first.

Extract formula from Excel Data Table (What-If Analysis)

I am faced with rewriting an Excel project in R. I see a table in which a cell {= TABLE (F2, C2)} is shown. I understand how to create a Table like this (What-If Analysis, Data Table...).
As I have to understand this to rewrite in R, how can I find the original formula which stands behind that cell?
EXAMPLE: I have created a Data Table as shown here and the sheet looks like this:
In my case, I don't know how the sheet was created, and I want to know the initial formula. Now this is shown as {=TABLE(,C4)}.
(In the example I know the answer, it is in the cell (D10), but where is reference for this cell in Data Table?)
I'm using Excel 2007 but have no reason to believe things differ in other versions.
#Stanislav was right to reject my comment suggestion that TABLE was a name; it is an EXCEL function. But it is a very strange function :-}
There isn't any help on the TABLE function in the local help, it isn't listed in "List of worksheet functions (alphabetical)".
You can't manually enter or edit the TABLE function; error "That function is not valid".
Copy/Pasting cells containing the TABLE function pastes their values, not their formulae, even when you specify Paste Special > Formulas
You can't insert rows/columns immediately above/left of cells containing the TABLE function; error "Cannot change part of a data table".
Pace #pnuts using Formulas > Formula Auditing cells containing the TABLE function shows no precedents and no cells show them as dependents. Although in a VBA sheet auditing tool which I use the Range.DirectDependents Property finds the "formula range" dependent on the "margin" cells containing the formulas, but not those containg the values (see below for explanation of those terms).
I haven't been able to find anything I regard as decent documentation of TABLE(). I have found lots of illustrations of how to produce and use that function, but nothing clearly specifying the arguments and result. The best I've found is https://support.office.com/en-us/article/Calculate-multiple-results-by-using-a-data-table-e95e2487-6ca6-4413-ad12-77542a5ea50b. I'd be pleased if anyone can point me to better documentation.
I deduce the bahaviour as described here:
TABLE(Rowinp,Colinp) is an array formula in a contiguous array of cells. I'll refer to that contiguous array as the "formula range" of the data table.
The cells immediately above/left of the formula range are also part of the data table, even though they do not contain a TABLE() function and can be edited; I'll refer to those cells as the "margins" of the data table.
Rowinp and Colinp must be blank or references to single cells.
Rowinp and Colinp must be different (or error "Input cell reference is not valid"), they must not both be blank.
The values in the formula range are calculated by taking formula(s) from the margin(s) and substituting references to Rowinp and/or Colinp with values from the margin(s).
There are three mutually exclusive possibilities, corresponding to Rowinp blank or not.
TABLE(Rowinp, ) Colinp blank. The formula is that in the left margin of the same row with instances of Rowinp replaced by values from the upper margin of the same column.
TABLE( , Colinp) Rowinp blank. The formula is that in the top margin of the same column with instances of Colinp replaced by values from the the left margin of the same row.
TABLE(Rowinp, Colinp) Neither blank. The formula is that in the cell at the intersection of the left and top margins with instances of Rowinp replaced by values from the upper margin of the same column and instances of Colinp replaced by values from the the left margin of the same row.
I think that should let you work out what the effective formula is in each cell of the formula range.
But I wouldn't be surprised to learn that any of the above is wrong :-0
I welcome pointers to anything more authoritative.
I think in your example the F2 and C2 are effectively only the addresses of parameters for a function (TABLE) where that may be located anywhere, with the associated formula in the table's top left cell.
So I suggest go to C2, FORMULAS > Formula Auditing and click Trace Dependents, repeat for F2 and see where the arrows converge.

Resources