How to display coefficients in scientific notation with stargazer - r

I want to compare the results of different models (lm, glm, plm, pglm) in a table in R using stargazer or a similar tool.
However I can't find a way to display the coefficients in scientific notation. This is kind of a problem because the intercept is rather large (about a million) while other coefficients are small (about e-7) which results in lots of useless zeros making it harder to read the table.
I found a similar question here: Format model display in texreg or stargazer R as scientific.
But the results there require rescaling the variables and since I use count data I wouldn't want to rescale it.
I am grateful for any suggestions.

Here's a reproducible example:
m1 <- lm(Sepal.Length ~ Petal.Length*Sepal.Width,
transform(iris, Sepal.Length = Sepal.Length+1e6,
Petal.Length=Petal.Length*10, Sepal.Width=Sepal.Width*100))
# Coefficients:
# (Intercept) Petal.Length Sepal.Width Petal.Length:Sepal.Width
# 1.000e+06 7.185e-02 8.500e-03 -7.701e-05
I don't believe stargazer has easy support for this.
You could try other alternatives like xtable or any of the many options here (I have not tried them all)
library(xtable)
xtable(m1, display=rep('g', 5)) # or there's `digits` too; see `?xtable`
Or if you're using knitr or pandoc I quite like pander, which has automagic scientific notation already (note: this is pandoc output which looks like markdown, not tex output, and then you knit or pandoc to latex/pdf):
library(pander)
pander(m1)

It's probably worth making a feature request to the package maintainer to include this option.
In the meantime, you can replace numbers in the output with scientific notation auto-magically. There are a few things to be careful about when replacing numbers. It is important not to reformat numbers that are part of the latex encoding. Also, be careful not to replace characters that are part of variable names. For example the . in Sepal.Width could easily be mistaken for a number by regex. The following code should deal with most common situations. But, if someone, for example, calls their variable X_123456789 it might rename this to X_1.23e+09 depending on the scipen setting. So some caution is needed and a more robust solution probably will need to be implemented within the stargazer package.
here's an example stargazer table to demonstrate on (shamelessly copied from #mathematical.coffee):
library(stargazer)
library(gsubfn)
m1 <- lm(Sepal.Length ~ Petal.Length*Sepal.Width,
transform(iris, Sepal.Length = Sepal.Length+1e6,
Petal.Length=Petal.Length*10, Sepal.Width=Sepal.Width*100))
star = stargazer(m1, header = F, digit.separator = '')
Now a helper function to reformat the numbers. You can play around with the digits and scipen parameters to control the output format. If you want to force scientific format more often use a smaller (more negative) scipen. Otherwise we can have it automatically use scientific format only for very small or large numbers by using a larger scipen. The cutoff parameter is there to prevent reformatting of numbers represented by only a few characters.
replace_numbers = function(x, cutoff=4, digits=3, scipen=-7) {
ifelse(nchar(x) < cutoff, x, prettyNum(as.numeric(x), digits=digits, scientific=scipen))
}
And apply that to the stargazer output using gsubfn::gsubfn
gsubfn("([0-9.]+)", ~replace_numbers(x), star)

Another robust way to get scientific notation using stargazer is to hack the digit.separator parameter. This option allows the user to specify the character that separates decimals (usually a period . in most locales). We can usurp this parameter to insert a uniquely identifiable string into any number that we want to be able to find using regex. The advantage of searching for numbers this way is that we shall only find numbers that correspond to numeric values in the stargazer output. I.e. there is no possibility to also match numbers that are part of variable names (e.g. X_12345) or that are part of the latex formatting code (e.g. \hline \\[-1.8ex]). In the following I use the string ::::, but any unique character string (such as a hash) that we will not find elsewhere in the table will do. It's probably best to avoid having any special regex characters in the identifier mark, as this will complicate things slightly.
Using the example model m1 from this other answer.
mark = '::::'
star = stargazer(m1, header = F, decimal.mark = mark, digit.separator = '')
replace_numbers = function(x, low=0.01, high=1e3, digits = 3, scipen=-7, ...) {
x = gsub(mark,'.',x)
x.num = as.numeric(x)
ifelse(
(x.num >= low) & (x.num < high),
round(x.num, digits = digits),
prettyNum(x.num, digits=digits, scientific = scipen, ...)
)
}
reg = paste0("([0-9.\\-]+", mark, "[0-9.\\-]+)")
cat(gsubfn(reg, ~replace_numbers(x), star), sep='\n')
Update
If you want to ensure that trailing zeros are retained in the scientific notation, then we can use sprintf instead of prettyNum.
Like this
replace_numbers = function(x, low=0.01, high=1e3, digits = 3) {
x = gsub(mark,'.',x)
x.num = as.numeric(x)
form = paste0('%.', digits, 'e')
ifelse(
(abs(x.num) >= low) & (abs(x.num) < high),
round(x.num, digits = digits),
sprintf(form, x.num)
)
}

Related

mutate function produces unexpected results for numeric column in table (huxtable), but not in dataframe

I am trying to learn how to produce pretty tables using the package huxtable. It's a learning curve, but so far I am really impressed. However, I have run into a few problems that I can't seem to solve.
Firstly, I am trying to format numbers so that there is comma separator for the thousands position (using the mutate_at function from the dplyr package, and prettyNum. It works well except that, for columns with class numeric, internal zeros are excised (e.g., 1001 becomes 1,1 instead of the desired 1,001). If the col class is integer, then the desired output is produced. Also, the correct output is produced if the input data is a dataframe rather than a huxtable, regardless of whether the column is numeric or integer.
Secondly, when I add other table formatting (in particular, a caption), the caption does not seem to be carried over when I write the table to a Word file. Additionally, a note is produced:
Note: zip::zip() is deprecated, please use zip::zipr() instead
Below is some example code that I think illustrates the issue.
My questions are:
1) Why does the mutate function produce the odd result for numeric column in huxtables, but not in data frames, and how can I ensure that it does work? I could, of course, do the number formatting before converting the dataframe to a table, but I'd still like to know what is going on here.
2) Why is the table formatting not preserved in the output file?
3) What does the note about using zipr mean, and could that issue it references also be responsible for the failure to export table properties?
Thanks,
Glenn
library(dplyr)
library(flextable)
library(huxtable)
test=data.frame(var1=1918:1925,var2=c(9009,1000:1006),var3 = 1100:1107)
str(test)
HUX <- hux(test)
number_format(HUX)
number_format(HUX[,2]) <- 0
# works as expected on data frame
mutate_at(test,-1,.funs=list(~prettyNum(.,big.mark=",")))
# does not work as expected on huxtable, for var2 of class numeric
mutate_at(HUX,-1,.funs=list(~prettyNum(., big.mark=",")))
# add caption, borders, and colnames
set_caption(HUX,"Example table") %>%
set_caption_pos("topleft") %>%
set_top_border(1,,1) %>%
set_bottom_border(final(1), , 1) %>%
add_colnames()
# write out the table (this produces a note about zipr)
quick_docx(HUX)
Re the note about using zipr: see https://github.com/awalker89/openxlsx/issues/454
Re mutate_at: your data is being transformed correctly, but huxtable is displaying it wrongly. It is recognising each number, before and after the comma, as separate. (Number recognition is hard, let's go shopping…) I would suggest using number_format instead of transforming the data directly:
number_format(HUX)[,2:3] <- list(function(x) prettyNum(x, big.mark=","))
Finally, your second problem has a simple solution: you are changing all of the features of HUX but you're not saving the result back to the original variable. Remember that R is a functional language, objects are very rarely modified in place. Add HUX <- to the start of your dplyr chain.

I want to be able to manipulate objects in class 'phylo' - ie. round/ turn my bootstrap values from decimals (.998) into percentages (99%)

I am using RStudio, programs ape and phytools. I've generated a tree with 500 bootstrap replicates stored in an object of class phylo.
Where cw is the name of my tree, I've tried the following:
round(cw, digits = 2)
and I get the following error message:
Error in round(cw, digits = 2) :
non-numeric argument to mathematical function
I feel like it's probably a very simple manipulation but I'm not sure how to get there.
Hard to tell without a reproducible example but I guess that your bootstrap scores are probably stored in the $node.label subset of your tree.
You can try the following:
## Are the bootstraps in the $node.label object?
if(!is.null(cw$node.label)) {
## Are they as character or numeric?
class(cw$node.label)
}
If they are numeric values:
cw$node.label <- round(cw$node.label, digits = 2)
If they are characters, you can probably coerce them (that can produce some NAs)
cw$node.label <- round(as.numeric(cw$node.label), digits = 2)

R Sweave: digits number in xtable of prop.table

I'm making an xtableFtable on R Sweave and can't find a way to suppress the digits with this code. What I am doing false? I've read that it can happen if your values aren't numeric but factor or character, but is prop.table making them non-numeric? I'm lost...
library(xtable)
a <- ftable(prop.table(table(mtcars$mpg, mtcars$hp), margin=2)*100)
b <- xtableFtable(a, method = "compact", digits = 0)
print.xtableFtable(b, rotate.colnames = TRUE)
I've already tried with digits=c(0,0,0,0...) too.
You could use options(digits) to control how many digits will print. Try something like options(digits = 4) as the first line of your code (change 4 to whatever value you want between 1 and 22). See ?options for more information.
Or round the values before printing
a = round(ftable(prop.table(table(mtcars$mpg, mtcars$hp), margin=2)*100), 2)
b = xtableFtable(a, method = "compact")
print.xtableFtable(b, rotate.colnames = TRUE)
The "digits" argument to xtableFtable seems to be unimplemented (as of my version, which is 1.8.3), since after playing around with it for half an hour nothing seems to make any difference.
There's a hint to this effect in the function documentation:
It is not recommended that users change the values of align, digits or align. First of all, alternative values have not been tested. Secondly, it is most likely that to determine appropriate values for these arguments, users will have to investigate the code for xtableFtable and/or print.xtableFtable.
It's probably just carried over from the xtable function (on which xtableFtable is surely based) as a TODO which the maintainer hasn't gotten around to yet.

A UPGMA cluster in R with NoData values

I have a matrix of sites. I want to develop a UPGMA aglomerative cluster. I want to use R and the vegan library for that. My matrix has sites in which not all the variables were measured.
Following a similar matrix of data:
Variable 1;Variable 2;Variable 3;Variable 4;Variable 5
0.5849774671338231;0.7962161133598957;0.3478909861199184;0.8027122599553912;0.5596553797833573
0.5904142034898171;0.18185393432022612;0.5503250366728479;NA;0.05657408486342197
0.2265148074206368;0.6345513807275411;0.8048128547418062;0.3303602674038131;0.8924461773052935
0.020429460126217602;0.18850489885886157;0.26412619465769416;0.8020472793070729;NA
0.006945970735023677;0.8404983401121199;0.058385134042814646;0.5750066564897788;0.737599672122899
0.9909722313946067;0.22356808747617019;0.7290078902086897;0.5621006367587756;0.3387823531518016
0.5932907022602052;0.899773235815933;0.5441346748937264;0.8045695319247985;0.6183003409599681
0.6520679140573288;0.5419713133237936;NA;0.7890033752744002;0.8561828607592286
0.31285906479192593;0.3396351688936058;0.5733594373520889;0.03867689654415574;0.1975784885854912
0.5045966366726562;0.6553489439611587;0.029929403932252963;0.42777351534900676;0.8787135401098227
I am planing to do it with the following code:
library(vegan)
# env <- read.csv("matrix_of_sites.csv")
env.norm <- decostand(env, method = "normalize") # Normalizing data here
env.ch <- vegdist(env.nom, method = "euclidean")
env.ch.UPGMA <- hclust(env.ch, method="average")
plot(env.ch.UPGMA)
After I run the second line, I get this error:
Error in x^2 : non-numeric argument to binary operator
I am not familiar with R, so I am not sure if this is due to the cells with no data. How can I solve this?
R does not think that data are numeric in your matrix, but at least some of them were interpreted as character variables and changed to factors. Inspect your data after reading int into R. If all your data are numbers, then sum(env) gives a numeric result. Use str() or summary() functions for detailed inspection.
From R's point of view, your data file has mixed formatting. R function read.csv assumes that items are separated by comma (,) and the decimal separator is period (.), and read.csv2 assumes that items are separated by colon (;) and decimal separator is comma ,. You mix these two conventions. You can read data formatted like that, but you may have to give both the sep and dec arguments.
If you get your data correctly in R, then decostand will stop with error: it does not accept missing values if you do not add na.rm = TRUE. The same also with the next vegdist command: it also needs na.rm = TRUE to analyse your data.

Loop through column names in Fixed Effects Regression

I am trying to code a fixed effects regression, but I have MANY dummy variables. Basically, I have 184 variables on the RHS of my equation. Instead of writing this out, I am trying to create a loop that will pass through each column (I have named each column with a number).
This is the code i have so far, but the paste is not working. I may be totally off base using paste, but I wasn't sure how else to approach this. However, I am getting an error (see below).
FE.model <- plm(avg.kw ~ 0 + (for (i in 41:87) {
paste("hour.dummy",i,sep="") + paste("dummy.CDH",i,sep="")
+ paste("dummy.MA",i,sep="") + paste("DR.variable",i,sep="")
}),
data = data.reg,
index=c('Site.ID','date.hour'),
model='within',
effect='individual')
summary(FE.model)
As an example for the column names, when i=41 the names should be "hour.dummy41" "dummy.CDH41", etc.
I'm getting the following error:
Error in paste("hour.dummy", i, sep = "") + paste("dummy.CDH", i, sep = "") : non-numeric argument to binary operator
So I'm not sure if it's the paste function that is not appropriate here, or if it's the loop. I can't seem to find a way to loop through column names easily in R.
Any help is much appreciated!
Ignoring worries about fitting a model with so many terms for the moment, you probably want to generate a string, and then cast it as a formula:
#create a data.frame where rows are the parts of the variable names, then collapse it
rhs <- do.call(paste, c(as.list(expand.grid(c("hour.dummy","dummy.CDH"), 41:87)), sep=".", collapse=" + "))
fml <- as.formula(sprintf ("avg.kw ~ %s"), rhs))
FE.model <-pml(flm, ...
I've only put in two of the 'dummy's in the second line- but you should get the idea

Resources