Can I force R to use regular numbers instead of using the e+10-like notation? I have:
1.810032e+09
# and
4
within the same vector and want to see:
1810032000
# and
4
I am creating output for an old fashioned program and I have to write a text file using cat.
That works fine so far but I simply can't use the e+10 notation there.
This is a bit of a grey area. You need to recall that R will always invoke a print method, and these print methods listen to some options. Including 'scipen' -- a penalty for scientific display. From help(options):
‘scipen’: integer. A penalty to be applied when deciding to print
numeric values in fixed or exponential notation. Positive
values bias towards fixed and negative towards scientific
notation: fixed notation will be preferred unless it is more
than ‘scipen’ digits wider.
Example:
R> ran2 <- c(1.810032e+09, 4)
R> options("scipen"=-100, "digits"=4)
R> ran2
[1] 1.81e+09 4.00e+00
R> options("scipen"=100, "digits"=4)
R> ran2
[1] 1810032000 4
That said, I still find it fudgeworthy. The most direct way is to use sprintf() with explicit width e.g. sprintf("%.5f", ran2).
It can be achieved by disabling scientific notation in R.
options(scipen = 999)
My favorite answer:
format(1810032000, scientific = FALSE)
# [1] "1810032000"
This gives what you want without having to muck about in R settings.
Note that it returns a character string rather than a number object
Put options(scipen = 999) in your .Rprofile file so it gets auto-executed by default. (Do not rely on doing it manually.)
(This is saying something different to other answers: how?
This keeps things sane when you thunk between multiple projects, multiple languages on a daily or monthly basis. Remembering to type in your per-project settings is error-prone and not scalable. You can have a global ~/.Rprofile or per-project .Rprofile. Or both, with the latter overriding the former.
Keeping all your config in a project-wide or global .Rprofile auto-executes it. This is useful for e.g. default package loads, data.table configuration, environment etc. Again, that config can run to a page of settings, and there's zero chance you'll remember those and their syntax and type them in
Related
I have been trying to read a file which has date field and a numeric field. I have the data in an excel sheet and looks something like below -
Date X
1/25/2008 0.0023456
12/23/2008 0.001987
When I read this in R using the readxl::read_xlsx function, the data in R looks like below -
Date X
1/25/2008 0.0023456000000000
12/23/2009 0.0019870000000000
I have tried limiting the digits using functions like round, format (nsmall = 7), etc. but nothing seems to work. What am I doing wrong? I also tried saving the data as a csv and a txt and read it using read.csv and read.delim but I face the same issue again. Any help would be really appreciated!
As noted in the comments to the OP and the other answer, this problem is due to the way floating point math is handled on the processor being used to run R, and its interaction with the digits option.
To illustrate, we'll create an Excel spreadsheet with the data from the OP, and demonstrate what happens as we adjust the options(digits=) option.
Next, we'll write a short R script to illustrate what happens when we adjust the digits option.
> # first, display the number of significant digits set in R
> getOption("digits")
[1] 7
>
> # Next, read data file from Excel
> library(xlsx)
>
> theData <- read.xlsx("./data/smallNumbers.xlsx",1,header=TRUE)
>
> head(theData)
Date X
1 2008-01-25 0.0023456
2 2008-12-23 0.0019870
>
> # change digits to larger number to replicate SO question
> options(digits=17)
> getOption("digits")
[1] 17
> head(theData)
Date X
1 2008-01-25 0.0023456000000000002
2 2008-12-23 0.0019870000000000001
>
However, the behavior of printing significant digits varies by processor / operating system, as setting options(digits=16) results in the following on a machine running an Intel i7-6500U processor with Microsoft Windows 10:
> # what happens when we set digits = 16?
> options(digits=16)
> getOption("digits")
[1] 16
> head(theData)
Date X
1 2008-01-25 0.0023456
2 2008-12-23 0.0019870
>
library(formattable)
x <- formattable(x, digits = 7, format = "f")
or you may want to add this to get the default formatting from R:
options(defaultPackages = "")
then, restart your R.
Perhaps the problem isn't your source file as you say this happens with .csv and .txt as well.
Try checking to see the current value of your display digits option by running options()$digits
If the result is e.g. 14 then that is likely the problem.
In which case, try running r command options(digits=8) which will set the display digits=8 for the session.
Then, simply reprint your dataframe to see the change has already taken effect with respect to how the decimals are displayed by default to the screen.
Consult ?options for more info about digits display setting and other session options.
Edit to improve original answer and to clarify for future readers:
Changing options(digits=x) either up or down does not change the value that is stored or read into into internal memory for floating point variables. The digits session option merely changes how the floating point values print i.e. display on the screen for common print functions per the '?options` documentation:
digits: controls the number of significant digits to print when printing numeric values.
What the OP showed as the problem he was having (R displaying more decimals after last digit in a decimal number than the OP expected to see) was not caused by the source file having been read from Excel - i.e. given the OP had the same problem with CSV and TXT the import process didn't cause a problem.
If you are seeing more decimals than you want by default in your printed/displayed output (e.g. for dataframes and numeric variables) try checking options()$digits and understand that option is simply the default for the number of digits used by R's common display and printing methods. HOWEVER, it does not affect floating point storage on any of your data or variables.
Regarding floating point numbers though, another answer here shows how setting option(digits=n) higher than the default can help demonstrate some precision/display idiosyncrasies that are related to floating point precision. That is a separate problem to what the OP displayed in his example but it's well worth understanding.
For a much more detailed and topic specific discussion of floating point precision than would be appropriate to rehash here, it's well worth reading this definitive SO question+answer: Why are these numbers not equal?
That other question+answer+discussion covers issues specifically around floating point precision and contains a long, well presented list of references that you will find helpful if you need more information on the subject.
Piston_Rings<-diameter[1:25,]
I want my quality control graph NOT to have the underscore in the object name.
At the moment there is an underscore (not a hyphen) in that object name. It is possible to construct objects whose names have spaces in them but in order to access them you will then always need to use backticks in order to get the interpreter to understand what you want:
> `Piston Rings` <- list(1,2)
> `Piston Rings`[[1]]
[1] 1
> `Piston Rings`[[2]]
[1] 2
The problem you incur is cluttering up your code, at least relative to obeying the usual conventions in R where a space is a token-ending marker to the parser. Hyphens (at least short-hyphens) are actually minus signs.
If on the other hand you only want to use a modified version of a name that contains an underscore as the title for a graph, then try something like this:
Piston_Rings <- list() # just for testing purposes so there will be an object.
plot( 1:10,10:1, main = sub("_", " ", quote(Piston_Rings)) )
#BondedDust's answer is correct, but (guessing, since you haven't been very specific) a simpler way to get what you want is just to specify xlab or ylab arguments to the plot() function. Let's say you have variables stuff (x) and Piston_Rings (y). If you just
plot(stuff,Piston_Rings)
then the plot will have "Piston_Rings" as the y-axis label. But if you
plot(stuff,Piston_Rings,ylab="Piston Rings")
you'll get the label you want. You can also include lots more information this way:
plot(stuff,Piston_Rings,
xlab="Important stuff (really)",
ylab="Piston Rings (number per segment)")
See ?plot.default for many more options.
when displaying a number with inline-code with more than four digits like
`r 21645`
the result in a knitted html-file is this: 2.164510^{4} (in reality inside the inline-hook there is a calculation going on which results in 21645). Even though I just want it to print the number, like so: 21645. I can easily fix this for one instance wrapping it inside as.integer or format or print, but how do I set an option for the whole knitr-document so that it prints whole integers as such (all I need is to print 5 digits)? Doing this by hand gets very annoying. Setting options(digits = 7) doesnt help. I am guessing I would have to set some chunk-optionor define a hook, but I have no idea how
I already solved it, just including the following line of code inside the setoptions-chunk in the beginning of a knitr document:
options(scipen=999)
resolves the issue, like one can read inside this answer from #Paul Hiemstra:
https://stackoverflow.com/a/25947542/4061993
from the documentation of ?options:
scipen: integer. A penalty to be applied when deciding to print
numeric values in fixed or exponential notation. Positive values bias
towards fixed and negative towards scientific notation: fixed notation
will be preferred unless it is more than scipen digits wider.
If you don't want to display scientific notation in this instance, but also don't want to disable it completely for your knitr report, you can use format() and set scientific=FALSE:
`r format(21645, scientific=FALSE)`
Note that if you type your numeric as integer it will be well formatted:
`r 21645L`
Of course you can always set an inline hook for more flexibility( even it is better to set globally options as in your answer):
```{r}
inline_hook <- function(x) {
if (is.numeric(x)) {
format(x, digits = 2)
} else x
}
knitr::knit_hooks$set(inline = inline_hook)
```
When editing an Sweave document in LaTeX (using the Noweb mode), Emacs knows to "ignore" code that is in <<>>= blocks. However, for interstitial \Sexpr{} blocks, this isn't the case. Given that R references by columns via '$' and LaTeX uses $ to set-off equations, these \Sexpr{} blocks often break the syntax highlighting, like so:
I have a very rudimentary understanding the elisp & Emacs syntax highlighting, but my hope is that it might be possible to add something to .emacs that will disable any parsing/$ detection within \Sexpr{}'s.
I thought emacs with ESS has correct syntax highlighting for Sweave?
Anyway, the easiest "fix" is to just not use the $ operator but [[ instead. For example:
foo$p.value
foo[['p.value']]
Should give the same result. I think foo$p.value is just short for foo[["p.value",exact=FALSE]]
I don't have a fix either, but I'll pass along my workaround, which is to never (well, rarely) do any processing in \Sexpr chunks but instead to store things I want to use in \Sexpr in variables, and to do so in the same chunk I do the main calculations in.
<<echo=FALSE, results=hide>>=
t1 <- chisq.test(someVar)
p1 <- formatC(t1$p.value, format="%f", digits=2)
#
\dots with a $p$-value of \Sexpr{p1}.
While there are some downsides to this, I find it helps me to better keep track of what I want to present, and how I want to present it.
As an aside, consider using formatC instead of round as it can keep significant zeros (ie, 0.10 instead of 0.1).
I have no good answer for you as I am not an Emacs hacker, so I usually do one of two things:
Either add a simple % $ comment at the of the line to "close" the math expression from $ to $,
Or rewrite the expression to not use $-based subsetting:
round(as.numeric(chisq.test(someVar)["p.value"]), 2).
I would like to set column widths (for all the 3 columns) in this data set, as: anim=1-10; sireid=11-20; damid=21-30. Some columns have missing values.
anim=c("1A038","1C467","2F179","38138","030081")
sireid=c("NA","NA","1W960","1W960","64404")
damid=c("NA","NA","1P119","1P119","63666")
mydf=data.frame(anim,sireid,damid)
From reading your question as well as your comments to previous answers, it seems to me that you are trying to create a fixed width file with your data. If this is the case, you can use the function write.fwf in package gdata:
Load the package and create a temporary output file:
library(gdata)
ff <- tempfile()
Write your data in fixed width format to the temporary file:
write.fwf(mydf, file=ff, width=c(10,10,10), colnames=FALSE)
Read the file with scan and print the results (to demonstrate fixed width output):
zz <- scan(ff, what="character", sep="\n")
cat(zz, sep="\n")
1A038 NA NA
1C467 NA NA
2F179 1W960 1P119
38138 1W960 1P119
030081 64404 63666
Delete the temporary file:
unlink(ff)
You can also write fixed width output for numbers and strings using the sprintf() function, which derives from C's counterpart.
For instance, to pad integers with 0s:
sprintf("%012d",99)
To pad with spaces:
sprintf("%12d",123)
And to pad strings:
sprintf("%20s","hello world")
The options for formatting are found via ?sprintf and there are many guides to formatting C output for fixed width.
It sounds like you're coming from a SAS background, where character variables should have explicit lengths specified to avoid unexpected truncations. In R, you don't need to worry about this. A character string has exactly as many characters as it needs, and automatically expands and contracts as its contents change.
One thing you should be aware of, though, is silent conversion of character variables to factors in a data frame. However, unless you change the contents at a later point in time, you should be able to live with the default.