Truncate decimal to specified places - r

This seems like it should be a fairly easy problem to solve but I am having some trouble locating an answer.
I have a vector which contains long decimals and I want to truncate it to a specific number of decimals. I do not wish to round it, but rather just remove the values beyond my desired number of decimals.
For example I would like 0.123456789 to return 0.1234 if I desired 4 decimal digits. This is not an issue of printing a specific number of digits but rather returning the original value truncated to a given number.
Thanks.

trunc(x*10^4)/10^4
yields 0.1234 like expected.
More generally,
trunc <- function(x, ..., prec = 0) base::trunc(x * 10^prec, ...) / 10^prec;
print(trunc(0.123456789, prec = 4) # 0.1234
print(trunc(14035, prec = -2), # 14000

I used the technics above for a long time. One day I had some issues when I was copying the results to a text file and I solved my problem in this way:
trunc_number_n_decimals <- function(numberToTrunc, nDecimals){
numberToTrunc <- numberToTrunc + (10^-(nDecimals+5))
splitNumber <- strsplit(x=format(numberToTrunc, digits=20, format=f), split="\\.")[[1]]
decimalPartTrunc <- substr(x=splitNumber[2], start=1, stop=nDecimals)
truncatedNumber <- as.numeric(paste0(splitNumber[1], ".", decimalPartTrunc))
return(truncatedNumber)
}
print(trunc_number_n_decimals(9.1762034354551236, 6), digits=14)
[1] 9.176203
print(trunc_number_n_decimals(9.1762034354551236, 7), digits=14)
[1] 9.1762034
print(trunc_number_n_decimals(9.1762034354551236, 8), digits=14)
[1] 9.17620343
print(trunc_number_n_decimals(9.1762034354551236, 9), digits=14)
[1] 9.176203435
This solution is very handy in cases when its necessary to write to a file the number with many decimals, such as 16.
Just remember to convert the number to string before writing to the file, using format()
numberToWrite <- format(trunc_number_n_decimals(9.1762034354551236, 9), digits=20)

Not the most elegant way, but it'll work.
string_it<-sprintf("%06.9f", old_numbers)
pos_list<-gregexpr(pattern="\\.", string_it)
pos<-unlist(lapply(pos_list, '[[', 1)) # This returns a vector with the first
#elements
#you're probably going to have to play around with the pos- numbers here
new_number<-as.numeric(substring(string_it, pos-1,pos+4))

Related

Why a substraction has a different result if computed from a dataframe? in R

I have been checking my data and I realized that a substraction was wrong, and I computed it mannually and I get the correct result.
My question is why when I computed the substraction from the data.frame it gives a number .5 and when I computed it with just the numbers it gives me the correct results.
I tried to reproduce the problem in other computer and here is a toy example of this problem
a <- data.frame(matrix(nrow = 1, ncol = 4))
a[1,] <- c(101292229, 101298224, 101190466, 101195700)
colnames(a) <- c("start_I","end_I","start_II","end_II")
a$MI <- (a$start_I + a$end_I)/2
a$MII <- (a$start_II + a$end_II)/2
a
start_I end_I start_II end_II MI MII
1 101292229 101298224 101190466 101195700 101295226 101193083
a$MII-a$MI
[1] -102143.5
101193083-101295226
[1] -102143
As Lyngbakr pointed out, a$MI is actually 101295226.5, but only appears to be 101295226 due to the default printing settings cutting off the decimal place.
If you override those settings with format(), you'll see the extra .5 appear:
format(a, digits = 10)
start_I end_I start_II end_II MI MII
1 101292229 101298224 101190466 101195700 101295226.5 101193083
Try
as.integer(a$MII-a$MI)
to get the results that you are expecting

Concatenate multiple strings in R

I want to concatenate the below urls, I have written a below function to concatenate all the urls:
library(datetime)
library(lubridate)
get_thredds_url<- function(mon, hr){
a <-"http://abc.co.in/"
b <-"thredds/path/"
c <-paste0("%02d", ymd_h(mon))
d <-paste0(strftime(datetime_group, format="%Y%m%d%H"))
e <-paste0("/gfs.t%sz.pgrb2.0p25.f%03d",(c, hr))
url <-paste0(a,b,b,d)
return (url)
}
mon = datetime(2017, 9, 26, 0)
hr = 240
url = get_thredds_url(mon,hr)
print (url)
But I am getting below error when I execute the definition of get_thredds_url():
Error: unexpected ',' in:
" d<-paste0(strftime(datetime_group, format="%Y%m%d%H"))
e<-paste0("/gfs.t%sz.pgrb2.0p25.f%03d",(c,"
url <-paste0(a,b,b,d)
Error in paste0(a, b, b, d) : object 'a' not found
return (url)
Error: no function to return from, jumping to top level
}
Error: unexpected '}' in "}"
What is wrong with my function and how can I solve this?
The final output should be:
http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240
Using sprintf allows more control of values being inserted into string
library(lubridate)
get_thredds_url<- function(mon, hr){
sprintf("http://abc.co.in/thredds/path/%s/gfs.t%02dz.pgrb2.0p25.f%03d",
strftime(mon, format = "%Y%m%d%H", tz = "UTC"),
hour(mon),
hr)
}
mon <- make_datetime(2017, 9, 26, 0, tz = "UTC")
hr <- 240
get_thredds_url(mon, hr)
[1] "http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240"
It was a bit messy to figure out what it is, you're trying to do. There seem to be quite a couple of contradicting pieces in your code, especially compared to your wanted final output. Therefore, I decided to focus on the wanted output and the inputs you provided in your variables.
get_thredds_url <- function(yr, mnth, day, hrs1, hrs2){
part1 <- "http://abc.co.in/"
part2 <- "thredds/path/"
ymdh <- c(yr, formatC(c(mnth, day, hrs1), width=2, flag="0"))
part3 <- paste0(ymdh, collapse="")
pre4 <- formatC(hrs1, width=2, flag="0")
part4 <- paste0("/gfs.t", pre4, "z.pgrb2.0p25.f", hrs2)
return(paste0(part1, part2, part3, part4))
}
get_thredds_url(2017, 9, 26, 0, 240)
# [1] "http://abc.co.in/thredds/path/2017092600/gfs.t00z.pgrb2.0p25.f240"
The key is using paste0() appropriately and I think formatC() may be new to some people (including me).
formatC() is used here to pad zeros in front of the number you provide, and thus makes sure that 9 is converted to 09, whereas 12 remains 12.
Note that this answer is in base R and does not require additional packages.
Also note that you should not use url and c as variable names. These names are already reserved for other functionalities in R. By using them as variable names, you are overwriting their actual purpose, which can (will) lead to problems at some point down the road

Forcing R output to be scientific notation with at most two decimals

I would like to have consistent output for a particular R script. In this case, I would like all numeric output to be in scientific notation with exactly two decimal places.
Examples:
0.05 --> 5.00e-02
0.05671 --> 5.67e-02
0.000000027 --> 2.70e-08
I tried using the following options:
options(scipen = 1)
options(digits = 2)
This gave me the results:
0.05 --> 0.05
0.05671 --> 0.057
0.000000027 --> 2.7e-08
I obtained the same results when I tried:
options(scipen = 0)
options(digits = 2)
Thank you for any advice.
I think it would probably be best to use formatC rather than change global settings.
For your case, it could be:
numb <- c(0.05, 0.05671, 0.000000027)
formatC(numb, format = "e", digits = 2)
Which yields:
[1] "5.00e-02" "5.67e-02" "2.70e-08"
Another option is to use the scientific function from the scales library.
library(scales)
numb <- c(0.05, 0.05671, 0.000000027)
# digits = 3 is the default but I am setting it here to be explicit,
# and draw attention to the fact this is different than the formatC
# solution.
scientific(numb, digits = 3)
## [1] "5.00e-02" "5.67e-02" "2.70e-08"
Note, digits is set to 3, not 2 as is the case for formatC

Converting geo coordinates from degree to decimal

I want to convert my geographic coordinates from degrees to decimals, my data are as follows:
lat long
105252 30°25.264 9°01.331
105253 30°39.237 8°10.811
105255 31°37.760 8°06.040
105258 31°41.190 8°06.557
105259 31°41.229 8°06.622
105260 31°38.891 8°06.281
I have this code but I can not see why it is does not work:
convert<-function(coord){
tmp1=strsplit(coord,"°")
tmp2=strsplit(tmp1[[1]][2],"\\.")
dec=c(as.numeric(tmp1[[1]][1]),as.numeric(tmp2[[1]]))
return(dec[1]+dec[2]/60+dec[3]/3600)
}
don_convert=don1
for(i in 1:nrow(don1)){don_convert[i,2]=convert(as.character(don1[i,2])); don_convert[i,3]=convert(as.character(don1[i,3]))}
The convert function works but the code where I am asking the loop to do the job for me does not work.
Any suggestion is apperciated.
Use the measurements package from CRAN which has a unit conversion function already so you don't need to make your own:
x = read.table(text = "
lat long
105252 30°25.264 9°01.331
105253 30°39.237 8°10.811
105255 31°37.760 8°06.040
105258 31°41.190 8°06.557
105259 31°41.229 8°06.622
105260 31°38.891 8°06.281",
header = TRUE, stringsAsFactors = FALSE)
Once your data.frame is set up then:
# change the degree symbol to a space
x$lat = gsub('°', ' ', x$lat)
x$long = gsub('°', ' ', x$long)
# convert from decimal minutes to decimal degrees
x$lat = measurements::conv_unit(x$lat, from = 'deg_dec_min', to = 'dec_deg')
x$long = measurements::conv_unit(x$long, from = 'deg_dec_min', to = 'dec_deg')
Resulting in the end product:
lat long
105252 30.4210666666667 9.02218333333333
105253 30.65395 8.18018333333333
105255 31.6293333333333 8.10066666666667
105258 31.6865 8.10928333333333
105259 31.68715 8.11036666666667
105260 31.6481833333333 8.10468333333333
Try using the char2dms function in the sp library. It has other functions that will additionally do decimal conversion.
library("sp")
?char2dms
A bit of vectorization and matrix manipulation will make your function much simpler:
x <- read.table(text="
lat long
105252 30°25.264 9°01.331
105253 30°39.237 8°10.811
105255 31°37.760 8°06.040
105258 31°41.190 8°06.557
105259 31°41.229 8°06.622
105260 31°38.891 8°06.281",
header=TRUE, stringsAsFactors=FALSE)
x
The function itself makes use of:
strsplit() with the regex pattern "[°\\.]" - this does the string split in one step
sapply to loop over the vector
Try this:
convert<-function(x){
z <- sapply((strsplit(x, "[°\\.]")), as.numeric)
z[1, ] + z[2, ]/60 + z[3, ]/3600
}
Try it:
convert(x$long)
[1] 9.108611 8.391944 8.111111 8.254722 8.272778 8.178056
Disclaimer: I didn't check your math. Use at your own discretion.
Thanks for answers by #Gord Stephen and #CephBirk. Sure helped me out.
I thought I'd just mention that I also found that measurements::conv_unit doesn't deal with "E/W" "N/S" entries, it requires positive/negative degrees.
My coordinates comes as character strings "1 1 1W" and needs to first be converted to "-1 1 1".
I thought I'd share my solution for that.
df <- c("1 1 1E", "1 1 1W", "2 2 2N","2 2 2S")
measurements::conv_unit(df, from = 'deg_min_sec', to = 'dec_deg')
[1] "1.01694444444444" NA NA NA
Warning message:
In split(as.numeric(unlist(strsplit(x, " "))) * c(3600, 60, 1), :
NAs introduced by coercion
ewns <- ifelse( str_extract(df,"\\(?[EWNS,.]+\\)?") %in% c("E","N"),"+","-")
dms <- str_sub(df,1,str_length(df)-1)
df2 <- paste0(ewns,dms)
df_dec <- measurements::conv_unit(df2,
from = 'deg_min_sec',
to = 'dec_deg'))
df_dec
[1] "1.01694444444444" "-1.01694444444444" "2.03388888888889" "-2.03388888888889"
as.numeric(df_dec)
[1] 1.016944 -1.016944 2.033889 -2.033889
Have a look at the command degree in the package OSMscale.
As Jim Lewis commented before it seems your are using floating point minutes. Then you only concatenate two elements on
dec=c(as.numeric(tmp1[[1]][1]),as.numeric(tmp2[[1]]))
Having degrees, minutes and seconds in the form 43°21'8.02 which as.character() returns "43°21'8.02\"", I updated your function to
convert<-function(coord){
tmp1=strsplit(coord,"°")
tmp2=strsplit(tmp1[[1]][2],"'")
tmp3=strsplit(tmp2[[1]][2],"\"")
dec=c(as.numeric(tmp1[[1]][1]),as.numeric(tmp2[[1]][1]),as.numeric(tmp3[[1]]))
c<-abs(dec[1])+dec[2]/60+dec[3]/3600
c<-ifelse(dec[1]<0,-c,c)
return(c)
}
adding the alternative for negative coordinates, and works great for me . I still don't get why char2dms function in the sp library didn't work for me.
Thanks
Another less elegant option using substring instead of strsplit. This will only work if all your positions have the same number of digits. For negative co-ordinates just multiply by -1 for the correct decimal degree.
x$LatDD<-(as.numeric(substring(x$lat, 1,2))
+ (as.numeric(substring(x$lat, 4,9))/60))
x$LongDD<-(as.numeric(substring(x$long, 1,1))
+ (as.numeric(substring(x$long, 3,8))/60))

using hash to determine whether 2 dataframes are identical (PART 01)

I have created a dataset using WHO ATC/DDD Index a few months before and I want to make sure if the database online remains unchanged today, so I downloaded it again and try to use the digest package in R to do the comparison.
The two dataset (in txt format) can be downloaded here. (I am aware that you may think the files are unsafe and may have virus, but I don't know how to generate a dummy dataset to replicate the issue I have now, so I upload the dataset finally)
And I have written a little script as below:
library(digest)
ddd.old <- read.table("ddd.table.old.txt",header=TRUE,stringsAsFactors=FALSE)
ddd.new <- read.table("ddd.table.new.txt",header=TRUE,stringsAsFactors=FALSE)
ddd.old[,"ddd"] <- as.character(ddd.old[,"ddd"])
ddd.new[,"ddd"] <- as.character(ddd.new[,"ddd"])
ddd.old <- data.frame(ddd.old, hash = apply(ddd.old, 1, digest),stringsAsFactors=FALSE)
ddd.new <- data.frame(ddd.new, hash = apply(ddd.new, 1, digest),stringsAsFactors=FALSE)
ddd.old <- ddd.old[order(ddd.old[,"hash"]),]
ddd.new <- ddd.new[order(ddd.new[,"hash"]),]
And something really interesting happens when I do the checking:
> table(ddd.old[,"hash"]%in%ddd.new[,"hash"]) #line01
TRUE
506
> table(ddd.new[,"hash"]%in%ddd.old[,"hash"]) #line02
TRUE
506
> digest(ddd.old[,"hash"])==digest(ddd.new[,"hash"]) #line03
[1] TRUE
> digest(ddd.old)==digest(ddd.new) #line04
[1] FALSE
line01 and line02 shows that every rows in ddd.old can be found in ddd.new, and vice versa.
line03 shows that the hash column for both dataframe are the same
line04 shows that the two dataframe are different
What happen? Both dataframe with the identical rows (from line01 and line02), same order (from line03), but are different? (from line04)
Or do I have any misunderstanding about digest? Thanks.
Read in data as before.
ddd.old <- read.table("ddd.table.old.txt",header=TRUE,stringsAsFactors=FALSE)
ddd.new <- read.table("ddd.table.new.txt",header=TRUE,stringsAsFactors=FALSE)
ddd.old[,"ddd"] <- as.character(ddd.old[,"ddd"])
ddd.new[,"ddd"] <- as.character(ddd.new[,"ddd"])
Like Marek said, start by checking for differences with all.equal.
all.equal(ddd.old, ddd.new)
[1] "Component 6: 4 string mismatches"
[2] "Component 8: 24 string mismatches"
So we just need to look at columns 6 and 8.
different.old <- ddd.old[, c(6, 8)]
different.new <- ddd.new[, c(6, 8)]
Hash these columns.
hash.old <- apply(different.old, 1, digest)
hash.new <- apply(different.new, 1, digest)
And find the rows where they don't match.
different_rows <- which(hash.old != hash.new) #which is optional
Finally, combine the datasets.
cbind(different.old[different_rows, ], different.new[different_rows, ])

Resources