So I have a big function that runs a MCMC algorithm. I believe most of the
expensive operations are multiplications of large matrices, but this Rprof output is rather perplexing.
$by.self
self.time self.pct total.time total.pct
"<Anonymous>" 328.90 81.84 329.34 81.95
"fprod" 46.16 11.49 376.02 93.57
"Dikin_Walk" 7.42 1.85 401.32 99.86
"as.vector" 5.98 1.49 57.56 14.32
".External" 2.54 0.63 2.54 0.63
"-" 1.84 0.46 1.84 0.46
"H_x" 1.16 0.29 225.82 56.19
"fcrossprod" 1.14 0.28 226.12 56.27
Edit: Here are the 3 functions which I define within my big wrapper function:
## first, augment A | b
A_b <- cbind (b, A)
## H(x) is the hessian
H_x <- function(x) {
D <- as.vector(1/(A_b[,1] - fprod(A_b[,-1], x)))
D_squared <- fdiag(D^2)
return(fcrossprod(A, fprod(D_squared, A)))
}
## D(x) is the diagonalized matrix of the log-barrier function of Ax <= b
D_x <- function(x) {
D <- as.vector(1/(A_b[,1] - fprod(A_b[,-1], x)))
return(fdiag(D))
}
## checks whether a point z is in Ellip(x)
ellipsoid <- function(z, x) {
## as.numeric converts the expression into an atom, so we get boolean
return( as.numeric(fcrossprod(z-x, fprod(H_x(x), (z-x)))) <= r^2)
}
The fdiag , fcrossprod, and fprod are all RcppArmEigen versions of their R counterparts. I used them because they are substantially faster.
The main algorithm:
> for (i in 1:n) {
>
> zeta <- rnorm(length(b), 0, 1)
> zeta <- r * zeta / sqrt(as.numeric(fcrossprod(zeta,zeta)))
>
> rhs <- fcrossprod(A, fprod(D_x(current.point), zeta))
>
> ## DONE
>
> y <- fprod(fsolve(H_x(current.point)), rhs)
> y <- y + current.point
>
>
> while(!ellipsoid(current.point, y)) {
> zeta <- rnorm(length(b), 0, 1)
>
> ## normalise to be on the m- unit sphere
> ## and then compute lhs as a m-vector
> zeta <- r * zeta / sqrt(sum(zeta * zeta))
>
>
> rhs <- fcrossprod(A, fprod(D_x(current.point), zeta))
>
> ##
> y <- fprod(fsolve(H_x(current.point)), rhs)
> y <- y + current.point
>
>
> if(ellipsoid(current.point, y)) {
>
> probability <- min(1, sqrt(fdet(fprod(fsolve(H_x(current.point)),H_x(y)) )))
>
>
> bool <- sample(c(TRUE, FALSE), 1, prob = c(probability, 1-?>probability))
> if(bool) {
> break
> }
> }
> }
And here is the by.total output:
$by.total
total.time total.pct self.time self.pct
"Dikin_Walk" 401.32 99.86 7.42 1.85
"fprod" 376.02 93.57 46.16 11.49
"<Anonymous>" 329.34 81.95 328.90 81.84
"cbind" 268.58 66.83 0.04 0.01
"fcrossprod" 226.12 56.27 1.14 0.28
"H_x" 225.82 56.19 1.16 0.29
"fsolve" 203.82 50.72 0.14 0.03
"ellipsoid" 126.30 31.43 0.56 0.14
"fdet" 64.84 16.13 0.02 0.00
"as.vector" 57.56 14.32 5.98 1.49
"fdiag" 35.68 8.88 0.50 0.12
fprod is defined as:
prodCpp <- 'typedef Eigen::Map<Eigen::MatrixXd> MapMatd;
const MapMatd B(as<MapMatd>(BB));
const MapMatd C(as<MapMatd>(CC));
return wrap(B * C);'
fprod <- cxxfunction(signature(BB = "matrix", CC = "matrix"),
prodCpp, "RcppEigen")
<Anonymous> refers to an anonymous (unnamed) function. If you are running such a function in a loop, most of the time will typically be spent in this function.
Apparently A_b is a matrix and x a vector. Use matrix algebra instead of a loop:
A_b <- matrix(1:16, 4)
x <- 1:3
D <- apply(A_b, 1, function(row) {1 / (row[1] - sum(row[-1] * x))})
D1 <- as.vector(1/(A_b[,1] - A_b[,-1] %*% x))
identical (D, D1)
#[1] TRUE
Edit:
The anonymous function is in the Rcpp magic of fprod:
B <- matrix(rnorm(1e6),1e3)
C <- matrix(rnorm(1e6),1e3)
Rprof()
for (i in 1:30) BC <- fprod(B, C)
Rprof(NULL)
summaryRprof()
#$by.self
# self.time self.pct total.time total.pct
#"<Anonymous>" 4.24 100 4.24 100
#
#$by.total
# total.time total.pct self.time self.pct
#"<Anonymous>" 4.24 100 4.24 100
#"fprod" 4.24 100 0.00 0
#
#$sample.interval
#[1] 0.02
#
#$sampling.time
#[1] 4.24
Most of your time is spent with matrix multiplication. You might benefit from an optimized BLAS, e.g., you could try OpenBLAS.
First of all, ignore "self time", because "total time" is inclusive of that plus callees.
If you are spending any time that you don't need to, you are far more likely to be doing it by calling functions than by crunching.**
Second, don't even look at that.
Rprofile produces a file of stack traces.
Just look at several of those, selected at random.
If a function is responsible for 80% of time, you will see it on roughly 4 out of 5 of stack traces.
What's more, you will see who is calling it, and you will see who it is calling, to cause that time to be spent.
Simple numbers do not tell you that.
Sorting the stack traces also does not tell you that.
It would be even better if it gave line numbers at which the calls were made, but it doesn't.
Even so, just showing the functions is still pretty useful.
** Profilers only display "self time" because they always have, and because all the others do it, and few people have woken up to the fact that it's just a distraction. If a function is at the terminus of a stack trace, it's in "self time". Either way it's in "inclusive time".
Related
I wrote the following piece of code to find all permutations of a given vector:
perm <- function(v, r = NULL, P = NULL) {
l <- length(v)
if (l == 0) {
P <- rbind(P, r)
rownames(P) <- NULL
P
} else {
for (i in 1:l) {
new_r <- c(r, v[i])
new_v <- v[-i]
P <- perm(new_v, new_r, P)
}
P
}
}
P <- perm(1:9) # takes "forever" yet e.g. perm(1:7) is quite fast!?!
P
It does what it should but the problem is that it kind of runs forever if one uses vectors of length > 8 (as above).
My question
I don't really see the problem, I found some recursive implementations that don't look so different yet are much more efficient... So is there a simple way to optimize the code so that it runs faster?
As #akrun states, recursion in R is generally not that efficient. However, if you must have a recursive solution, look no further than gtools::permutations. Here is the implementation:
permGtools <- function(n, r, v) {
if (r == 1)
matrix(v, n, 1)
else if (n == 1)
matrix(v, 1, r)
else {
X <- NULL
for (i in 1:n) X <- rbind(X, cbind(v[i], permGtools(n - 1, r - 1, v[-i])))
X
}
}
By the way, to get the full source code, simply type gtools::permutations in the console and hit enter. For more information see How can I view the source code for a function?
And here are some timings:
system.time(perm(1:8))
user system elapsed
34.074 10.641 44.815
system.time(permGtools(8,8,1:8))
user system elapsed
0.253 0.001 0.255
And just for good measure:
system.time(permGtools(9, 9, 1:9))
user system elapsed
2.512 0.046 2.567
Why is the OP's implementation slower?
Skip to the summary if you don't to read the details.
For starters, we can simply see that the OP's implementation makes more recursive calls than the implementation in gtools. To show this, we add count <<- count + 1L to the top of each function (N.B. We are using the <<- assignment operator which searches through the parent environments first). E.g:
permGtoolsCount <- function(n, r, v) {
count <<- count + 1L
if (r == 1)
.
.
And now we test a few lengths:
iterationsOP <- sapply(4:7, function(x) {
count <<- 0L
temp <- permCount(1:x)
count
})
iterationsOP
[1] 65 326 1957 13700
iterationsGtools <- sapply(4:7, function(x) {
count <<- 0L
temp <- permGtoolsCount(x, x, 1:x)
count
})
iterationsGtools
[1] 41 206 1237 8660
As you can see, the OP's implementation makes more calls in every case. In fact, it makes about 1.58... times the amount of recursive calls.
iterationsOP / iterationsGtools
[1] 1.585366 1.582524 1.582053 1.581986
As we have stated already, recursion in R has a bad reputation. I couldn't find anything pinpointing exactly why this is the case other than R does not employ tail-recursion.
At this point, it seems hard to believe that making about 1.58 times more recursive calls would explain the 175 times speed up we saw above (i.e. 44.815 / 0.255 ~= 175).
We can profile the code with Rprof in order to glean more information:
Rprof("perm.out", memory.profiling = TRUE)
a1 <- perm(1:8)
Rprof(NULL)
summaryRprof("perm.out", memory = "both")$by.total
total.time total.pct mem.total self.time self.pct
"perm" 43.42 100.00 15172.1 0.58 1.34
"rbind" 22.50 51.82 7513.7 22.50 51.82
"rownames<-" 20.32 46.80 7388.7 20.30 46.75
"c" 0.02 0.05 23.7 0.02 0.05
"length" 0.02 0.05 0.0 0.02 0.05
Rprof("permGtools.out", memory.profiling = TRUE)
a2 <- permGtools(8, 8, 1:8)
Rprof(NULL)
summaryRprof("permGtools.out", memory = "tseries")$by.total
total.time total.pct mem.total self.time self.pct
"rbind" 0.34 100.00 134.8 0.18 52.94
"cbind" 0.34 100.00 134.8 0.08 23.53
"permGtools" 0.34 100.00 134.8 0.06 17.65
"matrix" 0.02 5.88 0.0 0.02 5.88
One thing that jumps out immediately (other than the time) is the huge memory usage of the OP's implementation. The OP's implementation uses roughly 15 Gb of memory whereas the gtools implementation only use 134 Mb.
Digging Deeper
In the above, we are simply looking at memory usage in a general view by setting the memory parameter to both. There is another setting called tseries that lets you look at the memory usage over time.
head(summaryRprof("perm.out", memory = "tseries"))
vsize.small vsize.large nodes duplications stack:2
0.02 4050448 25558992 49908432 2048 "perm":"perm"
0.04 98808 15220400 1873760 780 "perm":"perm"
0.06 61832 12024184 1173256 489 "perm":"perm"
0.08 45400 0 861728 358 "perm":"perm"
0.1 0 14253568 0 495 "perm":"perm"
0.12 75752 21412320 1436120 599 "perm":"perm"
head(summaryRprof("permGtools.out", memory = "tseries"))
vsize.small vsize.large nodes duplications stack:2
0.02 4685464 39860824 43891512 0 "permGtools":"rbind"
0.04 542080 552384 12520256 0 "permGtools":"rbind"
0.06 0 0 0 0 "permGtools":"rbind"
0.08 767992 1200864 17740912 0 "permGtools":"rbind"
0.1 500208 566592 11561312 0 "permGtools":"rbind"
0.12 0 151488 0 0 "permGtools":"rbind"
There is a lot going on here, but the thing to focus on is the duplications field. From the documentation for summaryRprof we have:
It also records the number of calls to the internal function duplicate in the time interval. duplicate is called by C code when arguments need to be copied.
Comparing the number of copies in each implementation:
sum(summaryRprof("perm.out", memory = "tseries")$duplications)
[1] 121006
sum(summaryRprof("permGtools.out", memory = "tseries")$duplications)
[1] 0
So we see that the OP's implementation requires many copies to be made. I guess this isn't surprising given that the desired object is a parameter in the function prototype. That is, P is the matrix of permutations that is to be returned and is constantly getting larger and larger with each iteration. And with each iteration, we are passing it along to perm. You will notice in the gtools implementation that this is not the case as it simply as two numeric values and a vector for its parameters.
Summary
So there you have it, the OP's original implementation not only makes more recursive calls, but also require many copies which in turn bogs down the memory for drastic blows to efficiency.
It may be better to use permGeneral from RcppAlgos
P <- perm(1:5) # OP's function
library(RcppAlgos)
P1 <- permuteGeneral(5, 5)
all.equal(P, P1, check.attributes = FALSE)
#[1] TRUE
Benchmarks
On a slightly longer sequence
system.time({
P2 <- permuteGeneral(8, 8)
})
#user system elapsed
# 0.001 0.000 0.001
system.time({
P20 <- perm(1:8) #OP's function
})
# user system elapsed
# 31.254 11.045 42.226
all.equal(P2, P20, check.attributes = FALSE)
#[1] TRUE
Generally, recursive function can take longer time as recursive calls to the function takes more execution time
Background
In R this works:
> df <- data.frame(a=numeric(), b=numeric())
> rbind(df, list(a=1, b=2))
a b
1 1 2
But if I want the list to have a vector, rbind fails:
> df <- data.frame(a=numeric(), b=vector(mode="numeric"))
> rbind(df, list(a=1, b=c(2,3)))
Error in rbind(deparse.level, ...) :
invalid list argument: all variables should have the same length
And if I try to specify the vector length, declaring the dataframe fails:
> df <- data.frame(a=numeric(), b=vector(mode="numeric", length=2))
Error in data.frame(a = numeric(), b = vector(mode = "numeric", length = 2)) :
arguments imply differing number of rows: 0, 2
Finally, if I eschew declaring the dataframe and try rbind two lists directly, it looks like everything is working, but the datatypes are all wrong, and none of the columns appear to exist.
> l1 <- list(a=1, b=c(2,3))
> l2 <- list(a=10, b=c(20,30))
> obj <- rbind(l1, l2)
> obj
a b
l1 1 Numeric,2
l2 10 Numeric,2
> typeof(obj)
[1] "list"
> obj$a
NULL
> obj$b
NULL
> names(obj)
NULL
My setup
I have a embedded device that gathers data every 50ms and spits out a packet of data. In my script, I'm parsing a waveform that represents the states of that process (process previous frame and transmit, gather new data, dead time where nothing happens) with a state machine. For each packet I'm calculating the duration of the process period, the gathering data period which is subdivided into 8 or 16 acquisition cycles, where I calculate the time of each acquisition cycle, and the remaining dead time.
My list basically looks like `list(process=#, cycles=c(#,#,#,#), deadtime=#). Different packet types have different cycle lengths, so I pass that in as a parameter and I want the script to work on any packet time.
My question
Is there a way to declare a dataframe that does what I want, or am I using R in a fundamentally wrong way and I should break each cycle into it's own list element? I was hoping to avoid the latter as it will make treating the cycles as a group more difficult.
I will note that I've just started learning R so I'm probably doing some odd things with it.
Expected output
If I were to process 4 packets worth of signal with 3 acq. cycles each, this would be my ideal output:
df <- data.frame(processTime=numeric(), cyles=???, deadtime=numeric())
df <- rbind(df, list(processTime=0.05, cycles=c(0.08, 0.10, 0.07), deadtime=0.38)
etc...
processTime cycles deadtime
1 0.05 0.08 0.10 0.07 0.38
2 0.06 0.07 0.11 0.09 0.36
3 0.07 0.28 0.11 0.00 0.00
4 0.06 0.08 0.08 0.09 0.41
I'll take a different stab. Dealing with just your first 2 records.
processTime<-c(.05,.06)
cycles<-list(list(.08,.10,.07), list(.07,.09,.38))
deadtime<-c(.38,.36)
For cycles, we created a list element with a list that contains 3 elements in it. So cycles[[1]][1] would refer to .08, and cycles[[1]][2] would refer second element of the first list and cycles[[2]][3] would refer to the 3rd item in the second list.
If we use cbind to bind these we get the following:
test<-as.data.frame(cbind(processTime,cycles,deadtime))
test
processTime cycles deadtime
1 0.05 0.08, 0.10, 0.07 0.38
2 0.06 0.07, 0.09, 0.38 0.36
test$cycles[[1]] will return first list
test$cycles[[1]]
[[1]]
[[1]][[1]]
[1] 0.08
[[1]][[2]]
[1] 0.1
[[1]][[3]]
[1] 0.07
Whereas the 3rd element of the second list can be called with:
test$cycles[[2]][3]
[[1]]
[1] 0.38
You can also unlist later for calculations:
unlist(test$cycles[[2]])
[1] 0.07 0.09 0.38
To do this iteratively as you requested.
test<-data.frame()
processTime<-c(.05)
cycles<-list(list(.08,.10,.07))
deadtime<-c(.38)
test<-as.data.frame(cbind(processTime,cycles,deadtime))
test
processTime cycles deadtime
1 0.05 0.08, 0.10, 0.07 0.38
processTime<-c(.06)
cycles<-list(list(.07,.09,.38))
deadtime<-c(.36)
test<- rbind(test,as.data.frame(cbind(processTime,cycles,deadtime)))
test
processTime cycles deadtime
1 0.05 0.08, 0.10, 0.07 0.38
2 0.06 0.07, 0.09, 0.38 0.36
I have a data frame of 3 points in space represented by their longitude and latitute:
myData <- structure(list(lng = c(-37.06852042, -37.07473406, -37.07683313
), lat = c(-11.01471746, -11.02468103, -11.02806217)), .Names = c("lng",
"lat"), row.names = c(NA, 3L), class = "data.frame")
Next, I use the geosphere package to get a distance matrix (in meters, which I convert to km) for the points:
> m <- round(distm(myData)/1000,2)
> rownames(m) <- c("A", "B", "C")
> colnames(m) <- c("A", "B", "C")
> m
A B C
A 0.00 1.30 1.74
B 1.30 0.00 0.44
C 1.74 0.44 0.00
Given this is a distance matrix and I have 6 ways of going to A, B and C (like A -> B -> C, C -> A >-B, and so on), I would like to extract some information from it, like the minimum, the median, and the maximum distance.
To illustrate it, I calculated all the possible ways of my example manually:
ways <- c(abc <- 1.3 + 0.44,
acb <- 1.74 + 0.44,
bac <- 1.3 + 1.74,
bca <- 0.44 + 1.74,
cab <- 1.74 + 1.3,
cba <- 0.44 + 1.3)
> min(ways)
[1] 1.74
> median(ways)
[1] 2.18
> max(ways)
[1] 3.04
How do I automate this task, given that I'll be working with more than 10 locals and this problem has factorial complexity?
I wrote a package called trotter that maps integers to different arrangement types (permutations, combinations and others). For this problem, it seems that you are interested in the permutations of locations. One of the objects in the package is the permutation pseudo-vector that is created using the function ppv.
First install "trotter":
install.packages("trotter")
Then an automated version of your task might look something like:
library(geosphere)
myData <- data.frame(
lng = c(-37.06852042, -37.07473406, -37.07683313),
lat = c(-11.01471746, -11.02468103, -11.02806217)
)
m <- round(distm(myData) / 1000, 2)
locations <- c("A", "B", "C")
rownames(m) <- colnames(m) <- locations
library(trotter)
perms <- ppv(k = length(locations), items = locations)
ways <- c()
for (i in 1:length(perms)) {
perm <- perms[i]
route <- paste(perm, collapse = "")
ways[[route]] <- sum(
sapply(
1:(length(perm) - 1),
function(i) m[perm[i], perm[i + 1]]
)
)
}
Back in the R console:
> ways
ABC ACB CAB CBA BCA BAC
1.74 2.18 3.04 1.74 2.18 3.04
> # What is the minimum route length?
> min(ways)
[1] 1.74
> # Which route (index) is this?
> which.min((ways))
ABC
1
Just remember, like you said, you're dealing with factorial complexity and you might end up waiting a while running this brute force search with more than a few locations...
I would like to take column and row names from a text file and build a sparse matrix using the row and column information (the algorithm can be found in the description below). I have a working solution but it is slow for a text file with over 3,000,000 entries.
Does anyone have any suggestions for a faster algorithm than the one I describe below?
First, I start with a text file which provides column and row names, separated by a space. For example:
aaaa 11111 22222 33333 bbbb 11111 22222 cccc 11111
where {aaaa,bbbb,cccc} are 4 character column names and {11111,22222,33333} are 5 character row names.
Second, I load this text file into R using the scan function:
char_vec <- scan(file = "textFile.txt", what = "character")
which converts the textFile information into a character vector.
Third, I find all of the possible column names and row names:
c_names <- unique(char_vec[nchar(char_vec) == 4])
r_names <- unique(char_vec[nchar(char_vec) == 5])
Fourth, I create a sparse matrix from the data:
library(Matrix)
createMatrix <- function(char_vec=char_vec, c_names=c_names, r_names=r_names)
{
mySparseMatrix <- Matrix(0, nrow = length(r_names), ncol = length(c_names),
sparse = TRUE)
for (i1 in 1:length(char_vec))
{
if (char_vec[i1] %in% c_names)
{
c_index <- match(char_vec[i1], c_names)
}
if (char_vec[i1] %in% r_names)
{
r_index <- match(char_vec[i1], r_names)
mySparseMatrix[r_index, c_index] <- 1
}
}
colnames(mySparseMatrix) <- c_names
rownames(mySparseMatrix) <- r_names
return(mySparseMatrix)
}
This gives this output:
aaaa bbbb cccc
11111 1 1 1
22222 1 1 .
33333 1 . .
To show how fast this algorithm works, I padded out the character vector (albeit in an unrealistic manner but I think it serves its purpose as an example):
char_vec <- rep(c("aaaa", "11111", "22222", "33333", "bbbb", "11111", "22222", "cccc", "11111"), 1000)
and then ran:
system.time(createMatrix(char_vec, c_names, r_names))
Output:
user system elapsed
9.89 0.00 9.94
I have profiled the function using:
Rprof("createMatrixOut.out")
z <- createMatrix(char_vec, c_names, r_names)
Rprof(NULL)
and display a subset of the output using:
summaryRprof("createMatrixOut.out")$by.total[1:10,]
Output:
total.time total.pct self.time self.pct
"createMatrix" 8.08 100.00 0.08 0.99
"[<-" 7.96 98.51 0.08 0.99
"replCmat4" 7.40 91.58 0.04 0.50
"as" 5.64 69.80 0.04 0.50
"asMethod" 5.06 62.62 0.16 1.98
"standardGeneric" 4.68 57.92 0.24 2.97
"new" 4.52 55.94 0.02 0.25
"initialize" 4.40 54.46 0.04 0.50
"callNextMethod" 4.24 52.48 0.08 0.99
".Call" 4.12 50.99 0.60 7.43
I have changed the structure of the data: Instead of storing them in a character vector, I create list:
> lst
$aaaa
[1] "11111" "22222" "33333"
$bbbb
[1] "11111" "22222"
$cccc
[1] "11111"
It is than much faster to iterate through this list
createMatrix2 <- function(char_vec=char_vec, c_names=c_names, r_names=r_names)
{
# create list
lst <- list()
for (i1 in 1:length(char_vec))
{
if (nchar(char_vec[i1])==4)
{
cn <- char_vec[i1]
} else {
if (!(char_vec[i1] %in% lst[[cn]])){lst[[cn]] <- c(lst[[cn]],char_vec[i1])}
}
}
# create empty matrix
mySparseMatrix <- Matrix(0, nrow = length(r_names), ncol = length(c_names),
sparse = TRUE)
# fill the matrix
for (cn in names(lst)){
c_index <- match(cn, c_names)
for(rn in lst[[cn]]){
r_index <- match(rn, r_names)
mySparseMatrix[r_index, c_index] <- 1
}
}
# names and return
colnames(mySparseMatrix) <- c_names
rownames(mySparseMatrix) <- r_names
return(mySparseMatrix)
}
> system.time(createMatrix(char_vec, c_names, r_names))
user system elapsed
9.60 0.00 10.36
> system.time(createMatrix2(char_vec, c_names, r_names))
user system elapsed
0.06 0.00 0.06
Given a column of data (of the type 39600.432, 39600.433, etc) I would like to drop the integer part of the number and keep only the decimals (transforming 39600.432 into 432, and 39600.433 into 433). How can I do this?
Let's say your column is the vector x.
> x <- c(39.456, 976.902)
> x <- x - as.integer(x)
> x
[1] 0.456 0.902
That should work. You can then just multiply by 1000 to convert the current x to integers. You will need some more processing if you want 3.9 to become 9.
> x <- 1000*x
> x
[1] 456 902
Hope the helps!
Many good answers, here's one more using regular expressions.
> g <- c(134.3412,14234.5453)
> gsub("^[^\\.]*\\.", "", g)
[1] "3412" "5453"
To strip the integral part without a subtraction or regex, you can use the modulus operator.
x <- (10000:10010)/100
x
## [1] 100.00 100.01 100.02 100.03 100.04 100.05 100.06 100.07 100.08 100.09 100.10
x %% 1
## [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
%% 1 is meaningful in R. This does leave the value as fractional, which may not be ideal for your use.
You are looking for the floor function. But you could do as.integer as well.
Here is an approach using regular expressions
g<-c(134.3412,14234.5453)
r<-regexpr("[0-9]+$",g)
as.numeric(regmatches(g,r))
This should do it:
g <- c(134.3412,14234.5453)
h <- floor(g)
g - h