It seems a silly question, but I have searched on line, but still did not find any sufficient reply.
My question is: suppose we have a matrix M, then we use the scale() function, how can we extract the center and scale of each column by writing a line of code (I know we can see the centers and scales..), but my matrix has lots of columns, it is cumbersome to do it manually.
Any ideas? Many thanks!
you are looking for the attributes function:
set.seed(1)
mat = matrix(rnorm(1000),,10) # Suppose you have 10 columns
s = scale(mat) # scale your data
attributes(s)#This gives you the means and the standard deviations:
$`dim`
[1] 100 10
$`scaled:center`
[1] 0.1088873669 -0.0378080766 0.0296735350 0.0516018586 -0.0391342406 -0.0445193567 -0.1995797418
[8] 0.0002549694 0.0100772648 0.0040650015
$`scaled:scale`
[1] 0.8981994 0.9578791 1.0342655 0.9916751 1.1696122 0.9661804 1.0808358 1.0973012 1.0883612 1.0548091
These values can also be obtained as:
colMeans(mat)
[1] 0.1088873669 -0.0378080766 0.0296735350 0.0516018586 -0.0391342406 -0.0445193567 -0.1995797418
[8] 0.0002549694 0.0100772648 0.0040650015
sqrt(diag(var(mat)))
[1] 0.8981994 0.9578791 1.0342655 0.9916751 1.1696122 0.9661804 1.0808358 1.0973012 1.0883612 1.0548091
you get a list that you can subset the way you want:
or you can do
attr(s,"scaled:center")
[1] 0.1088873669 -0.0378080766 0.0296735350 0.0516018586 -0.0391342406 -0.0445193567 -0.1995797418
[8] 0.0002549694 0.0100772648 0.0040650015
attr(s,"scaled:scale")
[1] 0.8981994 0.9578791 1.0342655 0.9916751 1.1696122 0.9661804 1.0808358 1.0973012 1.0883612 1.0548091
I have calculated the required Safety Stock using the excel goal seek function. Below is the image.
But now I want to do the same using R.
The below function gives me the same excel results when I enter the SafetyStock & SD. Now I need to do the reverse calculation(Whenever I provide x & SD I need the SS). Could someone help me with the same?
I tried Optix and other similar R packages but couldn't succeed.
opt<-function(SS,SD){
x=-SS*(1-pnorm(SS/SD)) + SD * dnorm(SS/SD,mean=0,sd =1,0)
print(x)
}
Excel Goal seek
Solving f(x)=c for x is the same as solving f(x)-c=0. You can use uniroot to find the root:
f <- function(SS, SD, ESC) {
-SS*(1-pnorm(SS/SD)) + SD * dnorm(SS/SD,mean=0,sd =1,0) - ESC
}
zero <- uniroot(f,c(0,1000),SD=600,ESC=39.3)
zero$root
The second argument is the interval to search: between 0 and 1000. This returns
674.0586
The zero structure has more interesting information:
$root
[1] 674.0586
$f.root
[1] 1.933248e-08
$iter
[1] 8
$init.it
[1] NA
$estim.prec
[1] 6.103516e-05
Given a list animals, call it m, which contains
$bob
[1] 3
$ryan
[1] 4
$dan
[1] 1
How can I sort this guy by the numerical value?
Basically I'd like to see my code look like this
m=sort(m,sortbynumber)
$ryan
[1] 4
$bob
[1] 3
$dan
[1] 1
I can't figure this out unfortunately. Seems like a simple solution.
You can try order
m[order(-unlist(m))]
#$ryan
#[1] 4
#$bob
#[1] 3
#$dan
#[1] 1
Or a slightly more efficient option would be to use decreasing=TRUE argument of order (from #nicola's comments)
m[order(unlist(m), decreasing=TRUE)]
here is the optimized solution
library(hashmap)
a1<-hashmap("hello",1)
a1$insert("hello1",4)
a1$insert("hello2",2)
a1$insert("hello3",3)
sort(a1$data(),decreasing = TRUE)
#OUTPUT
hello1 hello3 hello2 hello
4 3 2 1
I've been struggling with formatting numbers in R using what I feel are very sensible rules. What I would want is to specify a number of significant digits (say 3), keep significant zeroes, and also keep all digits before the decimal point, some examples (with 3 significant digits):
1.23456 -> "1.23"
12.3456 -> "12.3"
123.456 -> "123"
1234.56 -> "1235"
12345.6 -> "12346"
1.50000 -> "1.50"
1.49999 -> "1.50"
Is there a function in R that does this kind of formatting? If not, how could it be done?
I feel these are quite sensible formatting rules, yet I have not managed to find a function that formats in this way in R. As far as I googled this is not a duplicate of many similar questions such as this
Edit:
Inspired by the two good answers I put together a function myself that I believe works for all cases:
sign_digits <- function(x,d){
s <- format(x,digits=d)
if(grepl("\\.", s) && ! grepl("e", s)) {
n_sign_digits <- nchar(s) -
max( grepl("\\.", s), attr(regexpr("(^[-0.]*)", s), "match.length") )
n_zeros <- max(0, d - n_sign_digits)
s <- paste(s, paste(rep("0", n_zeros), collapse=""), sep="")
}
s
}
format(num,3) comes very close.
format(1.23456,digits=3)
# [1] "1.23"
format(12.3456,digits=3)
# [1] "12.3"
format(123.456,digits=3)
# [1] "123"
format(1234.56,digits=3)
# [1] "1235"
format(12345.6,digits=3)
# [1] "12346"
format(1.5000,digits=3)
# [1] "1.5"
format(1.4999,digits=3)
# [1] "1.5"
Your rules are not actually internally consistent. You want 1234.56 to round down to 1234, yet you want 1.4999 to round up to 1.5.
EDIT This appears to deal with the very valid point made by #Henrik.
sigDigits <- function(x,d){
z <- format(x,digits=d)
if (!grepl("[.]",z)) return(z)
require(stringr)
return(str_pad(z,d+1,"right","0"))
}
z <- c(1.23456, 12.3456, 123.456, 1234.56, 12345.6, 1.5000, 1.4999)
sapply(z,sigDigits,d=3)
# [1] "1.23" "12.3" "123" "1235" "12346" "1.50" "1.50"
As #jlhoward points out, your rounding rule is not consistent. Hence you should use a conditional statement:
x <- c(1.23456, 12.3456, 123.456, 1234.56, 12345.6, 1.50000, 1.49999)
ifelse(x >= 100, sprintf("%.0f", x), ifelse(x < 100 & x >= 10, sprintf("%.1f", x), sprintf("%.2f", x)))
# "1.23" "12.3" "123" "1235" "12346" "1.50" "1.50"
It's hard to say the intended usage, but it might be better to use consistent rounding. Exponential notation could be an option:
sprintf("%.2e", x)
[1] "1.23e+00" "1.23e+01" "1.23e+02" "1.23e+03" "1.23e+04" "1.50e+00" "1.50e+00"
sig0=\(x,y){
dig=abs(pmin(0,floor(log10(abs(x)))-y+1))
dig[is.infinite(dig)]=y-1
sprintf(paste0("%.",dig,"f"),x)
}
> v=c(1111,111.11,11.1,1.1,1.99,.01,.001,0,-.11,-.9,-.000011)
> paste(sig0(v,2),collapse=" ")
[1] "1111 111 11 1.1 2.0 0.010 0.0010 0.0 -0.11 -0.90 -0.000011"
Or the following is almost the same with the exception that 0 is converted to 0 and not 0.0 (fg is a special version of f where the digits specify significant digits and not digits after the decimal point, and the # flag causes fg to not drop trailing zeroes):
> paste(sub("\\.$","",formatC(v,2,,"fg","#")),collapse=" ")
[1] "1111 111 11 1.1 2.0 0.010 0.0010 0 -0.11 -0.90 -0.000011"
I have two seemingly identical zoo objects created by the same commands from csv files for different time periods. I try to combine them into one long zoo but I'm failing with "indexes overlap" error. ('merge' 'c' or 'rbind' all produce variants of the same error text.) As far as I can see there are no duplicates and the time periods do not overlap. What am I doing wrong? Am using R version 3.0.1 on Windows 7 64bit if that makes a difference.
> colnames(z2)
[1] "Amb" "HWS" "Diff"
> colnames(t.tmp)
[1] "Amb" "HWS" "Diff"
> max(index(z2))
[1] "2012-12-06 02:17:45 GMT"
> min(index(t.tmp))
[1] "2012-12-06 03:43:45 GMT"
> anyDuplicated(c(index(z2),index(t.tmp)))
[1] 0
> c(z2,t.tmp)
Error in rbind.zoo(...) : indexes overlap
>
UPDATE: In trying to make a reproducible case I've concluded this is an implementation error due to the large number of rows I'm dealing with: it fails if the final result is more than 311434 rows long.
> nrow(c(z2,head(t.tmp,n=101958)))
Error in rbind.zoo(...) : indexes overlap
> nrow(c(z2,head(t.tmp,n=101957)))
[1] 311434
# but row 101958 inserts fine on its own so its not a data problem.
> nrow(c(z2,tail(head(t.tmp,n=101958),n=2)))
[1] 209479
I'm sorry but I dont have the R scripting skills to produce a zoo of the critical length, hopefully someone might be able to help me out..
UPDATE 2- Responding to Jason's suggestion.. : The problem is in the MATCH but my R skills arent sufficient to know how to interpret it- does it mean MATCH finds a duplicate value in x.t whereas anyDuplicated does not?
> x.t <- c(index(z2),index(t.tmp));
> length(x.t)
[1] 520713
> ix <- ORDER (x.t)
> length(ix)
[1] 520713
> x.t <- x.t[ix]
> length(ix)
[1] 520713
> length(x.t)
[1] 520713
> tx <- table(MATCH(x.t,x.t))
> max(tx)
[1] 2
> tx[which(tx==2)]
311371 311373 311378 311383 311384 311386 311389 311392 311400 311401
2 2 2 2 2 2 2 2 2 2
> anyDuplicated(x.t)
[1] 0
After all the testing and head scratching it seems that the problem I'm having is timezone related. Setting the environment to the same time zone as the original data makes it work just fine.
Sys.setenv(TZ="GMT")
> z3<-rbind(z2,t.tmp)
> nrow(z3)
[1] 520713
Thanks to how to guard against accidental time zone conversion for the inspiration to look in that direction.