I have a data frame, deflator.
I want to get a new data frame inflation which can be calculated by:
deflator[i] - deflator[i-4]
----------------------------- * 100
deflator [i - 4]
The data frame deflator has 71 numbers:
> deflator
[1] 0.9628929 0.9596746 0.9747274 0.9832532 0.9851884
[6] 0.9797770 0.9913502 1.0100561 1.0176906 1.0092516
[11] 1.0185932 1.0241043 1.0197975 1.0174097 1.0297328
[16] 1.0297071 1.0313232 1.0244618 1.0347808 1.0480411
[21] 1.0322142 1.0351968 1.0403264 1.0447121 1.0504402
[26] 1.0487097 1.0664664 1.0935239 1.0965951 1.1141851
[31] 1.1033155 1.1234482 1.1333870 1.1188136 1.1336276
[36] 1.1096461 1.1226584 1.1287245 1.1529588 1.1582911
[41] 1.1691221 1.1782178 1.1946234 1.1963453 1.1939922
[46] 1.2118189 1.2227960 1.2140535 1.2228828 1.2314258
[51] 1.2570788 1.2572214 1.2607763 1.2744415 1.2982076
[56] 1.3318808 1.3394186 1.3525902 1.3352815 1.3492751
[61] 1.3593859 1.3368135 1.3642940 1.3538567 1.3658135
[66] 1.3710932 1.3888638 1.4262185 1.4309707 1.4328823
[71] 1.4497201
This is a very tricky question for me.
I tried to do this using a for loop:
> d <- data.frame(deflator)
> for (i in 1:71) {d <-rbind(d,c(delfaotr ))}
I think I might be doing it wrong.
Why use data frames? This is a straightforward vector operation.
inflation = 100 * (deflator[1:67] - deflator[-(1:4)])/deflator[-(1:4)]
I agree with #Fhnuzoag that your example suggests calculations on a numeric vector, not a data frame. Here's an additional way to do your calculations taking advantage of the lag argument in the diff function (with indexes that match those in your question):
lagBy <- 4 # The number of indexes by which to lag
laggedDiff <- diff(deflator, lag = lagBy) # The numerator above
theDenom <- deflator[seq_len(length(deflator) - lagBy)] # The denominator above
inflation <- laggedDiff/theDenom
The first few results are:
head(inflation)
# [1] 0.02315470 0.02094710 0.01705379 0.02725941 0.03299085 0.03008297
Related
relevant_ods_reordered <- relevant_ods[names(cpm)]
the above seeks to reorder column names of a dataframe relevant_ods:
Plate1_DMSO_A01 Plate1_DMSO_B01 Plate1_DMSO_C01 Plate1_Lopinavir_D01
OD595 0.431 0.4495 0.4993 0.5785
Plate1_DMSO_E01 Plate1_DMSO_F01 Plate1_DMSO_G01 Plate1_DMSO_H01
OD595 0.5336 0.5133 0.527 0.5413
Plate1_DMSO_C12 Plate1_DMSO_D12 Plate1_Lopinavir_E12 Plate1_DMSO_F12
OD595 0.4137 0.4274 0.5241 0.4264
Plate1_DMSO_G12 Plate1_DMSO_H12
OD595 0.4561 0.4767
to match the order of the columns in a significantly larger dataframe:
[1] "Plate1_DMSO_A01" "Plate1_DMSO_A12"
[3] "Plate1_DMSO_B01" "Plate1_DMSO_B12"
[5] "Plate1_DMSO_C01" "Plate1_DMSO_C12"
[7] "Plate1_DMSO_D12" "Plate1_DMSO_E01"
[9] "Plate1_DMSO_F01" "Plate1_DMSO_F12"
[11] "Plate1_DMSO_G01" "Plate1_DMSO_G12"
[13] "Plate1_DMSO_H01" "Plate1_DMSO_H12"
[15] "Plate1_Lopinavir_D01" "Plate1_Lopinavir_E12"
[17] "Plate1_NS1519_22009_A02" "Plate1_NS1519_22009_A04"
[19] "Plate1_NS1519_22009_A05" "Plate1_NS1519_22009_A06"
[21] "Plate1_NS1519_22009_A07" "Plate1_NS1519_22009_A08"
[23] "Plate1_NS1519_22009_A09" "Plate1_NS1519_22009_A10"
[25] "Plate1_NS1519_22009_A11" "Plate1_NS1519_22009_B02"
[27] "Plate1_NS1519_22009_B03" "Plate1_NS1519_22009_B04"
[29] "Plate1_NS1519_22009_B05" "Plate1_NS1519_22009_B06"
etc.
Clearly, there is a returned
Error in `[.data.frame`(relevant_ods, names(cpm)) :
undefined columns selected
due to the mismatch between the numbers of columns
I have tried
relevant_ods_reordered <- relevant_ods[names(cpm),]
relevant_ods_reordered <- select(relevant_ods, names(cpm))
relevant_ods_reordered <- match(relevant_ods, names(cpm))
With base R, you need to find the names in common. intersect is good for this and preserves the order of its first argument:
relevant_ods[intersect(names(cpm), names(relevant_ods))]
Or with dplyr, use the select helper any_of:
select(relevant_ods, any_of(names(cpm)))
I generated a series of 10,000 random numbers through:
rand_x = rf(10000, 3, 5)
Now I want to produce another series that contains the variances at each point i.e. the column look like this:
[variance(first two numbers)]
[variance(first three numbers)]
[variance(first four numbers)]
[variance(first five numbers)]
.
.
.
.
[variance of 10,000 numbers]
I have written the code as:
c ( var(rand_x[1:1]) : var(rand_x[1:10000])
but I am only getting 157 elements in the column rather than not 10,000. Can someone guide what I am doing wrong here?
An option is to loop over the index from 2 to 10000 in sapply, extract the elements of 'rand_x' from position 1 to the looped index, apply the var and return a vector of variance output
out <- sapply(2:10000, function(i) var(rand_x[1:i]))
Your code creates a sequence incrementing by one with the variance of the first two elements as start value and the variance of the whole vector as limit.
var(rand_x[1:2]):var(rand_x[1:n])
# [1] 0.9026262 1.9026262 2.9026262
## compare:
.9026262:3.33433
# [1] 0.9026262 1.9026262 2.9026262
What you want is to loop over the vector indices, using seq_along to get the variances of sequences growing by one. To see what needs to be done, I show you first a (rather slow) for loop.
vars <- numeric() ## initialize numeric vector
for (i in seq_along(rand_x)) {
vars[i] <- var(rand_x[1:i])
}
vars
# [1] NA 0.9026262 1.4786540 1.2771584 1.7877717 1.6095619
# [7] 1.4483273 1.5653797 1.8121144 1.6192175 1.4821020 3.5005254
# [13] 3.3771453 3.1723564 2.9464537 2.7620001 2.7086317 2.5757641
# [19] 2.4330738 2.4073546 2.4242747 2.3149455 2.3192964 2.2544765
# [25] 3.1333738 3.0343781 3.0354998 2.9230927 2.8226541 2.7258979
# [31] 2.6775278 2.6651541 2.5995346 3.1333880 3.0487177 3.0392603
# [37] 3.0483917 4.0446074 4.0463367 4.0465158 3.9473870 3.8537925
# [43] 3.8461463 3.7848464 3.7505158 3.7048694 3.6953796 3.6605357
# [49] 3.6720684 3.6580296
The first element has to be NA because the variance of one element is not defined (division by zero).
However, the for loop is slow. Since R is vectorized we rather want to use a function from the *apply family, e.g. vapply, which is much faster. In vapply we initialize with numeric(1) (or just 0) because the result of each iteration is of length one.
vars <- vapply(seq_along(rand_x), function(i) var(rand_x[1:i]), numeric(1))
vars
# [1] NA 0.9026262 1.4786540 1.2771584 1.7877717 1.6095619
# [7] 1.4483273 1.5653797 1.8121144 1.6192175 1.4821020 3.5005254
# [13] 3.3771453 3.1723564 2.9464537 2.7620001 2.7086317 2.5757641
# [19] 2.4330738 2.4073546 2.4242747 2.3149455 2.3192964 2.2544765
# [25] 3.1333738 3.0343781 3.0354998 2.9230927 2.8226541 2.7258979
# [31] 2.6775278 2.6651541 2.5995346 3.1333880 3.0487177 3.0392603
# [37] 3.0483917 4.0446074 4.0463367 4.0465158 3.9473870 3.8537925
# [43] 3.8461463 3.7848464 3.7505158 3.7048694 3.6953796 3.6605357
# [49] 3.6720684 3.6580296
Data:
n <- 50
set.seed(42)
rand_x <- rf(n, 3, 5)
I am looking to do multiple two sample t.tests in R.
I want to test 50 indicators that have two levels. So at first I used :
t.test(m~f)
Welch Two Sample t-test
data: m by f
t = 2.5733, df = 174.416, p-value = 0.01091
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.05787966 0.43891600
sample estimates:
mean in group FSS mean in group NON-FSS
0.8344209 0.5860231
Here m corresponds to the first indicator I want to test m =Debt.to.equity.ratio.
Here is a list of all the indicators I need to test :
print (indicators)
[1] "Debt.to.equity.ratio" "Deposits.to.loans"
[3] "Deposits.to.total.assets" "Gross.loan.portfolio.to.total.assets"
[5] "Number.of.active.borrowers" "Percent.of.women.borrowers"
[7] "Number.of.loans.outstanding" "Gross.loan.portfolio"
[9] "Average.loan.balance.per.borrower" "Average.loan.balance.per.borrower...GNI.per.capita"
[11] "Average.outstanding.balance" "Average.outstanding.balance...GNI.per.capita"
[13] "Number.of.depositors" "Number.of.deposit.accounts"
[15] "Deposits" "Average.deposit.balance.per.depositor"
[17] "Average.deposit.balance.per.depositor...GNI.per.capita" "Average.deposit.account.balance"
[19] "Average.deposit.account.balance...GNI.per.capita" "Return.on.assets"
[21] "Return.on.equity" "Operational.self.sufficiency"
[23] "FSS" "Financial.revenue..assets"
[25] "Profit.margin" "Yield.on.gross.portfolio..nominal."
[27] "Yield.on.gross.portfolio..real." "Total.expense..assets"
[29] "Financial.expense..assets" "Provision.for.loan.impairment..assets"
[31] "Operating.expense..assets" "Personnel.expense..assets"
[33] "Administrative.expense..assets" "Operating.expense..loan.portfolio"
[35] "Personnel.expense..loan.portfolio" "Average.salary..GNI.per.capita"
[37] "Cost.per.borrower" "Cost.per.loan"
[39] "Borrowers.per.staff.member" "Loans.per.staff.member"
[41] "Borrowers.per.loan.officer" "Loans.per.loan.officer"
[43] "Depositors.per.staff.member" "Deposit.accounts.per.staff.member"
[45] "Personnel.allocation.ratio" "Portfolio.at.risk...30.days"
[47] "Portfolio.at.risk...90.days" "Write.off.ratio"
[49] "Loan.loss.rate" "Risk.coverage"
Instead of changing the indicator name each time in the t.test, I would like to create a loop that will do it automatically and calculate the p.value. I've tried creating a loop but can't make it work due to the nature of the variables = characters.
I would really appreciate any tips on how to go forward!
Thank you very much !
Best
Morgan
I am assuming you are doing the regression of each indicator against the same f.
In that case, you can try something like:
p_vals = NULL;
for(this_indicator in indicators)
{
this_formula = paste(c(this_indicator, "f"), collapse="~");
res = t.test(as.formula(this_formula));
p_vals = c(p_vals, res$p.value);
}
One comment, however: are you doing any multiplicity adjustment for these p-values? Given the large of tests you are doing, there is a good chance you will be showered with false positives.
I am trying to turn a vector of length n (say, 14), and turn it into a vector of length N (say, 90). For example, my vector is
x<-c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
and I want to turn it into a vector of length 90, by creating 90 equally "spaced" points on this vector- think of x as a function. Is there any way to do that in R?
Something like this?
> x<-c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
> seq(min(x),max(x),length=90)
[1] 2.000000 2.426966 2.853933 3.280899 3.707865 4.134831 4.561798
[8] 4.988764 5.415730 5.842697 6.269663 6.696629 7.123596 7.550562
[15] 7.977528 8.404494 8.831461 9.258427 9.685393 10.112360 10.539326
[22] 10.966292 11.393258 11.820225 12.247191 12.674157 13.101124 13.528090
[29] 13.955056 14.382022 14.808989 15.235955 15.662921 16.089888 16.516854
[36] 16.943820 17.370787 17.797753 18.224719 18.651685 19.078652 19.505618
[43] 19.932584 20.359551 20.786517 21.213483 21.640449 22.067416 22.494382
[50] 22.921348 23.348315 23.775281 24.202247 24.629213 25.056180 25.483146
[57] 25.910112 26.337079 26.764045 27.191011 27.617978 28.044944 28.471910
[64] 28.898876 29.325843 29.752809 30.179775 30.606742 31.033708 31.460674
[71] 31.887640 32.314607 32.741573 33.168539 33.595506 34.022472 34.449438
[78] 34.876404 35.303371 35.730337 36.157303 36.584270 37.011236 37.438202
[85] 37.865169 38.292135 38.719101 39.146067 39.573034 40.000000
>
Try this:
#data
x <- c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
#expected new length
N=90
#number of numbers between 2 numbers
my.length.out=round((N-length(x))/(length(x)-1))+1
#new data
x1 <- unlist(
lapply(1:(length(x)-1), function(i)
seq(x[i],x[i+1],length.out = my.length.out)))
#plot
par(mfrow=c(2,1))
plot(x)
plot(x1)
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
scale a series between two points in R
Does any know of an R function to perform range standardization on a vector? I'm looking to transform variables to a scale between 0 and 1, while retaining rank order and the relative size of separation between values.
Just to be clear, i'm not looking to standardize variables by mean centering and scaling by the SD, as is done in the function scale().
I tried the functions mmnorm() and rangenorm() in the package 'dprep', but these don't seem to do the job.
s = sort(rexp(100))
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
range01(s)
[1] 0.000000000 0.003338782 0.007572326 0.012192201 0.016055006 0.017161145
[7] 0.019949532 0.023839810 0.024421602 0.027197168 0.029889484 0.033039408
[13] 0.033783376 0.038051265 0.045183382 0.049560233 0.056941611 0.057552543
[19] 0.062674982 0.066001242 0.066420884 0.067689067 0.069247825 0.069432174
[25] 0.070136067 0.076340460 0.078709590 0.080393512 0.085591881 0.087540132
[31] 0.090517295 0.091026499 0.091251213 0.099218526 0.103236344 0.105724733
[37] 0.107495340 0.113332392 0.116103438 0.124050331 0.125596034 0.126599323
[43] 0.127154661 0.133392300 0.134258532 0.138253452 0.141933433 0.146748798
[49] 0.147490227 0.149960293 0.153126478 0.154275371 0.167701855 0.170160948
[55] 0.180313542 0.181834891 0.182554291 0.189188137 0.193807559 0.195903010
[61] 0.208902645 0.211308713 0.232942314 0.236135220 0.251950116 0.260816843
[67] 0.284090255 0.284150541 0.288498370 0.295515143 0.299408623 0.301264703
[73] 0.306817872 0.307853369 0.324882091 0.353241217 0.366800517 0.389474449
[79] 0.398838576 0.404266315 0.408936260 0.409198619 0.415165553 0.433960390
[85] 0.440690262 0.458692639 0.464027428 0.474214070 0.517224262 0.538532221
[91] 0.544911543 0.559945121 0.585390414 0.647030109 0.694095422 0.708385079
[97] 0.736486707 0.787250428 0.870874773 1.000000000
Adding ... will allow you to pass through na.rm = T if you want to omit missing values from the calculation (they will still be present in the results):
range01 <- function(x, ...){(x - min(x, ...)) / (max(x, ...) - min(x, ...))}