We conducted an experiment at Uni which we tried out ourselves before we gave it to real test persons. The problem now is, that our testing-data is included in the whole csv datafile so I need to delete the first 23 "test persons".
They all got a unique code and I could count how many of those unique codes exist (as you can see, there are 38). Now I only need the last 15 of them... I tried it with subset but I don't really now how to filter for those specific last 15 subjectId's (VPcount)
unique(d$VPcount)
uniqueN(d$VPcount)
[1] 7.941675e-312 7.941683e-312 7.941686e-312 7.941687e-312 7.941695e-312 7.941697e-312 7.941734e-312
[8] 7.942134e-312 7.942142e-312 7.942146e-312 7.942176e-312 7.942191e-312 7.942194e-312 7.942199e-312
[15] 7.942268e-312 7.942301e-312 7.942580e-312 7.943045e-312 7.944383e-312 7.944386e-312 7.944388e-312
[22] 7.944388e-312 7.944429e-312 7.944471e-312 7.944477e-312 7.944478e-312 7.944494e-312 7.944500e-312
[29] 7.944501e-312 7.944501e-312 7.944503e-312 7.944503e-312 7.944506e-312 7.944506e-312 7.944506e-312
[36] 7.944506e-312 7.944508e-312 7.944511e-312
[1] 38
You can try :
data <- subset(d, VPcount %in% tail(unique(VPcount), 15))
I have RNAseq data for a Time-course experiment (6 time points) and involves tens of thousands of genes.
I have used the Filter program on Tidyverse to find genes that fit certain criteria (reference genes for qPCR), but I don't know how to make this data into a figure easily. Right now, I'd have to change the format of the dataset completely, but that would take so much time to be impractical.
The goal is just to have a graph for each gene that shows the change in expression over time for each condition (different leaf pairs and droughted/well-watered). I have done this for some in Excel but would like a quicker way to do it.
The dataset is set out like this:
[1] "gene.id" "LP1.2.02:00.WW" "LP1.2.02:00.WW_1" "LP1.2.02:00.WW_2"
[5] "LP1.2.06:00.WW" "LP1.2.06:00.WW_1" "LP1.2.06:00.WW_2" "LP1.2.10:00.WW"
[9] "LP1.2.10:00.WW_1" "LP1.2.10:00.WW_2" "LP1.2.14:00.WW" "LP1.2.14:00.WW_1"
[13] "LP1.2.14:00.WW_2" "LP1.2.18:00.WW" "LP1.2.18:00.WW_1" "LP1.2.18:00.WW_2"
[17] "LP1.2.22:00.WW" "LP1.2.22:00.WW_1" "LP1.2.22:00.WW_2" "LP3.4.5.02:00.WW"
[21] "LP3.4.5.02:00.WW_1" "LP3.4.5.02:00.WW_2" "LP3.4.5.06:00.WW" "LP3.4.5.06:00.WW_1"
[25] "LP3.4.5.06:00.WW_2" "LP3.4.5.10:00.WW" "LP3.4.5.10:00.WW_1" "LP3.4.5.10:00.WW_2"
[29] "LP3.4.5.14:00.WW" "LP3.4.5.14:00.WW_1" "LP3.4.5.14:00.WW_2" "LP3.4.5.18:00.WW"
[33] "LP3.4.5.18:00.WW_1" "LP3.4.5.18:00.WW_2" "LP3.4.5.22:00.WW" "LP3.4.5.22:00.WW_1"
[37] "LP3.4.5.22:00.WW_2" "LP1.2.02:00.Drought" "LP1.2.02:00.Drought_1" "LP1.2.02:00.Drought_2"
[41] "LP1.2.06:00.Drought" "LP1.2.06:00.Drought_1" "LP1.2.06:00.Drought_2" "LP1.2.10:00.Drought"
[45] "LP1.2.10:00.Drought_1" "LP1.2.10:00.Drought_2" "LP1.2.14:00.Drought" "LP1.2.14:00.Drought_1"
[49] "LP1.2.14:00.Drought_2" "LP1.2.18:00.Drought" "LP1.2.18:00.Drought_1" "LP1.2.18:00.Drought_2"
[53] "LP1.2.22:00.Drought" "LP1.2.22:00.Drought_1" "LP1.2.22:00.Drought_2" "LP3.4.5.02:00.Drought"
[57] "LP3.4.5.02:00.Drought_1" "LP3.4.5.02:00.Drought_2" "LP3.4.5.06:00.Drought" "LP3.4.5.06:00.Drought_1"
[61] "LP3.4.5.06:00.Drought_2" "LP3.4.5.10:00.Drought" "LP3.4.5.10:00.Drought_1" "LP3.4.5.10:00.Drought_2"
[65] "LP3.4.5.14:00.Drought" "LP3.4.5.14:00.Drought_1" "LP3.4.5.14:00.Drought_2" "LP3.4.5.18:00.Drought"
[69] "LP3.4.5.18:00.Drought_1" "LP3.4.5.18:00.Drought_2" "LP3.4.5.22:00.Drought." "LP3.4.5.22:00.Drought"
[73] "LP3.4.5.22:00.Drought_1" "X74" "LP1.2.02:00.WW.mean" "LP1.2.06:00.WW.mean"
[77] "LP1.2.10:00.WW.mean" "LP1.2.14:00.WW.mean" "LP1.2.18:00.WW.mean" "LP1.2.22:00.WW.mean"
[81] "LP1.2.02:00.drought.mean" "LP1.2.06:00.drought.mean" "LP1.2.10:00.drought.mean" "LP1.2.14:00.drought.mean"
[85] "LP1.2.18:00.drought.mean" "LP1.2.22:00.drought.mean" "LP3.4.5.02:00.WW.mean" "LP3.4.5.06:00.WW.mean"
[89] "LP3.4.5.10:00.WW.mean" "LP3.4.5.14:00.WW.mean" "LP3.4.5.18:00.WW.mean" "LP3.4.5.22:00.WW.mean"
[93] "LP3.4.5.02:00.drought.mean" "LP3.4.5.06:00.drought.mean" "LP3.4.5.10:00.drought.mean" "LP3.4.5.14:00.drought.mean"
[97] "LP3.4.5.18:00.drought.mean" "LP3.4.5.22:00.drought.mean"
It's a lot of headings, and as you can see from the titles, they contain the time, leaf pairs and condition. So, I'm not sure how to translate this into an x~y graph.
I've had several thoughts including trying to divide conditions into different subsets (LP1.2. WW/ LP.1.2.D/LP3.4.5.WW/LP.3.4.5.D) and making a subset for Time (02:00, 06:00, etc.) and trying to make a graph for that.
#make subset for the time points
Time <- c("02:00", "06:00", "10:00", "14:00", "18:00", "22:00")
#make subsets for each condition (LP1.2. WW/ LP.1.2.D/LP3.4.5.WW/LP.3.4.5.D)
LP1.2.WW.mean <- as.matrix(KG_graph_data[c( "LP1.2.02:00.WW.mean",
"LP1.2.06:00.WW.mean",
"LP1.2.10:00.WW.mean",
"LP1.2.14:00.WW.mean",
"LP1.2.18:00.WW.mean",
"LP1.2.22:00.WW.mean",
"gene.id")])
LP.1.2.D.mean <-
as.matrix(KG_graph_data[c("LP1.2.02:00.drought.mean",
"LP1.2.06:00.drought.mean",
"LP1.2.10:00.drought.mean",
"LP1.2.14:00.drought.mean",
"LP1.2.18:00.drought.mean",
"LP1.2.22:00.drought.mean",
"gene.id")])
LP345.WW.mean <- as.matrix((KG_graph_data[c("LP3.4.5.02:00.WW.mean",
"LP3.4.5.06:00.WW.mean",
"LP3.4.5.10:00.WW.mean",
"LP3.4.5.14:00.WW.mean",
"LP3.4.5.18:00.WW.mean",
"LP3.4.5.22:00.WW.mean",
"gene.id")]))
LP345.D.mean <-
as.matrix(KG_graph_data[c("LP3.4.5.02:00.drought.mean",
"LP3.4.5.06:00.drought.mean",
"LP3.4.5.10:00.drought.mean",
"LP3.4.5.14:00.drought.mean",
"LP3.4.5.18:00.drought.mean",
"LP3.4.5.22:00.drought.mean",
"gene.id")])
I tried extracting a particular gene from each matrix to then perhaps plot a graph from but it only worked when it was from one matrix and even then, the table contained no data.
Total_KgGene007565 <- subset(LP1.2.WW.mean, "gene.id"=="KgGene007565",
LP.1.2.D.mean, "gene.id"=="KgGene007565",
LP345.WW.mean, "gene.id"=="KgGene007565",
LP345.D.mean, "gene.id"="KgGene007565")
I am not sure how to proceed from here or if this was the wrong way to approach this.
I am looking to do multiple two sample t.tests in R.
I want to test 50 indicators that have two levels. So at first I used :
t.test(m~f)
Welch Two Sample t-test
data: m by f
t = 2.5733, df = 174.416, p-value = 0.01091
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.05787966 0.43891600
sample estimates:
mean in group FSS mean in group NON-FSS
0.8344209 0.5860231
Here m corresponds to the first indicator I want to test m =Debt.to.equity.ratio.
Here is a list of all the indicators I need to test :
print (indicators)
[1] "Debt.to.equity.ratio" "Deposits.to.loans"
[3] "Deposits.to.total.assets" "Gross.loan.portfolio.to.total.assets"
[5] "Number.of.active.borrowers" "Percent.of.women.borrowers"
[7] "Number.of.loans.outstanding" "Gross.loan.portfolio"
[9] "Average.loan.balance.per.borrower" "Average.loan.balance.per.borrower...GNI.per.capita"
[11] "Average.outstanding.balance" "Average.outstanding.balance...GNI.per.capita"
[13] "Number.of.depositors" "Number.of.deposit.accounts"
[15] "Deposits" "Average.deposit.balance.per.depositor"
[17] "Average.deposit.balance.per.depositor...GNI.per.capita" "Average.deposit.account.balance"
[19] "Average.deposit.account.balance...GNI.per.capita" "Return.on.assets"
[21] "Return.on.equity" "Operational.self.sufficiency"
[23] "FSS" "Financial.revenue..assets"
[25] "Profit.margin" "Yield.on.gross.portfolio..nominal."
[27] "Yield.on.gross.portfolio..real." "Total.expense..assets"
[29] "Financial.expense..assets" "Provision.for.loan.impairment..assets"
[31] "Operating.expense..assets" "Personnel.expense..assets"
[33] "Administrative.expense..assets" "Operating.expense..loan.portfolio"
[35] "Personnel.expense..loan.portfolio" "Average.salary..GNI.per.capita"
[37] "Cost.per.borrower" "Cost.per.loan"
[39] "Borrowers.per.staff.member" "Loans.per.staff.member"
[41] "Borrowers.per.loan.officer" "Loans.per.loan.officer"
[43] "Depositors.per.staff.member" "Deposit.accounts.per.staff.member"
[45] "Personnel.allocation.ratio" "Portfolio.at.risk...30.days"
[47] "Portfolio.at.risk...90.days" "Write.off.ratio"
[49] "Loan.loss.rate" "Risk.coverage"
Instead of changing the indicator name each time in the t.test, I would like to create a loop that will do it automatically and calculate the p.value. I've tried creating a loop but can't make it work due to the nature of the variables = characters.
I would really appreciate any tips on how to go forward!
Thank you very much !
Best
Morgan
I am assuming you are doing the regression of each indicator against the same f.
In that case, you can try something like:
p_vals = NULL;
for(this_indicator in indicators)
{
this_formula = paste(c(this_indicator, "f"), collapse="~");
res = t.test(as.formula(this_formula));
p_vals = c(p_vals, res$p.value);
}
One comment, however: are you doing any multiplicity adjustment for these p-values? Given the large of tests you are doing, there is a good chance you will be showered with false positives.
I tried to find two values in the following vector, which are close to 10. The expected value is 10.12099196 and 10.63054170. Your inputs would be appreciated.
[1] 0.98799517 1.09055728 1.20383713 1.32927166 1.46857509 1.62380423 1.79743107 1.99241551 2.21226576 2.46106916 2.74346924 3.06455219 3.42958354 3.84350238 4.31005838
[16] 4.83051356 5.40199462 6.01590035 6.65715769 7.30532785 7.93823621 8.53773241 9.09570538 9.61755743 10.12099196 10.63018180 11.16783243 11.74870531 12.37719092 13.04922392
[31] 13.75661322 14.49087793 15.24414627 16.00601247 16.75709565 17.46236358 18.06882072 18.51050094 18.71908344 18.63563523 18.22123225 17.46709279 16.40246292 15.09417699 13.63404124
[46] 12.11854915 10.63054170 9.22947285 7.95056000 6.80923943 5.80717982 4.93764782 4.18947450 3.54966795 3.00499094 2.54283599 2.15165780 1.82114213 1.54222565 1.30703661
[61] 1.10879707 0.94170986 0.80084308 0.68201911 0.58171175 0.49695298 0.42525021 0.36451350 0.31299262 0.26922281 0.23197860 0.20023468 0.17313291 0.14995459 0.13009730
[76] 0.11305559 0.09840485 0.08578789 0.07490387 0.06549894 0.05735864
Another alternative could be allowing the user to control for the "tolerance" in order to set what "closeness" is, this can be done by using a simple function:
close <- function(x, value, tol=NULL){
if(!is.null(tol)){
x[abs(x-10) <= tol]
} else {
x[order(abs(x-10))]
}
}
Where x is a vector of values, value is the value of comparison for closeness, and tol is logical, if it's NULL it returns all the "close" values ordered by "closeness" to value, otherwise it returns just the values meeting the condition given in tol.
> close(x, value=10, tol=.7)
[1] 9.617557 10.120992 10.630182 10.630542
> close(x, value=10)
[1] 10.12099196 9.61755743 10.63018180 10.63054170 9.22947285 9.09570538 11.16783243
[8] 8.53773241 11.74870531 7.95056000 7.93823621 12.11854915 12.37719092 7.30532785
[15] 13.04922392 6.80923943 6.65715769 13.63404124 13.75661322 6.01590035 5.80717982
[22] 14.49087793 5.40199462 4.93764782 15.09417699 4.83051356 15.24414627 4.31005838
[29] 4.18947450 16.00601247 3.84350238 16.40246292 3.54966795 3.42958354 16.75709565
[36] 3.06455219 3.00499094 2.74346924 2.54283599 17.46236358 17.46709279 2.46106916
[43] 2.21226576 2.15165780 1.99241551 18.06882072 1.82114213 1.79743107 18.22123225
[50] 1.62380423 1.54222565 18.51050094 1.46857509 18.63563523 1.32927166 1.30703661
[57] 18.71908344 1.20383713 1.10879707 1.09055728 0.98799517 0.94170986 0.80084308
[64] 0.68201911 0.58171175 0.49695298 0.42525021 0.36451350 0.31299262 0.26922281
[71] 0.23197860 0.20023468 0.17313291 0.14995459 0.13009730 0.11305559 0.09840485
[78] 0.08578789 0.07490387 0.06549894 0.05735864
In the first example I defined "closeness" to be at most a difference of 0.7 between value and each elements in x. In the second example the function close returns a vector of values where the firsts are the closest to the value given in value and the lasts are the farest values from value.
Since my solution does not provide an easy (practical) way to find tol as #Arun pointed out, one way to find the closest values would be seting tol=NULL and asking for the exact number of close values as in:
> close(x, value=10)[1:3]
[1] 10.120992 9.617557 10.630182
This shows the three values in x closest to 10.
I can't think of a way without using sort. However, you can speed it up by using partial sort.
x[abs(x-10) %in% sort(abs(x-10), partial=1:2)[1:2]]
# [1] 9.617557 10.120992
In case the same values are present more than once, you'll get all of them here. So, you can either wrap this with unique or you can use match instead as follows:
x[match(sort(abs(x-10), partial=1:2)[1:2], abs(x-10))]
# [1] 10.120992 9.617557
dput output:
dput(x)
c(0.98799517, 1.09055728, 1.20383713, 1.32927166, 1.46857509,
1.62380423, 1.79743107, 1.99241551, 2.21226576, 2.46106916, 2.74346924,
3.06455219, 3.42958354, 3.84350238, 4.31005838, 4.83051356, 5.40199462,
6.01590035, 6.65715769, 7.30532785, 7.93823621, 8.53773241, 9.09570538,
9.61755743, 10.12099196, 10.6301818, 11.16783243, 11.74870531,
12.37719092, 13.04922392, 13.75661322, 14.49087793, 15.24414627,
16.00601247, 16.75709565, 17.46236358, 18.06882072, 18.51050094,
18.71908344, 18.63563523, 18.22123225, 17.46709279, 16.40246292,
15.09417699, 13.63404124, 12.11854915, 10.6305417, 9.22947285,
7.95056, 6.80923943, 5.80717982, 4.93764782, 4.1894745, 3.54966795,
3.00499094, 2.54283599, 2.1516578, 1.82114213, 1.54222565, 1.30703661,
1.10879707, 0.94170986, 0.80084308, 0.68201911, 0.58171175, 0.49695298,
0.42525021, 0.3645135, 0.31299262, 0.26922281, 0.2319786, 0.20023468,
0.17313291, 0.14995459, 0.1300973, 0.11305559, 0.09840485, 0.08578789,
0.07490387, 0.06549894, 0.05735864)
I'm not sure your question is clear, so here's another approach. To find the value closest to your first desired value, 10.12099196 , subtract that from the vector, take the absolute value, and then find the index of the closest element. Explicit:
delx <- abs( 10.12099196 - x)
min.index <- which.min(delx) #returns index of first minimum if there are duplicates
x[min.index] #gets you the value itself
Apologies if this was not the intent of your question.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
scale a series between two points in R
Does any know of an R function to perform range standardization on a vector? I'm looking to transform variables to a scale between 0 and 1, while retaining rank order and the relative size of separation between values.
Just to be clear, i'm not looking to standardize variables by mean centering and scaling by the SD, as is done in the function scale().
I tried the functions mmnorm() and rangenorm() in the package 'dprep', but these don't seem to do the job.
s = sort(rexp(100))
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
range01(s)
[1] 0.000000000 0.003338782 0.007572326 0.012192201 0.016055006 0.017161145
[7] 0.019949532 0.023839810 0.024421602 0.027197168 0.029889484 0.033039408
[13] 0.033783376 0.038051265 0.045183382 0.049560233 0.056941611 0.057552543
[19] 0.062674982 0.066001242 0.066420884 0.067689067 0.069247825 0.069432174
[25] 0.070136067 0.076340460 0.078709590 0.080393512 0.085591881 0.087540132
[31] 0.090517295 0.091026499 0.091251213 0.099218526 0.103236344 0.105724733
[37] 0.107495340 0.113332392 0.116103438 0.124050331 0.125596034 0.126599323
[43] 0.127154661 0.133392300 0.134258532 0.138253452 0.141933433 0.146748798
[49] 0.147490227 0.149960293 0.153126478 0.154275371 0.167701855 0.170160948
[55] 0.180313542 0.181834891 0.182554291 0.189188137 0.193807559 0.195903010
[61] 0.208902645 0.211308713 0.232942314 0.236135220 0.251950116 0.260816843
[67] 0.284090255 0.284150541 0.288498370 0.295515143 0.299408623 0.301264703
[73] 0.306817872 0.307853369 0.324882091 0.353241217 0.366800517 0.389474449
[79] 0.398838576 0.404266315 0.408936260 0.409198619 0.415165553 0.433960390
[85] 0.440690262 0.458692639 0.464027428 0.474214070 0.517224262 0.538532221
[91] 0.544911543 0.559945121 0.585390414 0.647030109 0.694095422 0.708385079
[97] 0.736486707 0.787250428 0.870874773 1.000000000
Adding ... will allow you to pass through na.rm = T if you want to omit missing values from the calculation (they will still be present in the results):
range01 <- function(x, ...){(x - min(x, ...)) / (max(x, ...) - min(x, ...))}