Subsetting for matrices - r

I'm playing around with some data, my matrix looks as shown in results:
results
[,1] [,2] [,3]
[1,] 1.7 0.0015 -1566.276
[2,] 1.7 0.0016 -1564.695
[3,] 1.7 0.0017 -1564.445
[4,] 1.7 0.0018 -1565.373
[5,] 1.7 0.0019 -1567.352
[6,] 1.7 0.0020 -1570.274
[7,] 1.8 0.0015 -1568.299
[8,] 1.8 0.0016 -1565.428
[9,] 1.8 0.0017 -1563.965
[10,] 1.8 0.0018 -1563.750
[11,] 1.8 0.0019 -1564.647
[12,] 1.8 0.0020 -1566.544
[13,] 1.9 0.0015 -1571.798
[14,] 1.9 0.0016 -1567.635
[15,] 1.9 0.0017 -1564.960
[16,] 1.9 0.0018 -1563.602
[17,] 1.9 0.0019 -1563.418
[18,] 1.9 0.0020 -1564.289
[19,] 2.0 0.0015 -1576.673
[20,] 2.0 0.0016 -1571.220
[21,] 2.0 0.0017 -1567.332
[22,] 2.0 0.0018 -1564.831
[23,] 2.0 0.0019 -1563.566
[24,] 2.0 0.0020 -1563.410
I wanted to print the row where the maximum on the 3rd column occurs, so I split it down in the following 2 lines:
max(results[,3])
results[results[,3] == -1563.41,]
The result didn't give me the desired row.
I tried putting the value in inverted commas as so:
max(results[,3])
results[results[,3] == "-1563.41",]
but this didn't work either.
the only code which worked was when I nested the max line of code in the 2nd line of code as so:
results[results[,3] == max(results[,3]),]
could someone please explain why breaking it down into steps didn't work??
I tried turning it into a data frame which didn't work either. using the filter from the tidyverse package didn't work either.
thanks,

R prints numbers to a maximum of getOption("digits") significant digits. I suspect that -1563.410 is the maximum rounded to seven significant digits, not the exact maximum. If no element of results[, 3] is exactly equal to the rounded number -1563.410, then
results[results[, 3] == -1563.41, ]
returns a 0-by-3 matrix. Meanwhile, max(results[, 3]) is the exact maximum, so
results[results[, 3] == max(results[, 3]), ]
returns the matrix subset that you want.
Note that if there are n rows whose third element is equal to max(results[, 3]), then the subset is a length-3 vector if n = 1 and an n-by-3 matrix if n > 1 (see ?Extract). If, in the n > 1 case, you only want the first of the n rows, then you should do:
results[which.max(results[, 3]), ]

Related

Optimum cut-off values

I have both univariate and multivariate logistic regression models and I want to find cut-off values with their respective sensitivity and specificity. I want to chose the best cut-off values for both my univariate and multivariate models.
I tried the following code for the univariate models but I am getting the sensitivity and specificity values in decimals. Is there any other way I can get the cut-off values as whole numbers rather than rounding up to the nearest integer? I am also not sure how to use the same code to get the cut-off values of the multivariate model.
Thank you for any help in advance!!
###Cut off values of Var1
library(pROC)
ok <- multiclass.roc(DATA$Outcome, DATA$Var1)
class_1 <- ok$rocs[[1]]
wants <- cbind(sensitivity = class_1$sensitivities, specificity = class_1$specificities, cutt_off = class_1$thresholds)
wants
I am getting the values:
sensitivity specificity cutt_off
[1,] 1.00000 0.000000 Inf
[2,] 1.00000 0.012346 73.500
[3,] 1.00000 0.024691 72.500
[4,] 1.00000 0.049383 71.500
[5,] 1.00000 0.061728 70.500
[6,] 1.00000 0.135802 69.500
[7,] 1.00000 0.172840 68.500
[8,] 0.94118 0.222222 67.500
[9,] 0.88235 0.283951 66.500
[10,] 0.88235 0.320988 65.750
[11,] 0.88235 0.333333 65.250
[12,] 0.88235 0.432099 64.500
[13,] 0.88235 0.506173 63.500
[14,] 0.82353 0.617284 62.500
[15,] 0.82353 0.629630 61.750
[16,] 0.76471 0.629630 61.250
[17,] 0.76471 0.691358 60.500
[18,] 0.70588 0.753086 59.750
[19,] 0.70588 0.777778 59.250
[20,] 0.70588 0.814815 58.500
[21,] 0.64706 0.827160 57.500
[22,] 0.64706 0.876543 56.500
[23,] 0.64706 0.901235 55.250
[24,] 0.58824 0.913580 54.250
[25,] 0.58824 0.938272 53.900
[26,] 0.52941 0.938272 53.400
[27,] 0.41176 0.938272 52.500
[28,] 0.35294 0.950617 51.835
[29,] 0.29412 0.950617 50.835
[30,] 0.29412 0.962963 49.000
[31,] 0.23529 0.975309 47.500
[32,] 0.17647 0.975309 46.000
[33,] 0.11765 0.987654 44.500
[34,] 0.00000 0.987654 42.500
[35,] 0.00000 1.000000 -Inf
To determine the cut-off values for the multivariate model, I tried the following code but I am getting errors. Also, my model comprises of both continuous and categorical values. Var1, Var2, Var3 are continuous variables and Var4 is categorical which was changed to 0, 1, & 2.
library(pROC)
ok <- multiclass.roc(DATA$Outcome, DATA$var1 + DATA$Var2 + DATA$Var3 + DATA$Var4)
class_1 <- ok$rocs[[1]]
wants <- cbind(sensitivity = class_1$sensitivities, specificity = class_1$specificities, cutt_off = class_1$thresholds)
wants

Unexpected output containing plus, minus, and letters produced by subtracting one column of numbers from another in R

I have a data.frame containing a vector of numeric values (prcp_log).
waterdate PRCP prcp_log
<date> <dbl> <dbl>
1 2007-10-01 0 0
2 2007-10-02 0.02 0.0198
3 2007-10-03 0.31 0.270
4 2007-10-04 1.8 1.03
5 2007-10-05 0.03 0.0296
6 2007-10-06 0.19 0.174
I then pass this data through Christiano-Fitzgerald band pass filter using the following command from the mfilter package.
library(mFilter)
US1ORLA0076_cffilter <- cffilter(US1ORLA0076$prcp_log,pl=180,pu=365,root=FALSE,drift=FALSE,
type=c("asymmetric"),
nfix=NULL,theta=1)
Which creates an S3 object containing, among other things, and vector of "trend" values and a vector of "cycle" values, like so:
head(US1ORLA0076_cffilter$trend)
[,1]
[1,] 0.05439408
[2,] 0.07275321
[3,] 0.32150292
[4,] 1.07958965
[5,] 0.07799329
[6,] 0.22082246
head(US1ORLA0076_cffilter$cycle)
[,1]
[1,] -0.05439408
[2,] -0.05295058
[3,] -0.05147578
[4,] -0.04997023
[5,] -0.04843449
[6,] -0.04686915
Plotted:
plot(US1ORLA0076_cffilter)
I then apply the following mathematical operation in attempt to remove the trend and seasonal components from the original numeric vector:
US1ORLA0076$decomp <- ((US1ORLA0076$prcp_log - US1ORLA0076_cffilter$trend) - US1ORLA0076_cffilter$cycle)
Which creates an output of values which includes unexpected elements such as dashes and letters.
head(US1ORLA0076$decomp)
[,1]
[1,] 0.000000e+00
[2,] 0.000000e+00
[3,] 1.387779e-17
[4,] -2.775558e-17
[5,] 0.000000e+00
[6,] 6.938894e-18
What has happened here? What do these additional characters signify? How can perform this mathematical operation and achieve the desired output of simply $log_prcp minus both the $tend and $cycle values?
I am happy to provide any additional info that will help right away, just ask.

Sector wise mean wind speed and directions in R

I am trying to get mean wind speeds of a data-set based on its mean direction within a sector. It is fairly simple and the below program does the trick. I, however, am unable to automate it, meaning I have to manually input the values of fsector and esectorevery time. Also, the output is not where I would like. Please tell me a better way or help me on improving this one.
##Dummy Wind Speed and Directional Data.
ws<-c(seq(1,25,by=0.5))
wd<-C(seq(0,360,by=7.346939))
fsector<-22.5 ##Starting point
esector<-45 ##End point
wind <- as.data.frame(cbind(ws,wd))
wind$test<- ifelse(wind$wd > fsector & wind$wd < esector,'mean','greater')
mean<-rbind(aggregate(wind$wd,by=list(wind$test),mean))
meanws<-rbind(aggregate(wind$ws,by=list(wind$test),mean))
mean<-cbind(meanws[2,2],mean[2,2])
mean
It would be great if i can choose the number of sectors and automatically generate the list of mean wind speeds and mean direction. Thanks.
Actually I'm working with the same data.
First I do a wind rose like this:
And then, depending the direction, I put the data:
max(Windspeed[direc >=11.25 & direc <= 33.75])
min(Windspeed[direc >=11.25 & direc <= 33.75])
mean(Windspeed[direc >=11.25 & direc <= 33.75])
I put he direccion in degrees.
If you don't search that, I will be here waiting for help you.
Okay working on the idea by #monse-aleman and similar question by her, here. I was able to automate the program to give the required answer. The function listed is as:
in_interval <- function(x, interval){
stopifnot(length(interval) == 2L)
interval[1] < x & x < interval[2]
}
Using the above code over the data-set we get.
##Consider a dummy Wind Speed and Direction Data.
ws<-c(seq(1,25,by=0.5))
wd<-c(seq(0,360,by=7.346939))
## Determine the sector starting and end points.
a<-rbind(0.0 ,22.5 ,45.0 ,67.5 ,90.0 ,112.5 ,135.0 ,157.5 ,180.0 ,202.5 ,225.0 ,247.5 ,270.0 ,292.5 ,315.0,337.5)
b<-rbind(22.5 ,45.0 ,67.5 ,90.0 ,112.5 ,135.0 ,157.5 ,180.0 ,202.5 ,225.0 ,247.5 ,270.0 ,292.5 ,315.0,337.5,360)
sectors<-cbind(a,b)
sectors
## See the table of the sector.
[,1] [,2]
[1,] 0.0 22.5
[2,] 22.5 45.0
[3,] 45.0 67.5
[4,] 67.5 90.0
[5,] 90.0 112.5
[6,] 112.5 135.0
[7,] 135.0 157.5
[8,] 157.5 180.0
[9,] 180.0 202.5
[10,] 202.5 225.0
[11,] 225.0 247.5
[12,] 247.5 270.0
[13,] 270.0 292.5
[14,] 292.5 315.0
[15,] 315.0 337.5
[16,] 337.5 360.0
for(o in 1:16){
mean[o]<-mean(ws[in_interval(wd, c(sectors[o,1], sectors[o,2]))])
}
mean
[1] 2.0 3.5 5.0 6.5 8.0 9.5 11.0 12.5 14.0 15.5 17.0 18.5 20.0 21.5 23.0 24.5
This is the result. Works quite well.

Correlation between two quantitative variables with NAs and by group

I have this dataset:
dbppre dbppost per1pre per1post per2pre per2post
0.544331824055634 0.426482748529805 1.10388140870983 1.14622255457398 1.007302668 1.489675646
0.44544008292805 0.300746382647025 0.891104906479033 0.876840408251785 0.919450773 0.892276804
0.734783578764543 0.489971007532308 1.02796075709944 0.79655130374748 0.610340504 0.936092006
1.04113077142586 0.386513119551008 0.965359488375859 1.04314173155816 1.122001994 0.638452078
0.333368637355291 0.525460160226716 NA 0.633435747 1.196988457 0.396543005
1.76769244892893 0.726077921840058 1.08060419667991 0.974269083108835 1.245643507 1.292857474
1.41486783 NA 0.910710353033318 1.03435985624106 0.959985314 1.244732938
1.01932795229362 0.624195252685448 1.27809687379565 1.59656046306852 1.076534265 0.848544508
1.3919315726037 0.728230610741795 0.817900465495852 1.24505216554384 0.796182044 1.47318564
1.48912544220417 0.897585509143984 0.878534099910696 1.12148645028777 1.096723799 1.312244217
1.56801709691326 0.816474814896344 1.13655475536592 1.01299018097117 1.226607978 0.863016615
1.34144721808244 0.596169010679233 1.889775937 NA 1.094095173 1.515202105
1.17409999971024 0.626873517936125 0.912837009713984 0.814632450682884 0.898149331 0.887216585
1.06862027138743 0.427855128881696 0.727537839417515 1.15967069522768 0.98168375 1.407271061
1.50406121956726 0.507362673558659 1.780752715 0.658835953 2.008229626 1.231869338
1.44980944220763 0.620658801480513 0.885827192590202 0.651268425772394 1.067548223 0.994736445
1.27975202574336 0.877955236879164 0.595981804265367 0.56002696152466 0.770642278 0.519875921
0.675518080750329 0.38478948746306 0.822745530980815 0.796051785239611 1.16899539 1.16658889
0.839686262472682 0.481534573379965 0.632380676760052 0.656052506855686 0.796504954 1.035781891
.
.
.
As you can see, there are multiple cuantitative variables for gene expression data, each gene meassured two times, pre and post treatment, with some missing values in some of the variables.
Each row corresponds to one individual, and they are also divided in two groups (0 = control, 1 = really treated).
I would like to make a correlation (Spearman or Pearson depending on normality, but by group, and obtaining the correlation value and the p-value significance, avoiding the NAs.
Is it possible?
I know how to implement cor.test() function to compare two variables, but I could not find any variable inside this function to take groups into account.
I also discovered plyr and data.table libraries to do so, by groups, but they return just the correlation value without p-value, and I haven't been able to make it word for variables with NAs.
Suggestions?
You could use the Hmisc package.
library(Hmisc)
set.seed(10)
dt<-matrix(rnorm(100),5,5) #create matrix
dt[1,1]<-NA #introduce NAs
dt[2,4]<-NA #introduce NAs
cors<-rcorr(dt, type="spearman") #spearman correlation
corp<-rcorr(dt, type="pearson") #pearson correlation
> corspear
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0 0.4 0.2 0.5 -0.4
[2,] 0.4 1.0 0.1 -0.4 0.8
[3,] 0.2 0.1 1.0 0.4 0.1
[4,] 0.5 -0.4 0.4 1.0 -0.8
[5,] -0.4 0.8 0.1 -0.8 1.0
n
[,1] [,2] [,3] [,4] [,5]
[1,] 4 4 4 3 4
[2,] 4 5 5 4 5
[3,] 4 5 5 4 5
[4,] 3 4 4 4 4
[5,] 4 5 5 4 5
P
[,1] [,2] [,3] [,4] [,5]
[1,] 0.6000 0.8000 0.6667 0.6000
[2,] 0.6000 0.8729 0.6000 0.1041
[3,] 0.8000 0.8729 0.6000 0.8729
[4,] 0.6667 0.6000 0.6000 0.2000
[5,] 0.6000 0.1041 0.8729 0.2000
For further details see the help section: ?rcorr
rcorr returns a list with elements r, the matrix of correlations, n
the matrix of number of observations used in analyzing each pair of
variables, and P, the asymptotic P-values. Pairs with fewer than 2
non-missing values have the r values set to NA.

Extract numeric value from ACF in R

This is what I am trying to do:
x <- c(1,2,3,3,2,3,4,5,6)
my_acf = acf(x,plot=F)
> my_acf
Autocorrelations of series ‘x’, by lag
0 1 2 3 4 5 6 7 8
1.000 0.497 0.097 -0.047 -0.050 -0.075 -0.231 -0.376 -0.316
I want to extract only 0.497, the correlation coefficient on the first lag, and I want to have it as a numeric value. How can I do that?
Thank You
The answer is to use my_acf$acf[2]. Here is what lead me to the solution:
> attributes(my_acf)
$names
[1] "acf" "type" "n.used" "lag" "series" "snames"
$class
[1] "acf"
> my_acf$acf
, , 1
[,1]
[1,] 1.00000000
[2,] 0.49747475
[3,] 0.09722222
[4,] -0.04734848
[5,] -0.04987374
[6,] -0.07512626
[7,] -0.23106061
[8,] -0.37563131
[9,] -0.31565657
> my_acf$acf[2]
[1] 0.4974747
You can try like this
my_acf$acf

Resources