Logit-Transformation backwards - r

I've transformed some values from my dataset with the logit transformation from the car-package. The variable "var" represent these values and consists of percentage values.
However, if I transform them back via inv.logit from the boot-package, the values dont match the original ones.
data$var
46.4, 69.5, 82.7, 61.7, 76.4, 84.8, 69.1
data["var_logit"] <- logit(data$var, percents=TRUE)
data$var_logit
-0.137013943, 0.778005062, 1.454239241, 0.452148763, 1.102883518, 1.589885549, 0.760443432
data$var_logback <- inv.logit(data$var_logit)
0.46580 0.68525 0.81065 0.61115 0.75080 0.83060 0.68145
It looks like I have to multiply the result with 100 to get the previous values (or at least some very similar values), but I feel like I'm missing something.
Thanks for the help!

The other thing that's going on here is that car::logit automatically adjusts the data if there are 0 or 1 values:
adjust: adjustment factor to avoid proportions of 0 or 1; defaults to ‘0’ if there are no such proportions in the data, and to ‘.025’ if there are.
library(car)
dat <- c(46.4, 69.5, 82.7, 61.7, 76.4, 84.8, 69.1)
(L1 <- logit(dat, percents=TRUE))
## [1] -0.1442496 0.8236001 1.5645131
## 0.4768340 1.1747360 1.7190001 0.8047985
(L2 <- logit(c(dat,0),percents=TRUE))
## [1] -0.1370139 0.7780051 1.4542392 0.4521488
## 1.1028835 1.5898855 0.7604434 -3.6635616
## Warning message:
## In logit(c(0, dat)) : proportions remapped to (0.025, 0.975)
This means you can't invert the results as easily.
Here's a function (using the guts of car::inv.logit with a little help from Wolfram Alpha because I was too lazy to do the algebra) that inverts the result:
inv.logit <- function(f,a) {
a <- (1-2*a)
(a*(1+exp(f))+(exp(f)-1))/(2*a*(1+exp(f)))
}
zapsmall(inv.logit(L2,a=0.025)*100)
## [1] 46.4 69.5 82.7 61.7 76.4 84.8 69.1 0.0

You set the percents=TRUE flag, which divides your values by 100, and the inverse command does not know about it.

Related

How to calculate slope and distance of two vectors in r?

I want to calculate slope and distance of two vectors. I am using the following code
df = structure(list(x = c(92.2, 88.1, 95.8, 83.8, 76.7, 83.3, 101.1,
111.8, 84.3, 81.5, 76.2, 87.1), y = c(84.8, 78.5, 103.1, 90.4,
85.1, 78.2, 98.3, 109.2, 85.6, 86.9, 85.6, 94)), class = "data.frame", row.names = c(NA,
-12L))
x <- df$x
y <- df$y
#Slope
diff(y)/diff(x)
#Distance
dist(df, method = "euclidean")
You can see in the output of slope that 11 values are coming. I want to have the slope of 12-1 also. How can I get that? and the from distance output I only want the values of 1-2, 2-3, 3-4, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 10-11, 11-12 and 12-1 combinations. How can I achieve it?
The expected output is
Length 7.5 25.8 17.5 8.9 9.5 26.8 15.3 36.2 3.1 5.5 13.8 10.5
Slope 1.54 3.19 1.06 0.75 -1.05 1.13 1.02 0.86 -0.46 0.25 0.77 1.08
I think the diff approach by #Gregor Thomas is concise enough. Here is another option in case you are interested in dist for computing diatances.
> d <- rbind(df, df[1, ])
> with(d, diff(y) / diff(x))
[1] 1.5365854 3.1948052 1.0583333 0.7464789 -1.0454545 1.1292135
[7] 1.0186916 0.8581818 -0.4642857 0.2452830 0.7706422 -1.8039216
> (m <- as.matrix(dist(d)))[col(m) - row(m) == 1]
[1] 7.516648 25.776928 17.472550 8.860023 9.548298 26.848650 15.274161
[8] 36.238239 3.087070 5.457105 13.761177 10.519030
There's no nice diff function for getting the difference of the last and first vector elements, you can directly use (y[12] - y[1]) / (x[12] - x[1]), or if you want to be more general use tail(x, 1) for the last element and head(x, 1) for the first element. Calculate it directly and append it to your slope vector.
For euclidean distance, of successive points, its most direct to calculate it directly: distance = sqrt(diff(x)^2 + diff(y)^2).
(slope = c(diff(y)/diff(x), (head(y, 1) - tail(y, 1)) / (head(x, 1) - tail(x, 1))))
# [1] 1.5365854 3.1948052 1.0583333 0.7464789 -1.0454545 1.1292135 1.0186916
# [8] 0.8581818 -0.4642857 0.2452830 0.7706422 1.8039216
(distance = sqrt(diff(x)^2 + diff(y)^2))
# [1] 7.516648 25.776928 17.472550 8.860023 9.548298 26.848650 15.274161 36.238239 3.087070 5.457105 13.761177
I'll leave it as an exercise for the reader to add the last distance between the first and last points.

Turning PCA output into dataframe in R

I've run a principal components analysis. The output pca1$loadings looks like a dataframe, but it's not. Is there a way to turn this into a dataframe?
I'd like to be able to sort the columns of the output. It would also be nice if I could use the output in Excel.
This is the code I used to generate the PCA.
cor <- cor(df[, 1:87]) #correlation matrix with all dv's
pca1 <- principal(cor, nfactors = 87, rotate = "varimax")
pca1$loadings
The object is of class loadings, to convert to dataframe use as.data.frame.matrix
pca1 <- psych::principal(cor, nfactors = 87, rotate = "varimax")$loadings
as.data.frame.matrix(pca1)
Using reproducible example with mtcars
cor <- cor(mtcars)
pca1 <- psych::principal(cor, nfactors =2, rotate = "varimax")$loadings
as.data.frame.matrix(pca1)
# RC1 RC2
#mpg 0.6846 -0.6329
#cyl -0.6373 0.7231
#disp -0.7328 0.6044
#hp -0.3233 0.8828
#drat 0.8533 -0.2091
#wt -0.7989 0.4557
#qsec -0.1591 -0.8996
#vs 0.2996 -0.8206
#am 0.9206 0.0774
#gear 0.9066 0.1661
#carb 0.0775 0.8660
A shorter version is to just remove the class attribute
unclass(pca1)

Is there an R function to return a parameter in a list that can not be find by str(list)

I’m trying to return a parameter in a list, but I cannot find the parameter using str(list).
this is my codes
install.packages("meta")
library(meta)
m1 <- metacor(c(0.85, 0.7, 0.95), c(20, 40, 10))
m1
COR 95%-CI %W(fixed) %W(random)
1 0.8500 [0.6532; 0.9392] 27.9 34.5
2 0.7000 [0.4968; 0.8304] 60.7 41.7
3 0.9500 [0.7972; 0.9884] 11.5 23.7
Number of studies combined: k = 3
COR 95%-CI z p-value
Fixed effect model 0.7955 [0.6834; 0.8710] 8.48 < 0.0001
Random effects model 0.8427 [0.6264; 0.9385] 4.87 < 0.0001
how could I save COR(=0.8427) orp-value(=< 0.0001) forRandom effects model as a single parameter.
It seems that the numbers that you are looking for (cor 0.8427) are created in print.meta. The function seems too big though so I gave up trying to pinpoint exactly where it gets calculated and what name it has. I don't think it is even saved within the function, but rather printed.
Anyway I took the alternative road of capturing the output:
#capture the output of the summary - the fifth line gives us what we want
out <- capture.output(summary(m1))[5]
#capture all the number and return the first
unlist(regmatches(out, gregexpr("[[:digit:]]+\\.*[[:digit:]]*", out)))[1]
#[1] "0.8427"
I assume your problem is accessing to the object.
The $ will help you with it, such that by putting the variablename, then the dollar and by pressing the tab, the different possibilities of that object will appear. According to you questions, the values would be
> m1$cor[1]
[1] 0.85
> mysummary<-summary(m1)
> mysummary$fixed$p
[1] 2.163813e-17
> mysummary$fixed$z
[1] 8.484643
> ifelse(mysummary$fixed$p<0.0001, "<0.0001", "WHATEVER")
[1] "<0.0001"
To select a specific one, you can use [i] where i is an integer (example i = 1 for 0.85)
To get a 0.0001, I suggest using an ifelse() statement on pvalues or Z with their according rule. Cheers !

Binning data in R

I have a vector with around 4000 values. I would just need to bin it into 60 equal intervals for which I would then have to calculate the median (for each of the bins).
v<-c(1:4000)
V is really just a vector. I read about cut but that needs me to specify the breakpoints. I just want 60 equal intervals
Use cut and tapply:
> tapply(v, cut(v, 60), median)
(-3,67.7] (67.7,134] (134,201] (201,268]
34.0 101.0 167.5 234.0
(268,334] (334,401] (401,468] (468,534]
301.0 367.5 434.0 501.0
(534,601] (601,668] (668,734] (734,801]
567.5 634.0 701.0 767.5
(801,867] (867,934] (934,1e+03] (1e+03,1.07e+03]
834.0 901.0 967.5 1034.0
(1.07e+03,1.13e+03] (1.13e+03,1.2e+03] (1.2e+03,1.27e+03] (1.27e+03,1.33e+03]
1101.0 1167.5 1234.0 1301.0
(1.33e+03,1.4e+03] (1.4e+03,1.47e+03] (1.47e+03,1.53e+03] (1.53e+03,1.6e+03]
1367.5 1434.0 1500.5 1567.0
(1.6e+03,1.67e+03] (1.67e+03,1.73e+03] (1.73e+03,1.8e+03] (1.8e+03,1.87e+03]
1634.0 1700.5 1767.0 1834.0
(1.87e+03,1.93e+03] (1.93e+03,2e+03] (2e+03,2.07e+03] (2.07e+03,2.13e+03]
1900.5 1967.0 2034.0 2100.5
(2.13e+03,2.2e+03] (2.2e+03,2.27e+03] (2.27e+03,2.33e+03] (2.33e+03,2.4e+03]
2167.0 2234.0 2300.5 2367.0
(2.4e+03,2.47e+03] (2.47e+03,2.53e+03] (2.53e+03,2.6e+03] (2.6e+03,2.67e+03]
2434.0 2500.5 2567.0 2634.0
(2.67e+03,2.73e+03] (2.73e+03,2.8e+03] (2.8e+03,2.87e+03] (2.87e+03,2.93e+03]
2700.5 2767.0 2833.5 2900.0
(2.93e+03,3e+03] (3e+03,3.07e+03] (3.07e+03,3.13e+03] (3.13e+03,3.2e+03]
2967.0 3033.5 3100.0 3167.0
(3.2e+03,3.27e+03] (3.27e+03,3.33e+03] (3.33e+03,3.4e+03] (3.4e+03,3.47e+03]
3233.5 3300.0 3367.0 3433.5
(3.47e+03,3.53e+03] (3.53e+03,3.6e+03] (3.6e+03,3.67e+03] (3.67e+03,3.73e+03]
3500.0 3567.0 3633.5 3700.0
(3.73e+03,3.8e+03] (3.8e+03,3.87e+03] (3.87e+03,3.93e+03] (3.93e+03,4e+03]
3767.0 3833.5 3900.0 3967.0
In the past, i've used this function
evenbins <- function(x, bin.count=10, order=T) {
bin.size <- rep(length(x) %/% bin.count, bin.count)
bin.size <- bin.size + ifelse(1:bin.count <= length(x) %% bin.count, 1, 0)
bin <- rep(1:bin.count, bin.size)
if(order) {
bin <- bin[rank(x,ties.method="random")]
}
return(factor(bin, levels=1:bin.count, ordered=order))
}
and then i can run it with
v.bin <- evenbins(v, 60)
and check the sizes with
table(v.bin)
and see they all contain 66 or 67 elements. By default this will order the values just like cut will so each of the factor levels will have increasing values. If you want to bin them based on their original order,
v.bin <- evenbins(v, 60, order=F)
instead. This just split the data up in the order it appears
This result shows the 59 median values of the break-points. The 60 bin values are probably as close to equal as possible (but probably not exactly equal).
> sq <- seq(1, 4000, length = 60)
> sapply(2:length(sq), function(i) median(c(sq[i-1], sq[i])))
# [1] 34.88983 102.66949 170.44915 238.22881 306.00847 373.78814
# [7] 441.56780 509.34746 577.12712 644.90678 712.68644 780.46610
# ......
Actually, after checking, the bins are pretty darn close to being equal.
> unique(diff(sq))
# [1] 67.77966 67.77966 67.77966

How can I exclude all the elements below a number in a vector, and at the same time remove the corresponding element in the other vector in R?

Suppose I have two vectors, representing the height and weight of the 97 participants in a research, now I want to remove all the observation with height below 2m, and at the same time remove the corresponding observations in the weight vector. What functions should I use in R?
You can get a boolean vector by comparing height vector and use that to filter both height and weight vectors.
height.check <- height < 200 # taken in cm scale
height <- height[!height.check]
weight <- weight[!height.check]
You want a data frame (use ?data.frame for info)
x <- data.frame("Participant"=paste("Participant",1:97,sep="_"),
"Height"=height_vector,
"Weight"=weight_vector)
where height_vector and weight_vector are your data
x2 <- x[x$Height >= 2,]
Since you gave us no data, I produced some fake data.
> height <- c(2.0, 1.75, 2.15, 1.98, 1.45) ## in meters
> weight <- c(200, 178, 180, 198, 205) ## in pounds
We can remove the unwanted values using vector operations:
> height[height < 2.0]
[1] 1.75 1.98 1.45
> weight[height < 2.0]
[1] 178 198 205
But it's best to put the two vectors together into a data.frame and then subset on the condition that height is less than 2. This will automatically remove the corresponding weights.
> d <- data.frame(height = c(2.0, 1.75, 2.15, 1.98, 1.45),
weight = c(200, 178, 180, 198, 205))
> d[d$height < 2, ]
height weight
2 1.75 178
4 1.98 198
5 1.45 205

Resources