R package for calculating partial coefficient of determination? [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Anyone know of an R package for calculating partial R^2 in multiple regression? I've tried the command partial.R2 from package asbio, but it is giving error messages even with the example from supplied documentation.
Many thanks.

I've found out that command lm.sumSquares from package lmSupport provides by partial and semipartial correlations.

Data from 'Applied Linear Statistical Models' by John Neter, Michael H Kutner, William Wasserman, Christopher J. Nachtsheim
Section 7.4 in page 274:
# body fat example from Neter et al. via rhelp archives:
bf.dat <- read.table(text="x1 x2 x3 y
1 19.5 43.1 29.1 11.9
2 24.7 49.8 28.2 22.8
3 30.7 51.9 37.0 18.7
4 29.8 54.3 31.1 20.1
5 19.1 42.2 30.9 12.9
6 25.6 53.9 23.7 21.7
7 31.4 58.5 27.6 27.1
8 27.9 52.1 30.6 25.4
9 22.1 49.9 23.2 21.3
10 25.5 53.5 24.8 19.3
11 31.1 56.6 30.0 25.4
12 30.4 56.7 28.3 27.2
13 18.7 46.5 23.0 11.7
14 19.7 44.2 28.6 17.8
15 14.6 42.7 21.3 12.8
16 29.5 54.4 30.1 23.9
17 27.7 55.3 25.7 22.6
18 30.2 58.6 24.6 25.4
19 22.7 48.2 27.1 14.8
20 25.2 51.0 27.5 21.1 ", header=TRUE)
library(rms) # will also load Hmisc
fit <- ols(y ~ x1 + x2, data=bf.dat)
plt <- plot(anova(fit), what='partial R2')
plt
# x2 x1
#0.066955220 0.007010427

Related

Matrices of difent size multiplication

Good evenning
In Rstudio
I have a problem multiplying these two matrices of a different size, and it becomes worse because I have to multiply in such a way that the values in the row d2$ID=1 have to multiply only the repetitions of w$sample=1.
sample and ID are indicating is the same sample
In other words, from the "subset" d2$ID=1, every single value ("L1", "ST", "GR", "CB", "HSK", "DDM") has to multiply the whole "subset" w$sample=1 (4 rows in this case, but not always), yes, all the values "G2", "G4", "G6", "G8", "G12"
>d2
ID L1 ST GR CB HSK DDM
1 1 0.1662000 0.2337000 0.3637000 0.11110000 0.10100000 0.024300000
2 2 0.1896576 0.2280830 0.3705740 0.09406879 0.09319434 0.024422281
3 3 0.1110259 0.2217769 0.4180797 0.11122498 0.10902635 0.028866094
4 4 0.1558785 0.2008862 0.4222565 0.09805538 0.10218119 0.020742172
5 5 0.1536421 0.1674096 0.4205395 0.14362176 0.08635519 0.028431849
6 6 0.1841964 0.1514189 0.4603306 0.10243621 0.08928011 0.012337688
> w
sample G2 G4 G6 G8 G12
1 1 10.9 15.9 21.4 28.0 37.8
2 1 11.5 16.6 22.2 29.5 38.3
3 1 10.3 15.1 20.7 28.3 36.7
4 1 11.7 18.1 24.8 31.2 39.5
5 2 11.0 16.8 22.4 30.6 38.0
6 2 10.1 15.9 22.5 30.2 36.7
7 2 12.8 17.8 22.8 28.7 37.1
8 2 11.8 16.3 20.8 27.3 34.7
9 2 11.9 16.7 21.6 28.3 34.6
10 3 12.0 18.1 24.2 30.9 40.0
11 3 12.2 17.7 24.2 31.7 40.5
12 4 11.1 16.5 22.7 31.0 39.2
13 4 12.5 19.8 27.4 32.8 38.8
14 4 12.4 19.2 25.8 33.0 39.9
15 4 12.4 19.2 26.2 33.4 38.9
16 4 13.4 18.3 23.7 30.0 38.2
17 5 13.3 18.6 24.0 30.7 38.4
18 5 13.3 18.1 22.9 30.1 36.8
19 5 13.7 19.9 26.5 33.8 43.0
20 5 12.7 18.2 24.6 32.5 41.3
21 6 12.1 17.5 24.3 33.7 42.2
22 6 14.5 20.8 28.4 35.3 43.7
I have check already a lot of questions but I can't figure it out, specially because most of the information is for matrices of the same size.
I tried by filtering the data from d2, but the data set is really big, then is really inefficient.
I am a beginner, if you consider is so easy I would appreciate at least a hint, please!
I have several data sets like these ones...
Thanks in advance!
This seems to perform as requested:
res <- apply(w, 1, function(x){ unclass(
outer(as.matrix( x[-1] ),
as.matrix( d2[1, c( "L1", "ST", "GR", "CB", "HSK", "DDM")])))})
str(res)
# result
# num [1:30, 1:22] 1.81 2.64 3.56 4.65 6.28 ...
# - attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:22] "1" "2" "3" "4" ...
I almost got it right on the first pass but after some debugging found that I needed to add the as.matrix call to both arguments inside outer (so to speak ;-). To explain my logic ... I wanted to run down each row of w with apply and then use match on the value of the first column (of each row of w) to the unique row of d2. The match function is designed for just this purpose, to return a suitable number to be used for indexing. Then with the rest of the row (x[-1] by the time it was passed through the function call), I would use outer on the row values crossed with the desired row and columns of d2. If you do it without the as.matrix calls you get an error message:
Error in tcrossprod(x, y) :
requires numeric/complex matrix/vector arguments
I don't think that's a very informative error message. Both of the arguments were numeric vectors.

read data from clipboard correctly in r

I want to read data to r from clipboard but the data dimension is wrong. The question is how I can read data from clipboard correctly and how can I distinguish the data separator.
My data is this
group month Estimate lwr upr
placebo 0 18.7 17.6 19.9
placebo 6 21.5 20.3 22.7
placebo 12 24.3 22.8 25.7
placebo 18 27.0 25.2 28.9
active 0 18.7 17.6 19.9
active 6 20.8 19.6 22.0
active 12 22.9 21.4 24.3
active 18 25.0 23.1 26.8
Code I tried is this
d1 <- read.delim('clipboard')
d2 <- readClipboard()

Draw regression line per row in R

I have the following data.
HEIrank1
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 41.8 147.6 90.3 82.9 106.8 63.0
2 MO 20.0 20.8 21.1 20.9 12.6 20.6
3 SD 21.2 32.3 25.7 23.9 25.0 40.1
4 UN 51.8 39.8 19.9 20.9 21.6 22.5
5 WS 18.0 19.9 15.3 13.6 15.7 15.2
6 BF 11.5 36.9 20.0 23.2 18.2 23.8
7 ME 34.2 30.3 28.4 30.1 31.5 25.6
8 IM 7.7 18.1 20.5 14.6 17.2 17.1
9 OM 11.4 11.2 12.2 11.1 13.4 19.2
10 DC 14.3 28.7 20.1 17.0 22.3 16.2
11 OC 28.6 44.0 24.9 27.9 34.0 30.7
12 TH 7.4 10.0 5.8 8.8 8.7 8.6
13 CC 12.1 11.0 12.2 12.1 14.9 15.0
14 MM 11.7 24.2 18.4 18.6 31.9 31.7
15 MC 19.0 13.7 17.0 20.4 20.5 12.1
16 SH 11.4 24.8 26.1 12.7 19.9 25.9
17 SB 13.0 22.8 15.9 17.6 17.2 9.6
18 SN 11.5 18.6 22.9 12.0 20.3 11.6
19 ER 10.8 13.2 20.0 11.0 14.9 14.2
20 SL 44.9 21.6 21.3 26.5 17.0 8.0
I try following commends to draw regression line for each HEIs.
year <- c(2007 , 2008 , 2009 , 2010 , 2011, 2012)
op <- as.numeric(HEIrank1[1,])
lm.r <- lm(op~year)
plot(year, op)
abline(lm.r)
I want to draw to draw regression line for each college in one graph and I do not how.can you help me.
Here's my approach with ggplot2 but the graph is uninterpretable with that many lines.
library(ggplot2);library(reshape2)
mdat <- melt(HEIrank1, variable.name="year")
mdat$year <- as.numeric(substring(mdat$year, 2))
ggplot(mdat, aes(year, value, colour=HEI.ID, group=HEI.ID)) +
geom_point() + stat_smooth(se = FALSE, method="lm")
Faceting may be a better way to got:
ggplot(mdat, aes(year, value, group=HEI.ID)) +
geom_point() + stat_smooth(se = FALSE, method="lm") +
facet_wrap(~HEI.ID)

Shift time series

I have 2 weekly time-series, which show a small correlation (~0.33).
How can i 'shift in time' one of these series, so that i can check if there's a
greater correlation in the data?
Example data:
x = textConnection('1530.2 1980.9 1811 1617 1585.4 1951.8 2146.6 1605 1395.2 1742.6 2206.5 1839.4 1699.1 1665.9 2144.7 2189.1 1718.4 1615.5 2003.3 2267.6 1772.1 1635.2 1836 2261.8 1799.1 1634.9 1638.6 2056.5 2201.4 1726.8 1586.4 1747.9 1982 1695.2 1624.9 1652.4 2011.9 1788.8 1568.4 1540.7 1866.1 2097.3 1601.3 1458.6 1424.4 1786.9 1628.4 1467.4 1476.2 1823 1736.7 1482.7 1334.2 1871.9 1752.9 1471.6 1583.2 1601.4 1987.7 1649.6 1530.9 1547.1 2165.2 1852 1656.9 1605.2 2184.6 1972 1617.6 1491.1 1709.5 2042.2 1667.1 1542.6 1497.6 2090.5 1816.8 1487.5 1468.2 2228.5 1889.9 1690.8 1395.7 1532.8 1934.4 1557.1 1570.6 1453.2 1669.6 1782 1526.1 1411 1608.1 1740.5 1492.3 1477.8 1102.6 1366.1 1701.1 1500.6 1403.2 1787.2 1776.6 1465.3 1429.5')
x = scan(x)
y = textConnection('29.8 22.6 26 24.8 28.9 27.3 26 29.2 28.2 23.9 24.5 23.6 21.1 22 20.7 19.9 22.8 25 21.6 19.1 27.2 23.7 24.2 22.4 25.5 25.4 23.4 24.7 27.4 23.4 25.8 28.8 27.7 23.7 22.9 29.4 22.6 28.6 22.2 27.6 26.2 26.2 29.8 31.5 24.5 28.7 25.9 26.9 25.9 30.5 30.5 29.4 29.3 31.4 30 27.9 28.5 26.4 29.5 28.4 25.1 24.6 21.1 23.6 20.5 23.7 25.3 20.2 23.4 21.1 23.1 24.6 20.7 20.7 26.9 24.1 24.7 25.8 26.7 26 28.9 29.5 27.4 22.1 31.6 25 27.4 30.4 28.9 27.4 22.5 28.4 28.7 31.1 29.3 28.3 30.6 28.6 26 26.2 26.2 26.7 25.6 31.5 30.9')
y = scan(y)
I'm using R with dtw package, but i'm not familiar with these kind of algorithms.
Thanks for any help!
You could try the ccf() function in base R. This estimates the cross-correlation function of the two time series.
For example, using your data (see below if interested in how I got the data you pasted into your Question into R objects x and y)
xyccf <- ccf(x, y)
yielding
> xyccf
Autocorrelations of series ‘X’, by lag
-17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7
0.106 0.092 0.014 0.018 0.011 0.029 -0.141 -0.153 -0.107 -0.141 -0.221
-6 -5 -4 -3 -2 -1 0 1 2 3 4
-0.274 -0.175 -0.277 -0.176 -0.217 -0.253 -0.339 -0.274 -0.267 -0.330 -0.278
5 6 7 8 9 10 11 12 13 14 15
-0.184 -0.120 -0.200 -0.156 -0.184 -0.062 -0.076 -0.117 -0.048 0.015 -0.016
16 17
-0.038 -0.029
and this plot
To interpret this, when the lag is positive, y is leading x whereas when the lag is negative x is leading y.
Reading your data into R...
x <- scan(text = "1530.2 1980.9 1811 1617 1585.4 1951.8 2146.6 1605 1395.2 1742.6
2206.5 1839.4 1699.1 1665.9 2144.7 2189.1 1718.4 1615.5 2003.3
2267.6 1772.1 1635.2 1836 2261.8 1799.1 1634.9 1638.6 2056.5
2201.4 1726.8 1586.4 1747.9 1982 1695.2 1624.9 1652.4 2011.9
1788.8 1568.4 1540.7 1866.1 2097.3 1601.3 1458.6 1424.4 1786.9
1628.4 1467.4 1476.2 1823 1736.7 1482.7 1334.2 1871.9 1752.9
1471.6 1583.2 1601.4 1987.7 1649.6 1530.9 1547.1 2165.2 1852
1656.9 1605.2 2184.6 1972 1617.6 1491.1 1709.5 2042.2 1667.1
1542.6 1497.6 2090.5 1816.8 1487.5 1468.2 2228.5 1889.9 1690.8
1395.7 1532.8 1934.4 1557.1 1570.6 1453.2 1669.6 1782 1526.1
1411 1608.1 1740.5 1492.3 1477.8 1102.6 1366.1 1701.1 1500.6
1403.2 1787.2 1776.6 1465.3 1429.5")
y <- scan(text = "29.8 22.6 26 24.8 28.9 27.3 26 29.2 28.2 23.9 24.5 23.6 21.1 22
20.7 19.9 22.8 25 21.6 19.1 27.2 23.7 24.2 22.4 25.5 25.4 23.4
24.7 27.4 23.4 25.8 28.8 27.7 23.7 22.9 29.4 22.6 28.6 22.2 27.6
26.2 26.2 29.8 31.5 24.5 28.7 25.9 26.9 25.9 30.5 30.5 29.4 29.3
31.4 30 27.9 28.5 26.4 29.5 28.4 25.1 24.6 21.1 23.6 20.5 23.7
25.3 20.2 23.4 21.1 23.1 24.6 20.7 20.7 26.9 24.1 24.7 25.8 26.7
26 28.9 29.5 27.4 22.1 31.6 25 27.4 30.4 28.9 27.4 22.5 28.4 28.7
31.1 29.3 28.3 30.6 28.6 26 26.2 26.2 26.7 25.6 31.5 30.9")

gnuplot input file 7 columns with decimals

I am trying to graph the following data file:
61.0 16.4 100.0 28.6 28.6 12.2 12.2
59.0 25.4 100.0 21.4 21.4 11.8 11.8
69.0 15.9 100.0 35.7 35.7 11.5 11.5
59.0 23.7 100.0 23.4 23.4 11.8 11.8
49.0 20.4 100.0 18.0 18.0 9.8 9.8
84.0 13.1 90.9 50.8 50.8 16.8 16.8
59.0 16.9 100.0 22.6 22.6 11.8 11.8
71.0 16.9 100.0 32.8 32.8 14.2 14.2
68.0 19.1 100.0 26.2 26.2 13.6 13.6
91.0 13.2 100.0 51.6 51.6 18.2 18.2
57.0 22.8 100.0 29.4 29.4 11.4 11.4
52.0 26.9 100.0 17.8 17.8 10.4 10.4
55.0 21.8 100.0 32.2 32.2 11.0 11.0
68.0 19.1 100.0 29.8 29.8 13.6 13.6
50.0 22.0 100.0 19.0 19.0 10.0 10.0
149.0 12.1 66.7 111.2 111.2 29.8 29.8
69.0 20.3 100.0 29.8 29.8 13.8 13.8
I am very new to gnuplot I cant seem to figure out what the correct code will be to get this graph:
I was trying something like this:
gnuplot> set output 'datastore1.png'
gnuplot> plot 'desktop1.dat' using 0:1 title "totalio" with lines, 'desktop1.dat' using 0:2 title "readpercentage" with lines, 'desktop1.dat' using 0:3 title "cachehitpercentage" with lines, 'desktop1.dat' using 0:4 title "currentkbpersecond" with lines, 'desktop1.dat' using 0:5 title "maximumkbpersecond" with lines, 'desktop1.dat' using 0:6 title "currentiopersecond" with lines, 'desktop1.dat' using 0:7 title "maximumiopersecond" with lines
gnuplot> quit
However the graph is not exactly correct.
Thanks for the help!
Not sure what you are trying to plot here, but I think the error is that you are using the zero-th column for the 'using' command which does not exist. Rather use this
p 'desktop1.dat' u 1:2, 'desktop1.dat' u 1:3
edit
So when you are plotting against time, you might want to add another column to the data that you read in from the file such that you have
15 61.0 16.4 100.0 28.6 28.6 12.2 12.2
as an example for the first line of your data. Afterwards you use the given plotting command I gove above.

Resources