Draw regression line per row in R - r

I have the following data.
HEIrank1
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 41.8 147.6 90.3 82.9 106.8 63.0
2 MO 20.0 20.8 21.1 20.9 12.6 20.6
3 SD 21.2 32.3 25.7 23.9 25.0 40.1
4 UN 51.8 39.8 19.9 20.9 21.6 22.5
5 WS 18.0 19.9 15.3 13.6 15.7 15.2
6 BF 11.5 36.9 20.0 23.2 18.2 23.8
7 ME 34.2 30.3 28.4 30.1 31.5 25.6
8 IM 7.7 18.1 20.5 14.6 17.2 17.1
9 OM 11.4 11.2 12.2 11.1 13.4 19.2
10 DC 14.3 28.7 20.1 17.0 22.3 16.2
11 OC 28.6 44.0 24.9 27.9 34.0 30.7
12 TH 7.4 10.0 5.8 8.8 8.7 8.6
13 CC 12.1 11.0 12.2 12.1 14.9 15.0
14 MM 11.7 24.2 18.4 18.6 31.9 31.7
15 MC 19.0 13.7 17.0 20.4 20.5 12.1
16 SH 11.4 24.8 26.1 12.7 19.9 25.9
17 SB 13.0 22.8 15.9 17.6 17.2 9.6
18 SN 11.5 18.6 22.9 12.0 20.3 11.6
19 ER 10.8 13.2 20.0 11.0 14.9 14.2
20 SL 44.9 21.6 21.3 26.5 17.0 8.0
I try following commends to draw regression line for each HEIs.
year <- c(2007 , 2008 , 2009 , 2010 , 2011, 2012)
op <- as.numeric(HEIrank1[1,])
lm.r <- lm(op~year)
plot(year, op)
abline(lm.r)
I want to draw to draw regression line for each college in one graph and I do not how.can you help me.

Here's my approach with ggplot2 but the graph is uninterpretable with that many lines.
library(ggplot2);library(reshape2)
mdat <- melt(HEIrank1, variable.name="year")
mdat$year <- as.numeric(substring(mdat$year, 2))
ggplot(mdat, aes(year, value, colour=HEI.ID, group=HEI.ID)) +
geom_point() + stat_smooth(se = FALSE, method="lm")
Faceting may be a better way to got:
ggplot(mdat, aes(year, value, group=HEI.ID)) +
geom_point() + stat_smooth(se = FALSE, method="lm") +
facet_wrap(~HEI.ID)

Related

Subsetting and Looping a Time Series Data in R

I have a dataset of timeseries (30 years). I did a subset for the month and the date I want (shown below in the code). Is there a way to do a loop for each month and the days in those month? Also, is there a way to save the plots automatically, in different folders corresponding to each month? Right now I am doing it manually by changing the month and date which corresponds to dfOct31all <- df [ which(df$Month==10 & df$Day==31), ]in the code below then plotting and saving it. By the way, I'm using RStudio.
Can someone please guide me?
Thanks!
setwd("WDir")
df <- read.csv("Velocity.csv", header = TRUE)
attach(df)
#Day 31
dfOct31all <- df [ which(df$Month==10 & df$Day==31), ]
dfall31Mbs <- dfOct31all[c(-1,-2,-3)]
densities <- lapply(dfall31Mbs, density)
par(mfcol=c(5,5), oma=c(1,1,0,0), mar=c(1,1,1,0), tcl=-0.1, mgp=c(0,0,0))
plot(densities[[1]], col="black",main = "1000mb",xlab=NA,ylab=NA)
plot(densities[[2]], col="black",main="925mb",xlab=NA,ylab=NA)
plot(densities[[3]], col="black",main="850mb",xlab=NA,ylab=NA)
plot(densities[[4]], col="black",main="700mb",xlab=NA,ylab=NA)
plot(densities[[5]], col="black",main="600mb",xlab=NA,ylab=NA)
plot(densities[[6]], col="black",main="500mb",xlab=NA,ylab=NA)
plot(densities[[7]], col ="black",main="400mb",xlab=NA,ylab=NA)
plot(densities[[8]], col="black",main="300mb",xlab=NA,ylab=NA)
plot(densities[[9]], col="black",main="250mb",xlab=NA,ylab=NA)
plot(densities[[10]], col="black",main="200mb",xlab=NA,ylab=NA)
plot(densities[[11]], col= "black",main="150mb",xlab=NA,ylab=NA)
plot(densities[[12]], col= "black",main="100mb",xlab=NA,ylab=NA)
plot(densities[[13]], col = "black",main="70mb",xlab=NA,ylab=NA)
plot(densities[[14]], col="black",main="50mb",xlab=NA,ylab=NA)
plot(densities[[15]], col="black",main="30mb",xlab=NA,ylab=NA)
plot(densities[[16]], col = "black",main="20mb",xlab=NA,ylab=NA)
plot(densities[[17]], col="black",main="10mb",xlab=NA,ylab=NA)
Snippet of data is shown as well
Year Month Day 1000mb 925mb 850mb 700mb 600mb 500mb 400mb 300mb 250mb 200mb 150mb 100mb 70mb 50mb 30mb 20mb 10mb
1984 10 31 6 6.6 7.9 11.5 14.6 17 20.8 25.8 26.4 25.3 24.4 22.7 19.9 19.2 20.4 24.8 30.8
1985 10 31 5.8 7.1 7.7 11.5 14.7 17.3 25.3 32.6 32.9 32.4 27.1 20.9 14.2 9.7 6.4 7.3 7.4
1986 10 31 4.3 6.1 7.7 11.3 18.4 26.3 34.4 44.5 48.9 46.2 34.5 20.4 13.8 13.2 21.7 31 46.4
1987 10 31 2.2 2.9 4 7 9 13.9 19.9 25.8 26.6 23.7 17.3 12 7 3.1 1.7 5.8 14.1
1988 10 31 2.5 2.1 2.3 6.5 6.4 5.1 7.4 12.1 13.4 16.1 16.7 15.2 8.8 5 2.8 6.2 8.9
1989 10 31 3.4 4 4.7 4.4 4.1 4 4.6 4.8 5.9 5.6 10.9 13.9 12.3 10.4 8.1 8 8
1990 10 31 4 4.9 7.5 14.6 19 21.9 25.7 28.3 29.4 29.2 27.3 18 12.6 10.1 9 12 19.9
1991 10 31 2.8 3.2 4 10.8 12.1 11.2 9.9 9.1 9.9 12.8 18 17.5 10.4 6.3 4.2 7.6 11.7
1992 10 31 5.9 6.9 7.9 13.1 17.9 25.2 34.6 47.3 53.3 53 42.4 21.3 11.6 6 4.6 8.5 12.8
1993 10 31 2.3 1.5 0.4 3.6 6.3 10.1 14.3 19.1 21.6 21.8 18.4 13.6 12.3 9.5 6.9 11 18.1
1994 10 31 2 2.2 3.8 11.6 17 19.8 23.6 24.9 25.5 26.2 28.4 25.2 16.7 13.6 9.3 8.3 9.8
1995 10 31 1.5 2 3.4 7.6 9.1 11.2 13.7 17.9 20.3 21.7 21.1 16.7 13 12.1 14.9 21.4 27.3
1996 10 31 1.9 2.4 3.5 8 11.7 17.4 26.4 35.6 33.3 24.6 12.4 4.1 0.5 3.4 7.2 9.4 11.6
1997 10 31 3.7 4.8 7.8 19.2 24.6 29.6 35.6 41 41.8 42 37.9 23.7 11.2 8.6 4.2 3.8 7
1998 10 31 0.7 1.1 0.9 4.8 8.4 11.4 14 25.3 29.7 25.2 15.9 6.6 2.1 1 4.5 8.9 6.1
1999 10 31 1.9 1.6 2.4 10.7 15.3 19 23.2 29 32.4 31.9 28 20.3 10.8 9.4 12 14.5 16.9
2000 10 31 5.1 5.8 6.7 12.8 18.2 23.9 29.9 40.7 42.2 33.7 23.5 12.7 2.6 1.6 3.8 4.7 5.1
2001 10 31 5.7 6.1 7.1 10.1 10.8 14.7 18.3 22.8 22.3 22.2 22 14 9.5 6.6 5.2 6.5 8.6
2002 10 31 1.4 1.6 1.8 9.2 14.5 19.5 24.8 30 30.5 27.6 22.2 13.9 9.1 7.1 8.5 16.1 23.8
2003 10 31 1.5 1.3 0.7 1 3.5 6 11.7 21.5 21.9 22.9 23 20.7 15.8 12.5 14.5 20.1 26
2004 10 31 5.4 5.6 6.9 14.4 23.3 33.3 46.1 60.9 62.1 54.6 42.9 28 17.3 12.3 10.1 13.6 13.3
2005 10 31 1.7 1.3 3 10.3 15.8 19.5 21.1 22.8 24.1 24.5 24.5 20.6 13.5 10.7 10 10.7 10.4
2006 10 31 2.3 1.5 1.7 8.7 12.5 15.9 18.7 20.5 21.8 24.3 29.9 25.3 18.3 12.8 7.7 8.8 12.4
2007 10 31 3.7 2.7 2.3 2.2 2.6 4.2 6.5 11.9 15.9 19.6 17.2 9.5 6.9 5.7 4.9 5.8 11.7
2008 10 31 7.7 10.8 14.3 20.3 23 25.8 27.4 32.1 35.4 34.8 25.8 13.2 7.1 2.9 2.6 3.4 6
2009 10 31 0.5 0.2 2 9.3 13.5 17.6 18.8 20.8 21.4 21.2 18.9 14.2 11.1 6.4 1.9 3 8
2010 10 31 5.6 6.8 8.5 13.4 16.5 20.3 23.8 26.8 31 28.1 24 15.7 9.9 7 4.8 3.9 1.8
2011 10 31 5.9 6.7 5.6 7.9 10.3 11.8 12.5 16.2 19.5 21.4 17.9 13.2 9.6 7.9 8 8.3 10.8
2012 10 31 4.8 6.3 9.4 19.5 24.2 27.2 27.5 27.3 27.7 30.7 27.5 16.7 10 7.6 8 13.8 19.7
2013 10 31 1.4 1.9 3.9 9.1 13.1 17.3 22.9 29.7 30.4 27.3 23.5 18.2 13.1 6.3 4.4 2.4 9.4
I wrote it out for each day rather than doing a loop.

Draw histograms per row over multiple columns in R

I'm using R for the analysis of my master thesis
I have the following data frame: STOF: Student to staff ratio
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 41.8 147.6 90.3 82.9 106.8 63.0
2 MO 20.0 20.8 21.1 20.9 12.6 20.6
3 SD 21.2 32.3 25.7 23.9 25.0 40.1
4 UN 51.8 39.8 19.9 20.9 21.6 22.5
5 WS 18.0 19.9 15.3 13.6 15.7 15.2
6 BF 11.5 36.9 20.0 23.2 18.2 23.8
7 ME 34.2 30.3 28.4 30.1 31.5 25.6
8 IM 7.7 18.1 20.5 14.6 17.2 17.1
9 OM 11.4 11.2 12.2 11.1 13.4 19.2
10 DC 14.3 28.7 20.1 17.0 22.3 16.2
11 OC 28.6 44.0 24.9 27.9 34.0 30.7
Then I rank colleges using this commend
HEIrank1<-(STOF[,-c(1)])
rank1 <- apply(HEIrank1,2,rank)
> HEIrank11
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 18.0 20 20.0 20.0 20.0 20
2 MO 14.0 9 13.0 13.5 2.0 12
3 SD 15.0 16 17.0 16.0 16.0 19
4 UN 20.0 18 8.0 13.5 14.0 13
5 WS 12.0 8 4.0 7.0 6.0 8
6 BF 6.5 17 9.5 15.0 10.0 14
7 ME 17.0 15 19.0 19.0 17.0 15
8 IM 2.0 6 12.0 8.0 8.5 10
9 OM 4.5 3 2.5 3.0 3.0 11
10 DC 11.0 14 11.0 9.0 15.0 9
11 OC 16.0 19 16.0 18.0 19.0 17
I would like to draw histogram for each HEIs (for each row)?
If you use ggplot you won't need to do it as a loop, you can plot them all at once. Also, you need to reformat your data so that it's in long format not short format. You can use the melt function from the reshape package to do so.
library(reshape2)
new.df<-melt(HEIrank11,id.vars="HEI.ID")
names(new.df)=c("HEI.ID","Year","Rank")
substring is just getting rid of the X in each year
library(ggplot2)
ggplot(new.df, aes(x=HEI.ID,y=Rank,fill=substring(Year,2)))+
geom_histogram(stat="identity",position="dodge")
Here's a solution in lattice:
require(lattice)
barchart(X2007+X2008+X2009+X2010+X2011+X2012 ~ HEI.ID,
data=HEIrank11,
auto.key=list(space='right')
)

Shift time series

I have 2 weekly time-series, which show a small correlation (~0.33).
How can i 'shift in time' one of these series, so that i can check if there's a
greater correlation in the data?
Example data:
x = textConnection('1530.2 1980.9 1811 1617 1585.4 1951.8 2146.6 1605 1395.2 1742.6 2206.5 1839.4 1699.1 1665.9 2144.7 2189.1 1718.4 1615.5 2003.3 2267.6 1772.1 1635.2 1836 2261.8 1799.1 1634.9 1638.6 2056.5 2201.4 1726.8 1586.4 1747.9 1982 1695.2 1624.9 1652.4 2011.9 1788.8 1568.4 1540.7 1866.1 2097.3 1601.3 1458.6 1424.4 1786.9 1628.4 1467.4 1476.2 1823 1736.7 1482.7 1334.2 1871.9 1752.9 1471.6 1583.2 1601.4 1987.7 1649.6 1530.9 1547.1 2165.2 1852 1656.9 1605.2 2184.6 1972 1617.6 1491.1 1709.5 2042.2 1667.1 1542.6 1497.6 2090.5 1816.8 1487.5 1468.2 2228.5 1889.9 1690.8 1395.7 1532.8 1934.4 1557.1 1570.6 1453.2 1669.6 1782 1526.1 1411 1608.1 1740.5 1492.3 1477.8 1102.6 1366.1 1701.1 1500.6 1403.2 1787.2 1776.6 1465.3 1429.5')
x = scan(x)
y = textConnection('29.8 22.6 26 24.8 28.9 27.3 26 29.2 28.2 23.9 24.5 23.6 21.1 22 20.7 19.9 22.8 25 21.6 19.1 27.2 23.7 24.2 22.4 25.5 25.4 23.4 24.7 27.4 23.4 25.8 28.8 27.7 23.7 22.9 29.4 22.6 28.6 22.2 27.6 26.2 26.2 29.8 31.5 24.5 28.7 25.9 26.9 25.9 30.5 30.5 29.4 29.3 31.4 30 27.9 28.5 26.4 29.5 28.4 25.1 24.6 21.1 23.6 20.5 23.7 25.3 20.2 23.4 21.1 23.1 24.6 20.7 20.7 26.9 24.1 24.7 25.8 26.7 26 28.9 29.5 27.4 22.1 31.6 25 27.4 30.4 28.9 27.4 22.5 28.4 28.7 31.1 29.3 28.3 30.6 28.6 26 26.2 26.2 26.7 25.6 31.5 30.9')
y = scan(y)
I'm using R with dtw package, but i'm not familiar with these kind of algorithms.
Thanks for any help!
You could try the ccf() function in base R. This estimates the cross-correlation function of the two time series.
For example, using your data (see below if interested in how I got the data you pasted into your Question into R objects x and y)
xyccf <- ccf(x, y)
yielding
> xyccf
Autocorrelations of series ‘X’, by lag
-17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7
0.106 0.092 0.014 0.018 0.011 0.029 -0.141 -0.153 -0.107 -0.141 -0.221
-6 -5 -4 -3 -2 -1 0 1 2 3 4
-0.274 -0.175 -0.277 -0.176 -0.217 -0.253 -0.339 -0.274 -0.267 -0.330 -0.278
5 6 7 8 9 10 11 12 13 14 15
-0.184 -0.120 -0.200 -0.156 -0.184 -0.062 -0.076 -0.117 -0.048 0.015 -0.016
16 17
-0.038 -0.029
and this plot
To interpret this, when the lag is positive, y is leading x whereas when the lag is negative x is leading y.
Reading your data into R...
x <- scan(text = "1530.2 1980.9 1811 1617 1585.4 1951.8 2146.6 1605 1395.2 1742.6
2206.5 1839.4 1699.1 1665.9 2144.7 2189.1 1718.4 1615.5 2003.3
2267.6 1772.1 1635.2 1836 2261.8 1799.1 1634.9 1638.6 2056.5
2201.4 1726.8 1586.4 1747.9 1982 1695.2 1624.9 1652.4 2011.9
1788.8 1568.4 1540.7 1866.1 2097.3 1601.3 1458.6 1424.4 1786.9
1628.4 1467.4 1476.2 1823 1736.7 1482.7 1334.2 1871.9 1752.9
1471.6 1583.2 1601.4 1987.7 1649.6 1530.9 1547.1 2165.2 1852
1656.9 1605.2 2184.6 1972 1617.6 1491.1 1709.5 2042.2 1667.1
1542.6 1497.6 2090.5 1816.8 1487.5 1468.2 2228.5 1889.9 1690.8
1395.7 1532.8 1934.4 1557.1 1570.6 1453.2 1669.6 1782 1526.1
1411 1608.1 1740.5 1492.3 1477.8 1102.6 1366.1 1701.1 1500.6
1403.2 1787.2 1776.6 1465.3 1429.5")
y <- scan(text = "29.8 22.6 26 24.8 28.9 27.3 26 29.2 28.2 23.9 24.5 23.6 21.1 22
20.7 19.9 22.8 25 21.6 19.1 27.2 23.7 24.2 22.4 25.5 25.4 23.4
24.7 27.4 23.4 25.8 28.8 27.7 23.7 22.9 29.4 22.6 28.6 22.2 27.6
26.2 26.2 29.8 31.5 24.5 28.7 25.9 26.9 25.9 30.5 30.5 29.4 29.3
31.4 30 27.9 28.5 26.4 29.5 28.4 25.1 24.6 21.1 23.6 20.5 23.7
25.3 20.2 23.4 21.1 23.1 24.6 20.7 20.7 26.9 24.1 24.7 25.8 26.7
26 28.9 29.5 27.4 22.1 31.6 25 27.4 30.4 28.9 27.4 22.5 28.4 28.7
31.1 29.3 28.3 30.6 28.6 26 26.2 26.2 26.7 25.6 31.5 30.9")

Inserting another column to a data frame and incrementing its value per row

I have this data frame:
head(df,10)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
3 36.4 13.1 13.9 36.6 9.26 57.9 28.0 34.96 26049 3492
4 31.1 11.2 12.6 45.1 7.81 48.8 25.9 37.85 17515 2754
5 33.2 13.4 13.2 40.3 8.69 54.3 26.9 35.67 23510 3265
6 34.0 12.8 13.7 39.4 8.77 54.8 26.5 35.19 25151 3305
7 32.7 12.4 13.6 41.3 8.49 53.0 25.9 35.97 25214 3201
8 33.4 13.7 12.5 40.3 8.76 54.7 27.1 36.50 23943 3391
9 35.2 13.8 13.5 37.5 9.20 57.5 27.8 33.08 25647 3385
10 34.6 14.9 14.9 35.6 9.35 58.4 27.8 35.81 27324 3790
11 30.4 13.3 13.0 43.3 8.29 51.8 24.9 38.31 25178 2881
12 32.0 13.3 14.0 40.7 8.58 53.6 26.1 35.97 25677 3162
I have DateTime is this:
DateTime<-Sys.time()
I would like to insert another column this df and increment the DateTime value by 30 seconds for each row.
Im doing this:
for (i in 1:nrow(df)) {
df[1,]$DateTime<-DateTime
DateTime<-DateTime+30
}
This loop is not doing what Im trying to do. Any help is greatly appreicated.
df$DateTime <- Sys.time() + 30 * (seq_len(nrow(df))-1)

gnuplot input file 7 columns with decimals

I am trying to graph the following data file:
61.0 16.4 100.0 28.6 28.6 12.2 12.2
59.0 25.4 100.0 21.4 21.4 11.8 11.8
69.0 15.9 100.0 35.7 35.7 11.5 11.5
59.0 23.7 100.0 23.4 23.4 11.8 11.8
49.0 20.4 100.0 18.0 18.0 9.8 9.8
84.0 13.1 90.9 50.8 50.8 16.8 16.8
59.0 16.9 100.0 22.6 22.6 11.8 11.8
71.0 16.9 100.0 32.8 32.8 14.2 14.2
68.0 19.1 100.0 26.2 26.2 13.6 13.6
91.0 13.2 100.0 51.6 51.6 18.2 18.2
57.0 22.8 100.0 29.4 29.4 11.4 11.4
52.0 26.9 100.0 17.8 17.8 10.4 10.4
55.0 21.8 100.0 32.2 32.2 11.0 11.0
68.0 19.1 100.0 29.8 29.8 13.6 13.6
50.0 22.0 100.0 19.0 19.0 10.0 10.0
149.0 12.1 66.7 111.2 111.2 29.8 29.8
69.0 20.3 100.0 29.8 29.8 13.8 13.8
I am very new to gnuplot I cant seem to figure out what the correct code will be to get this graph:
I was trying something like this:
gnuplot> set output 'datastore1.png'
gnuplot> plot 'desktop1.dat' using 0:1 title "totalio" with lines, 'desktop1.dat' using 0:2 title "readpercentage" with lines, 'desktop1.dat' using 0:3 title "cachehitpercentage" with lines, 'desktop1.dat' using 0:4 title "currentkbpersecond" with lines, 'desktop1.dat' using 0:5 title "maximumkbpersecond" with lines, 'desktop1.dat' using 0:6 title "currentiopersecond" with lines, 'desktop1.dat' using 0:7 title "maximumiopersecond" with lines
gnuplot> quit
However the graph is not exactly correct.
Thanks for the help!
Not sure what you are trying to plot here, but I think the error is that you are using the zero-th column for the 'using' command which does not exist. Rather use this
p 'desktop1.dat' u 1:2, 'desktop1.dat' u 1:3
edit
So when you are plotting against time, you might want to add another column to the data that you read in from the file such that you have
15 61.0 16.4 100.0 28.6 28.6 12.2 12.2
as an example for the first line of your data. Afterwards you use the given plotting command I gove above.

Resources