Subsetting and Looping a Time Series Data in R - r

I have a dataset of timeseries (30 years). I did a subset for the month and the date I want (shown below in the code). Is there a way to do a loop for each month and the days in those month? Also, is there a way to save the plots automatically, in different folders corresponding to each month? Right now I am doing it manually by changing the month and date which corresponds to dfOct31all <- df [ which(df$Month==10 & df$Day==31), ]in the code below then plotting and saving it. By the way, I'm using RStudio.
Can someone please guide me?
Thanks!
setwd("WDir")
df <- read.csv("Velocity.csv", header = TRUE)
attach(df)
#Day 31
dfOct31all <- df [ which(df$Month==10 & df$Day==31), ]
dfall31Mbs <- dfOct31all[c(-1,-2,-3)]
densities <- lapply(dfall31Mbs, density)
par(mfcol=c(5,5), oma=c(1,1,0,0), mar=c(1,1,1,0), tcl=-0.1, mgp=c(0,0,0))
plot(densities[[1]], col="black",main = "1000mb",xlab=NA,ylab=NA)
plot(densities[[2]], col="black",main="925mb",xlab=NA,ylab=NA)
plot(densities[[3]], col="black",main="850mb",xlab=NA,ylab=NA)
plot(densities[[4]], col="black",main="700mb",xlab=NA,ylab=NA)
plot(densities[[5]], col="black",main="600mb",xlab=NA,ylab=NA)
plot(densities[[6]], col="black",main="500mb",xlab=NA,ylab=NA)
plot(densities[[7]], col ="black",main="400mb",xlab=NA,ylab=NA)
plot(densities[[8]], col="black",main="300mb",xlab=NA,ylab=NA)
plot(densities[[9]], col="black",main="250mb",xlab=NA,ylab=NA)
plot(densities[[10]], col="black",main="200mb",xlab=NA,ylab=NA)
plot(densities[[11]], col= "black",main="150mb",xlab=NA,ylab=NA)
plot(densities[[12]], col= "black",main="100mb",xlab=NA,ylab=NA)
plot(densities[[13]], col = "black",main="70mb",xlab=NA,ylab=NA)
plot(densities[[14]], col="black",main="50mb",xlab=NA,ylab=NA)
plot(densities[[15]], col="black",main="30mb",xlab=NA,ylab=NA)
plot(densities[[16]], col = "black",main="20mb",xlab=NA,ylab=NA)
plot(densities[[17]], col="black",main="10mb",xlab=NA,ylab=NA)
Snippet of data is shown as well
Year Month Day 1000mb 925mb 850mb 700mb 600mb 500mb 400mb 300mb 250mb 200mb 150mb 100mb 70mb 50mb 30mb 20mb 10mb
1984 10 31 6 6.6 7.9 11.5 14.6 17 20.8 25.8 26.4 25.3 24.4 22.7 19.9 19.2 20.4 24.8 30.8
1985 10 31 5.8 7.1 7.7 11.5 14.7 17.3 25.3 32.6 32.9 32.4 27.1 20.9 14.2 9.7 6.4 7.3 7.4
1986 10 31 4.3 6.1 7.7 11.3 18.4 26.3 34.4 44.5 48.9 46.2 34.5 20.4 13.8 13.2 21.7 31 46.4
1987 10 31 2.2 2.9 4 7 9 13.9 19.9 25.8 26.6 23.7 17.3 12 7 3.1 1.7 5.8 14.1
1988 10 31 2.5 2.1 2.3 6.5 6.4 5.1 7.4 12.1 13.4 16.1 16.7 15.2 8.8 5 2.8 6.2 8.9
1989 10 31 3.4 4 4.7 4.4 4.1 4 4.6 4.8 5.9 5.6 10.9 13.9 12.3 10.4 8.1 8 8
1990 10 31 4 4.9 7.5 14.6 19 21.9 25.7 28.3 29.4 29.2 27.3 18 12.6 10.1 9 12 19.9
1991 10 31 2.8 3.2 4 10.8 12.1 11.2 9.9 9.1 9.9 12.8 18 17.5 10.4 6.3 4.2 7.6 11.7
1992 10 31 5.9 6.9 7.9 13.1 17.9 25.2 34.6 47.3 53.3 53 42.4 21.3 11.6 6 4.6 8.5 12.8
1993 10 31 2.3 1.5 0.4 3.6 6.3 10.1 14.3 19.1 21.6 21.8 18.4 13.6 12.3 9.5 6.9 11 18.1
1994 10 31 2 2.2 3.8 11.6 17 19.8 23.6 24.9 25.5 26.2 28.4 25.2 16.7 13.6 9.3 8.3 9.8
1995 10 31 1.5 2 3.4 7.6 9.1 11.2 13.7 17.9 20.3 21.7 21.1 16.7 13 12.1 14.9 21.4 27.3
1996 10 31 1.9 2.4 3.5 8 11.7 17.4 26.4 35.6 33.3 24.6 12.4 4.1 0.5 3.4 7.2 9.4 11.6
1997 10 31 3.7 4.8 7.8 19.2 24.6 29.6 35.6 41 41.8 42 37.9 23.7 11.2 8.6 4.2 3.8 7
1998 10 31 0.7 1.1 0.9 4.8 8.4 11.4 14 25.3 29.7 25.2 15.9 6.6 2.1 1 4.5 8.9 6.1
1999 10 31 1.9 1.6 2.4 10.7 15.3 19 23.2 29 32.4 31.9 28 20.3 10.8 9.4 12 14.5 16.9
2000 10 31 5.1 5.8 6.7 12.8 18.2 23.9 29.9 40.7 42.2 33.7 23.5 12.7 2.6 1.6 3.8 4.7 5.1
2001 10 31 5.7 6.1 7.1 10.1 10.8 14.7 18.3 22.8 22.3 22.2 22 14 9.5 6.6 5.2 6.5 8.6
2002 10 31 1.4 1.6 1.8 9.2 14.5 19.5 24.8 30 30.5 27.6 22.2 13.9 9.1 7.1 8.5 16.1 23.8
2003 10 31 1.5 1.3 0.7 1 3.5 6 11.7 21.5 21.9 22.9 23 20.7 15.8 12.5 14.5 20.1 26
2004 10 31 5.4 5.6 6.9 14.4 23.3 33.3 46.1 60.9 62.1 54.6 42.9 28 17.3 12.3 10.1 13.6 13.3
2005 10 31 1.7 1.3 3 10.3 15.8 19.5 21.1 22.8 24.1 24.5 24.5 20.6 13.5 10.7 10 10.7 10.4
2006 10 31 2.3 1.5 1.7 8.7 12.5 15.9 18.7 20.5 21.8 24.3 29.9 25.3 18.3 12.8 7.7 8.8 12.4
2007 10 31 3.7 2.7 2.3 2.2 2.6 4.2 6.5 11.9 15.9 19.6 17.2 9.5 6.9 5.7 4.9 5.8 11.7
2008 10 31 7.7 10.8 14.3 20.3 23 25.8 27.4 32.1 35.4 34.8 25.8 13.2 7.1 2.9 2.6 3.4 6
2009 10 31 0.5 0.2 2 9.3 13.5 17.6 18.8 20.8 21.4 21.2 18.9 14.2 11.1 6.4 1.9 3 8
2010 10 31 5.6 6.8 8.5 13.4 16.5 20.3 23.8 26.8 31 28.1 24 15.7 9.9 7 4.8 3.9 1.8
2011 10 31 5.9 6.7 5.6 7.9 10.3 11.8 12.5 16.2 19.5 21.4 17.9 13.2 9.6 7.9 8 8.3 10.8
2012 10 31 4.8 6.3 9.4 19.5 24.2 27.2 27.5 27.3 27.7 30.7 27.5 16.7 10 7.6 8 13.8 19.7
2013 10 31 1.4 1.9 3.9 9.1 13.1 17.3 22.9 29.7 30.4 27.3 23.5 18.2 13.1 6.3 4.4 2.4 9.4

I wrote it out for each day rather than doing a loop.

Related

How can I export THESE 2 matrices in one excel file with proper spacing and formatting in Rstudio

Truck location coordinates>
X[Now] Y[Now]
A 5.4 15.4
B 8.3 9.0
C 6.6 5.2
D 6.5 13.5
E 15.0 1.9
Load location coordinates> print(Bcd)
Pick-up-X Pick-up-Y Drop-off-X Drop-off-Y
1 18.3 0.5 4.0 13.9
2 11.1 0.1 17.1 18.9
3 20.0 8.9 18.4 7.4
4 4.4 18.2 8.6 15.0
5 12.7 2.9 4.0 0.7
6 5.2 10.7 16.9 18.9
7 18.5 19.0 4.8 9.5
8 8.2 17.3 0.6 4.6
9 11.5 0.5 3.4 11.4
10 2.1 11.3 11.4 0.1

Deep Nested Tag IDs not found using BeautifulSoup

I am trying to scrape NBA data from https://www.basketball-reference.com/leagues/NBA_2019.html, but I am running into issues where BeautifulSoup drops deeply nested tags.
I tried to use soup.find(id='opponent-stats-per_game') to grab the "Opponent Per Game Stats" table. However, I am getting None result. If I try to instead find a div that is higher up in the tree, then it clips the more deep children.
Could someone please offer me some guidance on how this works? I am fairly new to web scraping using BeautifulSoup
The reference.com sites are partially dynamic. I had the same issue a long while back when trying to figure out football-reference.com
There's a couple ways to handle it. One is to use Selenium to render the page first, and then you can go in and grab the tables. Now you can still use BeautifulSoup to get it, but whenever I see <table> tags, my first initial try is to use pandas and .read_html(), as that'll do most of the work for you on the tables.
This will return a list of dataframes. It's then just a matter of finding which dataframe you want, and then possibly do a little manipulation of column names and what-not to get it the way you need.
Doing this, your opponent stats per game table was in index position 19:
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
page_url = 'https://www.basketball-reference.com/leagues/NBA_2019.html'
driver.get(page_url)
tables = pd.read_html(driver.page_source)
opp_per_gm_df = tables[19]
driver.quit()
Output:
print (opp_per_gm_df)
Rk Team G MP FG ... STL BLK TOV PF PTS
0 1.0 Memphis Grizzlies 77 242.3 37.2 ... 7.7 4.9 15.5 21.7 105.6
1 2.0 Miami Heat 77 240.3 38.2 ... 7.4 4.8 14.2 20.3 105.6
2 3.0 Indiana Pacers* 78 240.3 38.7 ... 7.5 5.2 15.6 20.1 104.3
3 4.0 Utah Jazz* 77 240.6 39.7 ... 8.6 4.7 13.9 22.2 106.1
4 5.0 Denver Nuggets* 77 240.6 39.6 ... 7.5 5.0 13.5 20.5 106.9
5 6.0 Detroit Pistons 77 242.3 40.0 ... 6.9 5.2 14.1 21.5 107.5
6 7.0 Orlando Magic 78 241.3 39.9 ... 6.9 4.4 13.0 18.8 106.5
7 8.0 Boston Celtics* 78 241.3 39.5 ... 6.8 3.8 15.2 19.6 108.0
8 9.0 Toronto Raptors* 78 242.2 40.2 ... 7.7 4.5 15.1 20.6 108.4
9 10.0 Dallas Mavericks 77 241.0 40.9 ... 7.9 4.6 13.1 23.4 109.9
10 11.0 Milwaukee Bucks* 78 241.3 40.3 ... 6.9 4.9 13.4 20.0 108.6
11 12.0 Portland Trail Blazers* 77 242.3 41.1 ... 7.3 5.1 12.4 20.8 110.5
12 13.0 Houston Rockets* 78 241.9 40.4 ... 7.4 4.6 15.0 20.1 109.3
13 14.0 Golden State Warriors* 77 241.6 40.3 ... 7.6 3.7 13.5 19.8 111.4
14 15.0 San Antonio Spurs* 78 241.6 41.6 ... 7.2 4.1 12.2 19.7 110.4
15 16.0 Philadelphia 76ers* 77 241.6 41.5 ... 7.9 4.0 12.9 22.3 112.2
16 17.0 Charlotte Hornets 77 241.9 42.0 ... 7.1 6.1 13.6 20.6 112.2
17 18.0 Oklahoma City Thunder* 78 242.2 40.8 ... 8.2 5.1 16.9 22.6 110.8
18 19.0 Brooklyn Nets 78 243.8 42.2 ... 7.8 5.4 13.5 22.3 112.5
19 20.0 Minnesota Timberwolves 77 241.9 42.0 ... 6.6 5.6 14.7 22.0 114.0
20 21.0 New York Knicks 77 241.3 42.0 ... 7.4 5.7 13.4 21.0 114.1
21 22.0 Chicago Bulls 78 242.9 42.1 ... 7.5 5.6 13.5 18.9 113.4
22 23.0 Los Angeles Clippers* 78 241.6 41.4 ... 8.2 5.9 13.1 24.0 113.4
23 24.0 Los Angeles Lakers 78 241.3 42.1 ... 8.3 5.1 14.3 21.0 113.7
24 25.0 Cleveland Cavaliers 78 241.0 43.0 ... 6.9 5.6 12.5 19.6 113.9
25 26.0 Sacramento Kings 78 240.6 41.9 ... 7.7 5.1 15.9 21.6 114.9
26 27.0 Phoenix Suns 78 242.2 42.2 ... 9.1 5.0 15.6 20.7 116.3
27 28.0 New Orleans Pelicans 78 240.6 43.2 ... 8.4 5.4 13.8 21.3 116.5
28 29.0 Washington Wizards 78 243.2 43.3 ... 7.8 4.6 15.9 21.4 116.9
29 30.0 Atlanta Hawks 78 242.2 42.6 ... 9.9 5.4 15.1 22.0 118.8
30 NaN League Average 78 241.7 41.0 ... 7.7 5.0 14.2 21.0 111.1
[31 rows x 25 columns]

estimate phase and amplitude of a seasonal cycle

I have the following data:
CET <- url("http://www.metoffice.gov.uk/hadobs/hadcet/cetml1659on.dat")
cet <- read.table(CET, sep = "", skip = 6, header = TRUE,
fill = TRUE, na.string = c(-99.99, -99.9))
names(cet) <- c(month.abb, "Annual")
cet <- cet[-nrow(cet), ]
rn <- as.numeric(rownames(cet))
Years <- rn[1]:rn[length(rn)]
annCET <- data.frame(Temperature = cet[, ncol(cet)],Year = Years)
cet <- cet[, -ncol(cet)]
cet <- stack(cet)[,2:1]
names(cet) <- c("Month","Temperature")
cet <- transform(cet, Year = (Year <- rep(Years, times = 12)),
nMonth = rep(1:12, each = length(Years)),
Date = as.Date(paste(Year, Month, "15", sep = "-"),format = "%Y-%b-%d"))
cet <- cet[with(cet, order(Date)), ]
idx <- cet$Year > 1900
cet <- cet[idx,]
cet <- cet[,c('Date','Temperature')]
plot(cet, type = 'l')
This demonstrates the monthly temperature cycle from 1900 to 2014 in England, UK.
I would like to evaluate the phase and amplitude of the seasonal cycle of temperature follwowing the methods outlined in this paper. Specifically, they describe that given 12 monthly values (as we have here) we can estimate the yearly component as:
where X(t) represents 12 monthly values of surface temperature, x(t+t0), t = 0.5,...,11.5, are 12 monthly values of the de-meaned monthly temperature, where the factor of two is to account for both positive and negative frequencies.
Then the amplitude and phase of the seasonal cycle can be calculated as
and
They specify, that each year of data, they calculate the yearly (one cycle per year) sinusoidal component using the Fourier transform, as the equation shown above.
I'm a bit stuck on how to generate the time series they demonstrate here. Can anyone please provide some guidance as to how I can reproduce these methods. Note, I also work in matlab - in case anyone has some suggestions as to how this would be achieved in that environment.
Here is a subset of the data.
Date Temperature
1980-01-15 2.3
1980-02-15 5.7
1980-03-15 4.7
1980-04-15 8.8
1980-05-15 11.2
1980-06-15 13.8
1980-07-15 14.7
1980-08-15 15.9
1980-09-15 14.7
1980-10-15 9
1980-11-15 6.6
1980-12-15 5.6
1981-01-15 4.9
1981-02-15 3
1981-03-15 7.9
1981-04-15 7.8
1981-05-15 11.2
1981-06-15 13.2
1981-07-15 15.5
1981-08-15 16.2
1981-09-15 14.5
1981-10-15 8.6
1981-11-15 7.8
1981-12-15 0.3
1982-01-15 2.6
1982-02-15 4.8
1982-03-15 6.1
1982-04-15 8.6
1982-05-15 11.6
1982-06-15 15.5
1982-07-15 16.5
1982-08-15 15.7
1982-09-15 14.2
1982-10-15 10.1
1982-11-15 8
1982-12-15 4.4
1983-01-15 6.7
1983-02-15 1.7
1983-03-15 6.4
1983-04-15 6.8
1983-05-15 10.3
1983-06-15 14.4
1983-07-15 19.5
1983-08-15 17.3
1983-09-15 13.7
1983-10-15 10.5
1983-11-15 7.5
1983-12-15 5.6
1984-01-15 3.8
1984-02-15 3.3
1984-03-15 4.7
1984-04-15 8.1
1984-05-15 9.9
1984-06-15 14.5
1984-07-15 16.9
1984-08-15 17.6
1984-09-15 13.7
1984-10-15 11.1
1984-11-15 8
1984-12-15 5.2
1985-01-15 0.8
1985-02-15 2.1
1985-03-15 4.7
1985-04-15 8.3
1985-05-15 10.9
1985-06-15 12.7
1985-07-15 16.2
1985-08-15 14.6
1985-09-15 14.6
1985-10-15 11
1985-11-15 4.1
1985-12-15 6.3
1986-01-15 3.5
1986-02-15 -1.1
1986-03-15 4.9
1986-04-15 5.8
1986-05-15 11.1
1986-06-15 14.8
1986-07-15 15.9
1986-08-15 13.7
1986-09-15 11.3
1986-10-15 11
1986-11-15 7.8
1986-12-15 6.2
1987-01-15 0.8
1987-02-15 3.6
1987-03-15 4.1
1987-04-15 10.3
1987-05-15 10.1
1987-06-15 12.8
1987-07-15 15.9
1987-08-15 15.6
1987-09-15 13.6
1987-10-15 9.7
1987-11-15 6.5
1987-12-15 5.6
1988-01-15 5.3
1988-02-15 4.9
1988-03-15 6.4
1988-04-15 8.2
1988-05-15 11.9
1988-06-15 14.4
1988-07-15 14.7
1988-08-15 15.2
1988-09-15 13.2
1988-10-15 10.4
1988-11-15 5.2
1988-12-15 7.5
1989-01-15 6.1
1989-02-15 5.9
1989-03-15 7.5
1989-04-15 6.6
1989-05-15 13
1989-06-15 14.6
1989-07-15 18.2
1989-08-15 16.6
1989-09-15 14.7
1989-10-15 11.7
1989-11-15 6.2
1989-12-15 4.9
1990-01-15 6.5
1990-02-15 7.3
1990-03-15 8.3
1990-04-15 8
1990-05-15 12.6
1990-06-15 13.6
1990-07-15 16.9
1990-08-15 18
1990-09-15 13.2
1990-10-15 11.9
1990-11-15 6.9
1990-12-15 4.3
1991-01-15 3.3
1991-02-15 1.5
1991-03-15 7.9
1991-04-15 7.9
1991-05-15 10.8
1991-06-15 12.1
1991-07-15 17.3
1991-08-15 17.1
1991-09-15 14.7
1991-10-15 10.2
1991-11-15 6.8
1991-12-15 4.7
1992-01-15 3.7
1992-02-15 5.4
1992-03-15 7.5
1992-04-15 8.7
1992-05-15 13.6
1992-06-15 15.7
1992-07-15 16.2
1992-08-15 15.3
1992-09-15 13.4
1992-10-15 7.8
1992-11-15 7.4
1992-12-15 3.6
1993-01-15 5.9
1993-02-15 4.6
1993-03-15 6.7
1993-04-15 9.5
1993-05-15 11.4
1993-06-15 15
1993-07-15 15.2
1993-08-15 14.6
1993-09-15 12.4
1993-10-15 8.5
1993-11-15 4.6
1993-12-15 5.5
1994-01-15 5.3
1994-02-15 3.2
1994-03-15 7.7
1994-04-15 8.1
1994-05-15 10.7
1994-06-15 14.5
1994-07-15 18
1994-08-15 16
1994-09-15 12.7
1994-10-15 10.2
1994-11-15 10.1
1994-12-15 6.4
1995-01-15 4.8
1995-02-15 6.5
1995-03-15 5.6
1995-04-15 9.1
1995-05-15 11.6
1995-06-15 14.3
1995-07-15 18.6
1995-08-15 19.2
1995-09-15 13.7
1995-10-15 12.9
1995-11-15 7.7
1995-12-15 2.3
1996-01-15 4.3
1996-02-15 2.5
1996-03-15 4.5
1996-04-15 8.5
1996-05-15 9.1
1996-06-15 14.4
1996-07-15 16.5
1996-08-15 16.5
1996-09-15 13.6
1996-10-15 11.7
1996-11-15 5.9
1996-12-15 2.9
1997-01-15 2.5
1997-02-15 6.7
1997-03-15 8.4
1997-04-15 9
1997-05-15 11.5
1997-06-15 14.1
1997-07-15 16.7
1997-08-15 18.9
1997-09-15 14.2
1997-10-15 10.2
1997-11-15 8.4
1997-12-15 5.8
1998-01-15 5.2
1998-02-15 7.3
1998-03-15 7.9
1998-04-15 7.7
1998-05-15 13.1
1998-06-15 14.2
1998-07-15 15.5
1998-08-15 15.9
1998-09-15 14.9
1998-10-15 10.6
1998-11-15 6.2
1998-12-15 5.5
1999-01-15 5.5
1999-02-15 5.3
1999-03-15 7.4
1999-04-15 9.4
1999-05-15 12.9
1999-06-15 13.9
1999-07-15 17.7
1999-08-15 16.1
1999-09-15 15.6
1999-10-15 10.7
1999-11-15 7.9
1999-12-15 5
2000-01-15 4.9
2000-02-15 6.3
2000-03-15 7.6
2000-04-15 7.8
2000-05-15 12.1
2000-06-15 15.1
2000-07-15 15.5
2000-08-15 16.6
2000-09-15 14.7
2000-10-15 10.3
2000-11-15 7
2000-12-15 5.8
2001-01-15 3.2
2001-02-15 4.4
2001-03-15 5.2
2001-04-15 7.7
2001-05-15 12.6
2001-06-15 14.3
2001-07-15 17.2
2001-08-15 16.8
2001-09-15 13.4
2001-10-15 13.3
2001-11-15 7.5
2001-12-15 3.6
2002-01-15 5.5
2002-02-15 7
2002-03-15 7.6
2002-04-15 9.3
2002-05-15 11.8
2002-06-15 14.4
2002-07-15 16
2002-08-15 17
2002-09-15 14.4
2002-10-15 10.1
2002-11-15 8.5
2002-12-15 5.7
2003-01-15 4.5
2003-02-15 3.9
2003-03-15 7.5
2003-04-15 9.6
2003-05-15 12.1
2003-06-15 16.1
2003-07-15 17.6
2003-08-15 18.3
2003-09-15 14.3
2003-10-15 9.2
2003-11-15 8.1
2003-12-15 4.8
2004-01-15 5.2
2004-02-15 5.4
2004-03-15 6.5
2004-04-15 9.4
2004-05-15 12.1
2004-06-15 15.3
2004-07-15 15.8
2004-08-15 17.6
2004-09-15 14.9
2004-10-15 10.5
2004-11-15 7.7
2004-12-15 5.4
2005-01-15 6
2005-02-15 4.3
2005-03-15 7.2
2005-04-15 8.9
2005-05-15 11.4
2005-06-15 15.5
2005-07-15 16.9
2005-08-15 16.2
2005-09-15 15.2
2005-10-15 13.1
2005-11-15 6.2
2005-12-15 4.4
2006-01-15 4.3
2006-02-15 3.7
2006-03-15 4.9
2006-04-15 8.6
2006-05-15 12.3
2006-06-15 15.9
2006-07-15 19.7
2006-08-15 16.1
2006-09-15 16.8
2006-10-15 13
2006-11-15 8.1
2006-12-15 6.5
2007-01-15 7
2007-02-15 5.8
2007-03-15 7.2
2007-04-15 11.2
2007-05-15 11.9
2007-06-15 15.1
2007-07-15 15.2
2007-08-15 15.4
2007-09-15 13.8
2007-10-15 10.9
2007-11-15 7.3
2007-12-15 4.9
2008-01-15 6.6
2008-02-15 5.4
2008-03-15 6.1
2008-04-15 7.9
2008-05-15 13.4
2008-06-15 13.9
2008-07-15 16.2
2008-08-15 16.2
2008-09-15 13.5
2008-10-15 9.7
2008-11-15 7
2008-12-15 3.5
2009-01-15 3
2009-02-15 4.1
2009-03-15 7
2009-04-15 10
2009-05-15 12.1
2009-06-15 14.8
2009-07-15 16.1
2009-08-15 16.6
2009-09-15 14.2
2009-10-15 11.6
2009-11-15 8.7
2009-12-15 3.1
2010-01-15 1.4
2010-02-15 2.8
2010-03-15 6.1
2010-04-15 8.8
2010-05-15 10.7
2010-06-15 15.2
2010-07-15 17.1
2010-08-15 15.3
2010-09-15 13.8
2010-10-15 10.3
2010-11-15 5.2
2010-12-15 -0.7
2011-01-15 3.7
2011-02-15 6.4
2011-03-15 6.7
2011-04-15 11.8
2011-05-15 12.2
2011-06-15 13.8
2011-07-15 15.2
2011-08-15 15.4
2011-09-15 15.1
2011-10-15 12.6
2011-11-15 9.6
2011-12-15 6
2012-01-15 5.4
2012-02-15 3.8
2012-03-15 8.3
2012-04-15 7.2
2012-05-15 11.7
2012-06-15 13.5
2012-07-15 15.5
2012-08-15 16.6
2012-09-15 13
2012-10-15 9.7
2012-11-15 6.8
2012-12-15 4.8
2013-01-15 3.5
2013-02-15 3.2
2013-03-15 2.7
2013-04-15 7.5
2013-05-15 10.4
2013-06-15 13.6
2013-07-15 18.3
2013-08-15 16.9
2013-09-15 13.7
2013-10-15 12.5
2013-11-15 6.2
2013-12-15 6.3
2014-01-15 5.7
2014-02-15 6.2
2014-03-15 7.6
2014-04-15 10.2
2014-05-15 12.2
2014-06-15 15.1
2014-07-15 17.7
2014-08-15 14.9
2014-09-15 15.1
2014-10-15 12.5
2014-11-15 8.6
2014-12-15 5.2
Literally, the formula for Y can be represented in MATLAB as:
t=0.5:0.5:11.5; %//make sure the step size is indeed 0.5
Y = 1/6.*sum(exp(2*pi*i.*t/12).*X(t0-t); %// add the function for X
phi = atan2(imag(Y)/real(Y)); %// seasonal phase
without knowing the function for X I can't be sure this can indeed be vectorised, or whether you'd have to loop, which can be done like:
t=0.5:0.5:11.5; %//make sure the step size is indeed 0.5
Ytmp(numel(t),1)=0; %// initialise output
for ii = 1:numel(t)
Ytmp(ii,1) = exp(2*pi*i.*t(ii)/12).*X(t0-t(ii));
end
Y = 1/6 * sum(Ytmp)
Just slot in any t0 you want, loop over the codes above and you have your time series.

Draw regression line per row in R

I have the following data.
HEIrank1
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 41.8 147.6 90.3 82.9 106.8 63.0
2 MO 20.0 20.8 21.1 20.9 12.6 20.6
3 SD 21.2 32.3 25.7 23.9 25.0 40.1
4 UN 51.8 39.8 19.9 20.9 21.6 22.5
5 WS 18.0 19.9 15.3 13.6 15.7 15.2
6 BF 11.5 36.9 20.0 23.2 18.2 23.8
7 ME 34.2 30.3 28.4 30.1 31.5 25.6
8 IM 7.7 18.1 20.5 14.6 17.2 17.1
9 OM 11.4 11.2 12.2 11.1 13.4 19.2
10 DC 14.3 28.7 20.1 17.0 22.3 16.2
11 OC 28.6 44.0 24.9 27.9 34.0 30.7
12 TH 7.4 10.0 5.8 8.8 8.7 8.6
13 CC 12.1 11.0 12.2 12.1 14.9 15.0
14 MM 11.7 24.2 18.4 18.6 31.9 31.7
15 MC 19.0 13.7 17.0 20.4 20.5 12.1
16 SH 11.4 24.8 26.1 12.7 19.9 25.9
17 SB 13.0 22.8 15.9 17.6 17.2 9.6
18 SN 11.5 18.6 22.9 12.0 20.3 11.6
19 ER 10.8 13.2 20.0 11.0 14.9 14.2
20 SL 44.9 21.6 21.3 26.5 17.0 8.0
I try following commends to draw regression line for each HEIs.
year <- c(2007 , 2008 , 2009 , 2010 , 2011, 2012)
op <- as.numeric(HEIrank1[1,])
lm.r <- lm(op~year)
plot(year, op)
abline(lm.r)
I want to draw to draw regression line for each college in one graph and I do not how.can you help me.
Here's my approach with ggplot2 but the graph is uninterpretable with that many lines.
library(ggplot2);library(reshape2)
mdat <- melt(HEIrank1, variable.name="year")
mdat$year <- as.numeric(substring(mdat$year, 2))
ggplot(mdat, aes(year, value, colour=HEI.ID, group=HEI.ID)) +
geom_point() + stat_smooth(se = FALSE, method="lm")
Faceting may be a better way to got:
ggplot(mdat, aes(year, value, group=HEI.ID)) +
geom_point() + stat_smooth(se = FALSE, method="lm") +
facet_wrap(~HEI.ID)

Draw histograms per row over multiple columns in R

I'm using R for the analysis of my master thesis
I have the following data frame: STOF: Student to staff ratio
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 41.8 147.6 90.3 82.9 106.8 63.0
2 MO 20.0 20.8 21.1 20.9 12.6 20.6
3 SD 21.2 32.3 25.7 23.9 25.0 40.1
4 UN 51.8 39.8 19.9 20.9 21.6 22.5
5 WS 18.0 19.9 15.3 13.6 15.7 15.2
6 BF 11.5 36.9 20.0 23.2 18.2 23.8
7 ME 34.2 30.3 28.4 30.1 31.5 25.6
8 IM 7.7 18.1 20.5 14.6 17.2 17.1
9 OM 11.4 11.2 12.2 11.1 13.4 19.2
10 DC 14.3 28.7 20.1 17.0 22.3 16.2
11 OC 28.6 44.0 24.9 27.9 34.0 30.7
Then I rank colleges using this commend
HEIrank1<-(STOF[,-c(1)])
rank1 <- apply(HEIrank1,2,rank)
> HEIrank11
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 18.0 20 20.0 20.0 20.0 20
2 MO 14.0 9 13.0 13.5 2.0 12
3 SD 15.0 16 17.0 16.0 16.0 19
4 UN 20.0 18 8.0 13.5 14.0 13
5 WS 12.0 8 4.0 7.0 6.0 8
6 BF 6.5 17 9.5 15.0 10.0 14
7 ME 17.0 15 19.0 19.0 17.0 15
8 IM 2.0 6 12.0 8.0 8.5 10
9 OM 4.5 3 2.5 3.0 3.0 11
10 DC 11.0 14 11.0 9.0 15.0 9
11 OC 16.0 19 16.0 18.0 19.0 17
I would like to draw histogram for each HEIs (for each row)?
If you use ggplot you won't need to do it as a loop, you can plot them all at once. Also, you need to reformat your data so that it's in long format not short format. You can use the melt function from the reshape package to do so.
library(reshape2)
new.df<-melt(HEIrank11,id.vars="HEI.ID")
names(new.df)=c("HEI.ID","Year","Rank")
substring is just getting rid of the X in each year
library(ggplot2)
ggplot(new.df, aes(x=HEI.ID,y=Rank,fill=substring(Year,2)))+
geom_histogram(stat="identity",position="dodge")
Here's a solution in lattice:
require(lattice)
barchart(X2007+X2008+X2009+X2010+X2011+X2012 ~ HEI.ID,
data=HEIrank11,
auto.key=list(space='right')
)

Resources