I'm using R for the analysis of my master thesis
I have the following data frame: STOF: Student to staff ratio
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 41.8 147.6 90.3 82.9 106.8 63.0
2 MO 20.0 20.8 21.1 20.9 12.6 20.6
3 SD 21.2 32.3 25.7 23.9 25.0 40.1
4 UN 51.8 39.8 19.9 20.9 21.6 22.5
5 WS 18.0 19.9 15.3 13.6 15.7 15.2
6 BF 11.5 36.9 20.0 23.2 18.2 23.8
7 ME 34.2 30.3 28.4 30.1 31.5 25.6
8 IM 7.7 18.1 20.5 14.6 17.2 17.1
9 OM 11.4 11.2 12.2 11.1 13.4 19.2
10 DC 14.3 28.7 20.1 17.0 22.3 16.2
11 OC 28.6 44.0 24.9 27.9 34.0 30.7
Then I rank colleges using this commend
HEIrank1<-(STOF[,-c(1)])
rank1 <- apply(HEIrank1,2,rank)
> HEIrank11
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 18.0 20 20.0 20.0 20.0 20
2 MO 14.0 9 13.0 13.5 2.0 12
3 SD 15.0 16 17.0 16.0 16.0 19
4 UN 20.0 18 8.0 13.5 14.0 13
5 WS 12.0 8 4.0 7.0 6.0 8
6 BF 6.5 17 9.5 15.0 10.0 14
7 ME 17.0 15 19.0 19.0 17.0 15
8 IM 2.0 6 12.0 8.0 8.5 10
9 OM 4.5 3 2.5 3.0 3.0 11
10 DC 11.0 14 11.0 9.0 15.0 9
11 OC 16.0 19 16.0 18.0 19.0 17
I would like to draw histogram for each HEIs (for each row)?
If you use ggplot you won't need to do it as a loop, you can plot them all at once. Also, you need to reformat your data so that it's in long format not short format. You can use the melt function from the reshape package to do so.
library(reshape2)
new.df<-melt(HEIrank11,id.vars="HEI.ID")
names(new.df)=c("HEI.ID","Year","Rank")
substring is just getting rid of the X in each year
library(ggplot2)
ggplot(new.df, aes(x=HEI.ID,y=Rank,fill=substring(Year,2)))+
geom_histogram(stat="identity",position="dodge")
Here's a solution in lattice:
require(lattice)
barchart(X2007+X2008+X2009+X2010+X2011+X2012 ~ HEI.ID,
data=HEIrank11,
auto.key=list(space='right')
)
Related
In rows, 11:13, and in 14:16, it can be observed that there are duplicate entries in column 'C2_xsampa' for 'm:' and 'n:'. Each value in 'C2_xsampa' has two levels, Singleton or Geminate but it is not the case among 'm:' and 'n:'. This yields wrong mean values for numeric columns.
My question is: How do I filter which row is being duplicated? I have manually checked the parent dataset through which means values are obtained. All looks fine there.
Earlier, I was using subset () to rectify the 'real' errors in entry.
Data:
C2_xsampa Consonant Speaker C1.dn C2.dn V1.dn V2.dn total.dn
1 "d_d" Singleton 8.5 11.9 7.82 13.0 7.65 40.3
2 "d_d:" Geminate 9 11.6 11.9 11.4 7.46 42.3
3 "dZ" Singleton 8.31 7.79 7.47 14.9 9.81 40.0
4 "dZ:" Geminate 8.08 7.72 13.4 12.8 9.61 43.6
5 "g" Singleton 9 12.1 11.3 11.9 8.56 43.9
6 "g:" Geminate 8.69 11.3 11.1 12.7 10.2 45.3
7 "k" Singleton 9.5 12.3 14.4 9.71 6.97 43.4
8 "k:" Geminate 9 14.7 16.1 10.1 7.37 48.2
9 "l" Singleton 8.69 11.9 6.33 11.5 10.2 40.0
10 "l:" Geminate 8.81 11.3 10.0 10.0 11.5 42.8
11 "m" Singleton 8.36 13.6 9.11 11.1 9.20 43.0
12 "m:" Geminate 8.85 13.7 10.9 9.95 8.42 43.0
13 "m: " Geminate 14 14.6 12.4 5.66 5.01 37.7
14 "n" Singleton 8 15.1 4.44 11.6 8.99 40.2
15 "n:" Geminate 8.21 21.4 10.1 10.2 9.32 51.0
16 "n: " Geminate 11.3 32.0 10.4 8.09 7.94 58.5
17 "p" Singleton 8.4 11.2 11.9 7.98 6.53 37.7
18 "p:" Geminate 8.81 13.2 12.7 8.57 11.3 45.8
19 "t`" Singleton 9 12.9 10.5 8.69 9.20 41.3
20 "t`:" Geminate 9 13.1 13.1 8.39 10.6 45.2
Thanks.
You could check that the values for the two columns are unique throughout the dataset
df = df.drop_duplicates(subset=['C2_xsampa','Consonant'])
You can get the inverse df[~df] to get the rows that are incorrect
edit just saw the r language tag
I believe distinct(select(df, C2_xsampa, Consonant)) will do
It seems there are unnecessary symbols and spaces in some of the values of C2_xsampa. Here is a suggestion using {tidyverse}. First, it removes the symbols/spaces and then identifies duplicated rows by C2_xsampa and Consonant. You can filter the duplicated rows using dup column.
library(tidyverse)
dat1 <- dat %>%
mutate(C2_xsampa = str_trim(C2_xsampa)) %>%
group_by(C2_xsampa, Consonant) %>%
mutate(dup = n()) %>%
ungroup()
dat1
# # A tibble: 20 x 9
# C2_xsampa Consonant Speaker C1.dn C2.dn V1.dn V2.dn total.dn dup
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 d_d Singleton 8.5 11.9 7.82 13 7.65 40.3 1
# 2 d_d: Geminate 9 11.6 11.9 11.4 7.46 42.3 1
# 3 dZ Singleton 8.31 7.79 7.47 14.9 9.81 40 1
# 4 dZ: Geminate 8.08 7.72 13.4 12.8 9.61 43.6 1
# 5 g Singleton 9 12.1 11.3 11.9 8.56 43.9 1
# 6 g: Geminate 8.69 11.3 11.1 12.7 10.2 45.3 1
# 7 k Singleton 9.5 12.3 14.4 9.71 6.97 43.4 1
# 8 k: Geminate 9 14.7 16.1 10.1 7.37 48.2 1
# 9 l Singleton 8.69 11.9 6.33 11.5 10.2 40 1
# 10 l: Geminate 8.81 11.3 10 10 11.5 42.8 1
# 11 m Singleton 8.36 13.6 9.11 11.1 9.2 43 1
# 12 m: Geminate 8.85 13.7 10.9 9.95 8.42 43 2
# 13 m: Geminate 14 14.6 12.4 5.66 5.01 37.7 2
# 14 n Singleton 8 15.1 4.44 11.6 8.99 40.2 1
# 15 n: Geminate 8.21 21.4 10.1 10.2 9.32 51 2
# 16 n: Geminate 11.3 32 10.4 8.09 7.94 58.5 2
# 17 p Singleton 8.4 11.2 11.9 7.98 6.53 37.7 1
# 18 p: Geminate 8.81 13.2 12.7 8.57 11.3 45.8 1
# 19 t` Singleton 9 12.9 10.5 8.69 9.2 41.3 1
# 20 t`: Geminate 9 13.1 13.1 8.39 10.6 45.2 1
Here is the code for the dataset:
dat <- read.table(
text = '
C2_xsampa Consonant Speaker C1.dn C2.dn V1.dn V2.dn total.dn
1 "d_d" Singleton 8.5 11.9 7.82 13.0 7.65 40.3
2 "d_d:" Geminate 9 11.6 11.9 11.4 7.46 42.3
3 "dZ" Singleton 8.31 7.79 7.47 14.9 9.81 40.0
4 "dZ:" Geminate 8.08 7.72 13.4 12.8 9.61 43.6
5 "g" Singleton 9 12.1 11.3 11.9 8.56 43.9
6 "g:" Geminate 8.69 11.3 11.1 12.7 10.2 45.3
7 "k" Singleton 9.5 12.3 14.4 9.71 6.97 43.4
8 "k:" Geminate 9 14.7 16.1 10.1 7.37 48.2
9 "l" Singleton 8.69 11.9 6.33 11.5 10.2 40.0
10 "l:" Geminate 8.81 11.3 10.0 10.0 11.5 42.8
11 "m" Singleton 8.36 13.6 9.11 11.1 9.20 43.0
12 "m:" Geminate 8.85 13.7 10.9 9.95 8.42 43.0
13 "m: " Geminate 14 14.6 12.4 5.66 5.01 37.7
14 "n" Singleton 8 15.1 4.44 11.6 8.99 40.2
15 "n:" Geminate 8.21 21.4 10.1 10.2 9.32 51.0
16 "n: " Geminate 11.3 32.0 10.4 8.09 7.94 58.5
17 "p" Singleton 8.4 11.2 11.9 7.98 6.53 37.7
18 "p:" Geminate 8.81 13.2 12.7 8.57 11.3 45.8
19 "t`" Singleton 9 12.9 10.5 8.69 9.20 41.3
20 "t`:" Geminate 9 13.1 13.1 8.39 10.6 45.2',
header = TRUE
)
My favorite approach for this is:
subset(dat, duplicated(C2_xsampa) | duplicated(rev(C2_xsampa))
I am trying to work with this a data that has information about price electricity. The price is registered each 5 minutes. My objective is replace the negative values with the mean of the day.
year month day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
2009 7 1 1 16.9 17.6 16.7 15.7 15.5
2009 7 1 2 17.7 18.8 17.8 -16.1 15.5
2009 7 1 3 -17.7 18.6 18.1 15.9 15.4
2009 7 1 4 16.7 18.6 -17.6 14.3 12.8
2009 7 2 1 -15.6 17.6 16.3 13.2 11.8
2009 7 2 2 13.7 15.7 12.0 -11.1 -12.9
2009 7 2 3 13.7 15.8 11.9 11.1 12.9
2009 7 2 4 -13.9 16.1 -12.1 11.2 12.9
2009 8 1 1 13.8 16.0 12.2 11.2 12.8
2009 8 1 2 -13.7 16.3 11.6 10.6 12.6
2009 8 1 3 13.7 -15.8 11.9 11.0 12.7
2009 8 1 4 13.8 16.0 12.1 11.2 12.9
2009 8 2 1 17.6 -17.6 17.3 16.5 17.1
2009 8 2 2 17.7 17.6 17.3 16.8 17.4
2009 8 2 3 15.8 16.0 15.1 15.0 15.5
2009 8 2 4 -15.4 15.6 14.5 14.6 15.1
2009 9 1 1 14.7 15.0 13.8 14.0 14.5
2009 9 1 2 15.3 15.4 14.3 14.6 15.0
2009 9 1 3 15.3 15.6 14.4 14.5 15.0
2009 9 1 4 14.9 15.7 13.7 13.8 14.5
In order to obtain the mean of each day I use the following code
Daily_mean<-Base %>%
arrange(year, month, day, fivemin) %>% #we are ordering the data
group_by(year, month, day)%>%
summarise_at(
vars(c(rrp_nsw, rrp_qld, rrp_sa, rrp_tas, rrp_vic)),
.funs = funs(mean(.)))
When I get the daily mean I want to replace each negative value with the mean of the day. For example using the 16th observation
2009 8 2 4 "8.925" 15.6 14.5 14.6 15.1
If someone can help me i would be grateful
We can use a replace in mutate_at to change the negative values to mean of that column after grouping by the relevant columns
library(dplyr)
Base %>%
arrange(year, month, day, fivemin) %>%
group_by(year, month, day) %>%
mutate_at(vars(rrp_nsw, rrp_qld, rrp_sa, rrp_tas, rrp_vic),
~ replace(., . < 0, mean(.)))
I am trying to scrape NBA data from https://www.basketball-reference.com/leagues/NBA_2019.html, but I am running into issues where BeautifulSoup drops deeply nested tags.
I tried to use soup.find(id='opponent-stats-per_game') to grab the "Opponent Per Game Stats" table. However, I am getting None result. If I try to instead find a div that is higher up in the tree, then it clips the more deep children.
Could someone please offer me some guidance on how this works? I am fairly new to web scraping using BeautifulSoup
The reference.com sites are partially dynamic. I had the same issue a long while back when trying to figure out football-reference.com
There's a couple ways to handle it. One is to use Selenium to render the page first, and then you can go in and grab the tables. Now you can still use BeautifulSoup to get it, but whenever I see <table> tags, my first initial try is to use pandas and .read_html(), as that'll do most of the work for you on the tables.
This will return a list of dataframes. It's then just a matter of finding which dataframe you want, and then possibly do a little manipulation of column names and what-not to get it the way you need.
Doing this, your opponent stats per game table was in index position 19:
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
page_url = 'https://www.basketball-reference.com/leagues/NBA_2019.html'
driver.get(page_url)
tables = pd.read_html(driver.page_source)
opp_per_gm_df = tables[19]
driver.quit()
Output:
print (opp_per_gm_df)
Rk Team G MP FG ... STL BLK TOV PF PTS
0 1.0 Memphis Grizzlies 77 242.3 37.2 ... 7.7 4.9 15.5 21.7 105.6
1 2.0 Miami Heat 77 240.3 38.2 ... 7.4 4.8 14.2 20.3 105.6
2 3.0 Indiana Pacers* 78 240.3 38.7 ... 7.5 5.2 15.6 20.1 104.3
3 4.0 Utah Jazz* 77 240.6 39.7 ... 8.6 4.7 13.9 22.2 106.1
4 5.0 Denver Nuggets* 77 240.6 39.6 ... 7.5 5.0 13.5 20.5 106.9
5 6.0 Detroit Pistons 77 242.3 40.0 ... 6.9 5.2 14.1 21.5 107.5
6 7.0 Orlando Magic 78 241.3 39.9 ... 6.9 4.4 13.0 18.8 106.5
7 8.0 Boston Celtics* 78 241.3 39.5 ... 6.8 3.8 15.2 19.6 108.0
8 9.0 Toronto Raptors* 78 242.2 40.2 ... 7.7 4.5 15.1 20.6 108.4
9 10.0 Dallas Mavericks 77 241.0 40.9 ... 7.9 4.6 13.1 23.4 109.9
10 11.0 Milwaukee Bucks* 78 241.3 40.3 ... 6.9 4.9 13.4 20.0 108.6
11 12.0 Portland Trail Blazers* 77 242.3 41.1 ... 7.3 5.1 12.4 20.8 110.5
12 13.0 Houston Rockets* 78 241.9 40.4 ... 7.4 4.6 15.0 20.1 109.3
13 14.0 Golden State Warriors* 77 241.6 40.3 ... 7.6 3.7 13.5 19.8 111.4
14 15.0 San Antonio Spurs* 78 241.6 41.6 ... 7.2 4.1 12.2 19.7 110.4
15 16.0 Philadelphia 76ers* 77 241.6 41.5 ... 7.9 4.0 12.9 22.3 112.2
16 17.0 Charlotte Hornets 77 241.9 42.0 ... 7.1 6.1 13.6 20.6 112.2
17 18.0 Oklahoma City Thunder* 78 242.2 40.8 ... 8.2 5.1 16.9 22.6 110.8
18 19.0 Brooklyn Nets 78 243.8 42.2 ... 7.8 5.4 13.5 22.3 112.5
19 20.0 Minnesota Timberwolves 77 241.9 42.0 ... 6.6 5.6 14.7 22.0 114.0
20 21.0 New York Knicks 77 241.3 42.0 ... 7.4 5.7 13.4 21.0 114.1
21 22.0 Chicago Bulls 78 242.9 42.1 ... 7.5 5.6 13.5 18.9 113.4
22 23.0 Los Angeles Clippers* 78 241.6 41.4 ... 8.2 5.9 13.1 24.0 113.4
23 24.0 Los Angeles Lakers 78 241.3 42.1 ... 8.3 5.1 14.3 21.0 113.7
24 25.0 Cleveland Cavaliers 78 241.0 43.0 ... 6.9 5.6 12.5 19.6 113.9
25 26.0 Sacramento Kings 78 240.6 41.9 ... 7.7 5.1 15.9 21.6 114.9
26 27.0 Phoenix Suns 78 242.2 42.2 ... 9.1 5.0 15.6 20.7 116.3
27 28.0 New Orleans Pelicans 78 240.6 43.2 ... 8.4 5.4 13.8 21.3 116.5
28 29.0 Washington Wizards 78 243.2 43.3 ... 7.8 4.6 15.9 21.4 116.9
29 30.0 Atlanta Hawks 78 242.2 42.6 ... 9.9 5.4 15.1 22.0 118.8
30 NaN League Average 78 241.7 41.0 ... 7.7 5.0 14.2 21.0 111.1
[31 rows x 25 columns]
I have a dataset of timeseries (30 years). I did a subset for the month and the date I want (shown below in the code). Is there a way to do a loop for each month and the days in those month? Also, is there a way to save the plots automatically, in different folders corresponding to each month? Right now I am doing it manually by changing the month and date which corresponds to dfOct31all <- df [ which(df$Month==10 & df$Day==31), ]in the code below then plotting and saving it. By the way, I'm using RStudio.
Can someone please guide me?
Thanks!
setwd("WDir")
df <- read.csv("Velocity.csv", header = TRUE)
attach(df)
#Day 31
dfOct31all <- df [ which(df$Month==10 & df$Day==31), ]
dfall31Mbs <- dfOct31all[c(-1,-2,-3)]
densities <- lapply(dfall31Mbs, density)
par(mfcol=c(5,5), oma=c(1,1,0,0), mar=c(1,1,1,0), tcl=-0.1, mgp=c(0,0,0))
plot(densities[[1]], col="black",main = "1000mb",xlab=NA,ylab=NA)
plot(densities[[2]], col="black",main="925mb",xlab=NA,ylab=NA)
plot(densities[[3]], col="black",main="850mb",xlab=NA,ylab=NA)
plot(densities[[4]], col="black",main="700mb",xlab=NA,ylab=NA)
plot(densities[[5]], col="black",main="600mb",xlab=NA,ylab=NA)
plot(densities[[6]], col="black",main="500mb",xlab=NA,ylab=NA)
plot(densities[[7]], col ="black",main="400mb",xlab=NA,ylab=NA)
plot(densities[[8]], col="black",main="300mb",xlab=NA,ylab=NA)
plot(densities[[9]], col="black",main="250mb",xlab=NA,ylab=NA)
plot(densities[[10]], col="black",main="200mb",xlab=NA,ylab=NA)
plot(densities[[11]], col= "black",main="150mb",xlab=NA,ylab=NA)
plot(densities[[12]], col= "black",main="100mb",xlab=NA,ylab=NA)
plot(densities[[13]], col = "black",main="70mb",xlab=NA,ylab=NA)
plot(densities[[14]], col="black",main="50mb",xlab=NA,ylab=NA)
plot(densities[[15]], col="black",main="30mb",xlab=NA,ylab=NA)
plot(densities[[16]], col = "black",main="20mb",xlab=NA,ylab=NA)
plot(densities[[17]], col="black",main="10mb",xlab=NA,ylab=NA)
Snippet of data is shown as well
Year Month Day 1000mb 925mb 850mb 700mb 600mb 500mb 400mb 300mb 250mb 200mb 150mb 100mb 70mb 50mb 30mb 20mb 10mb
1984 10 31 6 6.6 7.9 11.5 14.6 17 20.8 25.8 26.4 25.3 24.4 22.7 19.9 19.2 20.4 24.8 30.8
1985 10 31 5.8 7.1 7.7 11.5 14.7 17.3 25.3 32.6 32.9 32.4 27.1 20.9 14.2 9.7 6.4 7.3 7.4
1986 10 31 4.3 6.1 7.7 11.3 18.4 26.3 34.4 44.5 48.9 46.2 34.5 20.4 13.8 13.2 21.7 31 46.4
1987 10 31 2.2 2.9 4 7 9 13.9 19.9 25.8 26.6 23.7 17.3 12 7 3.1 1.7 5.8 14.1
1988 10 31 2.5 2.1 2.3 6.5 6.4 5.1 7.4 12.1 13.4 16.1 16.7 15.2 8.8 5 2.8 6.2 8.9
1989 10 31 3.4 4 4.7 4.4 4.1 4 4.6 4.8 5.9 5.6 10.9 13.9 12.3 10.4 8.1 8 8
1990 10 31 4 4.9 7.5 14.6 19 21.9 25.7 28.3 29.4 29.2 27.3 18 12.6 10.1 9 12 19.9
1991 10 31 2.8 3.2 4 10.8 12.1 11.2 9.9 9.1 9.9 12.8 18 17.5 10.4 6.3 4.2 7.6 11.7
1992 10 31 5.9 6.9 7.9 13.1 17.9 25.2 34.6 47.3 53.3 53 42.4 21.3 11.6 6 4.6 8.5 12.8
1993 10 31 2.3 1.5 0.4 3.6 6.3 10.1 14.3 19.1 21.6 21.8 18.4 13.6 12.3 9.5 6.9 11 18.1
1994 10 31 2 2.2 3.8 11.6 17 19.8 23.6 24.9 25.5 26.2 28.4 25.2 16.7 13.6 9.3 8.3 9.8
1995 10 31 1.5 2 3.4 7.6 9.1 11.2 13.7 17.9 20.3 21.7 21.1 16.7 13 12.1 14.9 21.4 27.3
1996 10 31 1.9 2.4 3.5 8 11.7 17.4 26.4 35.6 33.3 24.6 12.4 4.1 0.5 3.4 7.2 9.4 11.6
1997 10 31 3.7 4.8 7.8 19.2 24.6 29.6 35.6 41 41.8 42 37.9 23.7 11.2 8.6 4.2 3.8 7
1998 10 31 0.7 1.1 0.9 4.8 8.4 11.4 14 25.3 29.7 25.2 15.9 6.6 2.1 1 4.5 8.9 6.1
1999 10 31 1.9 1.6 2.4 10.7 15.3 19 23.2 29 32.4 31.9 28 20.3 10.8 9.4 12 14.5 16.9
2000 10 31 5.1 5.8 6.7 12.8 18.2 23.9 29.9 40.7 42.2 33.7 23.5 12.7 2.6 1.6 3.8 4.7 5.1
2001 10 31 5.7 6.1 7.1 10.1 10.8 14.7 18.3 22.8 22.3 22.2 22 14 9.5 6.6 5.2 6.5 8.6
2002 10 31 1.4 1.6 1.8 9.2 14.5 19.5 24.8 30 30.5 27.6 22.2 13.9 9.1 7.1 8.5 16.1 23.8
2003 10 31 1.5 1.3 0.7 1 3.5 6 11.7 21.5 21.9 22.9 23 20.7 15.8 12.5 14.5 20.1 26
2004 10 31 5.4 5.6 6.9 14.4 23.3 33.3 46.1 60.9 62.1 54.6 42.9 28 17.3 12.3 10.1 13.6 13.3
2005 10 31 1.7 1.3 3 10.3 15.8 19.5 21.1 22.8 24.1 24.5 24.5 20.6 13.5 10.7 10 10.7 10.4
2006 10 31 2.3 1.5 1.7 8.7 12.5 15.9 18.7 20.5 21.8 24.3 29.9 25.3 18.3 12.8 7.7 8.8 12.4
2007 10 31 3.7 2.7 2.3 2.2 2.6 4.2 6.5 11.9 15.9 19.6 17.2 9.5 6.9 5.7 4.9 5.8 11.7
2008 10 31 7.7 10.8 14.3 20.3 23 25.8 27.4 32.1 35.4 34.8 25.8 13.2 7.1 2.9 2.6 3.4 6
2009 10 31 0.5 0.2 2 9.3 13.5 17.6 18.8 20.8 21.4 21.2 18.9 14.2 11.1 6.4 1.9 3 8
2010 10 31 5.6 6.8 8.5 13.4 16.5 20.3 23.8 26.8 31 28.1 24 15.7 9.9 7 4.8 3.9 1.8
2011 10 31 5.9 6.7 5.6 7.9 10.3 11.8 12.5 16.2 19.5 21.4 17.9 13.2 9.6 7.9 8 8.3 10.8
2012 10 31 4.8 6.3 9.4 19.5 24.2 27.2 27.5 27.3 27.7 30.7 27.5 16.7 10 7.6 8 13.8 19.7
2013 10 31 1.4 1.9 3.9 9.1 13.1 17.3 22.9 29.7 30.4 27.3 23.5 18.2 13.1 6.3 4.4 2.4 9.4
I wrote it out for each day rather than doing a loop.
I have the following data.
HEIrank1
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 41.8 147.6 90.3 82.9 106.8 63.0
2 MO 20.0 20.8 21.1 20.9 12.6 20.6
3 SD 21.2 32.3 25.7 23.9 25.0 40.1
4 UN 51.8 39.8 19.9 20.9 21.6 22.5
5 WS 18.0 19.9 15.3 13.6 15.7 15.2
6 BF 11.5 36.9 20.0 23.2 18.2 23.8
7 ME 34.2 30.3 28.4 30.1 31.5 25.6
8 IM 7.7 18.1 20.5 14.6 17.2 17.1
9 OM 11.4 11.2 12.2 11.1 13.4 19.2
10 DC 14.3 28.7 20.1 17.0 22.3 16.2
11 OC 28.6 44.0 24.9 27.9 34.0 30.7
12 TH 7.4 10.0 5.8 8.8 8.7 8.6
13 CC 12.1 11.0 12.2 12.1 14.9 15.0
14 MM 11.7 24.2 18.4 18.6 31.9 31.7
15 MC 19.0 13.7 17.0 20.4 20.5 12.1
16 SH 11.4 24.8 26.1 12.7 19.9 25.9
17 SB 13.0 22.8 15.9 17.6 17.2 9.6
18 SN 11.5 18.6 22.9 12.0 20.3 11.6
19 ER 10.8 13.2 20.0 11.0 14.9 14.2
20 SL 44.9 21.6 21.3 26.5 17.0 8.0
I try following commends to draw regression line for each HEIs.
year <- c(2007 , 2008 , 2009 , 2010 , 2011, 2012)
op <- as.numeric(HEIrank1[1,])
lm.r <- lm(op~year)
plot(year, op)
abline(lm.r)
I want to draw to draw regression line for each college in one graph and I do not how.can you help me.
Here's my approach with ggplot2 but the graph is uninterpretable with that many lines.
library(ggplot2);library(reshape2)
mdat <- melt(HEIrank1, variable.name="year")
mdat$year <- as.numeric(substring(mdat$year, 2))
ggplot(mdat, aes(year, value, colour=HEI.ID, group=HEI.ID)) +
geom_point() + stat_smooth(se = FALSE, method="lm")
Faceting may be a better way to got:
ggplot(mdat, aes(year, value, group=HEI.ID)) +
geom_point() + stat_smooth(se = FALSE, method="lm") +
facet_wrap(~HEI.ID)