I'm trying to remove an outlier from a data matrix. The original matrix is called Westdata and I want to remove row 51.
I've tried the following line of code but it doesn't remove the outlier and the new matrix is identical to the old one.
Westdata.Outlier<-Westdata[-51,]
Westdata.Outlier
State Region Pay Spend Area
20 Mont. MN 22.5 3.95 West
21 Wyo. MN 27.2 5.44 West
22 N.Mex. MN 22.6 3.40 West
23 Utah MN 22.3 2.30 West
24 Wash. PA 26.0 3.71 West
25 Calif. PA 29.1 3.61 West
26 Hawaii PA 25.8 3.77 West
46 Idaho MN 21.0 2.51 West
47 Colo. MN 25.9 4.04 West
48 Ariz. MN 26.6 2.83 West
49 Nev. MN 25.6 2.93 West
50 Oreg. PA 25.8 4.12 West
51 Alaska PA 41.5 8.35 West
Any suggestions?
Related
I have weather dataset my data is date-dependent
I want to predict the temperature from 07 May 2008 until 18 May 2008 (which is maybe a total of 10-15 observations) my data size is around 200
I will be using decision tree/RF and SVM & NN to make my prediction
I've never handled data like this so I'm not sure how to sample non random data
I want to sample data 80% train data and 30% test data but I want to sample the data in the original order not randomly. Is that possible ?
install.packages("rattle")
install.packages("RGtk2")
library("rattle")
seed <- 42
set.seed(seed)
fname <- system.file("csv", "weather.csv", package = "rattle")
dataset <- read.csv(fname, encoding = "UTF-8")
dataset <- dataset[1:200,]
dataset <- dataset[order(dataset$Date),]
set.seed(321)
sample_data = sample(nrow(dataset), nrow(dataset)*.8)
test<-dataset[sample_data,] # 30%
train<-dataset[-sample_data,] # 80%
output
> head(dataset)
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
1 2007-11-01 Canberra 8.0 24.3 0.0 3.4 6.3 NW 30
2 2007-11-02 Canberra 14.0 26.9 3.6 4.4 9.7 ENE 39
3 2007-11-03 Canberra 13.7 23.4 3.6 5.8 3.3 NW 85
4 2007-11-04 Canberra 13.3 15.5 39.8 7.2 9.1 NW 54
5 2007-11-05 Canberra 7.6 16.1 2.8 5.6 10.6 SSE 50
6 2007-11-06 Canberra 6.2 16.9 0.0 5.8 8.2 SE 44
WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
1 SW NW 6 20 68 29 1019.7
2 E W 4 17 80 36 1012.4
3 N NNE 6 6 82 69 1009.5
4 WNW W 30 24 62 56 1005.5
5 SSE ESE 20 28 68 49 1018.3
6 SE E 20 24 70 57 1023.8
Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
1 1015.0 7 7 14.4 23.6 No 3.6 Yes
2 1008.4 5 3 17.5 25.7 Yes 3.6 Yes
3 1007.2 8 7 15.4 20.2 Yes 39.8 Yes
4 1007.0 2 7 13.5 14.1 Yes 2.8 Yes
5 1018.5 7 7 11.1 15.4 Yes 0.0 No
6 1021.7 7 5 10.9 14.8 No 0.2 No
> head(test)
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
182 2008-04-30 Canberra -1.8 14.8 0.0 1.4 7.0 N 28
77 2008-01-16 Canberra 17.9 33.2 0.0 10.4 8.4 N 59
88 2008-01-27 Canberra 13.2 31.3 0.0 6.6 11.6 WSW 46
58 2007-12-28 Canberra 15.1 28.3 14.4 8.8 13.2 NNW 28
96 2008-02-04 Canberra 18.2 22.6 1.8 8.0 0.0 ENE 33
126 2008-03-05 Canberra 12.0 27.6 0.0 6.0 11.0 E 46
WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
182 E N 2 19 80 40 1024.2
77 N NNE 15 20 58 62 1008.5
88 N WNW 4 26 71 28 1013.1
58 NNW NW 6 13 73 44 1016.8
96 SSE ENE 7 13 92 76 1014.4
126 SSE WSW 7 6 69 35 1025.5
Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
182 1020.5 1 7 5.3 13.9 No 0.0 No
77 1006.1 6 7 24.5 23.5 No 4.8 Yes
88 1009.5 1 4 19.7 30.7 No 0.0 No
58 1013.4 1 5 18.3 27.4 Yes 0.0 No
96 1011.5 8 8 18.5 22.1 Yes 9.0 Yes
126 1022.2 1 1 15.7 26.2 No 0.0 No
> head(train)
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
7 2007-11-07 Canberra 6.1 18.2 0.2 4.2 8.4 SE 43
9 2007-11-09 Canberra 8.8 19.5 0.0 4.0 4.1 S 48
11 2007-11-11 Canberra 9.1 25.2 0.0 4.2 11.9 N 30
16 2007-11-16 Canberra 12.4 32.1 0.0 8.4 11.1 E 46
22 2007-11-22 Canberra 16.4 19.4 0.4 9.2 0.0 E 26
25 2007-11-25 Canberra 15.4 28.4 0.0 4.4 8.1 ENE 33
WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
7 SE ESE 19 26 63 47 1024.6
9 E ENE 19 17 70 48 1026.1
11 SE NW 6 9 74 34 1024.4
16 SE WSW 7 9 70 22 1017.9
22 ENE E 6 11 88 72 1010.7
25 SSE NE 9 15 85 31 1022.4
Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
7 1022.2 4 6 12.4 17.3 No 0.0 No
9 1022.7 7 7 14.1 18.9 No 16.2 Yes
11 1021.1 1 2 14.6 24.0 No 0.2 No
16 1012.8 0 3 19.1 30.7 No 0.0 No
22 1008.9 8 8 16.5 18.3 No 25.8 Yes
25 1018.6 8 2 16.8 27.3 No 0.0 No
I use mtcars as an example. An option to non-randomly split your data in train and test is to first create a sample size based on the number of rows in your data. After that you can use split to split the data exact at the 80% of your data. You using the following code:
smp_size <- floor(0.80 * nrow(mtcars))
split <- split(mtcars, rep(1:2, each = smp_size))
With the following code you can turn the split in train and test:
train <- split$`1`
test <- split$`2`
Let's check the number of rows:
> nrow(train)
[1] 25
> nrow(test)
[1] 7
Now the data is split in train and test without losing their order.
It work if Xpath using contains function
response.xpath('//table[contains(#class, "wikitable sortable")]')
However it returns a empty using code below:
response.xpath('//table[#class="wikitable sortable jquery-tablesorter"]')
Any explanation about why it return an empty list?
For more information, I'm trying to extract territory rankings table from this site https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population as practice.
You can extract territory rankings table easily using only pandas as follows:
Code:
import pandas as pd
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population',attrs={'class':'wikitable sortable'})
df = dfs[0]#.to_csv('d.csv')
print(df)
Output:
Rank State or territory ... % of the total U.S. pop.[d] % of Elec. Coll.
'20 '10 State or territory ... 2010 Ch.2010–2020 % of Elec. Coll.
0 1.0 1.0 California ... 11.91% –0.11%
10.04%
1 2.0 2.0 Texas ... 8.04% 0.66%
7.43%
2 3.0 4.0 Florida ... 6.01% 0.42%
5.58%
3 4.0 3.0 New York ... 6.19% –0.17%
5.20%
4 5.0 6.0 Pennsylvania ... 4.06% –0.18%
3.53%
5 6.0 5.0 Illinois ... 4.10% –0.28%
3.53%
6 7.0 7.0 Ohio ... 3.69% –0.17%
3.16%
7 8.0 9.0 Georgia ... 3.10% 0.10%
2.97%
8 9.0 10.0 North Carolina ... 3.05% 0.07%
2.97%
9 10.0 8.0 Michigan ... 3.16% –0.15%
2.79%
10 11.0 11.0 New Jersey ... 2.81% –0.04%
2.60%
11 12.0 12.0 Virginia ... 2.56% 0.02%
2.42%
12 13.0 13.0 Washington ... 2.15% 0.15%
2.23%
13 14.0 16.0 Arizona ... 2.04% 0.09%
2.04%
14 15.0 14.0 Massachusetts ... 2.09% 0.00%
2.04%
15 16.0 17.0 Tennessee ... 2.03% 0.03%
2.04%
16 17.0 15.0 Indiana ... 2.07% –0.05%
2.04%
17 18.0 19.0 Maryland ... 1.85% –0.00%
1.86%
18 19.0 18.0 Missouri ... 1.91% –0.08%
1.86%
19 20.0 20.0 Wisconsin ... 1.82% –0.06%
1.86%
20 21.0 22.0 Colorado ... 1.61% 0.12%
1.86%
21 22.0 21.0 Minnesota ... 1.70% 0.01%
1.86%
22 23.0 24.0 South Carolina ... 1.48% 0.05%
1.67%
23 24.0 23.0 Alabama ... 1.53% –0.03%
1.67%
24 25.0 25.0 Louisiana ... 1.45% –0.06%
1.49%
25 26.0 26.0 Kentucky ... 1.39% –0.04%
1.49%
26 27.0 27.0 Oregon ... 1.22% 0.04%
1.49%
27 28.0 28.0 Oklahoma ... 1.20% –0.02%
1.30%
28 29.0 30.0 Connecticut ... 1.14% –0.07%
1.30%
29 30.0 29.0 Puerto Rico ... 1.19% –0.21%
—
30 31.0 35.0 Utah ... 0.88% 0.09%
1.12%
31 32.0 31.0 Iowa ... 0.97% –0.02%
1.12%
32 33.0 36.0 Nevada ... 0.86% 0.06%
1.12%
33 34.0 33.0 Arkansas ... 0.93% –0.03%
1.12%
34 35.0 32.0 Mississippi ... 0.95% –0.06%
1.12%
35 36.0 34.0 Kansas ... 0.91% –0.04%
1.12%
36 37.0 37.0 New Mexico ... 0.66% –0.03%
0.93%
37 38.0 39.0 Nebraska ... 0.58% 0.00%
0.93%
38 39.0 40.0 Idaho ... 0.50% 0.05%
0.74%
39 40.0 38.0 West Virginia ... 0.59% –0.06%
0.74%
40 41.0 41.0 Hawaii ... 0.43% 0.00%
0.74%
41 42.0 43.0 New Hampshire ... 0.42% –0.01%
0.74%
42 43.0 42.0 Maine ... 0.42% –0.02%
0.74%
43 44.0 44.0 Rhode Island ... 0.34% –0.01%
0.74%
44 45.0 45.0 Montana ... 0.32% 0.01%
0.74%
45 46.0 46.0 Delaware ... 0.29% 0.01%
0.56%
46 47.0 47.0 South Dakota ... 0.26% 0.00%
0.56%
47 48.0 49.0 North Dakota ... 0.21% 0.02%
0.56%
48 49.0 48.0 Alaska ... 0.23% –0.01%
0.56%
49 50.0 51.0 District of Columbia ... 0.19% 0.01% 0.56%
50 51.0 50.0 Vermont ... 0.20% –0.01% 0.56%
51 52.0 52.0 Wyoming ... 0.18% –0.01% 0.56%
52 53.0 53.0 Guam[8] ... 0.05% –0.00% —
53 54.0 54.0 U.S. Virgin Islands[9] ... 0.03% –0.00% —
54 55.0 55.0 American Samoa[10] ... 0.02% –0.00% —
55 56.0 56.0 Northern Mariana Islands[11] ... 0.02% –0.00% —
56 NaN NaN Contiguous United States ... 98.03% 0.23% 98.70%
57 NaN NaN The fifty states ... 98.50% 0.21% 99.44%
58 NaN NaN The fifty states and D.C. ... 98.69% 0.22% 100.00%
59 NaN NaN Total United States ... — — —
[60 rows x 16 columns]
I have a dataframe which, summarised, looks like this:
CEMETERY SEX CONTEXT RaHD.L RaHD.R
1 Medieval-St. Mary Graces FEMALE 7172 21.2 21.6
2 Medieval-St. Mary Graces MALE 6225 23.9 25.2
3 Medieval-St. Mary Graces MALE 9987 23.9 23.5
4 Medieval-St. Mary Graces MALE 11475 22.4 22.3
5 Medieval-St. Mary Graces MALE 12356 25.8 25.4
6 Medieval-St. Mary Graces MALE 12525 22.4 22.3
7 Medieval-St. Mary Graces MALE 12785 22.9 22.6
8 Medieval-St. Mary Graces MALE 13840 22.5 22.9
9 Medieval-Spital Square FEMALE 383 21.5 22.0
10 Medieval-Spital Square MALE 31 23.3 22.0
17 Post-Medieval-Chelsea Old Church FEMALE 19 20.0 20.6
18 Post-Medieval-Chelsea Old Church FEMALE 31 19.5 20.0
19 Post-Medieval-Chelsea Old Church FEMALE 39 19.6 19.2
41 Post-Medieval-St. Thomas Hospital FEMALE 60 21.8 22.6
43 Post-Medieval-St. Thomas Hospital MALE 83 22.4 23.0
I want to change the vectors in the CEMETERY column to simply 'Medieval' and 'Post-Medieval', instead of having the entire cemetery name, or alternatively create a new column stating 'Medieval' or 'Post-medieval'.
We can use sub to capture the substring upto "Medieval" and then in the replacement use the backreference (\\1) for the captured substring
df1$CEMETERY <- sub("(.*(M|m)edieval).*", "\\1", df1$CEMETERY)
df1$CEMETERY
#[1] "Medieval" "Medieval" "Medieval" "Medieval"
#[5] "Medieval" "Medieval" "Medieval" "Medieval"
#[9] "Medieval" "Medieval" "Post-Medieval" "Post-Medieval"
#[13] "Post-Medieval" "Post-Medieval" "Post-Medieval"
In case the information on the location should be kept, there is an alternative approach which splits the CEMETERY column at the first hyphen after "Medieval" (which includes splitting after "Post-Medieval") and assigns the two parts to two columns PERIOD and CEMETERY:
library(data.table)
setDT(DF)[, c("PERIOD", "CEMETERY") := tstrsplit(CEMETERY, "(?<=Medieval)-", perl = TRUE)][]
CEMETERY SEX CONTEXT RaHD.L RaHD.R PERIOD
1: St. Mary Graces FEMALE 7172 21.2 21.6 Medieval
2: St. Mary Graces MALE 6225 23.9 25.2 Medieval
3: St. Mary Graces MALE 9987 23.9 23.5 Medieval
4: St. Mary Graces MALE 11475 22.4 22.3 Medieval
5: St. Mary Graces MALE 12356 25.8 25.4 Medieval
6: St. Mary Graces MALE 12525 22.4 22.3 Medieval
7: St. Mary Graces MALE 12785 22.9 22.6 Medieval
8: St. Mary Graces MALE 13840 22.5 22.9 Medieval
9: Spital Square FEMALE 383 21.5 22.0 Medieval
10: Spital Square MALE 31 23.3 22.0 Medieval
11: Chelsea Old Church FEMALE 19 20.0 20.6 Post-Medieval
12: Chelsea Old Church FEMALE 31 19.5 20.0 Post-Medieval
13: Chelsea Old Church FEMALE 39 19.6 19.2 Post-Medieval
14: St. Thomas Hospital FEMALE 60 21.8 22.6 Post-Medieval
15: St. Thomas Hospital MALE 83 22.4 23.0 Post-Medieval
The feature used in the regular expression to identify the correct hyphen to split on is called positive look-behind.
Data
DF <- readr::read_table(
" CEMETERY SEX CONTEXT RaHD.L RaHD.R
1 Medieval-St. Mary Graces FEMALE 7172 21.2 21.6
2 Medieval-St. Mary Graces MALE 6225 23.9 25.2
3 Medieval-St. Mary Graces MALE 9987 23.9 23.5
4 Medieval-St. Mary Graces MALE 11475 22.4 22.3
5 Medieval-St. Mary Graces MALE 12356 25.8 25.4
6 Medieval-St. Mary Graces MALE 12525 22.4 22.3
7 Medieval-St. Mary Graces MALE 12785 22.9 22.6
8 Medieval-St. Mary Graces MALE 13840 22.5 22.9
9 Medieval-Spital Square FEMALE 383 21.5 22.0
10 Medieval-Spital Square MALE 31 23.3 22.0
17 Post-Medieval-Chelsea Old Church FEMALE 19 20.0 20.6
18 Post-Medieval-Chelsea Old Church FEMALE 31 19.5 20.0
19 Post-Medieval-Chelsea Old Church FEMALE 39 19.6 19.2
41 Post-Medieval-St. Thomas Hospital FEMALE 60 21.8 22.6
43 Post-Medieval-St. Thomas Hospital MALE 83 22.4 23.0"
)[, -1]
I am trying to scrape from http://www.basketball-reference.com/teams/CHI/2015.html using rvest. I used selectorgadget and found the tag to be #advanced for the table I want. However, I noticed it wasn't picking it up. Looking at the page source, I noticed that the tables are inside an html comment tag <!--
What is the best way to get the tables from inside the comment tags? Thanks!
Edit: I am trying to pull the 'Advanced' table: http://www.basketball-reference.com/teams/CHI/2015.html#advanced::none
You can use the XPath comment() function to select comment nodes, then reparse their contents as HTML:
library(rvest)
# scrape page
h <- read_html('http://www.basketball-reference.com/teams/CHI/2015.html')
df <- h %>% html_nodes(xpath = '//comment()') %>% # select comment nodes
html_text() %>% # extract comment text
paste(collapse = '') %>% # collapse to a single string
read_html() %>% # reparse to HTML
html_node('table#advanced') %>% # select the desired table
html_table() %>% # parse table
.[colSums(is.na(.)) < nrow(.)] # get rid of spacer columns
df[, 1:15]
## Rk Player Age G MP PER TS% 3PAr FTr ORB% DRB% TRB% AST% STL% BLK%
## 1 1 Pau Gasol 34 78 2681 22.7 0.550 0.023 0.317 9.2 27.6 18.6 14.4 0.5 4.0
## 2 2 Jimmy Butler 25 65 2513 21.3 0.583 0.212 0.508 5.1 11.2 8.2 14.4 2.3 1.0
## 3 3 Joakim Noah 29 67 2049 15.3 0.482 0.005 0.407 11.9 22.1 17.1 23.0 1.2 2.6
## 4 4 Aaron Brooks 30 82 1885 14.4 0.534 0.383 0.213 1.9 7.5 4.8 24.2 1.5 0.6
## 5 5 Mike Dunleavy 34 63 1838 11.6 0.573 0.547 0.181 1.7 12.7 7.3 9.7 1.1 0.8
## 6 6 Taj Gibson 29 62 1692 16.1 0.545 0.000 0.364 10.7 14.6 12.7 6.9 1.1 3.2
## 7 7 Nikola Mirotic 23 82 1654 17.9 0.556 0.502 0.455 4.3 21.8 13.3 9.7 1.7 2.4
## 8 8 Kirk Hinrich 34 66 1610 6.8 0.468 0.441 0.131 1.4 6.6 4.1 13.8 1.5 0.6
## 9 9 Derrick Rose 26 51 1530 15.9 0.493 0.325 0.224 2.6 8.7 5.7 30.7 1.2 0.8
## 10 10 Tony Snell 23 72 1412 10.2 0.550 0.531 0.148 2.5 10.9 6.8 6.8 1.2 0.6
## 11 11 E'Twaun Moore 25 56 504 10.3 0.504 0.273 0.144 2.7 7.1 5.0 10.4 2.1 0.9
## 12 12 Doug McDermott 23 36 321 6.1 0.480 0.383 0.140 2.1 12.2 7.3 3.0 0.6 0.2
## 13 13 Nazr Mohammed 37 23 128 8.7 0.431 0.000 0.100 9.6 22.3 16.1 3.6 1.6 2.8
## 14 14 Cameron Bairstow 24 18 64 2.1 0.309 0.000 0.357 10.5 3.3 6.8 2.2 1.6 1.1
Ok..got it.
library(stringi)
library(knitr)
library(rvest)
any_version_html <- function(x){
XML::htmlParse(x)
}
a <- 'http://www.basketball-reference.com/teams/CHI/2015.html#advanced::none'
b <- readLines(a)
c <- paste0(b, collapse = "")
d <- as.character(unlist(stri_extract_all_regex(c, '<table(.*?)/table>', omit_no_match = T, simplify = T)))
e <- html_table(any_version_html(d))
> kable(summary(e),'rst')
====== ========== ====
Length Class Mode
====== ========== ====
9 data.frame list
2 data.frame list
24 data.frame list
21 data.frame list
28 data.frame list
28 data.frame list
27 data.frame list
30 data.frame list
27 data.frame list
27 data.frame list
28 data.frame list
28 data.frame list
27 data.frame list
30 data.frame list
27 data.frame list
27 data.frame list
3 data.frame list
====== ========== ====
kable(e[[1]],'rst')
=== ================ === ==== === ================== === === =================================
No. Player Pos Ht Wt Birth Date  Exp College
=== ================ === ==== === ================== === === =================================
41 Cameron Bairstow PF 6-9 250 December 7, 1990 au R University of New Mexico
0 Aaron Brooks PG 6-0 161 January 14, 1985 us 6 University of Oregon
21 Jimmy Butler SG 6-7 220 September 14, 1989 us 3 Marquette University
34 Mike Dunleavy SF 6-9 230 September 15, 1980 us 12 Duke University
16 Pau Gasol PF 7-0 250 July 6, 1980 es 13
22 Taj Gibson PF 6-9 225 June 24, 1985 us 5 University of Southern California
12 Kirk Hinrich SG 6-4 190 January 2, 1981 us 11 University of Kansas
3 Doug McDermott SF 6-8 225 January 3, 1992 us R Creighton University
## Realized we should index with some names...but this is somewhat cheating as we know the start and end indexes for table titles..I prefer to parse-in-the-dark.
# Names are in h2-tags
e_names <- as.character(unlist(stri_extract_all_regex(c, '<h2(.*?)/h2>', simplify = T)))
e_names <- gsub("<(.*?)>","",e_names[grep('Roster',e_names):grep('Salaries',e_names)])
names(e) <- e_names
kable(head(e$Salaries), 'rst')
=== ============== ===========
Rk Player Salary
=== ============== ===========
1 Derrick Rose $18,862,875
2 Carlos Boozer $13,550,000
3 Joakim Noah $12,200,000
4 Taj Gibson $8,000,000
5 Pau Gasol $7,128,000
6 Nikola Mirotic $5,305,000
=== ============== ===========
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I got this data here:
State Abb Region Change
3 Arizona AZ West 24.6
6 Colorado CO West 16.9
10 Florida FL South 17.6
11 Georgia GA South 18.3
13 Idaho ID West 21.1
29 Nevada NV West 35.1
34 North Carolina NC South 18.5
41 South Carolina SC South 15.3
44 Texas TX South 20.6
45 Utah UT West 23.8
I'm trying to extract a subset where Change > 40.
When I use
subset(uspopchange, rank(Change)>40)
it works
but when I use
subset(uspopchange, Change > 40)
it comes up with nothing.
Furthermore, if I use
subset(uspopchange, Change > 16.9)
it works also.
Why does it do that? Why do I need to user rank() to get my subset?
BTW: the data is from
install.packages("gcookbook")
> library(gcookbook)
> data(uspopchange)
> head(uspopchange[order(uspopchange$Change,decreasing=TRUE),])
State Abb Region Change
29 Nevada NV West 35.1
3 Arizona AZ West 24.6
45 Utah UT West 23.8
13 Idaho ID West 21.1
44 Texas TX South 20.6
34 North Carolina NC South 18.5
There are no rows with Change greater than 40. When you are using rank(Change) > 40 in your subset(), it is giving you the rows that, based on the value of Change, have a rank higher than 40. Since there are 50 rows in your data (Change has a length of 50), you are getting the rows that rank 41, 42, 43, ... , 50.
> Top10 <- subset(uspopchange, rank(Change)>40)
> Top10[order(Top10$Change,decreasing=TRUE),]
State Abb Region Change
29 Nevada NV West 35.1
3 Arizona AZ West 24.6
45 Utah UT West 23.8
13 Idaho ID West 21.1
44 Texas TX South 20.6
34 North Carolina NC South 18.5
11 Georgia GA South 18.3
10 Florida FL South 17.6
6 Colorado CO West 16.9
41 South Carolina SC South 15.3
##
> uspopchange[order(uspopchange$Change,decreasing=TRUE),][1:10,]
State Abb Region Change
29 Nevada NV West 35.1
3 Arizona AZ West 24.6
45 Utah UT West 23.8
13 Idaho ID West 21.1
44 Texas TX South 20.6
34 North Carolina NC South 18.5
11 Georgia GA South 18.3
10 Florida FL South 17.6
6 Colorado CO West 16.9
41 South Carolina SC South 15.3
Those are equivalent.