merging dataframes and repeating values (instead of NA) - r

I am attempting to create an administrative map that shows the number of students per county in California that qualify for reduced price lunch. I have two dataframes. ca_counties and mealpcnty as follows:
head(ca_counties)
long lat group order state county
6965 -121.4785 37.48290 157 6965 california alameda
6966 -121.5129 37.48290 157 6966 california alameda
6967 -121.8853 37.48290 157 6967 california alameda
6968 -121.8968 37.46571 157 6968 california alameda
6969 -121.9254 37.45998 157 6969 california alameda
6970 -121.9483 37.47717 157 6970 california alameda
head(mealpcnty)
county meal_stu long lat group
1 Alameda 3.97956 -122.0228 37.70349 157
2 Butte 5137.23140 -121.5874 39.66873 160
3 Calaveras 309.62441 -120.4988 38.30223 161
4 Contra Costa 2410.72933 -121.9598 38.02434 163
5 El Dorado 2699.99971 -120.4558 38.80930 165
6 Fresno 4105.86829 -119.6279 36.84119 166
What I would like to do is merge the mealpcnty df column meal_stu into the ca_counties df by "county" variable and repeat this number by the amount of times that county appears in the ca_county df. I have had no success. I have tried every version of merge() and still get a column full of NA values for meal_stu. I need the values to color in my map.
So, instead of the following
head(Meal.df)
county long lat group order state meal_stu
1 alameda -121.4785 37.48290 157 6965 california NA
2 alameda -121.5129 37.48290 157 6966 california NA
3 alameda -121.8853 37.48290 157 6967 california NA
4 alameda -121.8968 37.46571 157 6968 california NA
5 alameda -121.9254 37.45998 157 6969 california NA
6 alameda -121.9483 37.47717 157 6970 california NA
I would like to see the following: where the number of meal_stu associated with each county is repeated.
head(Meal.df)
county long lat group order state meal_stu
1 alameda -121.4785 37.48290 157 6965 california 3.97956
2 alameda -121.5129 37.48290 157 6966 california 3.97956
3 alameda -121.8853 37.48290 157 6967 california 3.97956
4 alameda -121.8968 37.46571 157 6968 california 3.97956
5 alameda -121.9254 37.45998 157 6969 california 3.97956
6 alameda -121.9483 37.47717 157 6970 california 3.97956

Related

Why when I join these two geographic datasets together do some values get filled with NAs?

I'm trying to join two datasets together- a dataset from Natural Earth subset to contain only countries in Europe (europe_map) and a list of locations in Europe (europe_places).
Here is the headers of the datasets:
europe_places
Simple feature collection with 23 features and 4 fields
geometry type: POINT
dimension: XY
bbox: xmin: -9.1393 ymin: 38.7223 xmax: 24.1052 ymax: 58.97
geographic CRS: WGS 84
First 10 features:
Location Year Country Continent geometry
1 Paris 2008 France Europe POINT (2.3522 48.8566)
2 Stavanger 2009 Norway Europe POINT (5.7331 58.97)
3 Paris 2009 France Europe POINT (2.3522 48.8566)
4 Berlin 2010 Germany Europe POINT (13.405 52.52)
5 Prague 2011 Czechia Europe POINT (14.4378 50.0755)
6 Piancavallo 2012 Italy Europe POINT (12.5166 46.10768)
7 Budapest 2012 Hungary Europe POINT (19.0402 47.4979)
8 Aprica 2013 Italy Europe POINT (10.15177 46.15486)
9 Vienna 2014 Austria Europe POINT (16.3738 48.2082)
10 Folgaria 2014 Italy Europe POINT (11.17205 45.9162)
europe_map
Simple feature collection with 6 features and 94 fields
geometry type: GEOMETRY
dimension: XY
bbox: xmin: -8.144824 ymin: 41.89756 xmax: 40.12832 ymax: 60.83188
geographic CRS: WGS 84
featurecla scalerank LABELRANK Country SOV_A3 ADM0_DIF LEVEL TYPE
5 Admin-0 country 6 6 Vatican VAT 0 2 Sovereign country
28 Admin-0 country 4 6 United Kingdom GB1 1 2 Country
29 Admin-0 country 4 6 United Kingdom GB1 1 2 Country
30 Admin-0 country 3 6 United Kingdom GB1 1 2 Country
31 Admin-0 country 1 2 United Kingdom GB1 1 2 Country
33 Admin-0 country 1 3 Ukraine UKR 0 2 Sovereign country
ADMIN ADM0_A3 GEOU_DIF GEOUNIT GU_A3 SU_DIF SUBUNIT SU_A3 BRK_DIFF
5 Vatican VAT 0 Vatican VAT 0 Vatican VAT 0
28 Jersey JEY 0 Jersey JEY 0 Jersey JEY 0
29 Guernsey GGY 0 Guernsey GGY 0 Guernsey GGY 0
30 Isle of Man IMN 0 Isle of Man IMN 0 Isle of Man IMN 0
31 United Kingdom GBR 0 United Kingdom GBR 0 United Kingdom GBR 0
33 Ukraine UKR 0 Ukraine UKR 0 Ukraine UKR 0
NAME NAME_LONG BRK_A3 BRK_NAME BRK_GROUP ABBREV POSTAL
5 Vatican Vatican VAT Vatican <NA> Vat. V
28 Jersey Jersey JEY Jersey Channel Islands Jey. JE
29 Guernsey Guernsey GGY Guernsey Channel Islands Guern. GG
30 Isle of Man Isle of Man IMN Isle of Man <NA> IoMan IM
31 United Kingdom United Kingdom GBR United Kingdom <NA> U.K. GB
33 Ukraine Ukraine UKR Ukraine <NA> Ukr. UA
FORMAL_EN FORMAL_FR NAME_CIAWF
5 State of the Vatican City <NA> Holy See (Vatican City)
28 Bailiwick of Jersey <NA> Jersey
29 Bailiwick of Guernsey <NA> Guernsey
30 <NA> <NA> Isle of Man
31 United Kingdom of Great Britain and Northern Ireland <NA> United Kingdom
33 Ukraine <NA> Ukraine
NOTE_ADM0 NOTE_BRK NAME_SORT NAME_ALT MAPCOLOR7 MAPCOLOR8 MAPCOLOR9
5 <NA> <NA> Vatican (Holy See) Holy See 1 3 4
28 U.K. crown dependency <NA> Jersey <NA> 6 6 6
29 U.K. crown dependency <NA> Guernsey <NA> 6 6 6
30 U.K. crown dependency <NA> Isle of Man <NA> 6 6 6
31 <NA> <NA> United Kingdom <NA> 6 6 6
33 <NA> <NA> Ukraine <NA> 5 1 6
MAPCOLOR13 POP_EST POP_RANK GDP_MD_EST POP_YEAR LASTCENSUS GDP_YEAR
5 2 1000 3 0 2015 NA 0
28 3 98840 8 5080 2017 2001 2015
29 3 66502 8 3465 2017 2001 2015
30 3 88815 8 7428 2017 2006 2014
31 3 64769452 16 2788000 2017 2011 2016
33 3 44033874 15 352600 2017 2001 2016
ECONOMY INCOME_GRP WIKIPEDIA FIPS_10_ ISO_A2 ISO_A3
5 2. Developed region: nonG7 2. High income: nonOECD 0 VT VA VAT
28 2. Developed region: nonG7 2. High income: nonOECD NA JE JE JEY
29 2. Developed region: nonG7 2. High income: nonOECD NA GK GG GGY
30 2. Developed region: nonG7 2. High income: nonOECD NA IM IM IMN
31 1. Developed region: G7 1. High income: OECD NA UK GB GBR
33 6. Developing region 4. Lower middle income NA UP UA UKR
ISO_A3_EH ISO_N3 UN_A3 WB_A2 WB_A3 WOE_ID WOE_ID_EH
5 VAT 336 336 <NA> <NA> 23424986 23424986
28 JEY 832 832 JG CHI 23424857 23424857
29 GGY 831 831 JG CHI 23424827 23424827
30 IMN 833 833 IM IMY 23424847 23424847
31 GBR 826 826 GB GBR -90 23424975
33 UKR 804 804 UA UKR 23424976 23424976
WOE_NOTE
5 Exact WOE match as country
28 Exact WOE match as country
29 Exact WOE match as country
30 Exact WOE match as country
31 Eh ID includes Channel Islands and Isle of Man. UK constituent countries of England (24554868), Wales (12578049), Scotland (12578048), and Northern Ireland (20070563).
33 Exact WOE match as country
ADM0_A3_IS ADM0_A3_US ADM0_A3_UN ADM0_A3_WB CONTINENT REGION_UN SUBREGION
5 VAT VAT NA NA Europe Europe Southern Europe
28 JEY JEY NA NA Europe Europe Northern Europe
29 GGY GGY NA NA Europe Europe Northern Europe
30 IMN IMN NA NA Europe Europe Northern Europe
31 GBR GBR NA NA Europe Europe Northern Europe
33 UKR UKR NA NA Europe Europe Eastern Europe
REGION_WB NAME_LEN LONG_LEN ABBREV_LEN TINY HOMEPART MIN_ZOOM MIN_LABEL
5 Europe & Central Asia 7 7 4 4 1 0 5.0
28 Europe & Central Asia 6 6 4 NA NA 0 5.0
29 Europe & Central Asia 8 8 6 NA NA 0 5.0
30 Europe & Central Asia 11 11 5 NA NA 0 5.0
31 Europe & Central Asia 14 14 4 NA 1 0 1.7
33 Europe & Central Asia 7 7 4 NA 1 0 3.0
MAX_LABEL NE_ID WIKIDATAID NAME_AR NAME_BN NAME_DE
5 10.0 1159321407 Q237 الفاتيكان ভ্যাটিকান সিটি Vatikanstadt
28 10.0 1159320725 Q785 جيرزي জার্সি Jersey
29 10.0 1159320715 Q25230 غيرنزي <NA> Guernsey
30 10.0 1159320721 Q9676 جزيرة مان আইল অব ম্যান Isle of Man
31 6.7 1159320713 Q145 المملكة المتحدة যুক্তরাজ্য Vereinigtes Königreich
33 7.0 1159321345 Q212 أوكرانيا ইউক্রেন Ukraine
NAME_EN NAME_ES NAME_FR NAME_EL NAME_HI
5 Vatican City Ciudad del Vaticano Vatican Βατικανό वैटिकन नगर
28 Jersey Jersey Jersey Τζέρσεϊ जर्सी
29 Guernsey Guernsey Guernesey Γκέρνσεϊ ग्वेर्नसे
30 Isle of Man Isla de Man île de Man Νήσος του Μαν मनुष्य का टापू
31 United Kingdom Reino Unido Royaume-Uni Ηνωμένο Βασίλειο यूनाइटेड किंगडम
33 Ukraine Ucrania Ukraine Ουκρανία युक्रेन
NAME_HU NAME_ID NAME_IT NAME_JA NAME_KO
5 Vatikán Vatikan Città del Vaticano バチカン 바티칸 시국
28 Jersey Jersey Baliato di Jersey ジャージー 저지 섬
29 Guernsey Bailiffség Guernsey Guernsey ガーンジー 건지 섬
30 Man Pulau Man Isola di Man マン島 맨 섬
31 Egyesült Királyság Britania Raya Regno Unito イギリス 영국
33 Ukrajna Ukraina Ucraina ウクライナ 우크라이나
NAME_NL NAME_PL NAME_PT NAME_RU NAME_SV
5 Vaticaanstad Watykan Vaticano Ватикан Vatikanstaten
28 Jersey Jersey Jersey Джерси Jersey
29 Guernsey Guernsey Guernsey Гернси Guernsey
30 Man Wyspa Man Ilha de Man остров Мэн Isle of Man
31 Verenigd Koninkrijk Wielka Brytania Reino Unido Великобритания Storbritannien
33 Oekraïne Ukraina Ucrânia Украина Ukraina
NAME_TR NAME_VI NAME_ZH
5 Vatikan Thành Vatican 梵蒂冈
28 Jersey Jersey 澤西島
29 Guernsey Guernsey 根西岛
30 Man Adası Đảo Man 马恩岛
31 Birleşik Krallık Vương quốc Liên hiệp Anh và Bắc Ireland 英国
33 Ukrayna Ukraina 乌克兰
geometry
5 POLYGON ((12.43916 41.89839...
28 POLYGON ((-2.018652 49.2312...
29 POLYGON ((-2.512305 49.4945...
30 POLYGON ((-4.412061 54.1853...
31 MULTIPOLYGON (((-2.667676 5...
33 MULTIPOLYGON (((38.21436 47...
I used the following code to join the datasets together:
europe.map1<-st_join(europe_places, europe_map, by="Country")
But when I did the entries for Venice, Lisbon and Copenhagen had NA values despite the entry for Country containing values that matched those in the europe_map dataset.
Picking up on the comments above, you have not specified the spatial join correctly. I think this is what you are looking for:
europe.map1<- st_join(europe_places, europe_map,
join=st_within, # always best to specify the method
left=TRUE)
This should work for you. That said, you may want to switch the order of europe_places and europe_map. I am not sure about your goal. You can find more information about the different types of spatial joins within the sf package here.

Grouping and/or Counting in R

I'm trying to 're-count' a column in R and having issues by cleaning up the data. I'm working on cleaning data by location and once I change CA to California.
all_location <- read.csv("all_location.csv", stringsAsFactors = FALSE)
all_location <- count(all_location, location)
all_location <- all_location[with(all_location, order(-n)), ]
all_location
A tibble: 100 x 2
location n
<chr> <int>
1 CA 3216
2 Alaska 2985
3 Nevada 949
4 Washington 253
5 Hawaii 239
6 Montana 218
7 Puerto Rico 149
8 California 126
9 Utah 83
10 NA 72
From the above, there's CA and California. Below I'm able to clean grep and replace CA with California. However, my issue is that it's grouping by California but shows two separate instances of California.
ca1 <- grep("CA",all_location$location)
all_location$location <- replace(all_location$location,ca1,"California")
all_location
A tibble: 100 x 2
location n
<chr> <int>
1 California 3216
2 Alaska 2985
3 Nevada 949
4 Washington 253
5 Hawaii 239
6 Montana 218
7 Puerto Rico 149
8 California 126
9 Utah 83
10 NA 72
My goal would be to combine both to a total under n.
all_location$location[substr(all_location$location, 1, 5) %in% "Calif" ] <- "California"
to make sure everything that starts with "Calif" gets made into "California"
I am assuming that maybe you have a space in the California (e.g. "California ") that is already present which is why this is happening..

removing duplicate/repeating values in the same data frame column in R

I have a weird data frame where the Player column has the names of the players. The problem is that the first name is shown twice. So Roy Sievers is RoyRoy Sievers, and I want the name to obviously be Roy Sievers.
Would anybody know how to do this?
Here is the full data frame, it's not very long:
Year Player Team Position
1 1949 RoyRoy Sievers St. Louis Browns OF
2 1950 WaltWalt Dropo Boston Red Sox 1B
3 1951 GilGil McDougald New York Yankees 3B
4 1952 HarryHarry Byrd Philadelphia Athletics P
5 1953 HarveyHarvey Kuenn Detroit Tigers SS
6 1954 BobBob Grim New York Yankees P
7 1955 HerbHerb Score Cleveland Indians P
8 1956 LuisLuis Aparicio Chicago White Sox SS
9 1957 TonyTony Kubek New York Yankees SS
10 1958 AlbieAlbie Pearson Washington Senators OF
11 1959 BobBob Allison Washington Senators OF
12 1960 RonRon Hansen Baltimore Orioles SS
13 1961 DonDon Schwall Boston Red Sox P
14 1962 TomTom Tresh New York Yankees SS
15 1963 GaryGary Peters Chicago White Sox P
16 1964 TonyTony Oliva Minnesota Twins OF
17 1965 CurtCurt Blefary Baltimore Orioles OF
18 1966 TommieTommie Agee Chicago White Sox OF
19 1967 RodRod Carew Minnesota Twins 2B
20 1968 StanStan Bahnsen New York Yankees P
21 1969 LouLou Piniella Kansas City Royals OF
22 1970 ThurmanThurman Munson New York Yankees C
23 1971 ChrisChris Chambliss Cleveland Indians 1B
24 1972 CarltonCarlton Fisk Boston Red Sox C
25 1973 AlAl Bumbry Baltimore Orioles OF
26 1974 MikeMike Hargrove Texas Rangers 1B
27 1975 FredFred Lynn Boston Red Sox OF
28 1976 MarkMark Fidrych Detroit Tigers P
29 1977 EddieEddie Murray Baltimore Orioles DH
30 1978 LouLou Whitaker Detroit Tigers 2B
31 1979* JohnJohn Castino Minnesota Twins 3B
32 1979* AlfredoAlfredo Griffin Toronto Blue Jays SS
33 1980 JoeJoe Charboneau Cleveland Indians OF
34 1981 DaveDave Righetti New York Yankees P
35 1982 CalCal Ripken Baltimore Orioles SS
36 1983 RonRon Kittle Chicago White Sox OF
37 1984 AlvinAlvin Davis Seattle Mariners 1B
38 1985 OzzieOzzie Guillén Chicago White Sox SS
39 1986 JoseJose Canseco Oakland Athletics OF
40 1987 MarkMark McGwire Oakland Athletics 1B
41 1988 WaltWalt Weiss Oakland Athletics SS
42 1989 GreggGregg Olson Baltimore Orioles P
43 1990 Sandy Alomar Jr Cleveland Indians C
44 1991 ChuckChuck Knoblauch Minnesota Twins 2B
45 1992 PatPat Listach Milwaukee Brewers SS
46 1993 TimTim Salmon California Angels OF
47 1994 BobBob Hamelin Kansas City Royals DH
48 1995 MartyMarty Cordova Minnesota Twins OF
49 1996 DerekDerek Jeter New York Yankees SS
50 1997 NomarNomar Garciaparra Boston Red Sox SS
51 1998 BenBen Grieve Oakland Athletics OF
52 1999 CarlosCarlos Beltrán Kansas City Royals OF
53 2000 KazuhiroKazuhiro Sasaki Seattle Mariners P
54 2001 IchiroIchiro Suzuki Seattle Mariners OF
55 2002 EricEric Hinske Toronto Blue Jays 3B
56 2003 ÁngelÁngel Berroa Kansas City Royals SS
57 2004 BobbyBobby Crosby Oakland Athletics SS
58 2005 HustonHuston Street Oakland Athletics P
59 2006 JustinJustin Verlander Detroit Tigers P
60 2007 DustinDustin Pedroia Boston Red Sox 2B
61 2008 EvanEvan Longoria Tampa Bay Rays 3B
62 2009 Andrew Bailey Oakland Athletics P
63 2010 NeftalíNeftalí Feliz Texas Rangers P
64 2011 JeremyJeremy Hellickson Tampa Bay Rays P
65 2012 MikeMike Trout Los Angeles Angels OF
66 2013 WilWil Myers Tampa Bay Rays OF
67 2014 JoséJosé Abreu Chicago White Sox 1B
68 2015 CarlosCarlos Correa Houston Astros SS
69 2016 MichaelMichael Fulmer Detroit Tigers P
You can fix this by finding a repeated pattern of at least three letters and replacing it with one copy like this:
gsub("(\\w{3,})\\1", "\\1", Players$Player)
If you want to overwrite the old version, just
Players$Player = gsub("(\\w{3,})\\1", "\\1", Players$Player)
G5W's answer gets you most of the way there, but would miss two-letter first names like "Al". This version relies on capitalization, and not character count:
myData$Player <- gsub('([A-Z][a-z]+)\\1', '\\1', myData$Player)
For the not so regex savvy---
library(stringr)
fun1<-function(string){
g<-str_split(g," ")
h<-str_length(m<-g[[1]][1])
l<-str_sub(m,start = 1,end = h/2)
return(paste(l,g[[1]][2]))
}
fun1(df$Player)

R - order function

Here is my data
x i
1 D W MCMILLAN MEMORIAL HOSPITAL AL
2 <NA> AK
3 JOHN C LINCOLN DEER VALLEY HOSPITAL AZ
4 ARKANSAS METHODIST MEDICAL CENTER AR
5 SHERMAN OAKS HOSPITAL CA
6 SKY RIDGE MEDICAL CENTER CO
7 MIDSTATE MEDICAL CENTER CT
8 <NA> DE
9 <NA> DC
10 SOUTH FLORIDA BAPTIST HOSPITAL FL
11 UPSON REGIONAL MEDICAL CENTER GA
12 <NA> HI
13 LOST RIVERS DISTRICT HOSPITAL ID
14 JESSE BROWN VA MEDICAL CENTER - VA CHICAGO HEALTHCARE SYSTEM IL
15 COMMUNITY HOSPITAL IN
16 COVENANT MEDICAL CENTER IA
17 COFFEYVILLE REGIONAL MEDICAL CENTER KS
18 KING'S DAUGHTERS' MEDICAL CENTER KY
19 NORTH OAKS MEDICAL CENTER, LLC LA
20 RUMFORD HOSPITAL ME
21 CIVISTA MEDICAL CENTER MD
22 HEYWOOD HOSPITAL MA
23 GENESYS REGIONAL MEDICAL CENTER - HEALTH PARK MI
24 HEALTHEAST WOODWINDS HOSPITAL MN
25 MARION GENERAL HOSPITAL MS
26 LIBERTY HOSPITAL MO
27 FRANCES MAHON DEACONESS HOSPITAL MT
28 ALEGENT HEALTH MEMORIAL HOSPITAL NE
29 BANNER CHURCHILL COMMUNITY HOSPITAL NV
30 FRANKLIN REGIONAL HOSPITAL NH
31 CAPITAL HEALTH MEDICAL CENTER - HOPEWELL NJ
32 ESPANOLA HOSPITAL NM
33 METROPOLITAN HOSPITAL CENTER NY
34 MEDWEST HAYWOOD NC
35 LISBON AREA HEALTH SERVICES ND
36 CINCINNATI VA MEDICAL CENTER OH
37 JACKSON COUNTY MEMORIAL HOSPITAL OK
38 ST ALPHONSUS MEDICAL CENTER - BAKER CITY, INC OR
39 UPMC PASSAVANT PA
40 HOSPITAL METROPOLITANO DR TITO MATTEI PR
41 <NA> RI
42 PALMETTO HEALTH BAPTIST SC
43 BLACK HILLS SURGICAL HOSPITAL LLP SD
44 INDIAN PATH MEDICAL CENTER TN
45 NIX HEALTH CARE SYSTEM TX
46 BEAR RIVER VALLEY HOSPITAL UT
47 <NA> VT
48 <NA> VI
49 CARILION GILES COMMUNITY HOSPITAL VA
50 SWEDISH MEDICAL CENTER WA
51 PLATEAU MEDICAL CENTER WV
52 ST CROIX REG MED CTR WI
53 POWELL VALLEY HOSPITAL WY
54 <NA> GU
I want to order this list by column i, but for some reason it throws GU at the bottom.
When I run
order(z$i)
(z is my table)
I get this as a result
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
> str(z)
'data.frame': 54 obs. of 2 variables:
$ x: Factor w/ 46 levels "D W MCMILLAN MEMORIAL HOSPITAL",..: 1 NA 2 3 4 5 6 NA NA 7 ...
$ i: Factor w/ 54 levels "AL","AK","AZ",..: 1 2 3 4 5 6 7 8 9 10 ...
Which to me means that it thinks that GU belongs at the bottom of the list. Also there is a problem at the top of the list, AL is before AK and AZ is before AR.
Any suggestion why it would do this?
Thanks
z[order(as.character(z$i)), ]
will do the trick.

Using rank() with subset [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I got this data here:
State Abb Region Change
3 Arizona AZ West 24.6
6 Colorado CO West 16.9
10 Florida FL South 17.6
11 Georgia GA South 18.3
13 Idaho ID West 21.1
29 Nevada NV West 35.1
34 North Carolina NC South 18.5
41 South Carolina SC South 15.3
44 Texas TX South 20.6
45 Utah UT West 23.8
I'm trying to extract a subset where Change > 40.
When I use
subset(uspopchange, rank(Change)>40)
it works
but when I use
subset(uspopchange, Change > 40)
it comes up with nothing.
Furthermore, if I use
subset(uspopchange, Change > 16.9)
it works also.
Why does it do that? Why do I need to user rank() to get my subset?
BTW: the data is from
install.packages("gcookbook")
> library(gcookbook)
> data(uspopchange)
> head(uspopchange[order(uspopchange$Change,decreasing=TRUE),])
State Abb Region Change
29 Nevada NV West 35.1
3 Arizona AZ West 24.6
45 Utah UT West 23.8
13 Idaho ID West 21.1
44 Texas TX South 20.6
34 North Carolina NC South 18.5
There are no rows with Change greater than 40. When you are using rank(Change) > 40 in your subset(), it is giving you the rows that, based on the value of Change, have a rank higher than 40. Since there are 50 rows in your data (Change has a length of 50), you are getting the rows that rank 41, 42, 43, ... , 50.
> Top10 <- subset(uspopchange, rank(Change)>40)
> Top10[order(Top10$Change,decreasing=TRUE),]
State Abb Region Change
29 Nevada NV West 35.1
3 Arizona AZ West 24.6
45 Utah UT West 23.8
13 Idaho ID West 21.1
44 Texas TX South 20.6
34 North Carolina NC South 18.5
11 Georgia GA South 18.3
10 Florida FL South 17.6
6 Colorado CO West 16.9
41 South Carolina SC South 15.3
##
> uspopchange[order(uspopchange$Change,decreasing=TRUE),][1:10,]
State Abb Region Change
29 Nevada NV West 35.1
3 Arizona AZ West 24.6
45 Utah UT West 23.8
13 Idaho ID West 21.1
44 Texas TX South 20.6
34 North Carolina NC South 18.5
11 Georgia GA South 18.3
10 Florida FL South 17.6
6 Colorado CO West 16.9
41 South Carolina SC South 15.3
Those are equivalent.

Resources