Related
I have thousands of rows of data which look like this
df <- data.frame(
thing_code = c("X123", "X123", "Y123", "Y123", "Y123", "Y123", "Z123", "Z123", "Z123", "Z123", "A456", "A456", "A456", "A456", "A456"),
year = c("2001", "2001", "2004", "2004", "2004", "2004", "2004", "2004", "2004", "2004", "2007", "2007", "2007", "2007", "2007"),
country = c("Vietnam", "Vietnam", "US", "US", "Singapore", "Vietnam", "Japan", "Vietnam", "Vietnam", "Cambodia", "Vietnam", "Vietnam", "Iran", "China", "Germany"))
wherein I want to count the chunk of contribution of the countries for each thing (represented by thing_code) per year. The categories I want for counting are:
Vietnam (local country in this example)
SEAsian (all other southeast asian countries except Vietnam)
Non-local (other countries except Vietnam and SEAsian)
I want to be able to come up with something like this:
# thing_codeyear location freq percentage
# X123 2001 Vietnam 2 1
# Y123 2004 Vietnam 1 0.25
# Y123 2004 Non-local 2 0.5
# Y123 2004 SEAsian 1 0.25
# Z123 2004 Non-local 1 0.25
# Z123 2004 Vietnam 2 0.5
# Z123 2004 SEAsian 1 0.25
# A456 2007 Vietnam 2 0.4
# A456 2007 Non-local 3 0.6
freq will be like a counter for abovementioned categories and percentage will just be the percent of each category's contribution.
So far, my code looks like
Vietnam <- df %>% filter(str_detect(country, "Vietnam"))
thing_code_year <- subset(Vietnam, select=c(thing_code, year))
freq <- table(thing_code_year)
frequency <- as.data.frame(freq)
frequency <- frequency %>% filter(Freq!=0)
but this only gives me the number for Vietnam and will probably take me a long time to obtain those for other categories.
This should give your desired output. You can use case_when to create a new variable that specifies the location using the logic you described above. Next you group_by the code, year, and newly created location to calculate the frequency of each category in location (Vietnam, SEAsian, Non-local). Then you can group_by by code and year to calculate the percentage/proportion of the categories in location.
library(dplyr)
df <- data.frame(
thing_code = c("X123", "X123", "Y123", "Y123", "Y123", "Y123", "Z123", "Z123", "Z123", "Z123", "A456", "A456", "A456", "A456", "A456"),
year = c("2001", "2001", "2004", "2004", "2004", "2004", "2004", "2004", "2004", "2004", "2007", "2007", "2007", "2007", "2007"),
country = c("Vietnam", "Vietnam", "US", "US", "Singapore", "Vietnam", "Japan", "Vietnam", "Vietnam", "Cambodia", "Vietnam", "Vietnam", "Iran", "China", "Germany"))
SEAsian <- c("Vietnam", "Singapore", "Cambodia")
df %>%
mutate(location = case_when(
country == "Vietnam" ~ "Vietnam",
country %in% SEAsian[SEAsian != "Vietnam"] ~ "SEAsian",
!country %in% SEAsian ~ "Non-local"
)) %>%
group_by(thing_code, year, location) %>%
summarise(freq = n()) %>%
group_by(thing_code, year) %>%
mutate(percentage = freq/sum(freq))
Output:
thing_code year location freq percentage
<fct> <fct> <chr> <int> <dbl>
1 A456 2007 Non-local 3 0.6
2 A456 2007 Vietnam 2 0.4
3 X123 2001 Vietnam 2 1
4 Y123 2004 Non-local 2 0.5
5 Y123 2004 SEAsian 1 0.25
6 Y123 2004 Vietnam 1 0.25
7 Z123 2004 Non-local 1 0.25
8 Z123 2004 SEAsian 1 0.25
9 Z123 2004 Vietnam 2 0.5
I have a list of data which look likes this in R:
enter image description here
In the entire list, the "Tenor" column is repeating the same 11 elements and the "last update" column changes every 11 element once.
I want to make a matrix with the Tenors as the column names and last update dates as the row names. The matrix should be field with the corresponding bid yields.
I do not know how to create such a matrix which would put the corresponding bid yields in the matrix.
The dput(my data) is like this
"2Y", "3Y", "4Y", "5Y", "7Y", "10Y", "15Y", "20Y", "25Y", "30Y",
"1Y", "2Y", "3Y", "4Y", "5Y", "7Y", "10Y", "15Y", "20Y", "25Y",
"30Y", "1Y", "2Y", "3Y", "4Y", "5Y", "7Y", "10Y", "15Y", "20Y",
"25Y", "30Y", "1Y", "2Y", "3Y", "4Y", "5Y", "7Y", "10Y", "15Y",
"20Y", "25Y", "30Y", "1Y", "2Y", "3Y", "4Y", "5Y", "7Y", "10Y",
"15Y", "20Y", "25Y", "30Y", "1Y", "2Y", "3Y", "4Y", "5Y", "7Y",
"10Y", "15Y", "20Y", "25Y", "30Y")), .Names = c("Bid Yield",
"Last Update", "Tenor"), row.names = c(NA, -25256L), class = "data.frame")
We can use xtabs
xtabs(BidYield ~ LastUpdate + Tenor, df1)
# Tenor
#LastUpdate 10Y 15Y 1Y 20Y 25Y 2Y 30Y 3Y 4Y 5Y 7Y
# 2011-04-15 4.807 5.233 0.666 5.411 5.504 1.315 5.504 2.105 2.780 3.355 4.180
# 2011-04-18 4.785 5.206 0.653 5.395 5.491 1.280 5.486 2.053 2.727 3.311 4.142
If the column names have spaces, etc, use backquotes around the column name
names(df1)[1:2] <- c("Bid Yield", "Last Update")
xtabs(`Bid Yield` ~ `Last Update` + Tenor, df1)
# Tenor
#Last Update 10Y 15Y 1Y 20Y 25Y 2Y 30Y 3Y 4Y 5Y 7Y
# 2011-04-15 4.807 5.233 0.666 5.411 5.504 1.315 5.504 2.105 2.780 3.355 4.180
# 2011-04-18 4.785 5.206 0.653 5.395 5.491 1.280 5.486 2.053 2.727 3.311 4.142
If we need the 'Tenor' columns to be ordered, an option is to convert it to factor with levels specified based on the value
library(gtools)
df1$Tenor <- factor(df1$Tenor, levels = mixedsort(unique(df1$Tenor)))
xtabs(`Bid Yield` ~ `Last Update` + Tenor, df1)
# Tenor
#Last Update 1Y 2Y 3Y 4Y 5Y 7Y 10Y 15Y 20Y 25Y 30Y
# 2011-04-15 0.666 1.315 2.105 2.780 3.355 4.180 4.807 5.233 5.411 5.504 5.504
# 2011-04-18 0.653 1.280 2.053 2.727 3.311 4.142 4.785 5.206 5.395 5.491 5.486
data
df1 <- structure(list(BidYield = c(0.666, 1.315, 2.105, 2.78, 3.355,
4.18, 4.807, 5.233, 5.411, 5.504, 5.504, 0.653, 1.28, 2.053,
2.727, 3.311, 4.142, 4.785, 5.206, 5.395, 5.491, 5.486),
LastUpdate = c("2011-04-15",
"2011-04-15", "2011-04-15", "2011-04-15", "2011-04-15", "2011-04-15",
"2011-04-15", "2011-04-15", "2011-04-15", "2011-04-15", "2011-04-15",
"2011-04-18", "2011-04-18", "2011-04-18", "2011-04-18", "2011-04-18",
"2011-04-18", "2011-04-18", "2011-04-18", "2011-04-18", "2011-04-18",
"2011-04-18"), Tenor = c("1Y", "2Y", "3Y", "4Y", "5Y", "7Y",
"10Y", "15Y", "20Y", "25Y", "30Y", "1Y", "2Y", "3Y", "4Y", "5Y",
"7Y", "10Y", "15Y", "20Y", "25Y", "30Y")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22"))
I'm running a function in the 'rnoaa' package that finds the 5 nearest weather stations to a data frame that contains nest box locations. This produces a nested tbl_df, a tibble for each nest box ID. I'd like to convert the tbl_df into a tibble or data frame that retains the corresponding nest box ID, but I'm not sure how to do it. Here's my code and an example of the data.
Import the data:
nests<-structure(list(id = structure(1:5, .Label = c("29", "36", "39",
"41", "42", "43", "45", "47", "48", "50", "51", "52", "53", "54",
"55", "57", "58", "59", "60", "61", "62", "64", "65", "67", "69",
"70", "71", "72", "73", "75", "77", "78", "79", "80", "81", "82",
"84", "87", "88", "89", "90", "91", "92", "93", "95", "97", "99",
"100", "102", "106", "108", "109", "110", "118", "123", "124",
"125", "126", "127", "129", "130", "131", "133", "134", "136",
"138", "140", "141", "144", "147", "149", "151", "155", "157",
"158", "160", "161", "162", "163", "165", "167", "168", "169",
"172", "174", "175", "177", "178", "179", "180", "181", "182",
"186", "189", "190", "193", "195", "202", "205", "207", "208",
"215", "217", "218", "225", "229", "230", "236", "240", "241",
"243", "244", "246", "247", "248", "249", "251", "253", "254",
"255", "257", "258", "259", "260", "261", "262", "263", "269",
"270", "276", "292", "294", "295", "296", "297", "298", "300",
"301", "302", "303", "305", "306", "307", "308", "309", "311",
"316", "317", "318", "322", "323", "324", "326", "329", "330",
"331", "332", "333", "334", "335", "336", "337", "338", "339",
"342", "345", "346", "350", "351", "353", "358", "362", "363",
"365", "366", "368", "369", "372", "379", "380", "381", "382",
"384", "386", "387", "388", "390", "391", "392", "393", "394",
"395", "396", "397", "398", "400", "401", "403", "404", "406",
"410", "411", "414", "415", "416", "418", "420", "424", "425",
"426", "428", "429", "430", "432", "433", "435", "436", "440",
"441", "442", "445", "446", "447", "448", "449", "450", "451",
"453", "458", "459", "461", "462", "463", "464", "465", "466",
"469", "470", "471", "478", "479", "488", "490", "497", "503",
"504", "506", "507", "508", "509", "512", "513", "514", "515",
"516", "517", "518", "519", "520", "521", "527", "528", "529",
"530", "531", "534", "540", "542", "545", "552", "553", "554",
"556", "558", "561", "562", "563", "565", "566", "568", "569",
"570", "571", "572", "573", "574", "575", "576", "577", "578",
"580", "583", "584", "585", "591", "592", "595", "606", "608",
"610", "612", "614", "615", "616", "617", "620", "621", "627",
"628", "634", "635", "636", "637", "638", "639", "643", "647",
"648", "651", "652", "653", "654", "656", "661", "662", "663",
"664", "665", "667", "669", "670", "673", "674", "676", "677",
"679", "680", "681", "684", "685", "690", "693", "694", "695",
"706", "708", "716", "717", "719", "720", "728", "757", "759",
"761", "777", "798", "801", "803", "818", "838", "839", "855",
"856", "864", "865", "867", "868", "880", "890", "899", "901",
"914", "915", "924", "985", "998", "999", "1002", "1003", "1004",
"1019", "1020", "1021", "1022", "1058", "1059", "1116", "1139",
"1146", "1164", "1169", "1170", "1178", "1183", "1186", "1188",
"1193", "1211", "1233", "1235", "1236", "1237", "1251", "1263",
"1285", "1288", "1289", "1294", "1296", "1298", "1299", "1300",
"1302", "1303", "1305", "1307", "1310", "1311", "1328", "1331",
"1332", "1333", "1334", "1335", "1455", "1456", "1459", "1461",
"1462", "1463", "1466", "1467", "1469", "1473", "1474", "1475",
"1476", "1478", "1479", "1480", "1482", "1485", "1487", "1503",
"1506", "1520", "1534", "1564", "1572", "1575", "1582", "1587",
"1588", "1592", "1593", "1594", "1597", "1602", "1607", "1611",
"1612", "1613", "1615", "1616", "1617", "1619", "1633", "1656",
"1657", "1658", "1660", "1663", "1664", "1667", "1668", "1669",
"1676", "1677", "1679", "1691", "1704", "1716", "1734", "1735",
"1736", "1766", "1771", "1772", "1773", "1775", "1777", "1783",
"1801", "1814", "1818", "1834", "1835", "1836", "1837", "1838",
"1840", "1843", "1845", "1846", "1847", "1850", "1852", "1856",
"1857", "1858", "1859", "1860", "1882", "1883", "1890", "1891",
"1897", "1899", "1901", "1902", "1909", "1910", "1912", "1914",
"1923", "1926", "1928", "1929", "1935", "1941", "1956", "1958",
"1960", "1968", "1991", "1994", "1998", "2002", "2010", "2012",
"2016", "2019", "2024", "2026", "2029", "2030", "2032", "2033",
"2034", "2035", "2036", "2039", "2042", "2046", "2049", "2053",
"2055", "2056", "2057", "2059", "2093", "2101", "2103", "2121",
"2134", "2146", "2147", "2152", "2184", "2185", "2186", "2187",
"2188", "2190", "2197", "2201", "2239", "2240", "2249", "2250",
"2291", "2313", "2322", "2347", "2351", "2353", "2354", "2355",
"2360", "2361", "2369", "2370", "2372", "2373", "2374", "2375",
"2376", "2402", "2426", "2427", "2445", "2447", "2449", "2459",
"2460", "2462", "2467", "2468", "2469", "2471", "2484", "2485",
"2486", "2488", "2490", "2494", "2496", "2517", "2613", "2623",
"2624", "2625", "2641", "2696", "2697", "2709", "2711", "2712",
"2713", "2714", "2997", "3000", "3004"), class = "factor"), latitude = c(43.29515222,
44.02074565, 44.44193, 44.146666, 43.98897), longitude = c(-89.29077182,
-92.04753707, -121.40635, -121.347223, -121.18639)), .Names = c("id",
"latitude", "longitude"), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
This grabs the 5 nearest weather stations and produces a tbl_df:
nearest_station<-meteo_nearby_stations(lat_lon_df = nests, station_data = station_data,
limit = 5, var = c("TAVG"),
year_min = 2011, year_max = 2016)
nearest_station
Finally, I used do.call to produce a single data frame:
ns <- do.call(rbind, lapply(nearest_station, data.frame, stringsAsFactors=FALSE))
head(ns)
While the resulting data table shows the nest box ID next to the weather station ID (under id), the first column really only contains the weather station ID:
id name latitude longitude distance
29.1 USW00014837 MADISON DANE RGNL AP 43.1406 -89.3453 17.74438
29.2 USR0000WDDG DODGEVILLE WISCONSIN 43.1000 -90.0000 61.44939
29.3 USW00014839 MILWAUKEE MITCHELL AP 42.9550 -87.9044 118.69939
29.4 USW00094822 ROCKFORD GTR ROCKFORD AP 42.1928 -89.0931 123.63416
29.5 USW00094908 DUBUQUE RGNL AP 42.3978 -90.7036 152.38709
36.1 USW00014925 ROCHESTER INTL AP 43.9042 -92.4917 37.83807
ns[,1]
USW00014837
Is there a way to keep the nest box information in the weather station data frame?
(sorry for late reply)
If you use something like dplyr::bind_rows, you can do:
dplyr::bind_rows(nearest_station, .id = "nest_box_id")
to get
#> # A tibble: 25 x 6
#> nest_box_id id name latitude longitude distance
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 29 USW00014837 MADISON DANE RGNL AP 43.1 -89.3 17.7
#> 29 USR0000WDDG DODGEVILLE WISCONSIN 43.1 -90.0 61.4
#> 29 USW00014839 MILWAUKEE MITCHELL AP 43.0 -87.9 119.
#> 29 USW00094822 ROCKFORD GTR ROCKFORD AP 42.2 -89.1 124.
#> 29 USW00094908 DUBUQUE RGNL AP 42.4 -90.7 152.
#> 36 USW00014925 ROCHESTER INTL AP 43.9 -92.5 37.8
#> 36 USW00014920 LA CROSSE MUNI AP 43.9 -91.3 65.5
#> 36 USR0000WBRF BLACK RIVER FALLS WISCONSIN 44.3 -90.8 102.
#> 36 USR0000WAUG AUGUSTA WISCONSIN 44.7 -91.1 105.
#> 36 USW00014922 MINNEAPOLIS/ST PAUL AP 44.9 -93.2 134.
#> # ... with 15 more rows
I think that I am facing a (hopefully) small problem but the search function is not providing any help for me. I am having trouble while extracting data via the OECD package. The Thing is, that I am getting a dataset in which all the variables are stored in one column. The dataset is in the long format, which is nice, but I want the variables to become single columns. At the moment the dataset looks like this:
As you can see the column "VAR" contains several variables: "B11","B12"...all in all 11 variables. All variables are measured for many countries (Col "COU"). What I would like to do is, do add new columns to the dataset that represent the single variables that are stored in "VAR" right now and contain the corresponding values of the "obsValue" column?
So that I can see the value for B11 e.g. for Afghanistan 1999 in one row, for 2000 in another but also the value for B12 in 1999 in the same row as the one for B11 and so on. I hope the my aim is getting clear, if not, do not hesitate to ask.
Here is Code to reproduce the head of the dataset:
dput(head(MIG,20))
structure(list(CO2 = c("AFG", "AFG", "AFG", "AFG", "AFG", "AFG",
"AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG",
"AFG", "AFG", "AFG", "AFG", "AFG"), VAR = c("B11", "B11", "B11",
"B11", "B11", "B11", "B11", "B11", "B11", "B11", "B11", "B11",
"B11", "B11", "B11", "B11", "B12", "B12", "B12", "B12"), GEN = c("WMN",
"WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN",
"WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN",
"WMN"), COU = c("AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS",
"AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS",
"AUS", "AUS", "AUS", "AUS"), TIME_FORMAT = c("P1Y", "P1Y", "P1Y",
"P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y",
"P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y"), obsTime = c("1999",
"2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007",
"2008", "2009", "2010", "2011", "2012", "2013", "2014", "1999",
"2000", "2001", "2004"), obsValue = c(434, 398, 225, 345, 544,
726, 1099, 1607, 1377, 1018, 946, 873, 1131, 903, 1230, 2939,
0, 0, 2, 24), OBS_STATUS = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), migrants = c(434, 398, 225, 345,
544, 726, 1099, 1607, 1377, 1018, 946, 873, 1131, 903, 1230,
2939, 0, 0, 2, 24)), .Names = c("CO2", "VAR", "GEN", "COU", "TIME_FORMAT",
"obsTime", "obsValue", "OBS_STATUS", "migrants"), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
And here is my whole code including two attempts to solve the problem on my own, which do not work, because they just copy the "obsValue" column or give me a column which says TRUE or FALSE. Notice that R will need pretty much time to load the dataset.
library(OECD)
library(plyr)
library(dplyr)
search_dataset("migration")
MIG<- get_dataset("MIG")
get_data_structure("MIG")
MIG$migrants <- if(MIG$VAR == "B11")MIG$migrants<-MIG$obsValue else MIG$migrants<-NA
MIG_long <- mutate(MIG,migrants=VAR=="B11")
if(MIG_long$migrants==T)MIG_long$migrants<-MIG_long$obsValue else MIG_long$migrants<-NA
I am hoping that this question is not to low for you and that you can "work" with my explanation. Nevertheless, if you have any questions please ask me.
Best wishes,
Marcel
You can use tidyr to spread the VAR and obsValue into columns. If you do want one year per row, as #atiretoo highighted, you can simply remove the migrants column to get unique values per year.
library(tidyr)
library(dplyr)
MIG %>%
select(-migrants) %>%
spread(VAR, obsValue)
CO2 obsTime B11 B12
(chr) (chr) (dbl) (dbl)
1 AFG 1999 434 0
2 AFG 2000 398 0
3 AFG 2001 225 2
4 AFG 2002 345 NA
5 AFG 2003 544 NA
6 AFG 2004 726 24
7 AFG 2005 1099 NA
8 AFG 2006 1607 NA
9 AFG 2007 1377 NA
10 AFG 2008 1018 NA
11 AFG 2009 946 NA
12 AFG 2010 873 NA
13 AFG 2011 1131 NA
14 AFG 2012 903 NA
15 AFG 2013 1230 NA
16 AFG 2014 2939 NA
I have the following data:
new_pairs
x y Freq start.latittude start.longitude start.station end.latitude
1 359 519 929 40.75188 -73.97770 Pershing\nSquare N 40.75510
2 477 465 5032 40.75514 -73.98658 Broadway &\nW 41 St 40.75641
3 484 519 1246 40.75188 -73.97770 Pershing\nSquare N 40.75500
4 484 318 2654 40.75320 -73.97799 E 43 St &\nVanderbilt\nAve 40.75500
5 492 267 1828 40.75098 -73.98765 Broadway &\nW 36 St 40.75020
6 492 498 957 40.74855 -73.98808 Broadway &\nW 32 St 40.75020
7 492 362 1405 40.75173 -73.98754 Broadway &\nW 37 St 40.75020
8 493 477 1582 40.75641 -73.99003 W 41 St &\n8 Ave 40.75680
9 493 529 728 40.75757 -73.99099 W 42 St &\n8 Ave 40.75680
10 529 2021 1748 40.75929 -73.98860 W 45 St &\n8 Ave 40.75757
end.longitude end.station interaction
1 -73.97499 E 47 St &\nPark Av E 47 St &Park Av > PershingSquare N
2 -73.99003 W 41 St &\n8 Ave W 41 St &8 Ave > Broadway &W 41 St
3 -73.98014 W 44 St &\n5 Ave W 44 St &5 Ave > PershingSquare N
4 -73.98014 W 44 St &\n5 Ave W 44 St &5 Ave > E 43 St &VanderbiltAve
5 -73.99093 W 33 St &\n7 Ave W 33 St &7 Ave > Broadway &W 36 St
6 -73.99093 W 33 St &\n7 Ave W 33 St &7 Ave > Broadway &W 32 St
7 -73.99093 W 33 St &\n7 Ave W 33 St &7 Ave > Broadway &W 37 St
8 -73.98291 W 45 St &\n6 Ave W 45 St &6 Ave > W 41 St &8 Ave
9 -73.98291 W 45 St &\n6 Ave W 45 St &6 Ave > W 42 St &8 Ave
10 -73.99099 W 42 St &\n8 Ave W 42 St &8 Ave > W 45 St &8 Ave
I would like to change the plot so that the labels are all justified to the center and change transparency based on their Freq so that the lower Freq are more transparent and the higher Freq are less transparent
ggplot(data= new_pairs, aes(x= reorder(interaction, -Freq), y=Freq))+ geom_bar(stat="identity", aes(fill = Freq, alpha = .7)) + ylab("Bikes received")+ xlab("Station")+ geom_text(aes(x = interaction, label = interaction), vjust="inward",hjust = "inward", size = 4, nudge_y = 1, fontface ="bold")+theme(axis.text.y=element_blank())+ggtitle("Bikes received viarebalancing")+coord_flip()+theme(legend.position = "none")
dput(new_pairs)
structure(list(x = structure(c(146L, 253L, 260L, 260L, 268L,
268L, 268L, 269L, 269L, 304L), .Label = c("72", "79", "82", "83",
"116", "119", "120", "127", "128", "137", "143", "144", "146",
"147", "150", "151", "152", "153", "157", "160", "161", "164",
"167", "168", "173", "174", "195", "212", "216", "217", "218",
"223", "224", "225", "228", "229", "232", "233", "236", "237",
"238", "239", "241", "242", "243", "244", "245", "247", "248",
"249", "250", "251", "252", "253", "254", "257", "258", "259",
"260", "261", "262", "263", "264", "265", "266", "267", "268",
"270", "271", "274", "275", "276", "278", "279", "280", "281",
"282", "284", "285", "289", "290", "291", "293", "294", "295",
"296", "297", "298", "300", "301", "302", "303", "304", "305",
"306", "307", "308", "309", "310", "311", "312", "313", "314",
"315", "316", "317", "318", "319", "320", "321", "322", "323",
"324", "325", "326", "327", "328", "329", "330", "331", "332",
"334", "335", "336", "337", "339", "340", "341", "342", "343",
"344", "345", "346", "347", "348", "349", "350", "351", "352",
"353", "354", "355", "356", "357", "358", "359", "360", "361",
"362", "363", "364", "365", "366", "367", "368", "369", "372",
"373", "375", "376", "377", "379", "380", "382", "383", "384",
"385", "386", "387", "388", "389", "390", "391", "392", "393",
"394", "395", "396", "397", "398", "399", "400", "401", "402",
"403", "404", "405", "406", "407", "408", "409", "410", "411",
"412", "414", "415", "416", "417", "418", "419", "420", "421",
"422", "423", "426", "427", "428", "430", "431", "432", "433",
"434", "435", "436", "437", "438", "439", "440", "441", "442",
"443", "444", "445", "446", "447", "448", "449", "450", "453",
"454", "455", "456", "457", "458", "459", "460", "461", "462",
"463", "464", "465", "466", "467", "468", "469", "470", "471",
"472", "473", "474", "475", "476", "477", "478", "479", "480",
"481", "482", "483", "484", "485", "486", "487", "488", "489",
"490", "491", "492", "493", "494", "495", "496", "497", "498",
"499", "500", "501", "502", "503", "504", "505", "507", "508",
"509", "510", "511", "512", "513", "514", "515", "516", "517",
"518", "519", "520", "521", "522", "523", "524", "525", "526",
"527", "528", "529", "530", "531", "532", "533", "534", "536",
"537", "538", "539", "540", "545", "546", "2000", "2002", "2003",
"2004", "2005", "2006", "2008", "2009", "2010", "2012", "2017",
"2021", "2022", "2023", "3002"), class = "factor"), y = structure(c(294L,
241L, 294L, 107L, 66L, 274L, 149L, 253L, 304L, 327L), .Label = c("72",
"79", "82", "83", "116", "119", "120", "127", "128", "137", "143",
"144", "146", "147", "150", "151", "152", "153", "157", "160",
"161", "164", "167", "168", "173", "174", "195", "212", "216",
"217", "218", "223", "224", "225", "228", "229", "232", "233",
"236", "237", "238", "239", "241", "242", "243", "244", "245",
"247", "248", "249", "250", "251", "252", "253", "254", "257",
"258", "259", "260", "261", "262", "263", "264", "265", "266",
"267", "268", "270", "271", "274", "275", "276", "278", "279",
"280", "281", "282", "284", "285", "289", "290", "291", "293",
"294", "295", "296", "297", "298", "300", "301", "302", "303",
"304", "305", "306", "307", "308", "309", "310", "311", "312",
"313", "314", "315", "316", "317", "318", "319", "320", "321",
"322", "323", "324", "325", "326", "327", "328", "329", "330",
"331", "332", "334", "335", "336", "337", "339", "340", "341",
"342", "343", "344", "345", "346", "347", "348", "349", "350",
"351", "352", "353", "354", "355", "356", "357", "358", "359",
"360", "361", "362", "363", "364", "365", "366", "367", "368",
"369", "372", "373", "375", "376", "377", "379", "380", "382",
"383", "384", "385", "386", "387", "388", "389", "390", "391",
"392", "393", "394", "395", "396", "397", "398", "399", "400",
"401", "402", "403", "404", "405", "406", "407", "408", "409",
"410", "411", "412", "414", "415", "416", "417", "418", "419",
"420", "421", "422", "423", "426", "427", "428", "430", "431",
"432", "433", "434", "435", "436", "437", "438", "439", "440",
"441", "442", "443", "444", "445", "446", "447", "448", "449",
"450", "453", "454", "455", "456", "457", "458", "459", "460",
"461", "462", "463", "464", "465", "466", "467", "468", "469",
"470", "471", "472", "473", "474", "475", "476", "477", "478",
"479", "480", "481", "482", "483", "484", "485", "486", "487",
"488", "489", "490", "491", "492", "493", "494", "495", "496",
"497", "498", "499", "500", "501", "502", "503", "504", "505",
"507", "508", "509", "510", "511", "512", "513", "514", "515",
"516", "517", "518", "519", "520", "521", "522", "523", "524",
"525", "526", "527", "528", "529", "530", "531", "532", "533",
"534", "536", "537", "538", "539", "540", "545", "546", "2000",
"2002", "2003", "2004", "2006", "2008", "2009", "2010", "2012",
"2017", "2021", "2022", "2023", "3002"), class = "factor"), Freq = c(929L,
5032L, 1246L, 2654L, 1828L, 957L, 1405L, 1582L, 728L, 1748L),
start.latittude = c(40.75188406, 40.75513557, 40.75188406,
40.75320159, 40.75097711, 40.74854862, 40.75172632, 40.75640548,
40.7575699, 40.75929124), start.longitude = c(-73.97770164,
-73.98658032, -73.97770164, -73.9779874, -73.98765428, -73.98808416,
-73.98753523, -73.9900262, -73.99098507, -73.98859651), start.station = c("Pershing\nSquare N",
"Broadway &\nW 41 St", "Pershing\nSquare N", "E 43 St &\nVanderbilt\nAve",
"Broadway &\nW 36 St", "Broadway &\nW 32 St", "Broadway &\nW 37 St",
"W 41 St &\n8 Ave", "W 42 St &\n8 Ave", "W 45 St &\n8 Ave"
), end.latitude = c(40.75510267, 40.75640548, 40.75500254,
40.75500254, 40.75019995, 40.75019995, 40.75019995, 40.7568001,
40.7568001, 40.7575699), end.longitude = c(-73.97498696,
-73.9900262, -73.98014437, -73.98014437, -73.99093085, -73.99093085,
-73.99093085, -73.98291153, -73.98291153, -73.99098507),
end.station = c("E 47 St &\nPark Av", "W 41 St &\n8 Ave",
"W 44 St &\n5 Ave", "W 44 St &\n5 Ave", "W 33 St &\n7 Ave",
"W 33 St &\n7 Ave", "W 33 St &\n7 Ave", "W 45 St &\n6 Ave",
"W 45 St &\n6 Ave", "W 42 St &\n8 Ave"), interaction = c("E 47 St &Park Av > PershingSquare N",
"W 41 St &8 Ave > Broadway &W 41 St", "W 44 St &5 Ave > PershingSquare N",
"W 44 St &5 Ave > E 43 St &VanderbiltAve", "W 33 St &7 Ave > Broadway &W 36 St",
"W 33 St &7 Ave > Broadway &W 32 St", "W 33 St &7 Ave > Broadway &W 37 St",
"W 45 St &6 Ave > W 41 St &8 Ave", "W 45 St &6 Ave > W 42 St &8 Ave",
"W 42 St &8 Ave > W 45 St &8 Ave")), .Names = c("x", "y",
"Freq", "start.latittude", "start.longitude", "start.station",
"end.latitude", "end.longitude", "end.station", "interaction"
), row.names = c(NA, -10L), class = "data.frame")`
Here's an option:
ggplot(data= new_pairs, aes(x= reorder(interaction, -Freq), y=Freq))+
geom_bar(stat="identity", aes(fill = Freq, alpha = Freq)) +
ylab("Bikes received")+ xlab("Station")+
ylim(0, max(new_pairs$Freq)+50) +
geom_text(aes(label = interaction,y=(max(new_pairs$Freq)+50)/2,alpha = Freq), vjust="center",hjust = "center", size = 4, nudge_y = 1, fontface ="bold")+
theme(axis.text.y=element_blank())+ggtitle("Bikes received viarebalancing")+
coord_flip()+theme(legend.position = "none")
You can set a y value in your geom_text aes to put the labels where you want them to be (you use coord_flip so changing y controls the horizontal placement of the text).
I set ylim manually to be able to max(new_pairs$Freq)+50 to center the text labels.
If you want to center the text to each bar, here's a solution (based on the thread I linked above):
library(plyr)
new_pairs <- ddply(new_pairs, .(interaction), transform, pos = cumsum(Freq) - (0.5 * Freq))
ggplot(data= new_pairs, aes(x= reorder(interaction, -Freq), y=Freq))+
geom_bar(stat="identity", aes(fill = Freq, alpha = Freq)) +
ylab("Bikes received")+ xlab("Station")+
geom_text(aes(label = interaction, y = pos, alpha = Freq), vjust="center",hjust = "center", size = 4, nudge_y = 1, fontface ="bold")+
theme(axis.text.y=element_blank())+ggtitle("Bikes received via rebalancing")+
coord_flip()+theme(legend.position = "none")