Retrieve Census tract from Coordinates [closed] - r
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I have a dataset with longitude and latitude coordinates. I want to retrieve the corresponding census tract. Is there a dataset or api that would allow me to do this?
My dataset looks like this:
lat lon
1 40.61847 -74.02123
2 40.71348 -73.96551
3 40.69948 -73.96104
4 40.70377 -73.93116
5 40.67859 -73.99049
6 40.71234 -73.92416
I want to add a column with the corresponding census tract.
Final output should look something like this (these are not the right numbers, just an example).
lat lon Census_Tract_Label
1 40.61847 -74.02123 5.01
2 40.71348 -73.96551 20
3 40.69948 -73.96104 41
4 40.70377 -73.93116 52.02
5 40.67859 -73.99049 58
6 40.71234 -73.92416 60
The tigris package includes a function called call_geolocator_latlon that should do what you're looking for. Here is some code using
> coord <- data.frame(lat = c(40.61847, 40.71348, 40.69948, 40.70377, 40.67859, 40.71234),
+ long = c(-74.02123, -73.96551, -73.96104, -73.93116, -73.99049, -73.92416))
>
> coord$census_code <- apply(coord, 1, function(row) call_geolocator_latlon(row['lat'], row['long']))
> coord
lat long census_code
1 40.61847 -74.02123 360470152003001
2 40.71348 -73.96551 360470551001009
3 40.69948 -73.96104 360470537002011
4 40.70377 -73.93116 360470425003000
5 40.67859 -73.99049 360470077001000
6 40.71234 -73.92416 360470449004075
As I understand it, the 15 digit code is several codes put together (the first two being the state, next three the county, and the following six the tract). To get just the census tract code I'd just use the substr function to pull out those six digits.
> coord$census_tract <- substr(coord$census_code, 6, 1)
> coord
lat long census_code census_tract
1 40.61847 -74.02123 360470152003001 015200
2 40.71348 -73.96551 360470551001009 055100
3 40.69948 -73.96104 360470537002011 053700
4 40.70377 -73.93116 360470425003000 042500
5 40.67859 -73.99049 360470077001000 007700
6 40.71234 -73.92416 360470449004075 044900
I hope that helps!
Related
Neat way to plot this correlation using ggplot2? [closed]
Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago. Improve this question I have this dataset airline avail_seat_km_per_week Number Year 1: Aer Lingus 320906734 2 1985-99 2: Aeroflot* 1197672318 76 1985-99 3: Aerolineas Argentinas 385803648 6 1985-99 4: Aeromexico* 596871813 3 1985-99 5: Air Canada 1865253802 2 1985-99 --- 108: United / Continental* 7139291291 14 2000-14 109: US Airways / America West* 2455687887 11 2000-14 110: Vietnam Airlines 625084918 1 2000-14 111: Virgin Atlantic 1005248585 0 2000-14 112: Xiamen Airlines 430462962 2 2000-14 These are some instances of the dataset: data.frame(airline=c("Aer Lingus", "Aeroflot*", "Aerolineas Argentinas", "Aeromexico*", "Air Canada", "Aer Lingus", "Aeroflot*", "Aerolineas Argentinas", "Aeromexico*", "Air Canada"), Number=c(2, 76, 6, 3, 2,0 ,6,1,5,2), Year=c("1985-99", "1985-99", "1985-99", "1985-99", "1985-99", "2000-14", "2000-14", "2000-14", "2000-14", "2000-14")) which includes the number of crashes of airlines around the world in 2 different periods, 85-99 and 00-14, I want to plot a scatterplot that displays the number of crashes in period 85-99 against period 00-14, what is a neat way to do it using dplyr and ggplot2 packages, preferably using pipes?. Please let me know if there are something I could do to further specify the problem. Appreciate your help!
When asking for help with plots in general, and ggplot, it's helpful if you're very clear about what data goes with each dimension - x, y, color, etc. library(tidyr) library(ggplot2) # (calling your data d) d %>% # widen the data so each plot dimension gets a column pivot_wider(names_from = Year, values_from = Number) %>% # use backticks for non-standard column names (because of the dash in this case) ggplot(aes(x = `1985-99`, y = `2000-14`, color = airline)) + geom_point()
how to increase size of character so as not to truncate the imported field [closed]
Closed. This question needs debugging details. It is not currently accepting answers. Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question. Closed 5 years ago. Improve this question Input data is of format A=integer, B=text [word count max 500] Importing this data set into R truncates the second column to fit chr. Is there a different class that will ensure no truncation or a method to increase the size of chr to accommodate the entire text? (conceptually equivalent to a TEXT vs VARCHAR in sql) xdoc <- read.csv("./data/abtest2.csv", header = TRUE, sep = ",", as.is = TRUE) head(xdoc) A 1 601004351600 B 1 adsfj al;ds fj;sd jf;klsdj f dsfdfsdf sdf sdf sdf as a dag dfgh tyutr erigkdj fajklsdf j;sdkl ;klajdfsiljuaeiodgjdfl;gdASo ri[3iocvjilgjdfi gjksjfl jgeoutoihjkvhlkasj;aljdsgkjdfghkdm,gfn;lkja;ja;drfjgkihyuirhl jkjfdkl hjgasdhgdfjkgksdjkj r...
I think it's something about the way in which you're viewing the files. longwords <- replicate(10,paste( sample(letters,600,replace=TRUE),collapse="")) nchar(longwords) ## 600 600 600 600 ... dd <- data.frame(n=1:10,w=longwords) write.csv(dd,file="tmp.csv",row.names=FALSE) Now read the data file back in -- it's the same as when it was written out xdoc <- read.csv("tmp.csv",as.is=TRUE) nchar(xdoc$w) ## [1] 600 600 600 600 600 ... I don't know what kind of limits there are on string length in R other than memory size, but they're long. Perhaps this note from ?as.character is relevant ... ? > ‘as.character’ breaks lines in language objects at 500 characters, and inserts newlines. Prior to 2.15.0 lines were truncated. So something else, either in your viewing procedure or in the way you've processed the data, is messing you up. head(xdoc) n 1 1 2 2 3 3 4 4 5 5 6 6 w 1 llscwhauaiqfqcftzfqujwqefathrchnneqwkcoktrpnebpylyjkoiqyscegbmdwmiegivulxnqxjlrcjiwrsfbltdrcymcmpeolxpexxcjhrggqjuphahysgocgjtsafueqzrnvcsofeuxfworytsnfrclsxozrmoitlpfunvmoomgijudjrjngynbrpfotbxzktjbctyafofvyjeegwuiavxrzhropgdtkbwsszwetxcgrrsymcjwstrmrqkaqlwuccikpbtjjwssvxvrrldzfjdqtythlhhzslxvhxrojskaxxuhcnmqppbymxvmqzbyhtzqfgljelvcmsmwsdbytqkvhkgyhreomxohpjtcbiffeuqgwrolwqgmmxevifadnqkxgbentgxazfspzztpuulvpqrbioelzhimyxzhrmdltlmynfpkaqldvwhaicmykjmlxmffrqlukqiwdmhrwygkricdozrggopnsknwduqxrmzovnrzcumddwtqzipfwmdijqgnclenqemecguxqfvbfyxcwpswmzrcvnuqohruphgkzljxgovddliiwdsrfobimtcboljtkxcmzfqwi 2 xuevtjfterzujzmauuvbwkszsbvcmyllddxnebwxgbwnqzlxhsppyxfnynjqkbzzuypxqaselnvwciusswranngvzmxgoxpjuawyaxxgtuisnifdcuqukluqlpwaqznbvlgltryvliwpqwmzrssadzocbiputgsyvfatwdhrbpjnhawdfqcssfkpqimyebfihcmkphsaybnyukzdjlggbkmjkogszslcossstvcehuyunrqapaggmvosouccuzpwjcyyqyizkyzqbcbsnsuewjkeicclfbxhlmishlxggnpluoovhlhcvxqqebzihrhtwjsbvrstddpqqpevjxvmprgthqkdiqgzbzvxjthnjuxvmbpijyvnxuwgemztexcpvouuasdikegxfiqdscjsgpjuvkxeweelfrvfuhllswebmxktpofxusqaqzdrbrybytufvuavknulcnikckayqhoxxsbjhwxcidtpxiwjwqpecmseutimbkfyjfbslhbvdrquefmeqggtbfogjoozbrcfsucxokbdvinnuoolriszkrgbeplswmrujgejsolidvyrdutqnejgrlkeoqqpguks 3 ohhbcsacskcpfjptbbvddwuzwbguedjqyowktvrinuzifawboyqgomhqrxahkbbuoyvsfbwwqstreomtzmdlszdndeurvehobdkzzqffxqgpgkcnqbwrrdcewlfbouveqpbwruoqnmbbodjbhetantlffwzpiefnwreimkoxjwswhdpncqgyvaulwehcuyyngidtdpscxysjqcydwbrqvhpjejudsondgltrrmmydrlnbqjaamdfnivundbupuaialqhuvivfiwtzmdahrtsgvaooardpdiwcinxzvrjrfufmjpsmtugrzqfibdyzgznahftzhlraqubtgnbbrrlursixsgzggbxqrjaqpzgmekqrtyawavhbmlcfcluhvwxfwcvjmxmlwkkzsleayftbxiufysupsygpoklqckxcwfpscleyidikrqvudpjzsqebwodmjkndzagemlofmznaoamedremdtrtbvrqmncxcjoydarnqfukqrapgcewncmhrdmpehiosurelobpqxhfiqksimmvcllcsdnefsvkpcwpokzgnpyluvescbztdlsnyduaxnjlrqgtpgkhclexnbd 4 njpjvhthxdkwrhjvzgnjmceketvjoxeaorxyasibcdhgallwbtvdixviamkrjgrgrwmnkxnihclcuxwoyitwnstlfpqqdwaqtilbmihzshpreexixbrqqhzblmkiptpieqhptczxocchzhbdweualevdoqdzbjdcxlosbgvexcbgwopmrvlqoquknwgcoulqdpmvnlsaxchtqxzzdqnnxukbrfvlfyhssidxsmyqkwmghzdkleccscagvkdioydhjyihgesczherzyoiolgmgyefriokqrxvhbpbzszugnogafoonprykardrjhuqrtdacydaefhrhrgvelehknavjuspgvulgaixgfjrgnmzsagbrxekwwegidduogyxohrfsvcahohggbhabwzkgxpqqrabwnkdeprfkrzlqvqwlqocfohhokxgjjvixvszkdhvszunsdqzzcgezdgvluholijbuitornmpjvggkqsqxhlnxsbujtjpriksthpmfqvhcnhvrnxxpjfrrulzjnfbmlemtvlemhtwfzdypabgcljgegdiehklzfgocsfbfmammpceocxddwpqlrmcvjbldkx 5 hawfcjfxgucbgcjggkfplsgcsncipmjnrwatlhwkrjokunomffyvmrvdkenbwahirvimlauvtefealzgkxihtfitevmffqtizbkvdidmgyshuvvwugpddwxxijtexrlnelbhftpczkxlwecmzxwpzfmaosixyzejbgandcuuiknattwgnopcrpfdhgdxdgnvumacvhnwgvlwmplnjroenogsjlrqroivbvibicxprylsoamxmhcumsbdqhvhwsmizemfnvxvlpbrhdqjyotgteomiymxqsyvcimxyxdyiplmohjnoxamibvselbbujdfnvwmycggsvqmhdrcwddpmqlgtuujqaadtinfuwiyghofqkxbgqdqqvqknhfehxhnamlwvingtaqdwmtgvsxplthzhlzolsjlwuvnxrzioxjvxlwcyssfrxljmikbqjfhevynsetwysnevxsczqbekfrpbbomvpphewrhprpabefhssuooubmxjhksqkljgglkewjkxafrorjuwlwjxyvioywztmaaruyekwuwlajfybievzchqviuueoaxosoeglxgbvlrehhnrmgmljruvygkvp 6 wirtvzltqsseidfrlezfrmaakmroyeztniyoiwwumqhuzqehlymaumrxqupxsfxmgmvoesvcgnavlamsqxbnzhesqsdsjajpowlevkwpifqlyinnifvsmyymrpfbmobrealrommitauwzxzkoohoppqwhfgfyqkdienrejptrvmaaoxwvdkmxeddfzynbiayrpfvrayjuvvcekbnfjtqyohyvkivoggovrodqyqxzbzyplmisqcreigwbjvabwoyfjfkgxssnafhicpercfievxgbbgpbqvfeeduletbmanmfckimsbeegeqrtfdmsqftqtmfwkfnjikxzipsjpbjcjncssmajqisellewvhunzgnmncplslsiuqngxecktxwzuyvwvlhdolkoarzcemluebjcvxckolwyebtxodqsbaleppqdluinwlafciqbfgfawcpsgocliyzeqxlkcwvptgicrtuffqdypeqojtfooaapvstolguhdgrwinzwxiglsxenkeghjdpitkxowqdtmekbqfpvtfrhpmebnrkvwdytzrzuigzyesyhssdaoircggxozljfrtoylsmnkkvfxk >
What is membership in community detection? [closed]
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations. Closed 6 years ago. Improve this question I am finding it hard to understand what membership and modularity returns and why is it exactly used. wc <- walktrap.community(karate) modularity(wc) membership(wc) plot(wc, karate) for the above code I get the following when I execute membership: [1] 1 1 2 1 5 5 5 1 2 2 5 1 1 2 3 3 5 1 3 1 3 1 3 4 4 4 3 4 2 3 2 2 3 for the above code I get the following when I execute modularity: [1] 0.3532216 I read the documentation but still a bit confusing.
The result of walktrap.community is a partition of your graph into communities which are numbered with id's from 1 to 5 in your case. The membership function gives a vector of a community ids for every node in your graph. So in your case node 1 belongs to community 1, and node 3 belongs to community 2. The partition of the graph into communities is based on optimizing a so called modularity function. When you call modularity you get the final value of that function after the optimization process is complete. A high value of modularity indicates a good partition of the graph into clear communities, while a low value indicates the opposite.
finding the most frequent item using bigmemory techniques and parallel computing? [closed]
Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 8 years ago. Improve this question How can I find which months have the most frequent delays without using regression? The following csv is a sample of a 100MB file. I know I should use bigmemory techniques but wasn't sure how to approach this. Here months are stored as integers not factor. Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay 2006,1,11,3,743,745,1024,1018,US,343,N657AW,281,273,223,6,-2,ATL,PHX,1587,45,13,0,,0,0,0,0,0,0 2006,1,11,3,1053,1053,1313,1318,US,613,N834AW,260,265,214,-5,0,ATL,PHX,1587,27,19,0,,0,0,0,0,0,0 2006,1,11,3,1915,1915,2110,2133,US,617,N605AW,235,258,220,-23,0,ATL,PHX,1587,4,11,0,,0,0,0,0,0,0 2006,1,11,3,1753,1755,1925,1933,US,300,N312AW,152,158,126,-8,-2,AUS,PHX,872,16,10,0,,0,0,0,0,0,0 2006,1,11,3,824,832,1015,1015,US,765,N309AW,171,163,132,0,-8,AUS,PHX,872,27,12,0,,0,0,0,0,0,0 2006,1,11,3,627,630,834,832,US,295,N733UW,127,122,108,2,-3,BDL,CLT,644,6,13,0,,0,0,0,0,0,0 2006,1,11,3,825,820,1041,1021,US,349,N177UW,136,121,111,20,5,BDL,CLT,644,4,21,0,,0,0,0,20,0,0 2006,1,11,3,942,945,1155,1148,US,356,N404US,133,123,121,7,-3,BDL,CLT,644,4,8,0,,0,0,0,0,0,0 2006,1,11,3,1239,1245,1438,1445,US,775,N722UW,119,120,103,-7,-6,BDL,CLT,644,4,12,0,,0,0,0,0,0,0 2006,1,11,3,1642,1645,1841,1845,US,1002,N104UW,119,120,105,-4,-3,BDL,CLT,644,4,10,0,,0,0,0,0,0,0 2006,1,11,3,1836,1835,NA,2035,US,1103,N425US,NA,120,NA,NA,1,BDL,CLT,644,0,17,0,,1,0,0,0,0,0 2006,1,11,3,NA,1725,NA,1845,US,69,0,NA,80,NA,NA,NA,BDL,DCA,313,0,0,1,A,0,0,0,0,0,0
Let's say your data.frame is called dd. If you want to see the total number of weather delays for each month across all years you can do delay <- aggregate(WeatherDelay~Month, dd, sum) delay[order(-delay$WeatherDelay),]
Is this closer to what you want? I don't know R well enough to sum the rows, but this at least aggregates them. I am learning, too! delays <- read.csv("tmp.csv", stringsAsFactors = FALSE) delay <- aggregate(cbind(ArrDelay, DepDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay) ~ Month, delays, sum) delay It outputs: Month ArrDelay DepDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay 1 1 10 -16 0 0 0 0 2 2 -31 -2 0 0 0 0 3 3 9 -4 0 20 0 0 Note: I changed your document a bit to provide some diversity on the Months column: Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay 2006,1,11,3,743,745,1024,1018,US,343,N657AW,281,273,223,6,-2,ATL,PHX,1587,45,13,0,,0,0,0,0,0,0 2006,1,11,3,1053,1053,1313,1318,US,613,N834AW,260,265,214,-5,0,ATL,PHX,1587,27,19,0,,0,0,0,0,0,0 2006,2,11,3,1915,1915,2110,2133,US,617,N605AW,235,258,220,-23,0,ATL,PHX,1587,4,11,0,,0,0,0,0,0,0 2006,2,11,3,1753,1755,1925,1933,US,300,N312AW,152,158,126,-8,-2,AUS,PHX,872,16,10,0,,0,0,0,0,0,0 2006,1,11,3,824,832,1015,1015,US,765,N309AW,171,163,132,0,-8,AUS,PHX,872,27,12,0,,0,0,0,0,0,0 2006,1,11,3,627,630,834,832,US,295,N733UW,127,122,108,2,-3,BDL,CLT,644,6,13,0,,0,0,0,0,0,0 2006,3,11,3,825,820,1041,1021,US,349,N177UW,136,121,111,20,5,BDL,CLT,644,4,21,0,,0,0,0,20,0,0 2006,1,11,3,942,945,1155,1148,US,356,N404US,133,123,121,7,-3,BDL,CLT,644,4,8,0,,0,0,0,0,0,0 2006,3,11,3,1239,1245,1438,1445,US,775,N722UW,119,120,103,-7,-6,BDL,CLT,644,4,12,0,,0,0,0,0,0,0 2006,3,11,3,1642,1645,1841,1845,US,1002,N104UW,119,120,105,-4,-3,BDL,CLT,644,4,10,0,,0,0,0,0,0,0 2006,3,11,3,1836,1835,NA,2035,US,1103,N425US,NA,120,NA,NA,1,BDL,CLT,644,0,17,0,,1,0,0,0,0,0 2006,1,11,3,NA,1725,NA,1845,US,69,0,NA,80,NA,NA,NA,BDL,DCA,313,0,0,1,A,0,0,0,0,0,0
How can i parse the text in data frame? [closed]
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist Closed 9 years ago. Improve this question Here is the web from which i want to get the data . the data to be parsed url="http://www.treasury.gov/resource-center/data-chart-center/tic/Documents/mfh.txt" download.file(url, destfile="/tmp/data") I can download it from web in txt format ,how can i get the data as a data frame?
I think it is a very interesting question that unfortunately will probably will be closed since OP don't show any effort to resolve it. The question is about of extracting numeric table from a text file. You should first detect the start and end of your table within the text using grep use read.fwf to read delimited data Change a double header to a simple header using some regular expression and toString Here my code: ll <- readLines('allo1.txt') i1 <- grep('Country',ll) i2 <- grep('Grand Total',ll) dat <- read.fwf(textConnection(ll[c(seq(i1+3,i2,1))]), widths = c(20,-1,rep(c(7,-1),13))) dat.h <- read.fwf(textConnection(ll[c(i1-1,i1)]), widths = c(20,-1,rep(c(7,-1),13))) nn <- unlist(lapply(dat.h,function(x)gsub('\\s|[*]','',toString(rev(unlist(x)))))) names(dat) <- nn Country, 2013,Apr 2013,Mar 2013,Feb 2013,Jan 2012,Dec 2012,Nov 2012,Oct 2012,Sep 2012,Aug 2012,Jul 2012,Jun 2012,May 2012,Apr 1 China, Mainland 1264.9 1270.3 1251.9 1214.2 1220.4 1183.1 1169.9 1153.6 1155.2 1160.0 1147.0 1164.0 1164.4 2 Japan 1100.3 1114.3 1105.5 1103.9 1111.2 1117.7 1131.9 1128.5 1120.9 1119.8 1108.4 1107.2 1087.9 3 Carib Bnkng Ctrs 4/ 273.1 283.9 280.3 271.8 266.2 263.5 273.5 261.1 263.9 247.6 244.6 243.2 237.3 4 Oil Exporters 3/ 272.7 265.1 256.8 261.6 262.0 259.1 262.2 267.2 269.1 268.4 270.2 260.6 262.2 5 Brazil 252.6 257.9 256.5 254.1 253.3 255.9 254.1 251.2 259.8 256.5 244.3 245.8 245.9