Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Here is the web from which i want to get the data .
the data to be parsed
url="http://www.treasury.gov/resource-center/data-chart-center/tic/Documents/mfh.txt"
download.file(url, destfile="/tmp/data")
I can download it from web in txt format ,how can i get the data as a data frame?
I think it is a very interesting question that unfortunately will probably will be closed since OP don't show any effort to resolve it. The question is about of extracting numeric table from a text file.
You should first detect the start and end of your table within the text using grep
use read.fwf to read delimited data
Change a double header to a simple header using some regular expression and toString
Here my code:
ll <- readLines('allo1.txt')
i1 <- grep('Country',ll)
i2 <- grep('Grand Total',ll)
dat <- read.fwf(textConnection(ll[c(seq(i1+3,i2,1))]),
widths = c(20,-1,rep(c(7,-1),13)))
dat.h <- read.fwf(textConnection(ll[c(i1-1,i1)]),
widths = c(20,-1,rep(c(7,-1),13)))
nn <- unlist(lapply(dat.h,function(x)gsub('\\s|[*]','',toString(rev(unlist(x))))))
names(dat) <- nn
Country, 2013,Apr 2013,Mar 2013,Feb 2013,Jan 2012,Dec 2012,Nov 2012,Oct 2012,Sep 2012,Aug 2012,Jul 2012,Jun 2012,May 2012,Apr
1 China, Mainland 1264.9 1270.3 1251.9 1214.2 1220.4 1183.1 1169.9 1153.6 1155.2 1160.0 1147.0 1164.0 1164.4
2 Japan 1100.3 1114.3 1105.5 1103.9 1111.2 1117.7 1131.9 1128.5 1120.9 1119.8 1108.4 1107.2 1087.9
3 Carib Bnkng Ctrs 4/ 273.1 283.9 280.3 271.8 266.2 263.5 273.5 261.1 263.9 247.6 244.6 243.2 237.3
4 Oil Exporters 3/ 272.7 265.1 256.8 261.6 262.0 259.1 262.2 267.2 269.1 268.4 270.2 260.6 262.2
5 Brazil 252.6 257.9 256.5 254.1 253.3 255.9 254.1 251.2 259.8 256.5 244.3 245.8 245.9
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I have a dataset with longitude and latitude coordinates. I want to retrieve the corresponding census tract. Is there a dataset or api that would allow me to do this?
My dataset looks like this:
lat lon
1 40.61847 -74.02123
2 40.71348 -73.96551
3 40.69948 -73.96104
4 40.70377 -73.93116
5 40.67859 -73.99049
6 40.71234 -73.92416
I want to add a column with the corresponding census tract.
Final output should look something like this (these are not the right numbers, just an example).
lat lon Census_Tract_Label
1 40.61847 -74.02123 5.01
2 40.71348 -73.96551 20
3 40.69948 -73.96104 41
4 40.70377 -73.93116 52.02
5 40.67859 -73.99049 58
6 40.71234 -73.92416 60
The tigris package includes a function called call_geolocator_latlon that should do what you're looking for. Here is some code using
> coord <- data.frame(lat = c(40.61847, 40.71348, 40.69948, 40.70377, 40.67859, 40.71234),
+ long = c(-74.02123, -73.96551, -73.96104, -73.93116, -73.99049, -73.92416))
>
> coord$census_code <- apply(coord, 1, function(row) call_geolocator_latlon(row['lat'], row['long']))
> coord
lat long census_code
1 40.61847 -74.02123 360470152003001
2 40.71348 -73.96551 360470551001009
3 40.69948 -73.96104 360470537002011
4 40.70377 -73.93116 360470425003000
5 40.67859 -73.99049 360470077001000
6 40.71234 -73.92416 360470449004075
As I understand it, the 15 digit code is several codes put together (the first two being the state, next three the county, and the following six the tract). To get just the census tract code I'd just use the substr function to pull out those six digits.
> coord$census_tract <- substr(coord$census_code, 6, 1)
> coord
lat long census_code census_tract
1 40.61847 -74.02123 360470152003001 015200
2 40.71348 -73.96551 360470551001009 055100
3 40.69948 -73.96104 360470537002011 053700
4 40.70377 -73.93116 360470425003000 042500
5 40.67859 -73.99049 360470077001000 007700
6 40.71234 -73.92416 360470449004075 044900
I hope that helps!
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
Input data is of format A=integer, B=text [word count max 500]
Importing this data set into R truncates the second column to fit chr. Is there a different class that will ensure no truncation or a method to increase the size of chr to accommodate the entire text? (conceptually equivalent to a TEXT vs VARCHAR in sql)
xdoc <- read.csv("./data/abtest2.csv", header = TRUE, sep = ",", as.is = TRUE)
head(xdoc)
A 1 601004351600
B 1
adsfj al;ds fj;sd jf;klsdj f dsfdfsdf sdf sdf sdf as a dag dfgh tyutr
erigkdj fajklsdf j;sdkl ;klajdfsiljuaeiodgjdfl;gdASo ri[3iocvjilgjdfi
gjksjfl jgeoutoihjkvhlkasj;aljdsgkjdfghkdm,gfn;lkja;ja;drfjgkihyuirhl
jkjfdkl hjgasdhgdfjkgksdjkj r...
I think it's something about the way in which you're viewing the files.
longwords <- replicate(10,paste(
sample(letters,600,replace=TRUE),collapse=""))
nchar(longwords) ## 600 600 600 600 ...
dd <- data.frame(n=1:10,w=longwords)
write.csv(dd,file="tmp.csv",row.names=FALSE)
Now read the data file back in -- it's the same as when it was written out
xdoc <- read.csv("tmp.csv",as.is=TRUE)
nchar(xdoc$w)
## [1] 600 600 600 600 600 ...
I don't know what kind of limits there are on string length in R other than memory size, but they're long. Perhaps this note from ?as.character is relevant ... ?
> ‘as.character’ breaks lines in language objects at 500 characters,
and inserts newlines. Prior to 2.15.0 lines were truncated.
So something else, either in your viewing procedure or in the way you've processed the data, is messing you up.
head(xdoc)
n
1 1
2 2
3 3
4 4
5 5
6 6
w
1 llscwhauaiqfqcftzfqujwqefathrchnneqwkcoktrpnebpylyjkoiqyscegbmdwmiegivulxnqxjlrcjiwrsfbltdrcymcmpeolxpexxcjhrggqjuphahysgocgjtsafueqzrnvcsofeuxfworytsnfrclsxozrmoitlpfunvmoomgijudjrjngynbrpfotbxzktjbctyafofvyjeegwuiavxrzhropgdtkbwsszwetxcgrrsymcjwstrmrqkaqlwuccikpbtjjwssvxvrrldzfjdqtythlhhzslxvhxrojskaxxuhcnmqppbymxvmqzbyhtzqfgljelvcmsmwsdbytqkvhkgyhreomxohpjtcbiffeuqgwrolwqgmmxevifadnqkxgbentgxazfspzztpuulvpqrbioelzhimyxzhrmdltlmynfpkaqldvwhaicmykjmlxmffrqlukqiwdmhrwygkricdozrggopnsknwduqxrmzovnrzcumddwtqzipfwmdijqgnclenqemecguxqfvbfyxcwpswmzrcvnuqohruphgkzljxgovddliiwdsrfobimtcboljtkxcmzfqwi
2 xuevtjfterzujzmauuvbwkszsbvcmyllddxnebwxgbwnqzlxhsppyxfnynjqkbzzuypxqaselnvwciusswranngvzmxgoxpjuawyaxxgtuisnifdcuqukluqlpwaqznbvlgltryvliwpqwmzrssadzocbiputgsyvfatwdhrbpjnhawdfqcssfkpqimyebfihcmkphsaybnyukzdjlggbkmjkogszslcossstvcehuyunrqapaggmvosouccuzpwjcyyqyizkyzqbcbsnsuewjkeicclfbxhlmishlxggnpluoovhlhcvxqqebzihrhtwjsbvrstddpqqpevjxvmprgthqkdiqgzbzvxjthnjuxvmbpijyvnxuwgemztexcpvouuasdikegxfiqdscjsgpjuvkxeweelfrvfuhllswebmxktpofxusqaqzdrbrybytufvuavknulcnikckayqhoxxsbjhwxcidtpxiwjwqpecmseutimbkfyjfbslhbvdrquefmeqggtbfogjoozbrcfsucxokbdvinnuoolriszkrgbeplswmrujgejsolidvyrdutqnejgrlkeoqqpguks
3 ohhbcsacskcpfjptbbvddwuzwbguedjqyowktvrinuzifawboyqgomhqrxahkbbuoyvsfbwwqstreomtzmdlszdndeurvehobdkzzqffxqgpgkcnqbwrrdcewlfbouveqpbwruoqnmbbodjbhetantlffwzpiefnwreimkoxjwswhdpncqgyvaulwehcuyyngidtdpscxysjqcydwbrqvhpjejudsondgltrrmmydrlnbqjaamdfnivundbupuaialqhuvivfiwtzmdahrtsgvaooardpdiwcinxzvrjrfufmjpsmtugrzqfibdyzgznahftzhlraqubtgnbbrrlursixsgzggbxqrjaqpzgmekqrtyawavhbmlcfcluhvwxfwcvjmxmlwkkzsleayftbxiufysupsygpoklqckxcwfpscleyidikrqvudpjzsqebwodmjkndzagemlofmznaoamedremdtrtbvrqmncxcjoydarnqfukqrapgcewncmhrdmpehiosurelobpqxhfiqksimmvcllcsdnefsvkpcwpokzgnpyluvescbztdlsnyduaxnjlrqgtpgkhclexnbd
4 njpjvhthxdkwrhjvzgnjmceketvjoxeaorxyasibcdhgallwbtvdixviamkrjgrgrwmnkxnihclcuxwoyitwnstlfpqqdwaqtilbmihzshpreexixbrqqhzblmkiptpieqhptczxocchzhbdweualevdoqdzbjdcxlosbgvexcbgwopmrvlqoquknwgcoulqdpmvnlsaxchtqxzzdqnnxukbrfvlfyhssidxsmyqkwmghzdkleccscagvkdioydhjyihgesczherzyoiolgmgyefriokqrxvhbpbzszugnogafoonprykardrjhuqrtdacydaefhrhrgvelehknavjuspgvulgaixgfjrgnmzsagbrxekwwegidduogyxohrfsvcahohggbhabwzkgxpqqrabwnkdeprfkrzlqvqwlqocfohhokxgjjvixvszkdhvszunsdqzzcgezdgvluholijbuitornmpjvggkqsqxhlnxsbujtjpriksthpmfqvhcnhvrnxxpjfrrulzjnfbmlemtvlemhtwfzdypabgcljgegdiehklzfgocsfbfmammpceocxddwpqlrmcvjbldkx
5 hawfcjfxgucbgcjggkfplsgcsncipmjnrwatlhwkrjokunomffyvmrvdkenbwahirvimlauvtefealzgkxihtfitevmffqtizbkvdidmgyshuvvwugpddwxxijtexrlnelbhftpczkxlwecmzxwpzfmaosixyzejbgandcuuiknattwgnopcrpfdhgdxdgnvumacvhnwgvlwmplnjroenogsjlrqroivbvibicxprylsoamxmhcumsbdqhvhwsmizemfnvxvlpbrhdqjyotgteomiymxqsyvcimxyxdyiplmohjnoxamibvselbbujdfnvwmycggsvqmhdrcwddpmqlgtuujqaadtinfuwiyghofqkxbgqdqqvqknhfehxhnamlwvingtaqdwmtgvsxplthzhlzolsjlwuvnxrzioxjvxlwcyssfrxljmikbqjfhevynsetwysnevxsczqbekfrpbbomvpphewrhprpabefhssuooubmxjhksqkljgglkewjkxafrorjuwlwjxyvioywztmaaruyekwuwlajfybievzchqviuueoaxosoeglxgbvlrehhnrmgmljruvygkvp
6 wirtvzltqsseidfrlezfrmaakmroyeztniyoiwwumqhuzqehlymaumrxqupxsfxmgmvoesvcgnavlamsqxbnzhesqsdsjajpowlevkwpifqlyinnifvsmyymrpfbmobrealrommitauwzxzkoohoppqwhfgfyqkdienrejptrvmaaoxwvdkmxeddfzynbiayrpfvrayjuvvcekbnfjtqyohyvkivoggovrodqyqxzbzyplmisqcreigwbjvabwoyfjfkgxssnafhicpercfievxgbbgpbqvfeeduletbmanmfckimsbeegeqrtfdmsqftqtmfwkfnjikxzipsjpbjcjncssmajqisellewvhunzgnmncplslsiuqngxecktxwzuyvwvlhdolkoarzcemluebjcvxckolwyebtxodqsbaleppqdluinwlafciqbfgfawcpsgocliyzeqxlkcwvptgicrtuffqdypeqojtfooaapvstolguhdgrwinzwxiglsxenkeghjdpitkxowqdtmekbqfpvtfrhpmebnrkvwdytzrzuigzyesyhssdaoircggxozljfrtoylsmnkkvfxk
>
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Here is some Example Data:
Begin = c("10-10-2010 12:15:35", "10-10-2010 12:20:52", "10-10-2010 12:23:45", "10-10-2010 12:25:01", "10-10-2010 12:30:29")
End = c("10-10-2010 12:24:23", "10-10-2010 12:23:30", "10-10-2010 12:45:15", "10-10-2010 12:32:11", "10-10-2010 12:45:05")
df = data.frame(Begin, End)
I want to count the number of events that have not currently finished when a new event begins and record it in a new column. So for this particular example the end result that is desired would be a column with values: 0, 1, 1, 1, 2
I need this to be coded in R please. I found a way to calculate this in SAS with a lag function but I do not like that method for various reasons and would like something that works better in R.
In reality I have 36,000 rows and this is dealing with power outages.
Someone asked me to post what I have tried, well. In SAS I was successful with a lag function as I said. That method did not work well because you have to hardcode a ton and its not efficient.
In R I tried to sort by begintime and number from 1-36k then sort by endtime and number from 1-36k and then try some ifthen logic but hit a wall and do not think that will work either.
My question was told to be edited to be made available to the community again. The only reason I can imagine is because there are too many possible answers. Well, I didn't edit anything, but I added this excerpt. In programming there will be many answers for any 'good' question that is not exactly the most simple question (but even those have many answers, especially in R). This question is one I know many people will ask throughout time and frankly it is hard to find a source of information on how to do this in R online. The answer to this question was very short and it worked perfectly. It would be a shame to not make this question available to the community as the point of stackoverflow is to attain a repertoire of great questions so basically their name will be pulled up when people google things along the lines of that question.
Maybe this helps:
library(lubridate)
library(data.table)
df <- as.data.frame(lapply(df, dmy_hms))
dt <- as.data.table(df)
setkey(dt,Begin,End)[,id:=.I]
merge(dt, foverlaps(dt,dt)[id>i.id,.N,by="Begin,End"], all.x=T)[,id:=NULL][is.na(N),N:=0][]
# Begin End N
# 1: 2010-10-10 12:15:35 2010-10-10 12:24:23 0
# 2: 2010-10-10 12:20:52 2010-10-10 12:23:30 1
# 3: 2010-10-10 12:23:45 2010-10-10 12:45:15 1
# 4: 2010-10-10 12:25:01 2010-10-10 12:32:11 1
# 5: 2010-10-10 12:30:29 2010-10-10 12:45:05 2
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 years ago.
Improve this question
I have a dataset flex in which one of the columns - Bookings - includes hours and minutes HH:MM. I need to remove the dots to have instead HHMM.
I tried:
gsub(":", "", flex$Bookings)
but this returns all new changed variables in the console, the variables in the dataset do not change.
I also tried:
flex$Bookings<-gsub(":", "", flex$Bookings)
but nothing happens. I know I miss something simple but can not figure out what. Thanks for your help
I am confused by your question. Please take a look at this output.
flex <- data.frame(Bookings = paste0(10:20, ":", 20:10))
flex$Bookings <- gsub(":", "", flex$Bookings)
flex
# Bookings
# 1 1020
# 2 1119
# 3 1218
# 4 1317
# 5 1416
# 6 1515
# 7 1614
# 8 1713
# 9 1812
# 10 1911
# 11 2010
Your syntax, for the question you asked, is correct.
Another option, if you table is very large, is to use the data.table package.
library(data.table)
flex_dt <- data.table(Bookings = paste0(10:20, ":", 20:10))
flex_dt[ , Bookings := gsub(":", "", Bookings)]
This might be too slow for you, but you could consider using a POSIX class, and format.Date.
time1 <- c("14:08")
time1.date <- format.Date(as.POSIXct(time1, format="%H:%M"), format="%H%M")
Again, this might be pretty slow, but if your need is specific to date/times, and you don't need a lot of speed, this approach could be helpful to you. Good luck.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I want to find and replace a list of words with another list of words.
Say my data is
1) plz suggst med for flu
2) tgif , m goin to have a blast
3) she is getting anorexic day by day
List of words to be replaced are
1) plz -- please
2) pls -- please
3) sugg -- suggest
4) suggst -- suggest
5) tgif -- thank god its friday
6) lol -- laughed out loud
7) med -- medicine
I would like to have 2 lists, list "A" --a list of words to be found and list "B" --a list of words to be replaced with. So that I can keep adding terms to these lists as and when required. I need a mechanism to search for all the words in list "A" and then replace it with corresponding words in list "B".
What is the best way to achieve this in R. Thanks in advance.
Try this:
#messy list
listA <- c("plz suggst med for flu",
"tgif , m goin to have a blast",
"she is getting anorexic day by day")
#lookup table
list_gsub <- read.csv(text="
a,b
plz,please
pls,please
sugg,suggest
suggst,suggest
tgif,thank god its friday
lol,laughed out loud
med,medicine")
#loop through each lookup row
for(x in 1:nrow(list_gsub))
listA <- gsub(list_gsub[x,"a"],list_gsub[x,"b"], listA)
#output
listA
#[1] "please suggestst medicine for flu"
#[2] "thank god its friday , m goin to have a blast"
#[3] "she is getting anorexic day by day"
have a look at ?gsub
x <- c("plz suggst med for flu", "tgif , m goin to have a blast", "she is getting anorexic day by day")
gsub("plz", "please", x)