I'm an undergrad trying to essentially recreate someone else's research, and cannot for the life of me right now make sense of this following line of code:
temp_data[, fips := paste0(sprintf("%02d", STATEFP), sprintf("%03d", COUNTYFP))]
temp_data was fread from a csv, and is a "data.table" "data.frame" which I'm reading as either or...
The error message that started all of this:
Error in paste0("%02d", STATEFP) : object 'STATEFP' not found
I've looked into both paste0, and sprintf, and am currently thinking that the line of code is trying create STATEFP, and COUNTYFP from temp_data using paste0 after sprintf interprets the fips code however it needs to...
Here is what the temp_data looks like:
screenshot
Any suggestions that can help me to figure out what's going on here would be greatly appreciated. I'm using R 4.0.1 on/with x86_64-apple-darwin17.0 if that helps any.
Thank you for the screenshot that was very helpful.
sprintf essentially returns a vector containing both text and variable values. It looks like STATEFP and COUNTRYFP must have been defined earlier in the code, most likely vectors. This line of code uses these vectors to filter the data in some way, but I cannot say how without knowing what STATEFP and COUNTRYFP are.
Related
I have a given df which has some column names that include a "/" (e.g. "Province/State" and "Country/Region").
I want to first group the df by "Country/Region" and then summarize it like this:
confirmed_by_country <- confirmed %>%
group_by("Country/Region") %>%
summarize_at(vars(-Lat, -Long, -"Province/State"), sum)
When I try to run this code it tells me that the column "Province/State" does not exist. I was warned about using this problem but still can't figure out what I am doing wrong.
I am also confused why I am only getting this error for "Province/State" and not "Country/Region" in the group_by() function.
does anyone have an idea what the problem might be? Thanks!
Somehow it made a difference whether I imported the data with read.csv() or read_csv().
It didn't work with read.csv() even if I used backticks but it did when I used read_csv() and backticks. (The column names were also different depending on which one I used.)
If anyone knows why that is, I'm interested!
I have dug into rlist and purrr and found them to be quite helpful in working with lists of pre-structured data. I tried to solve the problems arising on my one to improve my coding skills - so thanks to the community of helping out! However, I reached a dead-end now:
I want to write a code which is needed to be written in a way, that we throw our excel files in xlsm format to the folder an r does the rest.
I Import my data using:
vec.files<-list.files(pattern=".xlsm")
vec.numbers<- gsub("\\.xlsm","",vec.files)
list.alldata <- lapply(vec.files, read_excel, sheet="XYZ")
names(list.alldata) <- vec.numbers
The data we call is a combination of charaters, Dates (...).
When I try to use the rlist-package everything works fine until I try to use to filter on names, which were in the excel file not a fixed entry (e.g. Measurable 1), but a reference to another field (e.g. =Table1!A1, or an Reference).
If I try to call a false element I get this failure:
list.map(list.alldata, NameWhichWasAReferenceToAnotherFieldBefore)
Fehler in eval(.expr, .evalwith(.data), environment()) :
Objekt 'Namewhichwasareferencetoanotherfieldbefore' nicht gefunden
I am quite surprised, as if I call
names(list.alldata[[1]])
I get a vector with the correct entries / names.
As I identified the read_excel() as the problem causing reason I tried to add col_names=True, but did not help. Also col_names=False calls the correct arguments into the dataset.
I assume, that exporting the data as a .csv would help, but this is not an option. Can this be easily done by r in a pree-loop?
In my concept of working assessing the data by the names is essential and there is no work around so I really appreciate your help!
Using dplyr in R (microsoft R Open 3.5.3 to be precise). I'm having a slight problem with dplyr whereby I'm sometimes seeing lots of additional information in the data frame I create. For example, for these lines of code:
claims_frame_2 <- left_join(claims_frame,
select(new_policy_frame, c(Lookup_Key_4, Exposure_Year, RowName)),
by = c("Accident_Year" = "Exposure_Year", "Lookup_Key_4" = "Lookup_Key_4")
)
claims_frame_3 <- claims_frame_2 %>% group_by(Claim.Number) %>% filter(RowName == max(RowName))
No problem with the left_join command, but when I do the second command (group by/filter), the data structure of the claims_frame_3 object is different to that of the claims_frame_2 object. Seems to suddenly have lots of attributes (something I know little about) attached to the RowName field. See the attached photo.
Does anyone know why this happens and how I can stop it?
I had hoped to put together a small chunk of reproducible code that demonstrated this happening, but so far I haven't been successful. I will continue. In the mean time, I'm hoping someone might see this code (from a real project) and immediately know why this is happening!
Grateful for any advice.
Thanks
Alan
I've searched a bit for answered questions related to this, but I still keep running into issues.
I have a 1.4 million dataframe loaded into R, containing gps route data for ~56 vehicles. I used the split() function to parse my data into smaller chunks by bus name (Bus name example: '1367/E0007489'). I used the following line of code:
dfs <- split(sater001_paired, f=sater001_paired[, "vehicleName"])
Where sater001_pairedis my dataframe, and vehicleName is the variable I split with. The # of rows for each chunk is uneven, given that this data was captured real-time.
The problem I'm facing now is attempting to save each of these chunks into their own .csv files. I tried using lapply as such:
lapply(names(dfs), function(x){write.table(dfs[[x]], file = paste("bus", x, sep = ""))})
But R returns en error message "cannot open the connection". It's likely I'm missing something, as I'm very rusty on using the lapply function.
Any suggestions based off this?
MrFlick has helped me realize the issue I was having here.
So just to close this, the Vehicle Names column I had contained a forward slash halfway in each identification code. As Rstudio on windows does not take kindly to these characters, I did not realize this, as I have only recently switched over from primarily Mac OS use.
By using gsub in the following code:
sater001_paired$vehicleName <- gsub('/', '-', sater001_paired$vehicleName)
This issue has now resolved. Thanks again to MrFlick for the help.
Hi all and thanks in advance for all your help.
In R, I'm sending a command to an external Windows program using system(command), which in turn outputs lines (with multiple values per line) that I see directly on the R console. They look something like this:
a,b,c,d,e,f,g,h
1,2,3,4,5,6,7,8
3,4,5,7,1,3,4,9
7,5,3,1,8,1,5,7
What I would like to do is create an array that has the top row as column names and each subsequent row from the input should be the values that go into these columns. Any and all help in making this work would be very appreciated.
This is my first foray into this territory so I'm quite stuck as to how to do it. I've meddled with scan(), pipe() and readLines() but haven't been able to succeed. I have no particular attachment to system(command), any function that will run the executable that will give me the output I need is fine by me if it helps achieve what I want.
The comment made by user1935457 did the trick.
read.table(text = system(command, intern=TRUE), sep = ",", header=TRUE)