I'm pretty new to R.
I just imported a CSV file into my R environment. I see the name of the dataframe and the name of the columns, but there is information below and I don't know what to make of it.
It looks like it might be records of the data types that R guessed when it imported the data, but I'm not sure. Can you confirm what it's communicating? Thanks!
The functions in the readr package add an extra spec attribute to the data they read in. It won't affect anything in later code. I assume it is intended for debugging purposes in case your data is imported poorly, you can see what was used so you can try something different in a manual override.
It's mentioned a bit in the readr vignette, especially the last two sections. You can access the spec attribute directly with either attr(your_data, "spec") or spec(your_data).
Related
I am currently storing the output (a Julia Dataframe) of my Julia simulation in a Parquet file using Parquet.jl. I would also like to save some of the simulation parameters (eg. a list of (byte-)strings) to that same output file.
Preferably, these parameters are different for each column as each column is the result of different starting conditions of my code. However, I could also work with a global parameter list and then untangle it afterwards by indexing.
I have found a solution for Python using pyarrow
https://mungingdata.com/pyarrow/arbitrary-metadata-parquet-table/.
Do you know a way how to do it in Julia?
It's not quite done yet, and it's not registered, but my rewrite of the Julia parquet package, Parquet2.jl does support both custom file metadata and individual column metadata (the keyword arguments metadata and column_metadata in Parquet2.writefile.
I haven't gotten to documentation for writing yet, but if you are feeling adventurous you can give it a shot. I do expect to finish up this package and register it within the next couple of weeks. I don't have unit tests in for writing yet, so of course, if you try it and have problems, please open an issue.
It's probably also worth mentioning that the main use case I recommend for parquet is if you must have parquet for compatibility reasons. Most of the time, Julia users are probably better off with Arrow.jl as the format has a number of advantages over parquet for most use cases, please see my FAQ answer on this. Of course, the reason I undertook writing the package is because parquet is arguably the only ubiquitous binary format in "big data world" so a robust writer is desperately needed.
I have a large population survey dataset for a project and the first step is to make exclusions and have a final dataset for analyses. To organize my work, I must continue my work in a new file where I derive survey variables correctly. Is there a command used to continue work by saving all the previous data and code to the new file?
I don´t think I understand the problem you have. You can always create multiple .R files and split the code among them as you wish, and you can also arrange those files as you see fit in the file system (group them in the same folder with informative names and comments, etc...).
As for the data side of the problem, you can load your data into R, make any changes / filters needed, and then save it to another file with one of the billions of functions to write stuff to the disk: write.table() from base, fwrite() from data.table (which can be MUCH faster), etc...
I feel that my answer is way too obvious. When you say "project" you mean "something I have to get done" or the actual projects that you can create in rstudio. If it´s the first, then I think I have covered it. If it´s the second, I never got to use that feature so I am not going to be able to help :(
Maybe you can elaborate a bit more.
As the title says, I'm not a programmer. I've tried R before, got very confused and abandoned it. I'm a physician, and I do all my statistics either with SPSS or Excel. I'd like to learn some coding for when I get into problems like this:
I have an ascii file that I'd like to extract data from. The fields are contained within columns of variable width. 90% of the file is useless to me. For example, the fields I'm interested in extracting are encoded in columns 00645-00649, 03315-03319, etc. I'd like to get this into a format so I can run stats in SPSS/Excel. Should I be looking to use R, Python, something else or am I totally beyond hope?
Thanks in advance.
It's impossible to say for certain given only the information here, but the DATA LIST command in SPSS may well allow you to read the data into SPSS directly from the current file. If you can specify the column locations of the desired variables, you can specify those on that command, and SPSS will simply skip over the unnamed columns.
I am using the library(eventstudies)(Event Studies Package). In the sample they use:
(data(StockPriceReturns))
(data(SplitDates))
(head(SplitDates))
However I do not know how to set up my own dataset to use the package. My quesiton is:
How to look into the StockPriceReturns data?
I appreciate your answer!
I think you want to read a data set into a data frame or table.
I'm not familiar with that package, so I'm not sure about required format. If the data set you read in matches the schema of StockPriceReturns, I'm sure R will process it just fine. This PDF appears to explain it well.
I'm using memisc to read in a massive SPSS file in order to reshape it. This seems to work very well.
However, I'd also like to be able to output the results to an SPSS-readable file that retains labels and descriptions--after all, this is the advantage of using the data-set construct memisc has set up.
I've searched the memisc documentation for any mention of an export, save, or output function. I've also tried passing a memisc data set to write.foreign (from the foreign package).
Why am I doing this in the first place? Reshaping massive data files in SPSS is a pain. R makes it easy. But the folks I'm doing this for want to maintain the labels and descriptions.
Thanks!