Save column descriptions when exporting SAS dataset to CSV - r

I've been given SAS data which I'd like to export to CSV, so that I can analyze it using R. I've never used SAS but I found Efficiently convert a SAS dataset into a CSV, which explains how to convert to CSV using code like this:
proc export data=sashelp.class
outfile='c:\temp\sashelp class.csv'
dbms=csv
replace;
run;
This works, but I've noticed that I end up with what I'll call short column names in the CSV, whereas I see long, verbose column descriptions when I look at the data in SAS (i.e. using the SAS software).
I'd like to programmatically save those column descriptions to a txt file, so that I can read them into an R vector. In other words, I'm happy having the short column names in my CSV header (i.e. the first line of my CSV), but I'd like to also have a second file, with only one line, containing the longer column descriptions. How do I do that? I googled and didn't notice anything helpful.
To give an example, the long column descriptions I see in SAS might be something like "Number of elephants in Tanzania", with a corresponding short column name of "ElephTanz".

You can use the SAS "dictionary" library to access this kind of info. The following code creates a table work.column_labels that has two columns: the "short name" you're seeing and the longer label that appears when you view the data in SAS. (Note that the sashelp.class data doesn't happen to have labeled columns to this particular example will have the second column empty.)
proc sql;
create table work.column_lables as
select Name,label
from dictionary.columns
where libname = 'SASHELP'
and memname = 'CLASS';
quit;
Then you can export this table to a csv using code similar to what you already have.

Related

crawl excel data automatically based on contents

I want to get data from many excel files with the same format like this:
What I want is to output the ID data from column B to a CSV file. I have many files like this and for each file, the number of columns may not be the same but the ID data will always be in the B column.
Is there a package in Julia that can crawl data in this format? If not, what method should I use?
You can use the XLSX package.
If the file in your screenshot is called JAKE.xlsx and the data shown is in a sheet called DataSheet:
data = XLSX.readtable("JAKE.xlsx", "DataSheet")
# `data[1]` is a vector of vectors, each with data for a column.
# that way, `data[1][2]` correponds to column B's data.
data[1][2]
This should give you access to a vector with the data you need. After getting the IDs into a vector, you can use the CSV package to create an output file.
If you add a sample xlsx file to your post it might be possible to give you a more complete answer.

Concatenate numbers into strings R programming

I have two columns which contain numeric values. I would like to create a third by concatenating them as a string. I do that using the paste function. but on exporting the data to CSV. The third column gets converted into date
Desired output (Column C):
A B C
2 3 2-3
4 5 4-5
A & B is contained in a dataset called concat
code written till now as under
concat$C <- paste(concat$A,concat$B, sep="-", collapse = NULL)
This shows the desired output on screen but on writing to CSV, values in C column changes to date format.
As the comments have pointed out this a result of the way Excel (or other applications) interpret column formats. Similar problems happen if you want to export numeric columns with leading 0s, open US-format csv in countries like Germany, etc.
The easiest solution to all these problems is to not open .csv in Excel directly.
Instead open a new, empty Excel and use the Import Assistant in the data tab. This will allow you to import csv or any other separated-text-format and control the column formats before importing!
Be aware that simply opening .csv,.tsv, etc. in Exel and then saving in the original file format will overwrite all data to the Excel assumed data format! So always use the import assistant.

Formatting the exported data in R

I have a list formed from merging data together in R which was extracted from various excel sheets. When I am exporting this list to the Excel, some numbers get stored as text. How can I ensure that numbers/values are stored in the number format and text stored in the text format?
Example of a table I have:
Name1 Ed 23 0.45 DNR ST 8732
Name2 Bob - 0.78 Tik GH 999
Name3 Jose 26 0.23 DNR TT 1954
Desired outcome: have exactly the same table exported in excel with numeric values being stored in a numeric/general format.
easier way would be to use 'import dataset' wizard in R studio IDE. wizard is interactive and you can actually see the corresponding code generated as you change the options in the wizard (bottom right corner). say you change one column type that you think is not appropriate, and then you will see the code will change to reflect that. Once you finish the import, you can save the code for future use.
There is a reason why this happened, when you import data you need to define colClasses parameter of read.table or read.csv, whichever function you are using to read data. Then merge the data frame and check if the class of each variable is right, if not then convert the data type of the data frame to the required datatype. If you have a list then convert it to data frame because data frames are more handy.
If nothing works then convert the datatypes manually to their respective datatypes and then write the table. It will work.
There is a provision that date datatype when write to a file it is automatically converted to character datatype.
Let me know if it resolves your query

Read excel data as is using readxl in R

I have to read an excel file in R. The excelfile has a column with values such as 50%,20%... and another column with dates in the format "12-December-2017" but R converts both the column datas.
I am using readxl package and i specified in col_types parameter all the columns to be read as text but when i check the dataframe all the column types are characters but the percentage data and date changes to decimals and numbers respectively.
excelfile2<-read_excel(filePath,col_types=rep("text",8))
I want to read the excel file as is.Any help will be appreciated.
This is because what you visualize inside the Excel is not what actually is stored.
For example, if in excel you visualize "12-December-2017", what is stored in reality is the number of days since 1-1-1899.
My suggestion is to open the Excel file with the TextReader so you have a grasp what really you are reading in R.
Then, you can either define everything as text in excel or you can apply some transformations in R in order to convert the days since 1-1-1899 into a POSIXct format.

X12 seasonal adjustment program from census, problem with input file extensions

I downloaded the X12 seasonal adjustment program located here: http://www.census.gov/srd/www/x12a/x12downv03_pc.html
I followed the setup and got the setting correct. When I go to select a file to input I have four options for file extensions to import which are ".spc" ".mta" ".dta" and "."
The problem is that I have data in excel and I have searched extensively through search engines and I do cannot figure out a way to get data from excel into one of these formats so I can do a seasonal adjustment on my data. Thanks
ADDED: After converting to a dta file (using R thanks to the comments left below) it looks like the program makes you convert it also to a .spc file as well. Anyone have a lead on how to do this? thanks
My first reaction is to:
(1) export the data from excel in something simple like csv.
(2) import that data into R
(3) use the R library "foreign" to export the data in .dta format.
So with the file "test.csv" containing:
V1,V2
1,2
3,4
5,6
you could do the following to produce "test.dta":
library(foreign)
testdata <- read.csv("test.csv")
write.dta(testdata,"test.dta")
Voila, data in .dta format. Would this work for what you have?
I've only ever used the command-line version of X12, but it sounds like you may be using the windows interface instead? If so the following might not be entirely accurate, but it should be close enough (I hope!).
The .dta and .mta files you refer to are just metafiles containing text lists of either spec files or data files to be processed; in particular the .dta files X12 uses are NOT Stata data format files like those produced by Nathan's R-based answer. It's probably best to ignore using metafiles until you are comfortable enough using the software to adjust a single time series.
You can export your data in tab separated variable format (year month/quarter value) without headings and use that as your data file. You can also use a simple list of data values separated by spaces, tabs, or newlines and then tell X12ARIMA what the start and end dates of the series are in the .spc file.
The .spc file doesn't contain the input data, it's a specification file telling X12 where to find the data file and how you want those data to be processed -- you'll have to write them yourself or create them in Win X-12.
Ideally you should write a separate .spc file for each time series to be adjusted; while you can write a .spc file which invokes many of X12's autoselection and identification procedures, it's usually not a good idea to treat the process as a black box, and a bit of manual intervention in the .spc is often necessary to get a good quality adjustment (and essential if there's a seasonal break involved). I find it helpful to start with a fairly generic skeleton .spc file suitable for your computing environment to begin with and then tweak it from there as appropriate for each series.
If you really want to use a single .spc file to adjust multiple series, then you can provide a list of data files in a .dta file and a single .spc file instructing X12ARIMA how to adjust them, but take care to ensure this is appropriate for your data!
The "Getting started with X-12-ARIMA input files on your PC" document on that site is probably a good place to start reading, but you'll probably end up having to consult the complete reference documentation (in particular Chapters 3 and 7) as well.
Edit postscript:
The UK Office for National Statistics have a draft of their guide to seasonal adjustment with X12ARIMA available online here here (archive.org), and is worth a look. It's a good bit easier to work through than the Census Bureau documentation.
Ryan,
This is not elegant, but it might work for you. In this example I'm trying to replicate the spec file from the Example 3.2 in the Census documentation.
Concatentate the data into one text string, then save this single text string using the MS-DOS (TXT) format under the SAVE AS command. To make the text string, first insert two cells above your column header and in the second one type the following text into it.
series{title=
Next, insert double quotation marks before and after the text in your column header, like this:
"Monthly Retail Sales of Household Appliance Stores"
Directly below the last data row, insert rows of texts that list the model specifications, like the following:
)
start= 1972.jul}
transform{function = log}
regression{variables=td}
indentify[diff=(0,1) sdiff=(0,1)}
So you should have something like the following:
<blank row>
series{title=
"Monthly Retail Sales of Household Appliance Stores"
530
529
...
592
590
start= 1972.jul}
transform{function = log}
regression{variables=td}
indentify{diff=(0,1) sdiff=(0,1)}
For the next instructions I am assuming that the text *series{title=
* appears in cell A2 and that cell B1 is empty. In cell B2, insert the following:
=CONCATENATE(B1,A2," ")
Then copy this formula into every cell down the column to concatentate all of the text in column A into a single cell at the end of column B. Finally, copy the final cell to a new spreadsheet's cell A1 using PASTE SPECIAL/VALUE, and save this spreadsheet using SAVE AS: *TXT(MS-DOS), but change the extension to ".spc".
Good luck (and from the little I read of the Census documentation - you'll need it).

Resources