Concatenate numbers into strings R programming - r

I have two columns which contain numeric values. I would like to create a third by concatenating them as a string. I do that using the paste function. but on exporting the data to CSV. The third column gets converted into date
Desired output (Column C):
A B C
2 3 2-3
4 5 4-5
A & B is contained in a dataset called concat
code written till now as under
concat$C <- paste(concat$A,concat$B, sep="-", collapse = NULL)
This shows the desired output on screen but on writing to CSV, values in C column changes to date format.

As the comments have pointed out this a result of the way Excel (or other applications) interpret column formats. Similar problems happen if you want to export numeric columns with leading 0s, open US-format csv in countries like Germany, etc.
The easiest solution to all these problems is to not open .csv in Excel directly.
Instead open a new, empty Excel and use the Import Assistant in the data tab. This will allow you to import csv or any other separated-text-format and control the column formats before importing!
Be aware that simply opening .csv,.tsv, etc. in Exel and then saving in the original file format will overwrite all data to the Excel assumed data format! So always use the import assistant.

Related

Loosing column of data when importing large csv file with read_csv

I'm importing a 2.2GB csv file with 24 million rows using read_csv(). One of the columns (vital sign_date_time), a character variable, is not being read and is importing with only NA values.
I've opened the .csv file up in SQLServer and can confirm the data is there in the file. I've broken the large file up into smaller chunks in macOS terminal. When I import the smaller files, again with read_csv(), the data is also present.
I'm using the import dialog box in RStudio to minimize any typing errors. In the data view section of the dialog box, it shows only NA data in the column in question and is trying to import the column as a logical field. I've tried manually changing this to character type and it still reads only NA values.
Here's a screenshot of the dialog box:
Any ideas about what might be happening?
Thanks.
Take care,
Jeff
I was bitten by a similar problem recently, so this is a guess based on that experience.
By default, if the 1000 first entries of a column are NA, readr::read_csv will automatically set all values of this column to NA. You can control this by setting the guess_max argument. Here is the documentation:
guess_max: Maximum number of records to use for guessing column types.
For example,
library(readr)
dat <- read_csv("file.csv", guess_max=100000)

crawl excel data automatically based on contents

I want to get data from many excel files with the same format like this:
What I want is to output the ID data from column B to a CSV file. I have many files like this and for each file, the number of columns may not be the same but the ID data will always be in the B column.
Is there a package in Julia that can crawl data in this format? If not, what method should I use?
You can use the XLSX package.
If the file in your screenshot is called JAKE.xlsx and the data shown is in a sheet called DataSheet:
data = XLSX.readtable("JAKE.xlsx", "DataSheet")
# `data[1]` is a vector of vectors, each with data for a column.
# that way, `data[1][2]` correponds to column B's data.
data[1][2]
This should give you access to a vector with the data you need. After getting the IDs into a vector, you can use the CSV package to create an output file.
If you add a sample xlsx file to your post it might be possible to give you a more complete answer.

Formatting the exported data in R

I have a list formed from merging data together in R which was extracted from various excel sheets. When I am exporting this list to the Excel, some numbers get stored as text. How can I ensure that numbers/values are stored in the number format and text stored in the text format?
Example of a table I have:
Name1 Ed 23 0.45 DNR ST 8732
Name2 Bob - 0.78 Tik GH 999
Name3 Jose 26 0.23 DNR TT 1954
Desired outcome: have exactly the same table exported in excel with numeric values being stored in a numeric/general format.
easier way would be to use 'import dataset' wizard in R studio IDE. wizard is interactive and you can actually see the corresponding code generated as you change the options in the wizard (bottom right corner). say you change one column type that you think is not appropriate, and then you will see the code will change to reflect that. Once you finish the import, you can save the code for future use.
There is a reason why this happened, when you import data you need to define colClasses parameter of read.table or read.csv, whichever function you are using to read data. Then merge the data frame and check if the class of each variable is right, if not then convert the data type of the data frame to the required datatype. If you have a list then convert it to data frame because data frames are more handy.
If nothing works then convert the datatypes manually to their respective datatypes and then write the table. It will work.
There is a provision that date datatype when write to a file it is automatically converted to character datatype.
Let me know if it resolves your query

How to split one column containing several values so each column only contains one value?

starting situation as follows:
I've got a csv files with roughly 3000 rows, but only 1 column. In each of the rows there are several values included.
Now I want to assign only one value per column.
How do I manage to do that?
convert the file into txt format and then open the data using MS excel. Don't directly open the file. Open it using Open option in file menu. When you do this a text wizard will appear. You can then split your data by using delimited such as commas, spaces and form multiple columns. Once you are done with it, you save the file in csv format

Save column descriptions when exporting SAS dataset to CSV

I've been given SAS data which I'd like to export to CSV, so that I can analyze it using R. I've never used SAS but I found Efficiently convert a SAS dataset into a CSV, which explains how to convert to CSV using code like this:
proc export data=sashelp.class
outfile='c:\temp\sashelp class.csv'
dbms=csv
replace;
run;
This works, but I've noticed that I end up with what I'll call short column names in the CSV, whereas I see long, verbose column descriptions when I look at the data in SAS (i.e. using the SAS software).
I'd like to programmatically save those column descriptions to a txt file, so that I can read them into an R vector. In other words, I'm happy having the short column names in my CSV header (i.e. the first line of my CSV), but I'd like to also have a second file, with only one line, containing the longer column descriptions. How do I do that? I googled and didn't notice anything helpful.
To give an example, the long column descriptions I see in SAS might be something like "Number of elephants in Tanzania", with a corresponding short column name of "ElephTanz".
You can use the SAS "dictionary" library to access this kind of info. The following code creates a table work.column_labels that has two columns: the "short name" you're seeing and the longer label that appears when you view the data in SAS. (Note that the sashelp.class data doesn't happen to have labeled columns to this particular example will have the second column empty.)
proc sql;
create table work.column_lables as
select Name,label
from dictionary.columns
where libname = 'SASHELP'
and memname = 'CLASS';
quit;
Then you can export this table to a csv using code similar to what you already have.

Resources