Save big matrix as csv file - header over multiple rows in excel - r

I have a matrix with 168 rows and about 6000 columns. The column names are stock identifiers, the rownames are dates. I would like to export this matrix as a .csv file. I tried the following:
write.csv(OAS_data, "Some Path")
The export works. But the header of the matrix (stock identifiers) is distributed over the first 3 rows of the .csv file (when opened in excel). The first two rows each contain the names of 2184 column names. The rest of the column names is in the third line. How can I avoid those breaks? The rest of the .csv file looks fine - no line breaks there.
Thank You.

Your best bet is probably to transpose your data and analyze it that way due to Excel's limits.
write.csv(t(OAS_data), "Some Path")

Related

Writing csvs from dataframes in a list

I have a dataframe with a picture below, it contains a list of dataframes in the 2nd column, the column name is content. I also have a column called racenames in the 3rd column that I'd like to put inside of my csvs while running through the code. I can't figure out a way to get the list of dataframes to write to a csv in a loop or anything.
The code below works at writing a csv for the first dataframe in the content column, but I would like to write all of the dataframes at the same time so that I don't need to manually change the numbers/names for hours. All of the data has been scraped in one of my prior loops.
write.csv(ARCA_separate[[2]][[1]], file = "C:\\Users\\bubba\\Desktop\\RStuff\\Scraping\\ARCA 2012 Season\\*racename*.csv")
Here is what the data I'm working with looks like. The dataframe is called ARCA_separate.
How do I write all of the csvs and grab the corresponding racename in the same row to put into my csv name?
You can try this:
purrr::walk2(ARCA_separate$content, ARCA_separate$racename, function (x, y) write_csv(x, paste0("C:\\Users\\bubba\\Desktop\\RStuff\\Scraping\\ARCA 2012 Season\\", y, ".csv")))

Export to CSV, keeping leading zeros when opened in Excel

I have a series of massive data files that range in size from 800k to 1.4M rows, and one variable in particular has a set length of 12 characters (numeric data but with leading zeros where other the number of non-zero digits is fewer than 12). The column should look like this:
col
000000000003
000000000102
000000246691
000000000042
102851000324
etc.
I need to export these files for a client to a CSV file, using R. The final data NEEDS to retain the 12 character structure, but when I open the CSV files in excel, the zeros disappear. This happens even after converting the entire data frame to character. The code I am using to do this is as follows.
df1 %>%
mutate(across(everything(), as.character))
##### I did this for all data frames #####
export(df1, "df1.csv")
export(df2, "df2.csv")
....
export(df17, "df17.csv)
I've read a few other posts that say this is an excel problem, and that makes sense, but given the number of data files and amount of data, as well as the need for the client to be able to open it in excel, I need a way to do it on the front end in R. Any ideas?
Yes, this is definitely an Excel problem!
To demonstrate, In Excel enter your column values save the file as a CSV value and then re-open it in Excel, the leading zeros will disappear.
One option is add a leading non-numerical character such as '
paste0("\' ", df$col)
Not a great but an option.
A slightly better option is to paste Excel's Text function to the character string. Then Excel will process the function when the function is opened.
df$col <- paste0("=Text(", df$col, ", \"000000000000\")")
#or
df$col <- paste0("=\"", df$col, "\"")
write.csv(df, "df2.csv", row.names = FALSE)
Of course if the CSV file is saved and reopened then the leading 0 will again disappear.
Another option is to investigate saving the file directly as a .xlsx file with the "writexl", or "XLSX" or similar package.

R: How to add multiple variables at the end of a data frame from the file name

Apologies if this is a trivial question. I saw others like it such as: How can I turn a part of the filename into a variable when reading multiple text files into R? , but I still seem to be having some trouble...
I have been given 50000 .txt files. Each file contains a single observation (a single row of data) with exactly 12 variables (number of columns). The name of each .txt file is fairly regular. Specifically, each .txt file has a code at the end indicating the type of observation across three dimensions. An example of this code is 'VL-VL-NE' or 'VL-M-N' or 'H-H-L' (not including the apostrophes). Therefore, an example of a file name could be 'I-love-using-R-20_01_2016-VL-VL-NE.txt'.
My problem is that I want to include this code at the end of the .txt file in the actual vector itself when I import into R, i.e., I want to add three more variables (columns) at the end of the table corresponding to the three parts of code at the end of the file name.
Any help would be greatly appreciated.
Because you have exactly the same number of columns in each file, why don't you import them into R using a loop that looks for all .txt files in a particularly directory?
df <- c()
for (x in list.files(pattern="*.txt")) {
u<-read.csv(x, skip=6)
u$Label = factor(x) #A column that is the filename
df <- rbind(df,u)
}
You'll note that the file name itself becomes a column. Once everything is into R, it should be fairly easy to use a regex function to extract the exact elements you need from the file name column (df$Label).

Saving a txt file as a delimited csv file in R

I have the following code to read a file and save it as a csv file, I remove the first 7 lines in the text file and then the 3rd column as well, since I just require the first two columns.
current_file <- paste("Experiment 1 ",i,".cor",sep="")
curfile <- list.files(pattern = current_file)
curfile_data <- read.table(curfile, header=F,skip=7,sep=",")
curfile_data <- curfile_data[-grep('V3',colnames(curfile_data))]
write.csv(curfile_data,curfile)
new_file <- paste("Dev_C",i,".csv",sep="")
new_file
file.copy(curfile, new_file)
The curfile thus hold two column variables V1 and V2 along with the observation number column in the beginning.
Now when I use file.copy to copy the contents of the curfile into a .csv file and then open the new .csv file in Excel, all the data seems to be concatenated and appear in a single column, is there a way to show each of the individual columns separately? Thanks in advance for your suggestions.
The data in the .txt file looks like this,
"","V1","V2","V3"
"1",-0.02868862,5.442283e-11,76.3
"2",-0.03359281,7.669754e-12,76.35
"3",-0.03801883,-1.497323e-10,76.4
"4",-0.04320051,-6.557672e-11,76.45
"5",-0.04801207,-2.557059e-10,76.5
"6",-0.05325544,-9.986231e-11,76.55
You need to use Text to columns feature in Excel, selecting comma as a delimiter. The position of this point in menu depends on version of Excel you are using.

How to read a multiple line cell into R data frame

example of the data I am trying to read into R data frame
Dear friends:
I am trying to read the data, in which a single Excel Cell containing multiple lines, into a R data frame. Ideally, I want to keep those multiple lines in a single spot in the data frame with some separators between these lines, such as |, ;, etc.
How could I do that?
The resulting file should be like this:
Patient Segment(s)_____________Sponsor(s)__PrimaryDrugs_Other Drugs
"Comorbid...| Diabetic...|Hypertensive..."__NIDDK__celecoxib__.....
Many thanks!
It may depend on how you access this data. If you export it to a CSV file the CR-LF's in the cell may break the lines so that there will be a need to read them in with readLines() and then reassemble them with paste(). On the other hand if you use a package that is designed to read individual cells, the line breaks may get incorporated into single elements. You should display the CSV output ... or explain how you planned on accessing hte XLS file and post a portion of it somewhere tha tpeople can get to.
On a Mac it requires ctl-opt-enter to put a cr-lf into a cell. If it is there, exporting produces a result that looks like this in a text editor
"there is
a test of
alt-ctl-enter
"
and which then looks like this with read.table:
read.table("~/test.csv", header=FALSE)
V1
1 there is \na test of \nalt-ctl-enter\n
#plus a harmless warning about an incomplete line.
So it as a single character element in a vector. To replace the "\n" which is the <"cr-lf"> in R strings with "|" (pipes) use gsub:
dat <- read.table("~/test.csv", header=FALSE)
gsub("\n", "|", dat$V1)
# [1] "there is |a test of |alt-ctl-enter|"

Resources