I'm working with a huge file to do some data analysis in R. This file is a .csv. I can import it just fine. However, after transposing all the rows and columns using data.frame(t(data)), I export it and cannot re-import this data.
This is the code I am using:
write.csv(transposed_data, file = "transposed_data.csv", row.names = FALSE, quote = FALSE)
When I transpose the rows and columns, does something happen to the data that is causing these issues? When using read.csv, the transposed data simply will not open.
Related
I am struggeling in writing a code to import different files (selecting and renaming columns) with a loop.
In detail, I am looking for an efficient way to create a loop that does the same as follows for several .xlsx files.
library(readxl)
Data <-read_excel('File.xlsx', sheet = "results", range = cell_cols("B:D"),
c("Col1", "Col2", "Col3"))
I have done several attemps, but non of them worked; can anyone suggest a solution?
I am trying to output multiple small data frames to an Excel file. The data frames are residuals and predicted from mgcv models run from a loop. Each is a separate small data set that I am trying to output to a separate worksheet in the same Excel spreadsheet file.
The relevant line of code that is causing the error is from what I can tell this line of code
write.xlsx(resid_pred, parfilename, sheetName=parsheetname, append = TRUE)**
where resid_pred is the residuals predicted data frame, parfilename is the file name and path and
parsheetname is the sheet name.
The error message is
Error in save Workbook(wb, file = file, overwrite = overwrite) : File already exists!
Which makes no sense since the file would HAVE to exist if I am appending to it. Does anyone have a clue?
Amazingly the following code will work:
write.xlsx2(resid_pred, file=parfilename, sheetName= parsheetname, col.names =
TRUE, rowNames = FALSE, append = TRUE, overwrite = FALSE)
The only difference is it is write.xlsx2 not just xlsx.
I have merged two tables together and I want to write them to a .txt file. I have managed to do this, however when I open the .txt file in excel the symbols   have been added to some values. How do I stop this from happening? I have used the following code:
ICU_PPS <- merge(ICU, PPS, by=c("Study","Lane","Isolate_ID","Sample_Number","MALDI_ID", "WGS","Source"),all=TRUE)
write.table(ICU_PPS,"ICUPPS2.txt", sep="\t", row.names = FALSE)
An example of some values in a column that I get:
100_1#175
100_1#176
100_1#177
100_1#179
100_1#18 
100_1#19 
100_1#20 
What I want to achieve:
100_1#175
100_1#176
100_1#177
100_1#179
100_1#18
100_1#19
100_1#20
I have a very large dataset in a csv file. When opening it in a data visualization software (Spotfire) I can see that it has more than 7 millions rows.
However, I am working with RStudio, and when I try to open the file with read.csv2, being cautious of the quotes or other options that could affect my dataset, I end up with a 4 million file.
Here is my code when I import the file
my_data <- {
as.data.frame(read.csv2(file,
sep = ";",
header = TRUE,
na.strings=c(""," ","NA"),
quote = "",
check.names=FALSE,
stringsAsFactors=FALSE))
}
Moreover, when I take a look at the data in RStudio with View(my_data) I can see that my lines are perfectly correct
Is it related to a size limit of the files in RStudio or something like that ?
I have thousands of huge CSV files that I need to upload into Postgres. I read that COPY FROM is the fastest way to upload csv files. However, I need to do a bit of pre-processing of the data. As a bare minimum, I need to add the filename (or some sort of file id) so that I can tie the information to its source.
Right now, I'm reading the CSV files into a R data frame, adding a column with filename to the data frame and then writing the data frame to Postgres using
dbWriteTable(con, name = 'my_table', value = my_dataframe, row.names = FALSE, append = TRUE, overwrite= FALSE)
I want to know if there is a better/faster way of importing the csv files.
Thanks.