Inconsistency between 'read.csv' and 'write.csv' in R

Inconsistency between 'read.csv' and 'write.csv' in R - r

The R function read.csv works as the following as stated in the manual: "If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names." That's good. However, when it comes to the function write.csv, I cannot find a way to write the csv file in a similar way. So, if I have a file.txt as below:
Column_1,Column_2
Row_1,2,3
Row_2,4,5
Then when I read it using a = read.csv('file.txt'), the row and column names are Row_x and Column_x as expected. However, when I write the matrix a to a csv file again, then what I get as a result from write.csv('file2.txt', quote=F) is as below:
,Column_1,Column_2
Row_1,2,3
Row_2,4,5
So, there is a comma in the beginning of this file. And if I would read this file again using a2 = read.csv('file2.txt'), then resulting a2 will not be the same as the previous matrix a. The row names of the matrix a2 will not be Row_x. That's, I do not want a comma in the beginning of the file. How can I get rid of this comma while using write.csv?

The two functions that you have mentioned, read.cvs and write.csv are just a specific form of the more generic functions read.table and write.table.
When I copy your example data into a .csv and try to read it with read.csv, R throws a warning and says that the header line was incomplete. Thus it resorted to special behaviour to fix the error. Because we had an incomplete file, it completed the file by adding an empty element at the top left. R understands that this is a header row, and thus the data appears okay in R, but when we write to a csv, it doesn't understand what is header and what is not. Thus the empty element only appearing in the header row created by R shows up as a regular element. Which you would expect. Basically it made our table into a 3x3 because it can't have a weird number of elements.
You want the extra comma there, because it allows programs to read the column names in the right place. In order to read the file in again you can do the following, assuming test.csv is your data. You can fix this by manually adding the column and row names in R, including the missing element to put everything in place.
To fix the wonky row names, you're going to want to add an extra option specifying which row is the row names (row.names = your_column_number) when you read it back in with the comma correctly in place.
y <- read.csv(file = "foo.csv") #this throws a warning because your input is incorrect
write.csv(y, "foo_out.csv")
x <- read.csv(file = "foo.csv", header = T, row.names = 1) #this will read the first column as the row names.
Play around with read/write.csv, but it might be worth while to move into the more generic functions read.table and write.table. They offer expanded functionality.
To read a csv in the generic function
y <- read.table(file = "foo.csv", sep = ",", header = TRUE)
thus you can specify the delimiter and easily read in excel spreadsheets (separated by tab or "\t") or space delimited files ( " " ).
Hope that helps.

Related

R / openxlsx / Finding the first non-empty cell in Excel file

I'm trying to write data to an existing Excel file from R, while preserving the formatting. I'm able to do so following the answer to this question (Write from R into template in excel while preserving formatting), except that my file includes empty columns at the beginning, and so I cannot just begin to write data at cell A1.
As a solution I was hoping to be able to find the first non-empty cell, then start writing from there. If I run read.xlsx(file="myfile.xlsx") using the openxlsx package, the empty columns and rows are automatically removed, and only the data is left, so this doesn't work for me.
So I thought I would first load the worksheet using wb <- loadWorkbook("file.xlsx") so I have access to getStyles(wb) (which works). However, the subsequent command getTables returns character(0), and wb$tables returns NULL. I can't figure out why this is? Am I right in that these variables would tell me the first non-empty cell?
I've tried manually removing the empty columns and rows preceding the data, straight in the Excel file, but that doesn't change things. Am I on the right path here or is there a different solution?

As suggested by Stéphane Laurent, the package tidyxl offers the perfect solution here.
For instance, I can now search the Excel file for a character value, like my variable names of interest ("Item", "Score", and "Mean", which correspond to the names() of the data.frame I want to write to my Excel file):
require(tidyxl)
colnames <- c("Item","Score","Mean")
excelfile <- "FormattedSheet.xlsx"
x <- xlsx_cells(excelfile)
# Find all cells with character values: return their address (i.e., Cell) and character (i.e., Value)
chars <- x[x$data_type == "character", c("address", "character")]
starting.positions <- unlist(
chars[which(chars$character %in% colnames), "address"]
)
# returns: c(C6, D6, E6)

Importing a .csv with headers that show up as first row of data

I'm importing a csv file into R. I read a post here that said in order to get R to treat the first row of data as headers I needed to include the call header=TRUE.
I'm using the import function for RStudio and there is a Code Preview section in the bottom right. The default is:
library(readr)
existing_data <- read_csv("C:/Users/rruch/OneDrive/existing_data.csv")
View(existing_data)
I've tried placing header=TRUE in the following places:
read_csv(header=TRUE, "C:/Users...)
existing_data.csv", header=TRUE
after 2/existing_data.csv")
Would anyone be able to point me in the right direction?

You should use col_names instead of header. Try this:
library(readr)
existing_data <- read_csv("C:/Users/rruch/OneDrive/existing_data.csv", col_names = TRUE)
There are two different functions to read csv files (actually far more than two): read.csv from utils package and read_csv from readr package. The first one gets header argument and the second one col_names.
You could also try fread function from data.table package. It may be the fastest of all.
Good luck!

It looks like there is one variable name that is correctly identified as a variable name (notice your first column). I would guess that your first row only contains the variable "Existing Product List", and that your other variable names are actually contained in the second row. Open the file in Excel or LibreOffice Calc to confirm.
If it is indeed the case that all of the variable names you've listed (including "Existing Product List") are in the first row, then you're in the same boat as me. In my case, the first row contains all of my variables, however they appear as both variable names and the first row of observations. Turns out the encoding is messed up (which could also be your problem), so my solution was simply to remove the first row.
library(readr)
mydat = read_csv("my-file-path-&-name.csv")
mydat = mydat[-1, ]

R- import CSV file, all data fall into one (the first) column

I'm new, and I have a problem:
I got a dataset (csv file) with the 15 columns and 33,000 rows.
When I view the data in Excel it looks good, but when I try to load the data
into R- studio I have a problem:
I used the code:
x <- read.csv(file = "1energy.csv", head = TRUE, sep="")
View(x)
The result is that the columnnames are good, but the data (row 2 and further) are
all in my first column.
In the first column the data is separated with ; . But when i try the code:
x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=";")
The next problem is: Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
So i made the code:
x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=";", row.names = NULL)
And it looks liked it worked.... But now the data is in the wrong columns (for example, the "name" column contains now the "time" value, and the "time" column contains the "costs" value.
Does anybody know how to fix this? I can rename columns but i think that is not the best way.

Excel, in its English version at least, may use a comma as separator, so you may want to try
x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=",")
I once had a similar problem where header had a long entry that contained a character that read.csv mistook for column separator. In reality, it was a part of a long name that wasn’t quoted properly.
Try skipping header and see if the problem persists
x1 <- read.csv(file = "1energy.csv", skip = 1, head = FALSE, sep=";")
In reply to your comment:
Two things you can do. Simplest one is to assign names manually:
myColNames <- c(“col1.name”,”col2.name”)
names(x1) <- myColNames
The other way is to read just the name row (the first line in your file)
read only the first line, split it into a character vector
nameLine <- readLines(con="1energy.csv", n=1)
fileColNames <- unlist(strsplit(nameLine,”;”))
then see how you can fix the problem, then assign names to your x1 data frame. I don’t know what exactly is wrong with your first line, so I can’t tell you how to fix it.
Yet another cruder option is to open your csv file using a text editor and edit column names.

It happens because of Exel's specifics. The easy solution is just to copy all your data Ctrl+C to Notepad and Save it again from Notepad as filename.csv (don't forget to remove .txt if necessary). It worked well for me. R opened this newly created csv file correctly, all data was separated at columns right.

Open your file in text edit and see if it really is separated with commas...
Sometimes .csv files are separated with tabs instead of commas or semicolon and when opening in excel it has no problem but in R you have to specify the separator like this:
x <- read.csv(file = "1energy.csv", head = TRUE, sep="\t")
I once had the same problem, this was my solution. Hope it works for you.

This problem can arise due to regional settings on the excel application where the .csv file was created.
While in most places a "," separates the columns in a COMMA separated file (which makes sense), in other places it is a ";"
Depending on your regional settings, you can experiment with:
x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=",") #used in North America
or,
x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=";") #used in some parts of Asia and Europe

You could use -
df <- read.csv("filename.csv", sep = ";", quote = "")
It solved one my problems similar to yours.

So i made the code:
x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=";", row.names =
NULL) And it looks liked it worked.... But now the data is in the
wrong columns (for example, the "name" column contains now the "time"
value, and the "time" column contains the "costs" value.
Does anybody know how to fix this? I can rename columns but i think
that is not the best way.
I had the exact same issue. Did quite some research and found out, that the CSV was ill-formed.
In the header line of the CSV there were all the labels (separated by the separator) and then a line break.
Starting with line 2, there was an additional separator at the end of each line. So an example of such an ill-formed CSV file looks like this:
Field1;Field2 <-- see the *missing* semicolon at the end
12;23; <-- see the *trailing* semicolon in each of the data lines
34;67;
45;56;
Such ill-formatted files are even harder to spot for TAB-separated files.
Excel does not care about that, when importing CSV files.
But R does care.
When you use skip=1 you skip the header line that contains part of the mismatch. The data frame will be imported well, but there will be a column of "NA" at the end of each row. And obviously you will not have column names, as these were skipped.
Easiest solution: edit the CSV file and either add an additional separator at the end of the header line as well, or remove the trailing delimiters in the data lines. You can also use generic read and write functions in R for text files to automate that editing.

You can transform the data by arranging the data into many cells corresponding to columns.
1.Open your csv file
2.copy the content and paste it into txt file save and copy its content
3.open new excell file
4.in excell go to the section responsible for data . it is acually called "Data"
5.then on the left side go to external data query , in german "externe Daten abfragen"
6.go ahead step by step and seperate by commas
7. save your file as csv

I had the same problem and it was frustrating...
However, I found the ultimate solution
First take this (csv file) and then convert it online to Json file and download it ... then redo the whole thing backwards (re-convert Jason to csv) online... download the converted file... give it a name...
then put it on your Rstudio
file name <- read.csv(file='name your file.csv')
... took me 4 days to think out of the box... 🙂🙂🙂

Read text file using R into one column

I'm using:
x<-read.table(file,sep="")
in order to read a space-delimited numbers from a .txt file, but I receive the data back in multiple columns because the text file contains multiple lines (whose data is no different but of the same type).
How can I read all of the numbers in the different lines into one column only?

You can use ?scan:
x <- scan(file, what = "numeric")
or something simiar, depending on the structure of your file, should work. You might need to check / adjust the sep parameter.
Description of scan:
Read data into a vector or list from the console or file.
If you want x as a column in a data.frame, you can do
dat <- data.frame(x)
afterwards.

Dropping extra column in Tab delim file

I tried to merge different tab delim files into single file using the following R command.
If you observe, I even save the file using write.table command. Now i need to read the same files for further analysis. The biggest problem I am facing is that there is an extra column without any column name created automatically. If you observe that there is a column (Red colour) created automatically when I use the write.table function.
I want to get rid of that column as it hampers all further calculations.
combine=function(file) {
split_list <- unlist(strsplit(file,split=","))
setwd("D:/combine")
dataset <- do.call("cbind",lapply(split_list,FUN=function(files) { read.table(files,header=TRUE, sep="\t") } ) )
names(dataset)[1]=paste("Probe_ID")
drop=c("ProbeID")
dataset=dataset[,!(names(dataset)%in%drop)]
dataset$X=NULL
write.table(dataset,file="D:/output/illumina.txt",sep="\t",col.names=NA)
return ("illumina.txt")
}

Use the argument row.names=FALSE in write.table.

As #James says -- or use row.names=1 in read.table() to indicate that the first column designates the row identifiers of the table when reading the table back into R.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex