Skip lines containing only commas in csv files - r

How to read csv files so that lines containing only commas are skipped?
Skip all leading empty lines in read.csv
covers couple ideas for the case that these lines are in the beginning of the file, but what about a more generic solution?

I am not sure about the performance but you can do this with the help of readLines(), grepl() and writeLines() function:
Assuming input file is A.csv and the output CSV file (though not necessary)which will not contain commas be B.csv.
test <- readLines('A.csv')
test2 <- test[!grepl(",",test)]
Either you can use test2 variable or save that to B.csv:
writeLines(test2,'B.csv')

Related

how to write a file so that it does not have commas and remains in same format in R

I am reading file in R:
data <- read.delim ("file.fas", header=TRUE, sep="\t" )
However, after I have done some manipulations to the data, the output format is not same. It now contains commas "," like this all over.
write.table(x= data, file = "file_1.fas")
How can I avoid this? Maybe I should use some different function to write a file?

Assigning `comment()` to an R object from a .txt file

I am trying to assign a comment to a data frame to store some relevant metadata. I have an unstructured text file wrapped in quote marks, with several line breaks ('\n').
WHO_comment<-read.table(file="WHO comment.txt", sep="\t")
comment(WHO)<-WHO_comment #Read in the comment from .txt due to its length
cat(comment(WHO)) #Database metadata
However, the readout comes in one large block with '\n' read as literal strings. Converting it with as.character() only returns the row name (i.e. '1').
How can I read in this file correctly?
read.table is the wrong function to read a text file. As the name suggests, its purpose is to read tabular data. To read a text file, use readLines, and then paste the individual lines together:
comment(data) = paste(readLines('WHO comment.txt'), collapse = '\n')
Solved it - I need to use stringsAsFactors=FALSE to read the file in correctly. This code now does what I wanted it to, which is assign a comment from a .txt file.
WHO_comment<-read.table(file="WHO comment.txt", sep="\t",stringsAsFactors=FALSE)
comment(WHO)<-WHO_comment #Read in the comment from .txt due to its length
cat(comment(WHO)) #Database metadata

Loading csv into R with `sep=,` as the first line

The program I am exporting my data from (PowerBI) saves the data as a .csv file, but the first line of the file is sep=, and then the second line of the file has the header (column names).
Sample fake .csv file:
sep=,
Initiative,Actual to Estimate (revised),Hours Logged,Revised Estimate,InitiativeType,Client
FakeInitiative1 ,35 %,320.08,911,Platform,FakeClient1
FakeInitiative2,40 %,161.50,400,Platform,FakeClient2
I'm using this command to read the file:
initData <- read.csv("initData.csv",
row.names=NULL,
header=T,
stringsAsFactors = F)
but I keep getting an error that there are the wrong number of columns (because it thinks the first line tells it the number of columns).
If I do header=F instead then it loads, but then when I do names(initData) <- initData[2,] then the names have spaces and illegal characters and it breaks the rest of my program. Obnoxious.
Does anyone know how to tell R to ignore that first line? I can go into the .csv file in a text editor and just delete the first line manually before I load it each time (if I do that, everything works fine) but I have to export a bunch of files and this is a bit stupid and tedious.
Any help would be much appreciated.
There are many ways to do that. Here's one:
all_content = readLines("initData.csv")
skip_first_line = all_content[-1]
initData <- read.csv(textConnection(skip_first_line),
row.names=NULL,
header=T,
stringsAsFactors = F)
Your file could be in a UTF-16 encoding. See hrbrmstr's answer in how to read a UTF-16 file:

R how to read a .csv file with different separators

ItemID,Sentiment,SentimentSource,SentimentText
1,0,Sentiment140, ok thats it you win.
2,0,Sentiment140, i think mi bf is cheating on me!!! T_T
3,0,Sentiment140," I'm completely useless rt now. Funny, all I can do is twitter. "
How would you read a csv file like this into R?
Read a csv with read.csv(). You can specify sep="" to be whatever you need it to be. But as noted below, , is the default value for the separator.
R: Data Input
For example, csv file with comma as separator to a dataframe, manually choosing the file:
df <- read.csv(file.choose())

Saving a txt file as a delimited csv file in R

I have the following code to read a file and save it as a csv file, I remove the first 7 lines in the text file and then the 3rd column as well, since I just require the first two columns.
current_file <- paste("Experiment 1 ",i,".cor",sep="")
curfile <- list.files(pattern = current_file)
curfile_data <- read.table(curfile, header=F,skip=7,sep=",")
curfile_data <- curfile_data[-grep('V3',colnames(curfile_data))]
write.csv(curfile_data,curfile)
new_file <- paste("Dev_C",i,".csv",sep="")
new_file
file.copy(curfile, new_file)
The curfile thus hold two column variables V1 and V2 along with the observation number column in the beginning.
Now when I use file.copy to copy the contents of the curfile into a .csv file and then open the new .csv file in Excel, all the data seems to be concatenated and appear in a single column, is there a way to show each of the individual columns separately? Thanks in advance for your suggestions.
The data in the .txt file looks like this,
"","V1","V2","V3"
"1",-0.02868862,5.442283e-11,76.3
"2",-0.03359281,7.669754e-12,76.35
"3",-0.03801883,-1.497323e-10,76.4
"4",-0.04320051,-6.557672e-11,76.45
"5",-0.04801207,-2.557059e-10,76.5
"6",-0.05325544,-9.986231e-11,76.55
You need to use Text to columns feature in Excel, selecting comma as a delimiter. The position of this point in menu depends on version of Excel you are using.

Resources