Is there a problem with read.table when using # as seperator? - r

I'm trying to read in a text file (via read.table()) which looks like this:
1#2#3
4#5#6
7#8#9
The "#" should be used as the field's seperator.
My code:
test_data <- read.table("a_test_file.csv", sep= "#")
I'm only able to read in the first column (the values 1,4,7). Is there something I don't see or any ideas concerning a workaround?
Edit: I just realize that # is used to insert coments which are not treated as code text. Could it be that this sign is kind of 'locked' for any other purposes than creating coments?

Set comment.char to something else, e.g.:
test_data <- read.table("a_test_file.csv", sep = "#", comment.char = "")

read.table comes with an comment.char argument set to # by default. This is why it will discard everything after the first #
read.table("a_test_file.csv", sep="#",comment.char="")
will do the job

Related

Problems with displaying .txt file (delimiters)

I have a problem with one task where I have to load some data set, and I have to make sure that missing values are read in properly and that column names are unambiguous.
The format of .txt file:
At the end, data set should contain only country column and median age.
I tried using read.delim, precisely this chunk:
rawdata <- read.delim("rawdata_343.txt", sep = "", stringsAsFactors = FALSE, header = TRUE)
And when I run it, I get this:
It confuses me that if country has multiple words (Turks and Caicos Islands) it assigns every word to another column.
Since I am still a beginner in R, any suggestion would be very helpful for me. Thanks!
Three points to note about your input file: (1) the first two lines at the top are not tabular and should be skipped with skip = 2, (2) your column separators are tabs and this should be specified with sep = "\t", and (c) you have no headers, so header = FALSE. Your command should be: -
rawdata <- read.delim("rawdata_343.txt", sep = "\t", stringsAsFactors = FALSE, header = FALSE, skip = 2)
UPDATE: A fourth point is that the first column includes row numbers, so row.names = 1. This also addresses the follow-up comment.
rawdata <- read.delim("rawdata_343.txt", sep = "\t", stringsAsFactors = FALSE, header = FALSE, skip = 2, row.names = 1)
It looks like your delimiter that you are specifying in the sep= argument is telling R to consider spaces as the column delimiter. Looking at your data as a .txt file, there is no apparent delimiter (like commas that you would find in a typical .csv). If you can put the data in a tabular form in something like a .csv or .xlsx file, R is much better at reading that data as expected. As it is, you may struggle to get the .txt format to read in a tabular fashion, which is what I assume you want.
P.s. you can use read.csv() if you do end up putting the data in that format.

R csv export with specific decimal places produces output with unwanted spaces

After a couple of steps I am trying to export a csv file with given separators and decimal places in R.
I do calculations like:
Data_Table$`field_output` <- ifelse(Data_Table$`field_1` > 0,
Data_Table$`field_2`,
Data_Table$`field_3`
)
I tried two types of formatting:
Data_Table$`field_output` <- format(round(Data_Table$`field_output`, 2), nsmall = 2)
and
Data_Table$`field_output` <- formatC(Data_Table$`field_output`, digits=2, format="f")
then export:
write.table(Data_Table, file=paste("./output/filename_", datestring, ".csv"), quote=FALSE, sep=";", eol = "\n", dec=".", row.names=FALSE, col.names=TRUE)
My problem is the output
It produces right aligned columns with extra spaces/characters leading the output field
datestring;ID;UOM;field_output
20160831;1;kWh;100628610.00
20160831;2;kWh; 1800000.00
20160831;3;kWh; 252.00
20160831;4;kWh; 0.00
20160831;5;kWh; 0.00
Is there a way to get rid of those, and have a format like this:
datestring;ID;UOM;field_output
20160831;1;kWh;100628610.00
20160831;2;kWh;1800000.00
20160831;3;kWh;252.00
20160831;4;kWh;0.00
20160831;5;kWh;0.00
Thank you for your replies in advance! And excuse me if it's something trivial, or already answered!
Use gsub to strip whitespaces when writing the file, i.e.
write.table(gsub(" ", "", Data_Table), ...)
As you want to write a csv-file, maybe try write.csv. For me it worked, when openening it in Excel.
,"datestring","ID","UOM","field_output"
1,20160831,1,"kWh",100628610
2,20160831,2,"kWh",1800000
3,20160831,3,"kWh",252
4,20160831,4,"kWh",0
5,20160831,5,"kWh",0
During the trial it added a extra line of id's with reading in by setting row.names=ID I could have avoided it or by setting row.names=FALSE in the write.csv() function.

How do I write a csv file in R, where my input is written to the file as row?

This is a very simple issue and I'm surprised that there are no examples online.
I have a vector:
vector <- c(1,1,1,1,1)
I would like to write this as a csv as a simple row:
write.csv(vector, file ="myfile.csv", row.names=FALSE)
When I open up the file I've just written, the csv is written as a column of values.
It's as if R decided to put in newlines after each number 1.
Forgive me for being ignorant, but I always assumed that the point of having comma-separated-values was to express a sequence from left to right, of values, separated by commas. Sort of like I just did; in a sense mimicking the syntax of written word. Why does R cling so desperately to the column format when a csv so clearly should be a row?
All linguistic philosophy aside, I have tried to use the transpose function. I've dug through the documentation. Please help! Thanks.
write.csv is designed for matrices, and R treats a single vector as a matrix with a single column. Try making it into a matrix with one row and multiple columns and it should work as you expect.
write.csv(matrix(vector, nrow=1), file ="myfile.csv", row.names=FALSE)
Not sure what you tried with the transpose function, but that should work too.
write.csv(t(vector), file ="myfile.csv", row.names=FALSE)
Here's what I did:
cat("myVar <- c(",file="myVars.r.txt", append=TRUE);
cat( myVar, file="myVars.r.txt", append=TRUE, sep=", ");
cat(")\n", file="myVars.r.txt", append=TRUE);
this generates a text file that can immediately be re-loaded into R another day using:
source("myVars.r.txt")
Following up on what #Matt said, if you want a csv, try eol=",".
I tried with this:
write.csv(rbind(vector), file ="myfile.csv", row.names=FALSE)
Output is getting written column wise, but, with column names.
This one seems to be better:
write.table(rbind(vector), file = "myfile.csv", row.names =FALSE, col.names = FALSE,sep = ",")
Now, the output is being printed as:
1 1 1 1 1
in the .csv file, without column names.
write.table(vector, "myfile.csv", eol=" ", row.names=FALSE, col.names=FALSE)
You can simply change the eol to whatever you want. Here I've made it a space.
You can use cat to append rows to a file. The following code would write a vector as a line to the file:
myVector <- c("a","b","c")
cat(myVector, file="myfile.csv", append = TRUE, sep = ",", eol = "\n")
This would produce a file that is comma-separated, but with trailing commas on each line, hence it is not a CSV-file.
If you want a real CSV-file, use the solution given by #vamosrafa. The code is as follows:
write.table(rbind(myVector), file = "myfile.csv", row.names =FALSE, col.names = FALSE,sep = ",", append = TRUE)
The output will be like this:
"a","b","c"
If the function is called multiple times, it will add lines to the file.
One more:
write.table(as.list(vector), file ="myfile.csv", row.names=FALSE, col.names=FALSE, sep=",")
fwrite from data.table package is also another option:
library(data.table)
vector <- c(1,1,1,1,1)
fwrite(data.frame(t(vector)),file="myfile.csv",sep=",",row.names = FALSE)

Read tab delimited file with unusual characters, then write an exact copy

The problem
I have a tab delimited input file that looks like so:
Variable [1] Variable [2]
111 Something
Nothing 222
The first row represents column names and the two next rows represents column values. As you can see, the column names includes both spaces and some tricky signs.
Now, what I want to do is to import this file into R and then output it again to a new text file, making it look exactly the same as the input. For this purpose I have created the following script (assuming that the input file is called "Test.txt"):
file <- "Test.txt"
x <- read.table(file, header = TRUE, sep = "\t")
write.table(x, file = "TestOutput.txt", sep = "\t", col.names = TRUE, row.names = FALSE)
From this, I get an output that looks like this:
"Variable..1." "Variable..2."
"1" "111" "Something"
"2" "Nothing" "222"
Now, there are a couple of problems with this output.
The "[" and "]" signs have been converted to dots.
The spaces have been converted to dots.
Quote signs have appeared everywhere.
How can I make the output file look exactly the same as the input file?
What I've tried so far
Regarding problem number one and two, I've tried specifying the column names through creating an internal vector, c("Variable [1]", "Variable [2]"), and then using the col.names option for read.table(). This gives me the exact same output. I've also tried different encodings, through the encoding option for table.read(). If I look at the internally created vector, mentioned above, it prints the variable names as they should be printed so I guess there is a problem with the conversion between the "text -> R" and the "R -> text" phases of the process. That is, if I look at the data frame created by read.table() without any internally created vectors, the column names are wrong.
As for problem number three, I'm pretty much lost and haven't been able to figure out what I should try.
Given the following input file as test.txt:
Variable [1] Variable [2]
111 Something
Nothing 222
Where the columns are tab-separated you can use the following code to create an exact copy:
a <- read.table(file='test.txt', check.names=F, sep='\t', header=T,
stringsAsFactors=F)
write.table(x=a, file='test_copy.txt', quote=F, row.names=F,
col.names=T, sep='\t')

read.table creates too few rows, but readLines has the right number

I am trying to import a tab separated list into R.
It is 81704 rows long. However, read.table is only creating 31376. Here is my code:
population <- read.table('population.txt', header=TRUE,sep='\t',na.strings = 'NA',blank.lines.skip = FALSE)
There are no # commenting anything out.
Here are the first few lines:
[1] "NAME\tSTATENAME\tPOP_2009" "Alabama\tAlabama\t4708708" "Abbeville city\tAlabama\t2934" "Adamsville city\tAlabama\t4782"
[5] "Addison town\tAlabama\t711"
When I read it raw, readLines gives the right number.
Any ideas are much appreciated!
Difficult to diagnose without seeing the input file, but the usual suspects are quotes and comment characters (even if you think there are none of the latter). You can try:
quote = "", comment.char = ""
as arguments to read.table() and see if that helps.
Check with count.fields what's in file:
n <- count.fields('population.txt', sep='\t', blank.lines.skip=FALSE)
Then you could check
length(n) # should be 81705 (it count header so rows+1), if yes then:
table(n) # show you what's wrong
Then you readLines your file and check rows with wrong number of fields. (e.g. x<-readLines('population.txt'); head(x[n!=6]))

Resources