Tab Delimited Data - r

Since I'm not able to find/decipher solutions to my problem I'll try asking.
Up until now I've only worked with other people's data (.csv files) in RStudio but this time around it's my own data, which I entered into Excel. All entries are tab-delimited, and I wouldn't enter data into Excel any other way. After some googling it seems like R and .xlsx files aren't best friends, so I saved my file in various other formats, .csv one of them. Thus I have tab-delimited .csv file.
The problem with loading tab-delimited .csv files also features here, but my problem is not with "reading some of the numeric variables in as factors" (whatever that means), but that data is loaded as semi-colon separated in R:
data <- read.table(file.choose(), sep="\t", header=T)
View(data)
Date.Miles.Time
2015-08-10;5;45
Apart from improper formatting there are 29 observations rather than 24; the last five entries are just ;;. Again, my problem is not the same as in the link but I figure there's no harm in trying Justin's suggestion in the answer, i.e. options(stringsAsFactors=FALSE) and then running the above again, but it achieves nothing.
read.csv() and read.delim() yield the same result. Any suggestions?

Related

Writing to a CSV file producing errors

I am using R to analyze some text data. After doing some aggregation, I had a new dataframe I wanted to write to a csv file, so I can use it in other analyses. The dataframe looks correct in r- it only has 2 columns with text data- but once I write the csv and open it, the text is scattered across different columns. Here is the code I was using:
write.csv(new_df, "4.19 Group 1_agg by user try 2.csv")
I tried adding in an extra bit of code to specify that it should be using UTF-8, since I've heard this could be an encoding error, so the code then looked like this:
write.csv(new_df, "4.19 Group 1_agg by user try 2.csv", fileEncoding = "UTF-8")
I also tried reading in the file differently (using fread instead of read.csv)
Still, the csv file looks wrong/messy in many places. Here is what it should look like:
This is what it looks like currently:
Again, I think the error must be in writing the csv file, because everything looks good in R when I check it using names and head. Any help is appreciated, thank you!

When I upload my excel file to R, the column titles are in the rows and the data seems all jumbled. How do I fix this?

hi literally day one new coder
On the excel sheet, my data looks organized, but when I upload my file to R, it's not able to read the excel properly and the column headers are in the rows and the data seems randomized.
So far I have tried:
library(readxl)
dataset <-read_excel("pathname")
View(dataset)
Also tried:
dataset <-read_excel("pathname", sheet=1, colNames=TRUE)
Also tried to use the package openxlsx
but nothing is giving me the correct, organized data set.
I tried formatting my Excel to a CSV file, and the CSV file looks exactly like the data that shows up on R (both are messed up).
How should I approach this problem?
I deal with importing .xlsx into R frequently. It can be challenging due to the flexibility of the excel platform. I generally use readxl::read_xlsx() to fetch data from .xlsx files. My suggestions:
First, specify exactly the data you want to import with the range argument.
A cell range to read from, as described in cell-specification. Includes typical Excel
ranges like "B3:D87", possibly including the sheet name like "Budget!B2:G14"
Second, if there are there merged cells or other formatting challenges in column headers, I resort to setting col_names = FALSE. And supplying clean names after import with names(df) <- c("first_col", "second_col")
Third, if there are merged cells elsewhere in the spreadsheet I generally I resort to "fixing" them in excel (not ideal but easier for my use case), however, others may have suggestions on a programmatic fix.
It may be helpful to provide a screenshot of your spreadsheet.

R misreading csv files after modifications on Excel

This is more of a curiosity.
Sometimes I modify csv files from Excel rather than R (suppose I manage to find a missing piece of info and I type it in the csv file), of course maintaining commas and quotes as they were.
Every time I do this, R becomes unable to read the csv file, i.e. it imports a single column as it appears on Excel, rather than separating the values (no options like sep= or quote= change this).
Does anyone know why this happens?
Thanks a lot
An example
This was readable:
state,"city","county"
AK,"Anchorage",""
AK,"Haines",""
AK,"Juneau","Juneau"
After adding the missing info under "county", R fails to import it as a data frame, reading it instead as a single vector.
state,"city","county"
AK,"Anchorage","Anchorage"
AK,"Haines","Haines"
AK,"Juneau","Juneau"
Edit:
I'm just running the basic read.csv
df <- read.csv("C:/directory/df.csv")

The woes of endless columns in .csv data in R

So I have a bunch of .csv files that were output by a simulation. I'm writing an R script to run through them and make a histogram of a column in each .csv file. However, the .csv is written in such a way that R does not like it. When I was testing it, I had been originally opening the files in Excel and apparently this changed the format to one R liked. Then when I went back to run the script on the entire folder I discovered that R doesn't like the format.
I was reading the data in as:
x <- read.csv("synch-imit-characteristics-2-tags-2-size-200-cost-0.1run-2-.csv", strip.white=TRUE)
Error in read.table(test, strip.white = TRUE, header = TRUE) :
more columns than column names
Investigating I found that the original .csv file, which R does not like, looks different than after the test one I opened with excel. I copied and pasted the first bit below after opening it in notepad:
cost,0.1
mean-loyalty, mean-hospitality
0.9885449527316088, 0.33240076252915735
weight,1 of p1, 2 of p1,
However, in notepad, there is no apparent formatting. In fact, between rows there is no space at all, ie it is cost,0.1mean-loyalty,mean-hospitality0.988544, etc. So it is weird to me as well that when I cope and paste it from notepad it gets the desired formatting as above. Anyway, moving on, after I had opened it in excel it got transferred to this"
cost,0.1,,,,,,,,
mean-loyalty, mean-hospitality,,,,,,,,
0.989771257,0.335847092,,,,,,,,
weight,1 of p1, etc...
So it seems like the data originally has no separation between rows (but I don't know how excel figures it out, or copying and pasting it) but R doesn't pick up on this. Instead, it views it all as one row (and since I have 40,000+ rows, it doesn't have that many columns). I don't want to have to open and save every file in excel. Is there a way to get R to read the data as desired?
Since when I copy and paste it from notepad it had new lines for the rows, it seems like I just need R to read it knowing that commas separate columns on the same row and a return separates rows. I tried messing around with all the sep="" commands I could find. But I can't figure it out.
To first solve the Notepad issue:
You must have CR (carriage return, \r) characters between the lines (and no LF, \n characters, causing Notepad to see it as one line).
Some programs accept this as well as a new line character, some don't.
You can for example use Notepad++ to replace all '\r' with '\n' or '\r\n', using Replace wih the "Extended" option. First select View > Show Symbol > Show all characters, so see what you are doing.
Finally, to get back to R:
(As it was pointed out, R can actually handle CR as a newline)
read.csv assumes that you have non-empty header names in the first row, but instead you have:
cost,0.1
while later in the data you have a row with more than just two columns:
weight,1 of p1, 2 of p1,
This means that not all columns have a header name (and I wonder if 0.1 was supposed to be a header name anyway).
The two solutions can be:
add a header including all columns, or
as it was pointed out in a comment use header=F.

writing to a tab-delimited file or a csv file

I have a RMA normalized data ( from the CEL files ) and would like to write it into a file that I could open in excel but have some problems.
library(affy)
cel <- ReadAffy()
pre<-rma(cel)
write.table(pre, file="norm.txt", sep="\t")
write.table(pre, file="norma.txt")
The outut is arranged row-wise in the text file that is written using the above command and hence when exported to excel it is in a wrong form and many of the information is cut off as the maximum rows are used up .The output looks the following way :
GSM 133971.CEL 5.85302 3.54678 6.57648 9.45634
GSM 133972.CEL 4.65784 3.64578 3.54213 7.89566
GSM 133973.CEL 6.78543 3.54623 2.54345 7.89767
How to write it in a proper format from CEL files in R to a notepad or excel ?
You need to extract the values from the normalised probes using the exprs function. Something like:
write.csv(exprs(pre), file="output.csv", row.names=FALSE)
should do the trick.
I'm not totally clear about what the problem is, what do you mean by "proper format"? Excel will struggle with huge tables and doing your analysis in R with Bioconductor is likely a better way to go, you could then export a smaller results or summary table to excel.
Nevertheless, if you want the file written columnwise, you could try:
write.csv(t(pre),file="norm.txt")
But excel (at least used to) allow many more rows than columns.

Resources