Number format, writing 1e-5 instead of 0.00001 - r

I've used read.table to read a file that contains numbers such as 0.00001
when I write them back with write.table those numbers appear as 1e-5
How can I keep the old format?

I would just change the scipen option before calling write.table. Note that this will also change how numbers are displayed when printing to the console.
options(scipen=10)
write.table(foo, "foo.txt")
options(scipen=0) # restore the default

You can do this by converting your numbers to strings with formatting as you require, then using the argument quote = FALSE in the call to write.table.
dfr <- data.frame(x = 10^(0:15))
dfr$y <- format(dfr$x, scientific = FALSE)
write.table(dfr, file = "test.txt", quote = FALSE)
Note that you shouldn't need to change the format of the numbers in your file. Pretty much every piece of scientific software and every spreadsheet understands scientific notation for numbers, and also has number formatting options so you can view them how you choose.

If the input is a mixture of scientific notation and explicit notation numbers, then you will be writing your own parser to read in the numbers and keep track of which ones were in which formats. In fact, you'll want to keep a string representation of those numbers lying around so you can write back exactly what was in the input.
However, if you just want to write.table() with consistently explicit notation, try.
write.table(format(_your_table_here_, scientific=FALSE), ...)

For maximum control loop over all rows and print them to a text file formatted with sprintf
# Find number of rows in data.frame test
nrows <- dim(test)[1]
# init a new vector
mylines <- vector("character",dim(test)[1])
# loop over all rows in dataframe
for(i in 1:nrows){
# Print out exactly the format you want
mylines[i] <- sprintf("Line %d: %.2f\t%.2f",1,test[i,"x"],test[i,"y")
}
# write lines to file
writeLines(mylines,"out.txt")

Related

Outputting an R dataframe to a .txt file - Align positive and negative values

I am trying to output a dataframe in R to a .txt file. I want the .txt file to ultimately mirror the dataframe output, with columns and rows all aligned. I found this post on SO which mostly gave me the desired output with the following (now modified) code:
gene_names_only <- select(deseq2_hits_table_df, Gene, L2F)
colnames(gene_names_only) <- c()
capture.output(
print.data.frame(gene_names_only, row.names=F, col.names=F, print.gap=0, quote=F, right=F),
file="all_samples_comparison_gene_list.txt"
)
The resultant output, however, does not align negative and positive values. See:
I ultimately want both positive and negative values to be properly aligned with one another. This means that -0.00012 and 4.00046 would have the '-' character from the prior number aligned with the '4' of the next character. How could I accomplish this?
Two other questions:
The output file has a blank line at the beginning of the output. How can I change this?
The output file also seems to put far more spaces between the left column and the right column than I would want. Is there any way I can change this?
Maybe try a finer scale treatment of the printing using sprintf and a different format string for positive and negative numbers, e.g.:
> df = data.frame(x=c('PICALM','Luc','SEC22B'),y=c(-2.261085123,-2.235376098,2.227728912))
> sprintf('%15-s%.6f',df$x[1],df$y[1])
[1] "PICALM -2.261085"
> sprintf('%15-s%.6f',df$x[2],df$y[2])
[1] "Luc -2.235376"
> sprintf('%15-s%.7f',df$x[3],df$y[3])
[1] "SEC22B 2.2277289"
EDIT:
I don't think that write.table or similar functions accept custom format strings, so one option could be to create a data frame of formatted strings and the use write.table or writeLines to write to a file, e.g.
dfstr = data.frame(x=sprintf('%15-s', df$x),
y=sprintf(paste0('%.', 7-1*(df$y<0),'f'), df$y))
(The format string for y here is essentially what I previously proposed.) Next, write dfstr directly:
write.table(x=dfstr,file='filename.txt',
quote=F,row.names=F,col.names=F)

How to use fread or read_delim in R on characters with no linebreak

I have several .txt files which need to be imported to R as dataframes for some data analysis. One of these files has no EOL in any form, so I'm left wondering how I would go about to import that.
\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\"\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
This is how the first ~500 characters of that .txt file look like. The EOL would need to be placed like this:
\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\"
\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
Normally I would just gsub a "\n" to the places I need it to be, but there is no reoccurring string at the places where I would place a \n, so I don't think that gsub would work in this instance.
Seeing how the missing values are clearly indicated with NA, is there a function similar to read_delim that has a "col_number = x" argument? Like the first x values are the headers, the next x values are the values of the first row and so on and so forth?
If it changes anything, these .txt files are rather big (>300mb).
Big thank you to Julian_Hn. Works like a charm.
I would probably just read this in as a vector and then reformat as matrix with the number of columns you know are in the dataset. This essentially does what you want
str <- "\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\";\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA"
vec <- strsplit(str,";")[[1]]
//EDIT: add byrow = T To stay in the right format. Thanks Yuriy
table <- matrix(vec,ncol=23,nrow=3, byrow = T)
df <- as.data.frame(table)

How can a data frame be transformed into a string with a csv format on R?

I don't want to write a csv into a file, but to get a string representation of the dataframe with a csv format (to send it over the network).
I'm using R.NET, if it helps to know.
If you are not limited to base functions, you may try readr::format_csv.
library(readr)
format_csv(iris[1:2, 1:3])
# [1] "Sepal.Length,Sepal.Width,Petal.Length\n5.1,3.5,1.4\n4.9,3.0,1.4\n"
If you want a single string in csv format, you could capture the output from write.csv.
Let's use mtcars as an example:
paste(capture.output(write.csv(mtcars)), collapse = "\n")
This reads back into R fine with read.csv(text = ..., row.names = 1). You can make adjustments for the printing of row names and other attributes in write.csv.
Alternatively:
write.csv(mtcars, textConnection("output", "w"), row.names=FALSE)
which will create the variable output in the global environment and store it in a character vector.
You can do
paste0(output, collapse="\n")
to make it one big character string, similar to Rich's answer (but paste0() is marginally faster).

Importing csv file into R - numeric values read as characters

I am aware that there are similar questions on this site, however, none of them seem to answer my question sufficiently.
This is what I have done so far:
I have a csv file which I open in excel. I manipulate the columns algebraically to obtain a new column "A". I import the file into R using read.csv() and the entries in column A are stored as factors - I want them to be stored as numeric. I find this question on the topic:
Imported a csv-dataset to R but the values becomes factors
Following the advice, I include stringsAsFactors = FALSE as an argument in read.csv(), however, as Hong Ooi suggested in the page linked above, this doesn't cause the entries in column A to be stored as numeric values.
A possible solution is to use the advice given in the following page:
How to convert a factor to an integer\numeric without a loss of information?
however, I would like a cleaner solution i.e. a way to import the file so that the entries of column entries are stored as numeric values.
Cheers for any help!
Whatever algebra you are doing in Excel to create the new column could probably be done more effectively in R.
Please try the following: Read the raw file (before any excel manipulation) into R using read.csv(... stringsAsFactors=FALSE). [If that does not work, please take a look at ?read.table (which read.csv wraps), however there may be some other underlying issue].
For example:
delim = "," # or is it "\t" ?
dec = "." # or is it "," ?
myDataFrame <- read.csv("path/to/file.csv", header=TRUE, sep=delim, dec=dec, stringsAsFactors=FALSE)
Then, let's say your numeric columns is column 4
myDataFrame[, 4] <- as.numeric(myDataFrame[, 4]) # you can also refer to the column by "itsName"
Lastly, if you need any help with accomplishing in R the same tasks that you've done in Excel, there are plenty of folks here who would be happy to help you out
In read.table (and its relatives) it is the na.strings argument which specifies which strings are to be interpreted as missing values NA. The default value is na.strings = "NA"
If missing values in an otherwise numeric variable column are coded as something else than "NA", e.g. "." or "N/A", these rows will be interpreted as character, and then the whole column is converted to character.
Thus, if your missing values are some else than "NA", you need to specify them in na.strings.
If you're dealing with large datasets (i.e. datasets with a high number of columns), the solution noted above can be manually cumbersome, and requires you to know which columns are numeric a priori.
Try this instead.
char_data <- read.csv(input_filename, stringsAsFactors = F)
num_data <- data.frame(data.matrix(char_data))
numeric_columns <- sapply(num_data,function(x){mean(as.numeric(is.na(x)))<0.5})
final_data <- data.frame(num_data[,numeric_columns], char_data[,!numeric_columns])
The code does the following:
Imports your data as character columns.
Creates an instance of your data as numeric columns.
Identifies which columns from your data are numeric (assuming columns with less than 50% NAs upon converting your data to numeric are indeed numeric).
Merging the numeric and character columns into a final dataset.
This essentially automates the import of your .csv file by preserving the data types of the original columns (as character and numeric).
Including this in the read.csv command worked for me: strip.white = TRUE
(I found this solution here.)
version for data.table based on code from dmanuge :
convNumValues<-function(ds){
ds<-data.table(ds)
dsnum<-data.table(data.matrix(ds))
num_cols <- sapply(dsnum,function(x){mean(as.numeric(is.na(x)))<0.5})
nds <- data.table( dsnum[, .SD, .SDcols=attributes(num_cols)$names[which(num_cols)]]
,ds[, .SD, .SDcols=attributes(num_cols)$names[which(!num_cols)]] )
return(nds)
}
I had a similar problem. Based on Joshua's premise that excel was the problem I looked at it and found that the numbers were formatted with commas between every third digit. Reformatting without commas fixed the problem.
So, I had the similar situation here in my data file when I readin as a csv. All the numeric value were turned into char. But in my file there was a value with a word "Filtered" instead of NA. I converted "Filtered" to NA in vim editor of linux terminal with a command <%s/Filtered/NA/g> and saved this file and later used it and read it in R, all the values were num type and not char type any more.
Looks like character value "Filtered" was inducing all values to be char format.
Charu
Hello #Shawn Hemelstrand here are the steps in detail below:
example matrix file.csv having 'Filtered' word in it
I opened the file.csv in linux command terminal
vi file.csv
then press "Esc shift:"
and type the following command at the bottom
"%s/Filtered/NA/g"
press enter
then press "Esc shift:"
write "wq" at the bottom (this save the file and quit vim editor)
then in R script I read the file
data<- read.csv("file.csv", sep = ',', header = TRUE)
str(data)
All columns were num type which were earlier char type.
In case you need more help, it would be easier to share your txt or csv file.

Imported a csv-dataset to R but the values becomes factors

I am very new to R and I am having trouble accessing a dataset I've imported. I'm using RStudio and used the Import Dataset function when importing my csv-file and pasted the line from the console-window to the source-window. The code looks as follows:
setwd("c:/kalle/R")
stuckey <- read.csv("C:/kalle/R/stuckey.csv")
point <- stuckey$PTS
time <- stuckey$MP
However, the data isn't integer or numeric as I am used to but factors so when I try to plot the variables I only get histograms, not the usual plot. When checking the data it seems to be in order, just that I'm unable to use it since it's in factor form.
Both the data import function (here: read.csv()) as well as a global option offer you to say stringsAsFactors=FALSE which should fix this.
By default, read.csv checks the first few rows of your data to see whether to treat each variable as numeric. If it finds non-numeric values, it assumes the variable is character data, and character variables are converted to factors.
It looks like the PTS and MP variables in your dataset contain non-numerics, which is why you're getting unexpected results. You can force these variables to numeric with
point <- as.numeric(as.character(point))
time <- as.numeric(as.character(time))
But any values that can't be converted will become missing. (The R FAQ gives a slightly different method for factor -> numeric conversion but I can never remember what it is.)
You can set this globally for all read.csv/read.* commands with
options(stringsAsFactors=F)
Then read the file as follows:
my.tab <- read.table( "filename.csv", as.is=T )
When importing csv data files the import command should reflect both the data seperation between each column (;) and the float-number seperator for your numeric values (for numerical variable = 2,5 this would be ",").
The command for importing a csv, therefore, has to be a bit more comprehensive with more commands:
stuckey <- read.csv2("C:/kalle/R/stuckey.csv", header=TRUE, sep=";", dec=",")
This should import all variables as either integers or numeric.
None of these answers mention the colClasses argument which is another way to specify the variable classes in read.csv.
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "numeric") # all variables to numeric
or you can specify which columns to convert:
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = c("PTS" = "numeric", "MP" = "numeric") # specific columns to numeric
Note that if a variable can't be converted to numeric then it will be converted to factor as default which makes it more difficult to convert to number. Therefore, it can be advisable just to read all variables in as 'character' colClasses = "character" and then convert the specific columns to numeric once the csv is read in:
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "character")
point <- as.numeric(stuckey$PTS)
time <- as.numeric(stuckey$MP)
I'm new to R as well and faced the exact same problem. But then I looked at my data and noticed that it is being caused due to the fact that my csv file was using a comma separator (,) in all numeric columns (Ex: 1,233,444.56 instead of 1233444.56).
I removed the comma separator in my csv file and then reloaded into R. My data frame now recognises all columns as numbers.
I'm sure there's a way to handle this within the read.csv function itself.
This only worked right for me when including strip.white = TRUE in the read.csv command.
(I found the solution here.)
for me the solution was to include skip = 0
(number of rows to skip at the top of the file. Can be set >0)
mydata <- read.csv(file = "file.csv", header = TRUE, sep = ",", skip = 22)

Resources