R output tsv without scientific notation - r

I have this variable:
> output
[[1]]
[1] 1.394082e+19 3.481687e+18 1.829848e+19 1.414608e+19 1.694183e+19 1.394082e+19
[[2]]
[1] 1.569580e+19 1.569580e+19 1.204701e+19 1.600159e+19 6.915915e+18 4.672586e+18 1.095256e+19 1.395906e+19 2.199774e+18
[[3]]
[1] 1.602384e+19 2.610937e+18 1.750534e+19 3.749841e+17 1.602384e+19 1.921356e+18 1.490877e+19 1.858905e+17 9.592238e+18
[[4]]
[1] 1.488400e+19 8.239013e+18 1.397958e+19 1.488400e+19 5.659786e+17 1.235961e+18 1.802728e+19
[[5]]
[1] 1.038415e+19 5.060804e+18 3.892644e+18 1.038415e+19
I want to print to a tsv file without scientific notation. Something like
1.56958e+19 1.56958e+19 1.204701e+19 1.600159e+19 6.915915e+18
4.672586e+18 1.095256e+19 1.395906e+19 2.199774e+18
1.602384e+19 2.610937e+18 1.750534e+19 3.749841e+17 1.602384e+19
1.921356e+18 1.490877e+19 1.858905e+17 9.592238e+18
1.4884e+19 8.239013e+18 1.397958e+19 1.4884e+19 5.659786e+17
but without scientific notation. If I try,
x <- c(1:length(output)))
for ( val in x ) {
write(format(output[[val]],scientific = FALSE),file="test2.txt", append=TRUE,sep="\t")
}
I get new lines for every number:
13940817034235316224
3481686930273376256
18298477684861163520
14146076755830216704
16941832990896044032
13940817034235316224
15695800775901642752
15695800775901642752
Ive tried a few variations and having little experience with R leads me to believe I'm missing something simple?

You need to set up the ncolumns parameter explicitly. According to ?write, the default for ncolumns is:
ncolumns = if(is.character(x)) 1 else 5
Since format converts the numeric vector to a character vector, you always have one column per line; Programmatically you can set the ncolumns equal to the length of vector for each sublist:
for(vec in output) {
write(format(vec, scientific=FALSE), file="test2.txt", append=TRUE, sep="\t", ncolumns=length(vec))
}

Related

How do I use file.path() on a list of subdirectories

I want to add "_quants" to a list of folder names contained in samples$sample. When I use the following:
files <- file.path(dir, "quants", samples$sample, "_quants")
> dir
[1] "E:/ubuntu-shared/salmonTutorial/"
> samples$sample
[1] DRR016125 DRR016126 DRR016127 DRR016128 DRR016129 DRR016130 DRR016131 DRR016132 DRR016133 DRR016134 DRR016135 DRR016136 DRR016137 DRR016138 DRR016139
[16] DRR016140
16 Levels: DRR016125 DRR016126 DRR016127 DRR016128 DRR016129 DRR016130 DRR016131 DRR016132 DRR016133 DRR016134 DRR016135 DRR016136 DRR016137 ... DRR016140
I get:
[1] "E:/ubuntu-shared/salmonTutorial//quants/DRR016125/_quants"
How do I remove the double // and append "_quants" to "DRR016125" using file.path() to get the desired:
[1] "E:/ubuntu-shared/salmonTutorial/quants/DRR016125_quants"
[2] "E:/ubuntu-shared/salmonTutorial/quants/DRR016126_quants"
Solution using base::paste0:
dir <- "E:/ubuntu-shared/salmonTutorial/"
samples <- list(sample = c("DRR016125", "DRR016126", "DRR016127"))
paste0(dir, "quants", samples$sample, "_quants")
[1] "E:/ubuntu-shared/salmonTutorial/quantsDRR016125_quants"
[2] "E:/ubuntu-shared/salmonTutorial/quantsDRR016126_quants"
[3] "E:/ubuntu-shared/salmonTutorial/quantsDRR016127_quants"
paste0 - concatenates vectors (after converting to character), i.e. outputs single string. And as you passed multiple samples it does this for every sample.

how to add strings and numbers into a list from a csv file?

I have a list like the following in R:
data1<-list("A" = 1, "B" = 2, "C" = 3,"D" = 4)
and when I print data1 I have:
$A
[1] 1
$B
[1] 2
$C
[1] 3
$D
[1] 4
I have a csv file with the values:
alt1,alt2,alt3,alt4
appear,certain,dance,example
apply,danger,chance,excellent
where alt1,alt2,... are the headers of the csv.file
I would like to extract the second row from my csv file so that I could get something like data1, I have done the following:
getData=read.csv("test.csv",header=TRUE)
q<-getData[2,]
print(q)
anylist<-list()
anylist[[q[1]]]<-1
anylist[[q[2]]]<-2
anylist[[q[3]]]<-3
anylist[[q[4]]]<-4
print(anylist)
because I need that anylist to have the same structure like data1, I mean if I will have to write directly it would be:
anylist<-list("apply" = 1, "danger" = 2, "chance" = 3,"excellent" = 4)
so when I print anylist I want to print:
$apply
[1] 1
$danger
[1] 2
$chance
[1] 3
$excellent
[1] 4
but I got the error:
Error in anylist[[q[1]]] <- 1 : invalid subscript type 'list'
The following quick function will take a row of data, determine the order, and associate order and name, outputting a list. I believe this is what you wanted, right? If you wanted to do this for many rows, simply use an apply statement, with MARGIN=1, and you will get a list of lists. Is this what you were looking for?
getNames=function(row){
retList=as.list(order(row))
names(retList) = as.character(sort(row))
return(retList)
}
...here's a quick validation.
test=c("apply", "danger", "chance", "excellent")
getNames(test)
$apply
[1] 1
$chance
[1] 3
$danger
[1] 2
$excellent
[1] 4
test2=c('alt1','alt2','alt3','alt4')
getNames(test2)
$alt1
[1] 1
$alt2
[1] 2
$alt3
[1] 3
$alt4
[1] 4
Gotcha!
getData=read.csv("test.csv",header=TRUE,stringsAsFactors=FALSE)
q<-getData[2,]
n<-as.list(c(1:4))
names(n)<-q

Data loss during read.csv in R

I have a .csv file to be imported into R, which has more than 1K observations. However, when I used the read.csv function as usual, the imported file only has 21 observations. This is strange. I've never seen this before.
t <- read.csv("E:\\AH1_09182014.CSV",header=T, colClasses=c(rep("character",3),rep("numeric",22)),na.string=c("null","NaN",""),stringsAsFactors=FALSE)
Can anyone help me figure out the problem? I am giving a link to my data file:
https://drive.google.com/file/d/0B86_a8ltyoL3TzBza0x1VTd2OTQ/edit?usp=sharing
You have some messy characters in your data--things like embedded control characters.
A workaround is to read the file in binary mode, and use read.csv on the text file read in.
This answer proposes a basic function to do those steps.
The function looks like this:
sReadLines <- function(fnam) {
f <- file(fnam, "rb")
res <- readLines(f)
close(f)
res
}
You can use it as follows:
temp <- read.csv(text = sReadLines("~/Downloads/AH1_09182014.CSV"),
stringsAsFactors = FALSE)
Have all lines been read in?
dim(temp)
# [1] 1449 25
Where is that problem line?
unlist(temp[21, ], use.names = FALSE)
# [1] "A-H Log 1" "09/18/2014" "0:19:00" "7.866" "255" "0.009"
# [7] "525" "7" "4468" "76" "4576.76" "20"
# [13] "71" "19" "77" "1222" "33857" "-3382"
# [19] "26\032)" "18.30" "84.80" "991.43" "23713.90" "0.85"
# [25] "10.54"
^^ see item [19] above.
Because of this, you won't be able to specify all of your column types up front--unless you clean the CSV first.

R: How to remove quotation marks in a vector of strings, but maintain vector format as to call each individual value?

I want to create a vector of names that act as variable names so I can then use themlater on in a loop.
years=1950:2012
for(i in 1:length(years))
{
varname[i]=paste("mydata",years[i],sep="")
}
this gives:
> [1] "mydata1950" "mydata1951" "mydata1952" "mydata1953" "mydata1954" "mydata1955" "mydata1956" "mydata1957" "mydata1958"
[10] "mydata1959" "mydata1960" "mydata1961" "mydata1962" "mydata1963" "mydata1964" "mydata1965" "mydata1966" "mydata1967"
[19] "mydata1968" "mydata1969" "mydata1970" "mydata1971" "mydata1972" "mydata1973" "mydata1974" "mydata1975" "mydata1976"
[28] "mydata1977" "mydata1978" "mydata1979" "mydata1980" "mydata1981" "mydata1982" "mydata1983" "mydata1984" "mydata1985"
[37] "mydata1986" "mydata1987" "mydata1988" "mydata1989" "mydata1990" "mydata1991" "mydata1992" "mydata1993" "mydata1994"
[46] "mydata1995" "mydata1996" "mydata1997" "mydata1998" "mydata1999" "mydata2000" "mydata2001" "mydata2002" "mydata2003"
[55] "mydata2004" "mydata2005" "mydata2006" "mydata2007" "mydata2008" "mydata2009" "mydata2010" "mydata2011" "mydata2012"
All I want to do is remove the quotes and be able to call each value individually.
I want:
>[1] mydata1950 mydata1951 mydata1952 mydata1953, #etc...
stored as a variable such that
varname[1]
> mydata1950
varname[2]
> mydata1951
and so on.
I have played around with
cat(varname[i],"\n")
but this just prints values as one line and I can't call each individual string. And
gsub("'",'',varname)
but this doesn't seem to do anything.
Suggestions? Is this possible in R? Thank you.
There are no quotes in that character vector's values. Use:
cat(varname)
.... if you want to see the unquoted values. The R print mechanism is set to use quotes as a signal to your brain that distinct values are present. You can also use:
print(varname, quote=FALSE)
If there are that many named objects in you workspace, then you need desperately to learn to use lists. There are mechanisms for "promoting" character values to names, but this would be seen as a failure on your part to learn to use the language effectively:
var <- 2
> eval(as.name('var'))
[1] 2
> eval(parse(text="var"))
[1] 2
> get('var')
[1] 2

store summary output in a list of tables or matrix

How to read the following vector "c" of strings into a list of tables? Which way is the shortest read.table strsplit? e.g. I cant see how to read the table Edit:c[4:6] a[4:6] in one command.
require(car)
m<-matrix(rnorm(16),4,4,byrow=T)
a<-Anova(lm(m~1),type=3,idata=data.frame(treatment=factor(1:4)),idesign=~treatment)
c<-capture.output(summary(a,multivariate=F))
c
This returns lines 4:6
c[4:6]
Now if you wanted to parse this I would do it in two steps. First on the column values from rows 5:6 and then add back the names.
> vals <- read.table(text=c[5:6])
> txt <- " \t SS\t num Df\t Error SS\t den Df\t F\t Pr(>F)"
> names(vals) <- names(read.delim(text=txt))
> vals
X SS num.Df Error.SS den.Df F Pr..F.
1 (Intercept) 0.57613392 1 0.4219563 3 4.09616 0.13614
2 treatment 1.85936442 3 8.2899759 9 0.67287 0.58996
EDIT --
you could look at the source code of the summary function and calculate the quantities required by yourself
getAnywhere(summary.Anova.mlm)
The original idea seems not to work.
c2 <- summary(a)
# find out what 'properties' the summary object has
# turns out, it is just the Anova object
class(c2) <- "list"
names(c2)
This returns
[1] "SSP" "SSPE" "P" "df" "error.df"
[6] "terms" "repeated" "type" "test" "idata"
[11] "idesign" "icontrasts" "imatrix" "singular"
and we can get access them
c2$SSP
c2$SSPE
It seems not a good idea to use R internal c function as a variable name

Resources