problems in deleting columns in csv file and storing data? - r

I have a csv file that has the following format:
1 3 1 4
1415670_at 1 8.512147859 8.196725061 8.174426394 8.62388149
1415671_at 2 9.119200527 9.190318548 9.149239039 9.211401637
1415672_at 3 10.03383593 9.575728316 10.06998673 9.735217522
1415673_at 4 5.925999419 5.692092375 5.689299161 7.807354922
I had made some manipulation of this data by deleting columns that are not 1 or 2:
m<-read.csv("table.csv")
smallerdat <- m[ c(1,2, grep("^X1$|^X2$|X1\\.|X2\\." , names(m) ) ) ]
Now I want to save this results again to a csv file, so I do this:
write.csv(smallerdat,"tablemodified.csv",ncolumns=length(smallerdat),sep=",")
but I got an error that says:
Error in cat(list(...), file, sep, fill, labels, append) :
argument 1 (type 'list') cannot be handled by 'cat'
The question that I have is how I can store into a csv file the modified table.
Any help?

The write.csv function needs to have the file-name given as a named argument (as do all of the write.* cousins). Try this instead (edited):
write.csv(smallerdat, file="tablemodified.csv" )
And my original guess applies to the save() function rather than the write.table variants.

I was about to tell you to read ?read.csv and note the "See Also" section that pointed to write.csv... but it doesn't.
So, use write.csv. :)

Related

How to save an object whose name is in a variable?

This is calling for some "tricky R", but this time it's beyond my fantasy :-) I need to save() an object whose name is in the variable var. I tried:
save(get(var), file = ofn)
# Error in save(get(var), file = ofn) : object ‘get(var)’ not found
save(eval(parse(text = var)), file = ofn)
# Error in save(eval(parse(text = var)), file = ofn) :
# object ‘eval(parse(text = var))’ not found
both of which fail, unfortunatelly. How would you solve this?
Use the list argument. This saves x in the file x.RData. (The list argument can specify a vector of names if you need to save more than one at a time.)
x <- 3
name.of.x <- "x"
save(list = name.of.x, file = "x.RData")
# loading x.RData to check that it worked
rm(x)
load("x.RData")
x
## [1] 3
Note
Regarding the first attempt in the question which attempts to use get we need to specify the name rather than its value so that attempt could use do.call converting the character name to a name class object.
do.call("save", list(as.name(name.of.x), file = "x.RData"))
Regarding the second attempt in the question which uses eval, to do that write out the save, substitute in its name as a name class object and then evaluate it.
eval(substitute(save(Name, file = "x.RData"), list(Name = as.name(name.of.x))))
If it's just one object, you can use saveRDS:
a<-1:4
var<-"a"
saveRDS(get(var),file="test.R")
readRDS(file="test.R")
[1] 1 2 3 4

Error in writing dataframe in csv

i have below dataframe
df_Place:
Name|Places
----+-----------------------
abc |delhi
bcd |mumbai,delhi
cde |chennai,hyderabad,delhi
def |mumbai
efg |bangalore,mumbai
ghi |delhi,bangalore
i wanted to have places in form of a matrix so i did below operation
df_Place$matrix<-as.matrix(strsplit(df_Place$Place,","))
i get below dataframe
Name|Places |matrix
----+-----------------------+------------------------------
abc |delhi |delhi
bcd |mumbai,delhi |c("mumbai","delhi")
cde |chennai,hyderabad,delhi|c("chennai","hyderabad","delhi")
def |mumbai |mumbai
efg |bangalore,mumbai |c("bangalore","mumbai")
ghi |delhi,bangalore |c("delhi","bangalore")
now while trying to write this into csv
write.csv(df_Place,"tx.csv")
i get below error:
Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
unimplemented type 'list' in 'EncodeElement'
but if i remove the matrix column then it gets written successfully.
i know that it will be very basic, but can someone explain the reason behind this
It has to do with writing a matrix (with multiple dimensions) to a df in which multiple cols have no dimensions (vector). I found this solution to work (see Outputting a Dataframe in R to a .csv)
# First coerce the data.frame to all-character
df_Place2 = data.frame(lapply(df_Place, as.character), stringsAsFactors=FALSE)
# write file
write.csv(df_place2,"tx.csv")
You can use data.table library
fwrite(df_Place, file ="df_Place.csv")

Writing and reading a zoo object - errors

I have a zoo object, prices, which, when I type class(prices), it returns “zoo.” I then create a file using:
write.zoo(prices, file = “foo”, index.name = “time”)
The resulting files looks like this:
"time" "AAPL.Adjusted" “SHY.Adjusted"
2013-05-01 60.31 84.12
2013-05-02 61.16 84.11
2013-05-03 61.77 84.08
I then try and read this file with this statement:
myData <- read.zoo(“foo”)
and I get this error:
Error in read.zoo(“foo") :
index has bad entries at data rows: 1 2 3 4
I’ve tried a number of parameter settings and nothing seems to work. Help much appreciated.
Newbie
The file has a header line so try:
z <- read.zoo("foo", header = TRUE, check.names = FALSE)
The check.names part gives nicer looking column names but you could leave it out if that were not important.

Reading large fixed format text file in r

I am trying to input a large (> 70 MB) fixed format text file into r. For a smaller file (< 1MB), I can use the read.fwf() function as shown below.
condodattest1a <- read.fwf(impfile1,widths=testcsv3$Varlen,col.names=testcsv3$Varname)
When I try to run the line of code below,
condodattest1 <- read.fwf(impfile,widths=testcsv3$Varlen,col.names=testcsv3$Varname)
I get the following error message:
Error: cannot allocate vector of size 2 Kb
The only difference between the 2 lines is the size of the input file.
The formatting for the file I want to import is given in the dataframe called testcsv3. I show a small snippet of the dataframe below:
> head(testcsv3)
Varlen Varname Varclass Varsep Varforfmt
1 2 "V1" "character" 2 "A2.0"
2 15 "V2" "character" 17 "A15.0"
3 28 "V3" "character" 45 "A28.0"
4 3 "V4" "character" 48 "F3.0"
5 1 "V5" "character" 49 "A1.0"
6 3 "V6" "character" 52 "A3.0"
At least part of my problem is that I am reading in all the data as factors when I use read.fwf() and I end up exceeding the memory limit on my computer.
I tried to use read.table() as a way of formatting each variable but it seems I need a text delimiter with that function. There is a suggestion in section 3.3 in the link below that I could use sep to identify the column where every variable starts.
http://data.princeton.edu/R/readingData.html
However, when I use the command below:
condodattest1b <- read.table(impfile1,sep=testcsv3$Varsep,col.names=testcsv3$Varname, colClasses=testcsv3$Varclass)
I get the following error message:
Error in read.table(impfile1, sep = testcsv3$Varsep, col.names = testcsv3$Varname, : invalid 'sep' argument
Finally, I tried to use:
condodattest1c <- read.fortran(impfile1,lengths=testcsv3$Varlen, format=testcsv3$Varforfmt, col.names=testcsv3$Varname)
but I get the following message:
Error in processFormat(format) : missing lengths for some fields
In addition: Warning messages:
1: In processFormat(format) : NAs introduced by coercion
2: In processFormat(format) : NAs introduced by coercion
3: In processFormat(format) : NAs introduced by coercion
All I am trying to do at this point is format the data when they come into r as something other than factors. I am hoping this will limit the amount of memory I am using and allow me to actually input the file. I would appreciate any suggestions about how I can do this. I know the Fortran formats for all the variables and the column at which each variable begins.
Thank you,
Warren
Maybe this code works for you. You have to fill varlen with the field sizes and add the corresponding type strings (e.g. numeric, character, integer) to colclasses
my.readfwf <- function(filename,varlen,colclasses) {
sidx <- cumsum(c(1,varlen[1:(length(varlen)-1)]))
eidx <- sidx+varlen-1
filecontent <- scan(filename,character(0),sep="\n")
if (any(diff(nchar(filecontent))!=0))
stop("line lengths differ!")
nlines <- length(filecontent)
res <- list()
for (i in seq_along(varlen)) {
res[[i]] <- sapply(filecontent,substring,first=sidx[i],last=eidx[i])
mode(res[[i]]) <- colclasses[i]
}
attributes(res) <- list(names=paste("V",seq_along(res),sep=""),row.names=seq_along(res[[1]]),class="data.frame")
return(res)
}

R: Output data frame with list to csv

I have the following data frame (info) that looks like this:
> info[1:5,]
field BinningMethod DataType numLevels cumLevel factLevels
1 data_len EQUAL AREA DOUBLE 5 5 (-inf,2.0], (2.0,6.0), [6.0,8.0), [8.0,+inf), MISSING
2 dns_count_add_rr DISCRETE MAPPING DOUBLE 3 8 0.0, 1.0, MISSING
3 dns_count_answers DISCRETE MAPPING DOUBLE 3 11 0.0, 1.0, MISSING
4 dns_count_auth_rr DISCRETE MAPPING DOUBLE 3 14 0.0, 1.0, MISSING
5 dns_count_queries DISCRETE MAPPING DOUBLE 2 16 1.0, MISSING
With class types:
> sapply(info, class)
field BinningMethod DataType numLevels cumLevel factLevels
"character" "character" "character" "numeric" "numeric" "list"
I'd like to output 'info' to a CSV file but do not know how to handle the list field (factLevels). I currently get the following error:
> write.csv( info,
+ file = paste("FIELDS_", modelFile, sep=""),
+ row.names = FALSE, na = "")
Error in write.table(x, file, nrow(x), p, rnames, sep, eol, na, dec, as.integer(quote), : unimplemented type 'list' in 'EncodeElement'
What are some possible solutions to this? The only requirement I have is for a java program to be able to read it in and distinguish the different values.
I see that #Seb has linked (now deleted) to an answer of mine that is loosely on this topic. (Generally speaking, columns of data frames shouldn't be lists in R.) However, if your only purpose is to dump this information into a file, perhaps this will be more relevant to you:
One simple option may be to convert the factLevels column from a list to a character vector by pasting the values together (using a delimiter other than a comma, of course). Perhaps something like:
info$factLevels <- sapply(info$factLevels,
FUN = paste,collapse = "-")
Then you'll have to adjust your java program to parse the factor levels properly, of course.

Resources